While most Amazon Alexa certified products are hardware designs, Amazon website also includes a Software Audio Front End (AFE) Dev Kits section that lists software algorithms that optimize audio detection in noisy environments, and the latest addition is Espressif’s Audio Front-End algorithms, or ESP AFE for shorts, that have recently been qualified for Amazon Alexa devices.
It’s not the first Alexa certified solution from Espressif Systems, as both companies have worked together in the past with audio products like the ESP32-PICO-V3-ZERO Alexa Connect Kit Module or ESP32-Vaquita-DSPG board
The algorithms were created by Espressif’s AI Lab team who used the AI and DSP instructions inside ESP32-S3 processor to optimize the code. The algorithms only utilized 12 to 20% of the CPU, as well as 220 KB of internal and 240 KB of external memory, leaving extra resources for other applications running in the wireless SoC. The ESP AFE is said to achieve a Wake-up rate of 100% (should that be 99.99%), and the speech recognition rate is over 90% in low-SNR scenarios.
The algorithms include multi-channel acoustic echo cancellation, blind source separation (beam-forming), voice activity detection, and noise reduction. Those will help enhance voice-user interfaces (VUI), especially in noisy environments, for example when a device both plays music and listens to voice commands. Espressif also explains the ESP AFE may help design smaller devices with two microphones separated by as small a distance as 2 cm.
You’ll find a longer overview on Espressif’s website, and more technical details on Github. The Amazon page says ESP AFE works with a 2-microphone array and costs $19.99. That may be the cost of ESP32-S3-DevKitC-1 development board coupled with a 2-mic array, or a separate unannounced ESP32-S3 board with two microphones.
[Update; John Lee posted an internal ESP32-S3 board with a 2-mic array to Twitter
]
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress
Just added a photo of an ESP32-S3 audio board with a 2-mic array used internally by Espressif.
BSS is not beamforming but an alternative to and sometimes both are used with a beamformer feeding 1 channel of a BSS input for optimal effect. The algorithms only utilized 12 to 20% of the CPU, as well as 220 KB of internal and 240 KB of external memory Great info though just wish we had opensource rather than blobs as the tail of the AEC likely could be increased and maybe others may enhance with non linear latency adjustment where depending on psram we could assign more memory Same with wakenet as pretty sure the closed source blob even… Read more »