Espressif ESP-SR enables on-device speech recognition framework on ESP32-S3 and ESP32 WiSoCs

Espressif ESP-SR is a speech recognition framework enabling on-device speech recognition on ESP32 and ESP32-S3 wireless microcontrollers with the latter being recommended due to its vector extension for AI acceleration and larger, high-speech octal SPI PSRAM.

The ESP-SR framework was first released on December 17, 2021 with version 1.0, before the v1.20 update was introduced in March of this year, but I only found out about ESP-SR offline speech recognition solution through a tweet by John Lee showing an ESP-SR demo video by @ThatProject.

I initially was confused since ESP32 boards have supported speech recognition for years using the ESP-ADF framework. But the key difference is that the latter relies on online voice assistants such as Baidu DuerOS, Amazon Alexa, and Google Assistant, while the relatively new ESP-SR does that locally directly on the ESP32 CPU, so you don’t even need a network connection for this to work. We’ve written about various offline voice recognition modules in the last few years, and I didn’t know this was already implemented on the ESP32 chips.

The GitHub repository for ESP-SR lists four main components:

  • Audio Front-end AFE
  • WakeNet Wake Word Engine
  • MultiNet Speech Command Word Recognition
  • Speech Synthesis (only supports the Chinese language at this time)

If some of the components above ring a bell, that’s because they are existing solutions and we covered the ESP-AFE algorithms when they become Alexa certified, while WakeNet and MultiNet are part of the ESP-SKAINET assistant introduced in 2019. What appears to be new are test apps for speech recognition and text-to-speech conversion that were committed just 3 to 5 days ago.

ESP-SR ESP32 on-device speech recognition workflow
Speech recognition workflow

So it looks like the ESP-SR simply combines all those different projects as components to help with integration into customers’ projects. You’ll find documentation on the Espressif website, and the company recommends the ESP32-S3-Korvo-1 or ESP32-S3-Korvo-2 development boards to get started although I’d assume it should probably work on other ESP32-S3 smart audio devkits with microphones such as the ESP32-S3-BOX as well.

Share this:
FacebookTwitterHacker NewsSlashdotRedditLinkedInPinterestFlipboardMeWeLineEmailShare

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK 5 ITX RK3588 mini-ITX motherboard

3 Replies to “Espressif ESP-SR enables on-device speech recognition framework on ESP32-S3 and ESP32 WiSoCs”

      1. For some reason Willow has chosen to use Whisper for ASR which yeah if you use the large model then you need some Ooomf when using off device ASR.
        ESP-SR does offer on device ASR through Multinet but its really pushing past the devices capability hence why it broadcasts via a KW trigger to a central ASR.

        Wakenet the KW part also isn’t the best and the Claims willow is competive is extremely optimistic, but the ADF does give a 2/3 mic BSS (blind source seperation alg) that can really help with noise and far field.

        Still though it uses the $50 Esp32-S3-Box dev kit that is loaded with a bloat as a technology demonstrator and for some reason depending on your opinion opensource in this arena can be considered extremely optimistic or snakeoil with its comparisons depending on your opinion.

        What wakenet does prove is that the esp32-s3 has the potential to make a really good broadcast low cost KW microphone where maybe several could be used in a zone to increase coverage.
        If anybody is up for creating a dual mic ADC shim for esp32-S3 or a low cost specific S3 mic dev kit then please do.

        The current Wakenet suffers from poor datasets that if a community got together with an opt-in and do what Big-Data do of collating a on-device dataset, the problem of poor datasets would be solved.
        There is no better dataset than a dataset recorded on device of use and its a catch-22.

Leave a Reply

Your email address will not be published. Required fields are marked *

Boardcon Rockchip and Allwinner SoM and SBC products
Boardcon Rockchip and Allwinner SoM and SBC products