[Update February 17, 2018: The kit was previously referred as ESP32 LyRaTD MS1, but the company appears to have changed the name to ESP32-LyRaTD-MSC]
So apparently voice command will represent 50% of all searches in the next two years, and everybody is jumping on the smart speaker bandwagon, with announcements from many companies at CES 2018, including Google’s Android Things + Assistant products‘ announcement, NXP i.MX 8M official launch, Amazon Alexa Voice Service (AVS) development kit from Amlogic and Allwinner, and more.
Espressif Systems is about to join the party with their ESP32 LyraTD MS1 HDK (Hardware development kit) that most people will likely remember as “Audio Mic HDK” that was announced on Twitter.
Espressif Audio Mic HDK specifications:
- Wireless Module – ESP32-WROVER module with 802.11 b/g/n WiFi and Bluetooth 4.1 LE connectivity.
- DSP – 4-mic array chip
- Storage – micro SD card for audio files
- Audio
- Audio driver chip
- Earphone jack
- Dual speaker output ports
- 4x microphone array with up to 3 meter sensitivity while playing music
- Expansion
- I2C/SPI header
- 6-pin UART header
- I2S header
- Others undocumented
- Debugging – USB-UART micro USB interface (based on CP2102N), and JTAG header
- Misc – Power switch, 8x keys on top
- Power Supply – 5V via micro USB port
The kit can work over WiFi or Bluetooth, supports major cloud voice vendors such as Amazon Alexa, Google Assistant, and Baidu DuerOS. Soft decoder, and hot word recognition runs directly on ESP32 processor.
In twitter, the company also said you could implement your own hotword/keyword, by providing around 5000 unique recordings of your selected word, and that they expect to ship the board next week. It’s unclear when the board will be available for sale however.
One of the commenter mentioned he made his own ESP32 Circle evaluation kit with an audio jack, and a single microphone. If you are interested in that third party board, you can purchase it on Taobao for 169 RMB (~$26). The official Espressif Audio Mic HDK should sell for a bit higher due to the extra features.
[Update: Espressif ESP32-LyraTD-MSC is now sold for 44 Euros on Olimex]
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress
With all those dev kits that have been announced recently, I expect that we will be soon flooded with a lot of new smart speakers 🙂
Maybe hard to compete when most other support 802.11ac WiFi?
Well, it is a very cheap microcontroller. The similar development boards usually cost over $200. On top of that, these kinds of application hardly ever need 802.11ac-grade speed, since these processor cannot handle that speed.
And most importantly, which one support 802.11ac?? Even the famous Amazon Echo does not support 802.11ac.
…is not needed for this use case as you said. But dual-band capability (802.11n @ 5 GHz is sufficient) can be important to escape overcrowded 2.4 GHz band in cities. In my area 2.4 GHz band is close to unusable in the evening hours and on weekends.
Oh yes. The need for 5ghz makes sense indeed.
I agree with 5Ghz. In crowded areas like condos, apartments, dorms, offices, etc 2.4ghz is so over saturated that it is almost useless. 5ghz 11n is fine, no need for 11ac
The market reports are suggesting that voice will be a big market going forward. One report suggests 143 million will use voice control by 2022.
I have no study data to go on but gut feeling is to many predators in a pond with limited food supply. Fingers are going to get burnt chasing double digit growth. In my humble opinion.
I’ve received a sample. The chip in the middle of the main board is MicroSemi ZL38063 Microphone Array ASR-Assist Audio Processor.
There’s also a tiny chip on the back of the microphone board that reads N1309-3216.
What is the point of this board? ZL38063 is a $10 chip with an embedded CPU.
Might as well use an Allwinner H3/ram/spi flash and get a real CPU for the same price. H3 ($4) can boot off from a $0.50 SPI flash and you can use $0.80 128MB DRAM. Never did find a price for X-Powers AC108 but is has to be in the $0.50 range.
I am unclear where the wake word processing is done on this board, is it done in the ZL38063 or the ESP32?
@Jon Smirl
They said it’s done on ESP32 with assistance from ZL38063.
I’m wondering if the ESP32 will do hot word detection using a $1 I2S MEMS mic chip which avoids this $10 solution using the ZXL38063. Another option is the X-powers AC108 and then do the beamforming on the ESP32.
I’ve emailed some people in Espressif, we’ll see if they have anything to say. I’m not convinced a $10 external DSP is really needed.