Espressif Systems has very recently introduced the ESP32-S3-BOX AI voice devkit designed for the development of applications with offline and online voice assistants, and whose design I find similar to the M5Stack Core2 devkit, but the applications will be different.
The ESP32-S3-BOX features the latest ESP32-S3 processor with WiFi and BLE connectivity, AI capabilities, as well as a 2.4-inch capacitive touchscreen display, a 2-mic microphone array, a speaker, and I/O connectors with everything housed in a plastic enclosure with a stand.
ESP32-S3-BOX specifications:
- WiSoC – ESP32-S3 dual-core Tensilica LX7 up to 240 MHz with Wi-Fi & Bluetooth 5, AI instructions, 512KB SRAM
- Memory and Storage – 8MB octal PSRAM and 16MB QSPI flash
- Display – 2.4-inch capacitive touchscreen display with 320×240 resolution
- Audio – Dual microphone, speaker
- USB – 1x USB Type-C port for power and debugging (JTAG/serial)
- Expansion – 2x Pmod-compatible headers for up to 16x GPIOs
- Misc
- Power LED, Mute button and LED, boot mode button, reset button
- 6-axis IMU Sensor (IvenSense ICM-42670)
- Infrared “controller”
- Power Supply – 5V via USB Type-C connector or dock
- Dimensions – TBD
The company says the ESP32-S3-BOX is ideal for the development of smart speakers, gateways, and IoT devices that requires human-computer voice interaction. The development kit supports far-field voice interaction thanks to the built-in microphone array, offline voice wake-up and speech commands recognition in Chinese and English languages, reconfigurable voice commands again in Chinese and English languages, as well as ESP-RainMaker IoT development framework.
Software support for ESP-S3-BOX built upon previous work done for ESP32 including ESP-Skainet Voice Assistant and ESP-DL library for machine learning, as well as third-party solutions like Alexa for IoT SDK or the LVGL open-source graphics library used to develop HMI solution. You’ll find the ESP-BOX AIoT development framework and documentation to get started with the AI voice development kit on Github.
A blog post goes into more details about the current and future capabilities of the kit, notably the use of the Pmod connectors to add Zigbee and Thread connectivity with an ESP32-H2 module and/or even cellular IoT connectivity (5G, NB-IoT, LTE Cat-M1).
ESP32-S3-BOX AI voice development kit can be pre-order for $45 on Amazon, Aliexpress and Adafruit. At the time of writing, the devkit is only in stock on Aliexpress, and interestingly it’s the just-opened official Espressif Systems store on Aliexpress, so we may expect the company to sell future devkits that way going forward.

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress