$59 Voice “Preview Edition” adds an offline smart speaker to your Home Assistant server

Nabu Casa has just launched the Home Assistant Voice Preview Edition, a little ESP32 device with an XMOS XU316 audio processor, a dual-microphone array, an internal speaker, and a 3.5mm audio jack, that adds offline smart speaker functions to your Home Assistant server through WiFi.

If your Home Assistant server is powerful enough, voice processing will be done directly on your local hardware using Home Assistant Voice software, but with lower-end hardware like a Raspberry Pi 4, audio processing can be done via a privacy-focused cloud instead. The solution also supports expansion thanks to a Grove connector on the bottom of the device.

Home Assistant Voice Preview Edition

Voice Preview Edition specifications:

  • SoC – Espressif ESP32-S3 dual-core Xtensa LX7 @ up to 240 MHz with vector extension for ML acceleration, 2.4 GHz WiFi & Bluetooth 5.0 LE connectivity
  • Memory- 8 MB octal PSRAM
  • Storage – 16 MB flash
  • Audio
    • DSP/Processor – XMOS XU316 with 16 real-time logical cores, support for echo cancellation, stationary noise removal, auto gain control
    • TI AIC3202 DAC with 48 kHz sampling rate
    • Input – Internal dual-mic array
    • Output
      • Internal speaker
      • 3.5mm stereo headphone jack
  • Expansion
    • Grove port to connect sensors or other accessories
    • Exposed pads on PCB for modding
  • Misc
    • Multipurpose button
    • Rotary dial for volume and other input
    • Mute switch that physically cuts power to the microphone
  • Power Supply – 5V/2A via USB-C port
  • Dimensions – 84x84x21 mm
  • Weight – 96 grams
  • Temperature Range – 0°C to 30°C
Home Assistant ESP32 Voice port
USB-C and 3.5mm audio jack (top), mute switch (bottom)
Voice Preview Edition Grove port
Grove port (protected with plastic cover when not in use)

Nabu Casa says the ESPHome open-source firmware is preloaded on the ESP32, and the firmware for the XMOS chip is also open-source. All you need is Home Assistant already configured on another device. For fully local speech processing, it is recommended to have a Home Assistant system based on an Intel N100 or higher. Low-end hardware such as the Home Assistant Green or Raspberry Pi 4-powered hardware will require a Home Assistant Cloud subscription for optimal speech processing performance. In all cases, this is handled by the open-source Assist voice assistant part of the Home Assistant project.

English, Spanish, and Portuguese are all fully supported through local processing and the cloud service, but your mileage may vary with other languages, and for example Thai and Chinese (Mandarin) are not supported at all on the device, and more work is needed for the cloud service. You can check whether your language is supported on the Home Assistant Voice Control page. The documentation has more details to get started with the kit.

The Voice Preview Edition can be purchased today for $59/€59 (MSRP). Additional information can found on the product page.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

Radxa Orion O6 Armv9 mini-ITX motherboard
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
3 Comments
oldest
newest
Luca Corsini
29 days ago

interesting, i wonder how this works compared to a fleet of m5stack microphones.
I am genuinely interested in a completely local assistant (and wonder if a rpi5 with an AI accelerator would be enough) driving some mics in almost every ambient.

MIcael
MIcael
29 days ago

so cool, wonder if an RK3588 as server would be capable of processing the audio?

Stuart Naylor
27 days ago

Without doubt its just strange that HA uses a large LLM such as Whisper for ASR whilst so much existing specific language models can match WER for far less process space. https://github.com/wenet-e2e/wenet is one as well as a much smaller and leaner model its runtime is far more optimal written in C than Python, so being more suitable for embedded. Its curious why HA seems to be creating own brand ecosphere than creating an open system and reusing existing as there is a number of projects to choose from that likely would welcome model contribution. Its an improvement on the… Read more »

Boardcon EM3562 Rockchip RK3562 SBC with 8 analog camera inputs