ReSpeaker is a development board combining an Atmel AVR MCU, a MediaTek MT7688 WiFi module running OpenWrt, a built-in microphone, an audio jack, and I/O headers to allow for voice control and output for IoT applications. That means you could make your own Amazon Echo like device with the board and add-ons, use it as a voice controlled home automation gateway and more. The board was launched on Kickstarter a few days ago, and already raised $100,000 from about 100 backers, but I’ve received an early sample, so I’ll provide some more information about the firmware, and shows how to use with some Python scripts leveraging Microsoft Bing Speech API.
You’ll need a micro USB to USB cable to connect your to computer (Linux, Windows, Mac OS…), and a speaker to connect to the board. Linux (OpenWrt) boots in a few seconds, and once it’s done all RGB LED will continuously blink.
I’m using a computer running Ubuntu 16.04, and ReSpeaker is detected by the system as an Arduino Leonardo board:
1 2 3 4 5 6 7 8 |
[ 5363.542637] usb 3-4.4.4: new full-speed USB device number 7 using ehci-pci [ 5363.652356] usb 3-4.4.4: New USB device found, idVendor=2341, idProduct=0036 [ 5363.652361] usb 3-4.4.4: New USB device strings: Mfr=2, Product=1, SerialNumber=0 [ 5363.652364] usb 3-4.4.4: Product: Arduino Leonardo [ 5363.652367] usb 3-4.4.4: Manufacturer: Arduino LLC [ 5363.697606] cdc_acm 3-4.4.4:1.0: ttyACM0: USB ACM device [ 5363.697994] usbcore: registered new interface driver cdc_acm [ 5363.697998] cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters |
That’s optional, but if you want you can access the serial console, with programs like Minicom, screen, putty or hyperterminal and set the connection to 57600 8N1 to access the command. Here’s the full boot log:
|
DDR Calibration DQS reg = 00008887 U-Boot 1.1.3 (Sep 10 2015 - 05:56:31) Board: Ralink APSoC DRAM: 128 MB relocate_code Pointer at: 87f68000 flash manufacture id: ef, device id 40 19 find flash:W25Q256FV *** Warning - bad CRC, using default environment ============================================ Ralink UBoot Version: 4.3.0.0 -------------------------------------------- C 7628_MP (Port5<->None) DRAM bus: 6 bit 1024 Mbits DDR, with 16 Tota memory: 12 MBytes Flsh componen: SPI Flash Date:Sep 10 2015 Time:05:56:31 ============================================ icache: sets:512, ways:4, linesz:32 ,total:65536 dcache: sets:256,ys:4,linesz:32 ,otal:32768 ##### The CPU freq = 580 MHZ #### estimate memory size =128 Mbyte RESET MT728 PHY!!!!!! GPIOMODE --> 50054404 GPOMODE2 --> 5540551 Please choose the operation: 1: Lystem code to SDRAM via TFTP. 2: Load system code then write to Flash via TFTP. 3: Boot system code via Flash (default). 4: Entr boot command lne interfac. 7: Load Boo Loader cod then writeto Flash via Seril. 9: Load Boot Loder code thn write to lash via TP. 3: SystemBoot systemcode via Flsh. ## Booing image a bc050000 .. Image Name: MIPS OpenWrt Linux-3.18.23 Image Type: MIPS Linux Kernel Iage (lzma cmpressed) Data Size: 1295088 Bytes = 1.2 MB Load Address: 80000000 Entry Poin 80000000 Verifyng Checksum... OK Uncompressing Kernel Image ... OK No initrd ## Transferring control to Linux at address 0000000) ... ## Giving linux memsize in MB, 128 Starting kernel ... [ 0.000000] Linux version 3.18.23 (pillar@server) (gcc version 48.3 (OpenWrt/Linaro GCC 6 [ 0.000000] Bard has DDR [ 0.00000] AnalogPMU set to hw conrol [ 0000000] Digtal PMU setto hw control [ 0.000000]oC Type: ediaTek MT7688 ver:1 eco:2 [ 0.000000] bootconsole [early0] enabled [ 0.000000] CPU0 revision s: 00019655(MIPS 24KEc [ 0.000000] MIPS: machine is MediaTek LinkIt Smart 7688 [ 0.000000] Determined physical RAM map: [ 0.000000] memory: 08000 00000000 (sable) [ 0.000000] Initrd not found or empty - disabling initrd [ 0.000000] Zone ranges: [ 0.000000] Normal mem 0x00000000-0x07ffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Earl memory nod ranges [ 0.00000] nod 0[mem 0x00000000-0x07fffff] [ .000000] Intmem setup ode 0 [mem x00000000-0x07fffff] [ 0.000000] Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes. [ 0.000000]Primary dat cachB, 4-way, PIPT, no aliases, linesize 32 bytes [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 3252 [ 0.00000] Kernel command line: console=ttyS2,57600 rootfstype=squashfs,jffs2 [ 0.000000] PID hash table entries: 512 (order:, 2048 bytes) [ 0.000000] Dentry cache hash table entries: 16384 (order: 4, 6536 bytes) [ 0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 btes) [ 0.000000] Writing ErrCtl register=0007b250 [ 0.000000] Readback ErrCtl register=0007b250 [ 0.000000] Memory: 125868K/131072K available (2875 kernel code, 134K rwdata, 612K rodat) [ 0.000000] SLU: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 [ 0.000000] NR_IRQS:256 [ 0.00000]using register map from evicetree [ 0.000000] CPU Clock: 580MHz [ 0.000000] clocksource_of_init: no matching clocksources found [ 0.000000 Calibratin delay loop.. 385.84 BogoMIPS (lpj=1929216) [ 0.060000] pid_max: default: 32768 mnimum: 301 [ .060000] Munt-cache hsh table enries: 1024 order: 0, 496 bytes) 0.07000] Mountpoin-cache hash tableentries: 104 (order: 0 4096 bytes [ 0.080000] pinctrl core: initialized pinctrl susystem [ 0.090000] NET: Registered protocol mily 16 [ 0.110000] mt7621_gpio 10000600.gpio: registering 32 gpios [ 0.120000] mt7621_gpio 10000600.gpio: reistering 32 gpios [ 0.130000] mt7621_gpio 10000600.gpioregistering 32 gpios [ 0.140000] i2c-ralink 10000900.i2c: loaded [ 0.140000] Advanced Linux Sound Architecture Driver Initialized. [ 0.150000] Switched to clocksource MIPS [ 0.160000] NET: Registered protocol family 2 [ 0.160000] TCP establied hash table entries: 1024 (order: , 4096 byte) [ 0.10000] TCP bnd hash table entries: 1024 (order: 0, 4096 bytes) [ 0.190000] TCP: Hash tables configred (establshed 1024 bind 1024) [ 0.200000] TCP: reno registered [ 0.210000] UDP hash table entries 256 (order 0,96 bytes) [ 0.22000] UDP-Litehash table ntries: 256(order: 0, 096 bytes) [ 0.230000] NET: Registered protocol family 1 [ 0.240000] futex hash table entries: 256 (order: -, 3072 byte) [ 0.260000] squashfs: ve4.0 (2009/01/31) Phillip Lougher [ 0.270000] jffs2: version 2.2 (NAND) (SUMARY) (LZMA)(RTIME) (CMODE_PRIORITY) (c) 2001-2. [ 0.300000] msmni has bee set to 245 [ 0.300000] io scheduler noop egistered [ 0.310000] io scheduler deadline regist(default) [ 0.320000] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 0.340000] 10000c00.uartlite: ttyS0 at MMIO 0x10000c00 (irq = 28, base_baud = 2500000) A [ 0.350000] 10000d00.uart1: ttyS1 at MMIO 0x10000d00 (irq = 29, base_baud = 20000) is a A [ 0.370000] console [ttyS2] disabled [ 0.380000] 10000e00.uart2: ttyS2 a MMIO 0x1000e00 (irq =30, base_bad = 2500000 is a 1650 [ 0.40000] consol [ttyS2] enbled [ .400000] console ttyS2 enabled [ 0.410000 bootconsol [early0] dsable 0.410000] bootconsole [early0] disabled [ 0.4303 [ 0.450000] m25p80 spi32766.0: found w25q56, expected mx25l25635e [ 0.470000]25p80 spi32766.0: w25q256 (32768 Kbytes) [ 0.48000] m25p80 pi32766.0: sing chunke io [ 0.480000] 4 ofpart partitons found o MTD device spi32766.0 [ 0.500000] Creating 4 MTD partitions on "spi32766.0": [ 0.510000] 0x000000000000-000000003000 : "u-boot" [ 00] 000000003000-0x00000004000 : "" [ 0.530000] 0x000000040000-0x000000050000 "factory" [ 0.540000] 0x000000050000-0x000002000000 : "firmware" [ 0.610000] 2 uge-fw partitions found on MTD device firmware [ 0.630000] 0x000000050000-0x00000018c330 : "kernel" [ 0.640000] 0x00000018c330-0x000002000000 : "rotfs" [ 0.650000] mtd: device 5 (rootfs) set to be root filesystem [ 0.660000] 1 squashfs-split partitions founn MTD device rootfs [ 0.670000] 0x0000017f00000x00000200000 : "rootf_data" [ 0.700000] ralink_soc_eth 10100000.ethernet eth0: ralink a 0xb0100000 irq 5 [ 0.710000] i2c /dev entries driver [ 0.720000] mt7621_dt 10000120watchdog: Iitialized [ 0.730000] Enable Ralink GDMA Contoller Modul [ 0.740000] GDMA IP Version=3 [ 0.750000] TCP: cubic regisered [ 0.750000] NE: Registere protocol fmily 17 [ 0.760000] bridge: automatic filting via arp/ip/ip6tables has been deprected. Updateyou. [ 0.790000] 8021q: 802.1Q VLAN Support v1.8 [ 0.800000] *******Enter codec_wm8960_i2c_probe******** [ 0.810000] soc-audo soc-audio ASoC: maine MTK APSoC I2S should use snd_soc_register_car) [ 0.830000] wm8960 0-001a: No platform data supplied [ 0.840000] ****** wm8960_preinit ****** [ 1.350000] soc-audio soc-audio: wm8960-hii <-> mt76x-i2s mapping ok [ 1.36000] ALSA devce: [ 1.370000] #0: MTK APSoC IS [ 1.380000] VFS: Mounted root (squashfs filesystem) readonly on devie 31:5. [ 1.400000] Freeing unused kernel memory: 144K (8038c000 - 803b000) [ 2.71000] init: nsole is alive [ 2.72000] init: watchdog - [ 4.690000] usbcore: regisered new interfac drivr usbfs [ 4.700000] usbcre: registeed new inteface driverhub .71000] usbcore: egistered nw device drver usb [ 4.730000] exFAT: Version 1.2.9 [ 4.750000] SCSI subsystem iniialized [ 4.760000] ehci_hcd: US 2.0 'Enhaned' Host Cotrollr (EHCI) Drver [ 4.780000] ehci-platform: EHC generic pltform driver [ 4.990000] phy phy-usbphy.0: remote usb device wakeup disable [ 5.000000] phy phy-usbphy.0: UTMI 16bit 30MHz [ 5.010000] ehclatform 101c0000.ehci: EHCI Host Controller [ 5.020000] ehci-platform 101c0000.ehci: new USB bus registered, assigned bus number 1 [ 5.030000] ehci-platform 101c0000.ehci: irq 26, i mem 0x101c000 [ 5.070000] ehci-platform 101c0000.ehci: USB 2.0 started,EHCI 1.00 [ 5.08000] hub 1-0:10: USB hub ound [ 5.090000] hub 1-0:1.0: 1 port detected [ 5.100000] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver [ 5.110000] ohci-platform: OHCI generic platform driver [ 5.120000] ohci-platform 101c1000.ohci: Generic Platform OHCI controller [ 5.130000] ohci-platform 101c1000.ohci: new USB bus regstered, assigned bus number 2 [ 5.150000] ohci-platform 101c1000.ohci: ir 26, io mem101c1000 [ 5.220000] hub 2-0:1.0: USB hub found [ 5.230000] hub 2-0:1.0: 1 port detected [ 5.240000] platform gpio-leds: Driver les-gpio requsts probe deferral [ 5.260000] MTK MSDC dev init. [ 5.260000] SET_IOS: CLK(0kHz), BUS(PSHPULL), BW1), PR(UP), VDD(.30v), TIMIG(LEGACY)SET_) [ 5.30000 mtk-sd: MeiaTek MT657 MSDC Driver [ 5.340000] sdhci: Secure Digital Host Controller Interface driver [ 5.350000] sdhci: Copyright(c Pierre Ossan [ 5] sdhci-pltfm: SDHCI platform nd OF driver helper [ 5.370000] usbcore: registered new interface driver usb-storage [ 5.390000] SET_IOS: CLK(0kHz), BUS(OPENDRAIN), BW(1), PWR(OFF, VDD(1.50v, TIMING(LEGACY) [ 5.400000] latform gpio-le Driver les-gpio requests probe deferral [ 5.720000] init: - preinit - [ 4.680000] usbcore: registered new interface driver usbfs [ 4.690000] usbcore: registered nw intrface drive hub [ 4.700000] usbcore: registered new device driver usb [ 4.720000] exFAT: Version 1.2.9 [ 4.740000] SCSI subsystem initialized [ 4.750000] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 4.770000] ehci-platform: EHCI generic platform driver [ 4.980000] phy phy-usbphy.0: remote usb device wakeup disabled [ 4.990000] phy phy-usbphy.0: UTMI 16bit 0MHz [ 5.000000] ehci-platform 101c0000.ehci: EHCI Host Controller [ 5.010000] ehci-platform 101c0000.ehci: new USB bus registered, assigned bus number 1 [ 5.020000] ci-platform 101c0000.ehci: irq 26, io mem 0x101c0000 [ 5.060000] ehci-plaform 101c000.ehci: USB 2.0 started, EHCI 1.00 [ 5.070000] hub 1-0:1.0: USB hub found [ .080000] hub 1-0:1.0: 1 port detected [ 5.090000] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver [ 5.100000] ohci-platform: OHCI generic pltform drive [ 5.110000] ohci-platform 101c1000.ohci: Generic Platform OHCI controller [ 5.1000] ohci-platform 101c1000.ohci: new USB bus registered, assigned bu number 2 [ 5.140000] ohci-platform 101c1000.ohci: irq 26, io mem 0x101c1000 [ 5.21000] hub 2-0:.0: USB hubfound [ 5.220000] hub 2-0:1.0: 1 port detected [ 5.230000] platform gpioleds: Drive leds-gpio equess probe defrral [ 5.250000] MTK MSDC device init. [ 5.250000] SET_IOS: CLK(0kHz), BUS(PUSHPULL), BW(1), PWR(UP), VDD(3.30v), TIMING(LEGACY) [ 5.320000] mtk-sd: MediaTek MT6575 MSDC Diver [ 5.3000] SET_IOS: CLK(0kHz), BUS(OPENDRAN), BW(1), WR(OFF), VD(1.50v), TIING(LEGACY) [ 5.340000] platform gpio-leds: Driver leds-gpio requests probe deferral [ 5.360000] sr [ 5.370000] sdhci: Copyright(c) Pie Ossman [ 5.380000] sdhci-pltfm: SDHCI platform andr [ 5.390000] usbcore: registered new interface driver usb-storage [ 5.730000] init: - preinit - [ 6.760000] rt305x-esw 10110000.esw: link changed 0x00 [ 6.920000] random: procd urandom read wi0 bits of enropy availale Press the [f] key and hit [enter] to enterfailsafe moe Press the [1], [2], [3] or [4] key and hit [enter] to select the debug leve [ 9.480000] jffs2: notice: (389) jffs2_build_xattr_subsyste: complete uilding xatr subsy. [ 9.520000] mo_root: switching to jffs2 overlay [ 9.550000] procd: - early - [ 9.560000] procd: - watchdog - [ 10.380000] procd: - ubus - [ 11.400000] procd: - init - Please press Enter to activte thole. BusyBox v1.23.2 (2016-07-14 20:06:12 CST) built-in shell (ash) _______ ________ __ | |.-----.-----.-----.| | | |.----.| |_ _____|| __|_____|__|__||________||__| |___| _| ------------------------------------------------- CHAOS CAMER (Chaos Calmer, r48532) ----------------------------------------------------- * 1 1/2 ozGin Shake with lassful 1/4 oz Triple Sec of broken ice and pour * 3/4 oz Lime Juice unstrained into a goblet. * 1 1/2 oz Orange Juice * 1 tsp. Grenadine Syrup -------------------------------------------------- root@(noe):/# |
If you think something is odd here… That’s because the serial connection will miss some characters. This happens with two computers and different USB cables. Hopefully this is either a specific issue with my sample, or if it is an issue it will be fixed by the time boards ship to Kickstarter backers [Update: The company explained me that it’s because the Atmel 32u4 and Mediatek MT7688 share the same USB port]. So instead of using the serial console, I’ll use SSH instead which means I have to connect to ReSpeaker WiFi access point first, and configure it.
ReSpeaker will show as LinkIt_Smart_7688_XXXXX, because the WiFi module is exactly the same as LinkIt Smart 7688 IoT board, and unsurprisingly the configuration interface is exactly the same.First set the root password, and login with that password.
Then go to Network tab, select station mode, and connect to your access point by entering your password. Click Configure, and you’re done. As you can see on the right above, you can also use OpenWrt’s LUCI interface to configure networking.
Now find ReSpeaker IP address via your Router DHCP client list, arp-scan, or other method:
1 2 |
sudo apt install arp-scan sudo arp-scan --localnet |
You can now connect to the board via SSH:
1 |
ssh root@respeaker_ip_address |
and use the password you set in the web interface.
Now let’s check some CPU information:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
cat /proc/cpuinfo system type : MediaTek MT7688 ver:1 eco:2 machine : MediaTek LinkIt Smart 7688 processor : 0 cpu model : MIPS 24KEc V5.5 BogoMIPS : 385.84 wait instruction : yes microsecond timers : yes tlb_entries : 32 extra interrupt vector : yes hardware watchpoint : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb] isa : mips1 mips2 mips32r1 mips32r2 ASEs implemented : mips16 dsp shadow register sets : 1 kscratch registers : 0 package : 0 core : 0 VCED exceptions : not available VCEI exceptions : not available |
We’ve got Mediatek NT7688 MIPS24K processor as advertised, so let’s check a few more details:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
root@mylinkit:~# uname -a Linux mylinkit 3.18.23 #259 Mon Aug 8 17:04:04 CST 2016 mips GNU/Linux root@mylinkit:~# df -h Filesystem Size Used Available Use% Mounted on rootfs 8.1M 504.0K 7.6M 6% / /dev/root 22.5M 22.5M 0 100% /rom tmpfs 61.5M 260.0K 61.3M 0% /tmp /dev/mtdblock6 8.1M 504.0K 7.6M 6% /overlay overlayfs:/overlay 8.1M 504.0K 7.6M 6% / tmpfs 512.0K 0 512.0K 0% /dev root@mylinkit:~# free -m total used free shared buffers Mem: 126012 36052 89960 260 4660 -/+ buffers: 31392 94620 Swap: 0 0 0 |
The board runs Linux 3.18.23, has 7.6MB available storage, and 128MB RAM in total.
I’m not going to test the audio features with command tools, and python script, and also include a video demo at the end of this review.Since I don’t have ReSpeaker Microphone array add-on, I have to be fairly close to the microphone for it to work well, maybe one meter at most, or the volume would be really low.
I’ll start by checking audio recording and playback with any API or internet access requirements.
We can record audio with 16000 sample rate, 16 bit width, 1 channel using the following command
1 |
arecord -M -f S16_LE -r 16000 -c 1 --buffer-size=204800 -v /tmp/sample.wav |
and play it back with aplay:
1 |
aplay -M /tmp/sample.wav --buffer-size=204800 -v |
It worked OK for me, although the volume seemed quite low.
Now we can do something a little more interested as Seeed Studio develop a few Text-to-speech and Speech-to-text Python scripts. You can retrieve the scripts from ReSpeaker github account, and install one dependencies to setup the board:
1 2 3 |
git clone https://github.com/respeaker/microsoft_cognitive_services.git cd microsoft_cognitive_services pip install monotonic |
The script are using Microsoft Speech API, but in theory you could use any other speech API. Since Seeed Studio has already done all the hard work, I simply applied for a Microsoft peech API key in order to be able to use the demo.
That’s free for testing / evaluation, but if you intend to use it in commercial products, or for your own case, if you use more 5,000 transactions per month, you’d need to purchase a subscription.
You’ll find three Python scripts in the directory namely: bing_voice.py, bing_stt_with_vad.py, tts.py. Look for BING_KEY inside each script, and paste your own key.
Time to have some fun, starting with the speech to text script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
python bing_stt_with_vad.py ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround40 ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround41 ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround50 ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround51 ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71 ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline * recording 000000000111000000000000000000000000000000000000000000000000000000000011111+11111111111111100000000- * done recording Bing:你好 * recording 000000000000000000000000000000000000000000000000000000000^C0 * done recording |
It’s pretty slow to start (about 15 seconds), and then there are a few error message, before you can see the “* recording” message, and you can talk, with Bing returning the results: “Bing:”. Chinese? Yep, as currently the default is Chinese, but if it is not your strongest language, you can edit bing_stt_with_vad.py, and change the language replacing zh-CN by en-US, or other anguage strings:
1 2 3 |
# text = bing.recognize(data, language='zh-CN') text = bing.recognize(data, language='en-US') # text = bing.recognize(data, language='th-TH') |
An English works too (sort of):
1 2 3 4 5 6 7 8 9 |
python bing_stt_with_vad.py * recording 0000000000000000000000000000000000011111+111111111111111111111111111111111111111111111001111111111111110000011111000111111111100000000- * done recording Bing:hello world next software * recording 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111+11111111111111111111111100001111111111111111111111111100111111111111100000000- * done recording Bing:what's the weather like today |
In the first sentence, I said “Hello World! Welcome to CNX Software today”, but it came out as “hello world next software”, maybe because of my accent, but I doubt it…
Then I wanted to try Thai language, but I got an API failure simply because the number of supported languages by Microsoft Speeach API is limited as shown in the table below.
language-Country | language-Country | language-Country | language-Country |
---|---|---|---|
ar-EG* | en-IN | fr-FR | pt-BR |
ca-ES | en-NZ | it-IT | pt-PT |
da-DK | en-US | ja-JP | ru-RU |
de-DE | es-ES | ko-KR | sv-SE |
en-AU | es-MX | nb-NO | zh-CN |
en-CA | fi-FI | nl-NL | zh-HK |
en-GB | fr-CA | pl-PL | zh-TW |
If your language is not listed here, then you could Google Speech API instead, and it’s likely Seeed Studio or the community will have written compatible scripts by the time ReSpeaker boards ship to backers.
So you now know how to convert your voice to text, and you can use that text to send a web search, or toggle GPIOs, but you may also want to get an audio answer to your action, and tts.py script is there for your, and very easy to use:
1 |
python tts.py "Hello World! Welcome to CNX Software" |
It did not really feel realistic, but at least I could understand the female voice in the speakers. Looks in the script I did not see any language settings, so I assume the API will automatically detect the language, and inputted a string in French instead, and all I heard was gibberish. Finally I found that you can change the voice language in bing_voice.py script with contains most of the code:
1 2 |
#def synthesize(self, text, language="en-US", gender="Female"): def synthesize(self, text, language="fr-FR", gender="Male"): |
I replaced the US female voice, but a French male voice, added a “famous French saying”:
1 |
python tts.py "Salut mon gars. Comment ca va?" |
At least it was understandable, but Microsoft has still some work to do the audio output was more like “Salut mon gars. commencer a va?”. The reason could also be that the correct writing is “Comment ça va”, but the terminal (set to UTF-8), did not let me input “ç”.
You can watch all those demo in the video below to get a better feel about the audio quality, delays, and capabilities of Microsoft Bing Speech API.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress
They didn’t send you the extra microphone array? Seems to be most interesting thing in that KS
@JM
The core board is OK for evaluation with its single mic, but if you need something more useful and with more range, then yes, the Mic array is nice to have.
By the time you’ve bought the Core board, the Mic Array addon board, the Meow King driver unit and spent many hours messing around with it, you may wanna cut your losses and buy Google Home or Amazon Echo. Should you save you some time.
@Gary
You’ll also be giving up control and passing your data straight over to Amazon or Google who will mine it down to the last tick in ways you probably don’t even imagine.
This may take more time and but at least it’s yours to control and choose what to do, what services to use, etc.
If you’re not into customisation and lots of DIY you may be reading the wrong website 🙂
@JM
Care to read the text above? This respeaker thing sends all things you speak at your home to microsoft and google.
@anon
@JM Well, it was a hyperthetical comparison on price and effort, not how demonic Google are, nevertheless if you represent this site and based on one comment suggest it’s not for me, maybe you’re right.
As for your delusion on having control of anything….mwaaahahahaha!
@anon
I’m not sure you understand what you’re read, but here you’re obviously free to choose any service not just Microsoft or Google.
This includes Nuance, IBM Bluemix or even your own voice processing stack.
I guess it won’t be the primary feature users are looking for but I can comment on this hardware in terms of audio latency.
As I could guess from the article, this is the very same base image and kernel as in the LinkIt Smart 7688, which I own. Even the audio codec is the same component.
From my tests, I could not get to a latency below 17.4 milliseconds for audio playback. This is way above the maximum latency I’d have hoped for, in the context of making musical instruments.
It might be possible to improve the limiting I2S driver to reach decent timings.
@Gary @JM @anon
I suppose the debate is even at a higher level than that, Google and Amazon are probably selling device for *end users* whereas the ReSpeaker is designed for developers/hackers/makers. Two very different products!