August 19, 2019August 19, 2019 by Jean-Luc Aufranc (CNXSoft) - 43 Comments

Allwinner R328 Smart Speaker & System-on-Module Spotted in China

Earlier this year, Allwinner introduced some AIoT (AI + IoT) processors including Allwinner R328 dual-core Cortex-A7 processor for “low-cost voice interaction solutions” aka low-cost smart speakers. I did not pay too much attention at the processor at the time, but since then, the company has released a product brief with some more details about the processor.

We can see it integrates 64MB to 128MB DDR3 memory which should be enough to run Linux without external memory, and truly provide a low-cost solution for smart speakers, and I was told the chip may cost around $3. I was also asked whether Allwinner R328 smart speakers were already shipping.

A Google search in English did not help, so I had to switch to Chinese, and after visiting several sites, I could see some Allwinner A328 platforms including a smart speaker and a system-on-module were showcased at some event in China.

We’ve got a photo, but that much more info about the speaker itself. Processors are not typically mentioned in smart speakers specifications, so I went to Taobao and 1688.com websites to look for smart speakers that look like the one above.

“Tmall Elf Sugar R” (R) matches the design above, and sells for only 100 RMB ($14.2 US) on the website. There are several variants, so it’s a bit confusing since I can’t read Chinese. Based on the description the speaker comes with a 2-microphone array, support Bluetooth 4.2, and runs AliGenie Voice Assistant. According to Wikipedia, AliGenie Voice is an open-platform intelligent personal assistant launched and developed by Alibaba Group used in the Tmall Genie smart speaker. AliGenie is capable of smart home control, music playback, voice shopping with Taobao and Tmall, voice recognition, voiceprint recognition, as well as semantic understanding and speech synthesis.

Beside the smart speaker, I also noticed an Allwinner R328 system-on-module named CB-L 2-S1R07-6236, but a web search did not yield any results. We can still the module comes with R328-S3 processor, and Ampak AP6236 802.11n WiFi 4 and Bluetooth 4.2 module, as well as a flash of unknown capacity. The company behind the design is apparently called CB which stands for… City Brand (Thanks Milkboy! See comments for details and link) ~~well I’m not sure since the photo is blurry, but it looks like “City Biguo” or similar. Hopefully, Chinese readers may help on that one.~~

Jean-Luc Aufranc (CNXSoft)

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.

43 Replies to “Allwinner R328 Smart Speaker & System-on-Module Spotted in China”

dgp says:

August 19, 2019 at 12:41

It’s expensive but there is one board with this chip on taobao:

https://item.taobao.com/item.htm?spm=a230r.1.14.112.1ce816e7zyh333&id=600269429132&ns=1&abbucket=8#detail

Reply
1. Jean-Luc Aufranc (CNXSoft) says:
  
  August 19, 2019 at 14:06
  
  I can see that board is using R328-S2. I wonder what the differences are between R328-S3 and R328-S2. Maybe just the memory 64MB vs 128MB?
  
  Reply
  1. Jean-Luc Aufranc (CNXSoft) says:
    
    August 19, 2019 at 14:11
    
    Oh. It’s in the product brief… R328-S2: 64MB DDR2; R328-S3: 128MB DDR3
    
    Reply
    1. dgp says:
      
      August 19, 2019 at 15:21
      
      The physical size and pinout of the two seems to differ as well. That probably means you can’t put an S3 on an S3 board if you find you need more memory after you’ve done your PCB layout.
      
      Reply
      1. Jean-Luc Aufranc (CNXSoft) says:
        
        August 19, 2019 at 16:14
        
        LFBGA 143balls(R328-S2), LFBGA 144balls(R328-S3):
        • 9.25mm x 9.15mm body size,0.65mm ball pitch,0.35mm ball size(R328-S2)
        • 9.15mm x 11.1mm body size,0.65mm ball pitch,0.35mm ball size(R328-S3)
        
        The package size is different but the ball size and ball pitch are the same, and S3 only has one more pin, so maybe it’s possible to design a PCB that works for both?
      2. dgp says:
        
        August 19, 2019 at 16:23
        
        >a PCB that works for both?
        
        Looking at the board above again you can see two silkscreen boxes for the SoC. I guess the bigger outline is for the S3 so the layout guy didn’t put components in the way.
milkboy007 says:

August 19, 2019 at 15:25

“There are several variants,”
There are 2 variant essentially
官方标配 = just the devices, regular and Model “R”. usually expensive ones are just new stock with new design, with minimal HW changes
anything with *套餐 means bundling, see link bellow
https://img.alicdn.com/imgextra/i4/1115488308/O1CN01AJ7frj2BF73jpufzX_!!1115488308.jpg

“The company behind the design is apparently called CB which stands for”
Its CITY BRAND, http://www.citybrandhk.com/news_view.aspx?TypeId=4&Id=452&Fid=t2:4:2

Reply
1. Jean-Luc Aufranc (CNXSoft) says:
  
  August 19, 2019 at 16:14
  
  Thanks!
  
  Reply
Moonxi says:

August 19, 2019 at 15:32

Thanks, Jean-Luc nice work!
As per Allwinner website: OS: Linux 4.9
R328-S2 is square 9×9 mm
R328-S3 is rectangular 9×11 mm

Looking for SKD/Datasheet/User Manual

Chinese speaking friends please chime in. If there is this speaker for 14$ with R328-S3 I’m getting one ASAP to Europe
to disassemble for all of us to see whats inside.

Reply
1. dgp says:
  
  August 19, 2019 at 16:13
  
  Would be nice if internally it uses a SoM like the one pictured. So you can take the SoM out and dump the rest of it.
  That said RK3308 SoMs with wifi are $10 on taobao (https://item.taobao.com/item.htm?spm=a230r.1.14.89.7d592d11kcApkR&id=595669877515&ns=1&abbucket=8#detail).
  
  Reply
  1. Jon Smirl says:
    
    August 19, 2019 at 21:09
    
    Is PMIC integrated on RK3308? There is no PMIC on this module.
    
    Reply
    1. dgp says:
      
      August 19, 2019 at 21:19
      
      There are 5 dumb DC-DC supplies on the board so you likely won’t be able to scale any voltages along with clock frequencies.
      FWIW I went for this SoM instead for my first RK3308 experience. It’s more expensive but I don’t like SPI NAND or RTL wifi -> https://item.taobao.com/item.htm?spm=a230r.1.14.191.66dc2d11tXHjWG&id=600801250471&ns=1&abbucket=8#detail
      Haven’t gotten the board yet but it looks like dumb DC-DC supplies like the other one.
      
      Reply
      1. Jon Smirl says:
        
        August 19, 2019 at 21:29
        
        There is also this one:
        https://www.cnx-software.com/2019/06/08/mcuzone-rk3308-som-mdk3308-ek-evaluation-kit-smart-voice-applications/
      2. dgp says:
        
        August 19, 2019 at 21:39
        
        I considered that one too but the headers make it massive. It does seem to have a rockchip PMIC though so maybe the power consumption will be better.
2. Jon Smirl says:
  
  August 19, 2019 at 21:05
  
  You can certainly build and retail a smart speaker using the R328 for $14. I’ve heard that the R328 is under $3 in volume. So that SOM in the photo costs about $6 to mass produce. Throw in power supply, case, speakers – maybe $10 wholesale. That supports the $14 retail price.
  
  The board on taobao is using an AC101 for the array mics. That is a more expensive solution which I’m not sure is worth the added costs. The analog condenser mics do have better sound reproduction, but is it needed? This solution also allows you to input the amp-out signal into the AC101. Now when you sample you can get the 4 mics and amp-out simultaneously for easier echo cancelation. AC101 is I2S based.
  
  The cheap solution is to directly attach four PDM digital mic to the SOC. The SOC supports that. You then wire amp-out to line-in. Downside — the echo cancelling software is more complicated because the sampling is not synchronized. I wonder if there is a simple circuit you could make that turns amp-out into a PDM signal?
  
  Reply
  1. dgp says:
    
    August 19, 2019 at 22:03
    
    > I’ve heard that the R328 is under $3 in volume
    
    That would make it just shy of the cheapest 32bit Cortex A with memory you can buy today (AFAIK the cheapest is this: https://item.taobao.com/item.htm?spm=a230r.1.14.14.31bb2e0eYacUFC&id=596354183127&ns=1&abbucket=8#detail at ~$2.6). If only it wasn’t in such a horrible to work with package and you didn’t need to order thousands of them.
    
    Reply
    1. Jon Smirl says:
      
      August 19, 2019 at 22:55
      
      Many of the Hisilicon chips with embedded memory are sub-$3. But they are all ARM9.
      
      Reply
      1. dgp says:
        
        August 19, 2019 at 23:29
        
        >But they are all ARM9.
        
        That means the are probably on pre-DT 3.x kernels and makes getting them running with a mainline kernel a massive pain. Cortex A means if you have a working u-boot you can probably get from no kernel support to booting a buildroot initramfs in about 3 changes (a new arm/mach- dir and skeleton machine file, a device tree with the memory, gic and arch timer and some config options to enable a debug uart) to the latest upstream kernel.
      2. Jon Smirl says:
        
        August 20, 2019 at 04:00
        
        They are all Linux 3.4 and all of the h.,264/265 support is closed source so you can’t switch kernels. On the other hand, a smart speaker doesn’t care that much about the kernel. You do have the source to the 3.4 kernel, just not all of their modules. So you can keep applying patches to it. 3.4 has received a lot of patches due to it being used in Android 4.2 (?)
        
        If you don’t care about the camera and are a glutton for punishment, there is no reason you couldn’t bring these chips up to mainline.
      3. dgp says:
        
        August 20, 2019 at 14:22
        
        >So you can keep applying patches to it. 3.4
        
        I think 3.4 has been dead for a while as almost no one uses Android 4 anymore. If you don’t mind working with something that out of date and are never going to put in on a network I guess you can get away with it but I think you’d be setting yourself up for a lot of pain in the long run.
        
        >If you don’t care about the camera and are a glutton for punishment,
        >there is no reason you couldn’t bring these chips up to mainline.
        
        I did that for the $2.5 chip above. Cortex A has enough standardised that it’s actually possible to do without destroying your sanity.
    2. Jon Smirl says:
      
      August 19, 2019 at 23:05
      
      That $3 number is from single source. That ‘s not today’s spot price, more like commit to 1M chips spread over a year kind of price.
      
      Reply
gamiee says:

August 19, 2019 at 15:35

Hmm, the SoC have old AW logo, kinda strange

Reply
1. Jon Smirl says:
  
  August 19, 2019 at 20:49
  
  It is brand new chip. These may be sample units not produced on the high volume production lines.
  
  Reply
geokon says:

August 19, 2019 at 16:11

This is a bit tangential, but what’s a good option for getting a microphone array to play around with on the computer? I’m struggling to find good info and I thought someone might know a thing or two here

I saw the ReSpeaker-Mic-Array-v2-0 which is sorta what I want, but it’s a bit pricey but probably mostly do to the on-board DSP stuff which I don’t need. I’m actually more interested in trying my hand at implementing the DOA-type algorithms on the computer side myself instead of having it all done for me on-chip.

But even if I get a ReSpeaker and use it as a 4 mic sound card, I’m unclear as to how to get the microphone lines in-sync. For instance the ReSpeaker wiki shows recording from all 4 speakers at once in Audacity – but in code, how would I make sure the sound I’m reading off is lined up timewise? From what little sound programming I’ve done it seems you’re usually just pulling bytes off of some sound buffer and there aren’t exactly time stamps to go with it… (On a DSP ofcourse this is a nonissue!)

I could have some reference tone or impulse for calibrating the lines time-wise, but it’d be nice to avoid that.

If anyone has some pointers, let me know 🙂

Reply
1. Diego says:
  
  August 19, 2019 at 19:55
  
  I’d expect all ADCs to run of the same clock as a bare minimum.
  
  Reply
2. Jon Smirl says:
  
  August 19, 2019 at 20:47
  
  The absolute cheapest way is to buy a Playstation Eye. $7 and they are USB. 4 mic channels and ALSA driver already in Linux mainline.
  https://blog.michaelamerz.com/wordpress/trying-respeaker-mic-array-v2-0/
  
  Don’t worry about syncing. The syncing is done in hardware. When you read from the ALSA device you will get four samples at a time (one from each mic), those samples were simultaneously captured by the hardware and then serialized later for you to read them.
  
  Reply
  1. geokon says:
    
    August 19, 2019 at 21:00
    
    Thanks so much for the pointers Jon. The price isn’t a huge turn off, but it’s good to know my options. I think the Play station eye not being omnidirectional will needlessly complicate things for me at the moment. I will look into ALSA. I had done some stuff through Java’s sound API and it may has obfuscated the 4 samples-at-a-time feature b/c of it’s own API
    
    Reply
    1. Jon Smirl says:
      
      August 19, 2019 at 21:26
      
      You can plug the Playstation Eye into your desktop to allow for easy software development.
      
      Also, PulseAudio supports microphone arrays.
      https://arunraghavan.net/2016/06/beamforming-in-pulseaudio/
      
      Reply
      1. geokon says:
        
        August 20, 2019 at 09:24
        
        What’s funny is that I have the same exact webcam and I never realized it had two microphones 🙂
        I just confirmed I get both mics in Audacity (though I needed a reboot after plugging it in). Looks like I’ve got enough to get started. Thanks again for the links and info. It’s been very helpful for getting started with this
      2. Stuart Naylor says:
        
        April 19, 2020 at 05:45
        
        Gone this route also beamforming doesn’t work and is already dropped from webrtc and will be the same on the next update of pulse audio.
        
        I am not trolling you John but jusy went through all this and honestly I have read the same and they are bum steers. 🙂
      3. Jon Smirl says:
        
        April 19, 2020 at 06:59
        
        Beamforming was dropped from google’s webrtc because no one has array mics in their web browsers. AFAIK libwebrtc-audio-processing is going to keep the beamforming code. Pulse audio uses libwebrtc-audio-processing.
        https://freedesktop.org/software/pulseaudio/webrtc-audio-processing/
        There are no updates to libwebrtc-audio-processing. from after when beamforming was removed from google webrtc.
        
        It is not a huge amount of code.
    2. Jean-Luc Aufranc (CNXSoft) says:
      
      August 20, 2019 at 10:12
      
      There’s also a discussion about PS3 eye and Kinect (See comments) @ https://www.cnx-software.com/2017/08/31/those-charts-show-the-benefit-of-microphone-arrays-for-hot-word-detection/
      
      Reply
      1. geokon says:
        
        August 20, 2019 at 15:20
        
        Oh my. The rabbit hole keeps going deeper. I assume that that’s the Kinect V1.0. They’re a bit harder to get on Taobao but I did find this interesting PDF about beamforming in Java with the Kinect
        
        https://fivedots.coe.psu.ac.th/~ad/kinect/ch15/kinectMike.pdf
        
        Some details are a bit vague. He can only use two mics at a time (may be a Java issue) He spend a lot of time setting up the Java Sound API but then when he gets to beam-forming he drops it entirely and uses a library (that I think works with the Kinect SDK)
        
        The mics are also rather weirdly spaced on the Kinect.. so there are some unknowns and room for complications
        
        I think working directly with ALSA like Jon suggested is probably the more no-nonsense route. And his direct testimonial about the PS3 Eye leaves me with a bit more confidence 😛
        
        And umm.. any pointers for deciphering what Drone said: ” mic polling can be surprisingly effective, sometimes easily outperforming complex synthetic mic arrays.” ?
        
        From what I understand polling is about how you get the data from the system and synthetic mic arrays is a postprocessing thing – but I’m prolly misunderstanding something
      2. Jon Smirl says:
        
        August 20, 2019 at 22:24
        
        I’d get something working first and then worry about the PhD research stuff. You will need hot word detection – here is a free open source version.
        https://github.com/nyumaya/nyumaya_audio_recognition
      3. geokon says:
        
        August 21, 2019 at 17:39
        
        Thanks for that but I’ve actually got a few small projects in mind that aren’t ML related. More in the art/visualization space and point to point data over audio stuff. Nothing smart-speaker related! 🙂
      4. Jon Smirl says:
        
        August 21, 2019 at 20:06
        
        Example source for data over audio
        https://github.com/voice-engine/hey-wifi
      5. Jean-Luc Aufranc (CNXSoft) says:
        
        August 22, 2019 at 09:56
        
        Another option is Chirp which partnered with Arduino recently: https://www.cnx-software.com/2019/08/13/arduino-partners-with-chirp-to-enable-data-over-sound-m2m-connectivity/
        
        But it’s not open source.
      6. geokon says:
        
        August 22, 2019 at 13:57
        
        The library they use is very interesting: https://github.com/quiet/quiet
        As well as it’s dependencies liquid-dsp and libfec
        
        I’ll need to dig into this
      7. Jean-Luc Aufranc (CNXSoft) says:
        
        September 6, 2019 at 12:03
        
        Have you had time to look into it? I’ve asked somebody to test hey-wifi on Raspberry Pi, and he’s struggling to make it work.
  2. Stuart Naylor says:
    
    April 19, 2020 at 05:42
    
    PS3eye often gets recommended but the drivers in linux don’t work 100%. You get alsactl errors simple uitils like alsamixer don’t work.
    At one stage it did and then the changed the multuchannel IEC setups in linux and its not that great.
    Even the sample rate it states is all over the place.
    
    The best prob for the pi is the respeaker 2mic as then capture/playback are on the same clock and you might have a chance of software AEC.
    
    Without AEC if you have any media playing you go into forrest mode and are unable to tell it to stop or at least it will not hear you as its full of echo.
    
    PS3eye for a media playing AI is a piece of junk and where the last one somone advised me now resides.
    
    Reply
    1. Jon Smirl says:
      
      April 19, 2020 at 07:08
      
      PS3eye is only recommend because it is dirt cheap and has open source support. You could fix up the drivers and send a patch into the kernel if you want. It is not a complicated piece of hardware, shouldn’t be very hard to fix the drivers.
      
      The respeaker stuff works, but all of the interesting bits (beamforming, aec, agc) are closed source. That’s fine if you want a black box solution.
      
      Many ARM CPUs support PDM mics. Like this one:
      https://www.adafruit.com/product/3492
      Several times I have wired up four of these to the PDM inputs on RK1808, V5, H6, etc… It is very easy to do with common jumper wires. I solder the wires into the PDM boards and then plug the other ends into the dev board connectors.
      
      Reply
MisterTechBlog says:

August 21, 2019 at 10:14

Really cool gadget! (Like an upgraded Amazon Echo). BTW the official name seems to be “TMall Genie”, says so on the front of the gadget (in recessed text). I did a quick translation of the fun features… https://docs.google.com/presentation/d/1K1GtkdrRbwVOaXWCqo0TiFOKYjn1-CD3gy7PCjFdy64/edit?usp=sharing

Reply
Jean-Luc Aufranc (CNXSoft) says:

December 16, 2019 at 14:12

Somebody tried Tmall Elf Sugar and shot a video. It’s in Chinese, but turning on caption translation turns out mostly OK, once the “cat sperm” is out of the way…
https://www.youtube.com/watch?v=8Dcrb_ODwA8

Reply

Boardcon CM3588 Rockchip RK3588 System-on-Module designed for AI and IoT applications

43 Replies to “Allwinner R328 Smart Speaker & System-on-Module Spotted in China”

Leave a Reply Cancel reply

Leave a Reply