GreenWaves GAP8 is a Low Power RISC-V IoT Processor Optimized for Artificial Intelligence Applications

GreenWaves Technologies, a fabless semiconductor startup based in Grenoble, France, has designed GAP8 IoT application processor based on RISC-V architecture, and optimized for image and audio algorithms including convolutional neural network (CNN) inference with high energy efficiency thanks to an 8-core computational cluster combined with a convolution hardware accelerator. The design is based on RISC-V based Parallel Ultra Low Power (PULP) computing open-source platform.

The new processor targets industrial and consumer products integrating artificial intelligence, and advanced classification such as image recognition, counting people and objects, machine health monitoring, home security, speech recognition, consumer robotics, wearables and smart toys.

Some of GAP8 processor specifications:

  • 1x extended RISC-V fabric controller core with 16 kB data and 4 kB instruction cache for system control
  • 8x extended RISC-V compute cores with 64 kB shared data memory and 16 kB shared instruction cache
  • 1x Hardware optimized synchronization unit
  • 1x Hardware Convolution Engine (HWCE)
  • Multi channel 1D/2D DMA, specialized multi-channel micro DMA for autonomous peripheral support
  • Programmable Voltage Regulator
  • Real Time Clock
  • 2x programmable clocks
  • Secured execution support with Memory Protection Unit
  • 512 kB State Retentive L2 Memory
  • Optional external high speed low power SDRAM up to 16 MB, through HyperBus
  • 32 kHz external quartz, Up to 250 MHz internal clock
  • I/O interfaces
    • 128 Mb/s LVDS IEEE compliant
    • Serial I/Q
    • UART
    • Quad SPI Master + additional SPI Master, SPI Slave
    • 1x I2S
    • 1x I2C
    • 1x Camera parallel interface
    • HyperBus (External Flash and RAM)
    • Up to 32 GPIOs
    • 4x PWM
  • Supply Voltage
    • 1.2 V down to 1V core VDD supply
    • 1.8 V to 3.3 V for I/Os
  • aQFN 84 package

The processor is capable of delivering up to 8 GOPS at a few tens of mW, or up to 200 MOPS at 1 mW thanks to partially a cycle 5×5 convolution. The company compared the (theoretical) performance differences between GP8 to STM32H7 (Cortex M7) MCU for a CNN graph, and we can clearly see the massive advantage the new processor has for that particular task.

Target Clock Time Cycles Active Power
STM32H7 216 MHz 99.1 ms 21 405 600 60 mW (STM32H7)
GAP8 15.4 Mhz 99.1 ms 1 527 232 3.7 mW
GAP8 175 Mhz 8.7 ms 1 527 232 70 mW

If GAP8 is configured to run at 15.4 MHz it can complete the task as fast as STM32 F7, but using only a fraction of the power, or run the task over 10 times faster when clocked at 175 MHz with a only slightly higher active power. Another way to look at power consumption, is the company’s claim that the processor can classify a QVGA image every three minutes for 10 years on a small 3.6 Wh battery.

Some typical use cases include:

  • Always-on face detection with a few mWs of power
  • Indoor people counting / presence detection with years of autonomy
  • Sub $15 machine vision and voice control solutions for consumer robotics
  • Single-chip processing for 4 microphone voice capture and 10-word speaker-independent keyword spotting

The company also offers a development kit comprised of GAPDUINO board, a sensor board, and a QVGA camera module. Besides GAP8 IoT processor, the Arduino compatible board features the following:

  • Memory / Storage – 256Mbits SPI flash, I2C EEPROM, HyperBus combo DRAM/Flash 512Mbits Flash + 64Mbits DRAM
  • Camera connector for an external camera (e.g. Himax HM01B0)
  • USB port
  • USB to GAP8 JTAG + UART
  • Misc – Reset button, Configurable I/O voltage
  • Battery holder (SAF17500), DC connector
  • Arduino Uno compatible Master/Shield

GAP8 can be programmed like any MCU thanks to GAP8 SDK including:

  • The RISC-V GCC/GDB toolchain with extensions to the optimizer for the extra instructions that we have added to GAP8
  • The MCU/Fabric Controller side tools include 2 OS choices (this list will be extended in the future): PULP OS, or Arm Mbed OS (for RISC-V/GAP8)
  • Cluster side development tools – GAP8 AutoTiler to generate C code to automate the movement of data between L2 or external memory.
  • Code generators for the cluster – GAP8 Generator Library including different algorithms developed using the GAP8 AutoTiler. It includes CNN layers, FFT, Matrix Operations, FIR Filters, and more.

You can find more details about the GAP8 processor, and/or pre-order the development kit (199 Euros) scheduled to ship in April 2018 on the GreenWaves website. The company is also attending Embedded World in Germany at the RISC-V Foundation booth (Hall 3A, Booth 3A-419).

Thanks to TLS for the tip.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

Radxa Orion O6 Armv9 mini-ITX motherboard
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
12 Comments
oldest
newest
jacky
jacky
6 years ago

Does the compute core support risc-v vector extension?

blu
blu
6 years ago

@jacky
Not according to this earlier article: https://www.cnx-software.com/2016/04/06/pulpino-open-source-risc-v-mcu-is-designed-for-iot-and-wearables/

BTW, has the vector extension been finalized already?

Philipp Blum
Philipp Blum
6 years ago

I mean I like RISC-V. We need more open hardware. But the chip is still kind of expensive. There are cheaper ARM M3 and M0 Chips out there. I hope we will see a high price drop in the future. Maybe you can get a huge discount when you buy 10.000.

aware
aware
6 years ago

Wow. Philipp completely misses the point of this 8.1 core shared memory MCU…

geokon
6 years ago

Comparing to an STM32H7 seems a bit apples to oranges, no? Wouldn’t their target applications be more suited for some kind of CPU/GPGPU combination? Or is this filling in the gaps that GPU’s can’t handle?

At the end of the day, this seems incredibly tricky to program, and relying on their custom libraries seems a bit scary b/c good matrix/fft libaries are incredibly tricky to write. Even a big player like ARM can’t provide decent FFT libraries for their NEON instruction set. Maybe OpenCL-on-CPU is viable? Though I haven’t tried, I hear that’s okay-ish

geokon
6 years ago


Gotcha, that makes sense. Thanks for the explanation. I figured there were low power GPU-like solutions

Martin Croome
Martin Croome
6 years ago

@jacky

Hi Jacky, it does not support the ‘official’ vector extension which I believe has not been standardized yet. I might be wrong on this.

The direction of the official vector extension is very much towards HPC and we are very much oriented towards low energy so we/PULP designed our own. The vector extension works on both 8 bit and 16 bit fixed point operands and includes an extremely useful single cycle vector dot product with accumulate.

Hope this helps

Martin Croome
Martin Croome
6 years ago

@geokon Hi. We compared to to the H7 since Arm was publishing benchmarks on the M7 targeting exactly the same market as us. I’m not aware of any GPU that runs at the energy levels that we do. GAP8’s fabric controller/MCU core is pretty much as easy to program as any MCU on the market. You are correct that the 8 core cluster is a more difficult engine to program but it follows a pretty classical OpenMP type programming model. We are releasing some tools and pre-made examples to help with understanding the cluster. The SDK comes with open source… Read more »

blu
blu
6 years ago

@Martin Croome
The M7 comparison you’ve carried is formidable. Do you have other power measurements as well (not necessarily against competitors) — FFT, FIR?

Martin Croome
Martin Croome
6 years ago

@blu
Yes. On FFT we have sum stuff and we are working on keyword spotting performance as well MFCC -> DNN.

We will be publishing figures on these as blogs on our site over the next few weeks.

Martin Croome
Martin Croome
6 years ago

some not sum 🙁

Boardcon EM3562 Rockchip RK3562 SBC with 8 analog camera inputs