Quadric Chimera GPNPU IP combines NPU, DSP, and real-time CPU into one single programmable core

A typical chip for AI or ML inference would include an NPU, a DSP, a real-time CPU, plus some memory, an application processor, an ISP, and a few more IP blocks. Quadric Chimera GPNPU (general purpose neural processor unit) IP combines the NPU, DSP, and real-time CPU into one single programmable core.

According to Quadric, the main benefit of such design is simplifying system-on-chip (SoC) hardware design and subsequent software programming once the chip is available thanks to a unified architecture for machine learning inference as well as pre-and-post processing. Since the core is programmable it should also be future-proof.

Three “QB series” Chimera GPNPU cores are available:

Chimera QB1 – 1 TOPS machine learning, 64 GOPS DSP capability
Chimera QB4 – 4 TOPS ML, 256 GOPS DSP
Chimera QB16 – 16 TOPS ML, 1 TOPS DSP

Quadric says the Chimera cores can be used with any (modern) manufacturing process and can achieve up to 1 GHz operation using 16nm or 7nm processes. Two or more Chimaera cores can be paired together for applications requiring more performance.

The Chimera GPNPU architecture is said to deliver ML inference performance similar to the efficiency of dedicated CNN offload engines, but with full programmability and the ability to run any ML operator, which may be critical if data scientists find better models requiring new operators that cannot run on existing fixed-function accelerators.

Quadric’s architecture combines the neural network graphs and C++ code into a single software code stream and only one toolchain is used for scalar, vector, and matrix computations. Memory bandwidth is also optimized by a single unified compilation stack that also helps lowering power consumption. Custom operators can be added by a software developer by writing a C++ kernel utilizing the Chimera Compute Library (CCL) API, then compiling that kernel using the Chimera SDK. The task of software engineers should be further simplified since they only work with one core and do not need to handle a multi-core heterogenous system.

It may take a while before the solution gets into actual chips, however. While the Chimera architecture has already been tested in silicon, Quadric is looking for customers whose chip design teams can start an IP evaluation this fall or winter, so I assume it may be a year or more before commercial SoCs based on the Quadic Chimera architecture become available. Additional details can be found on the products page and the press release.

Thanks to TLS for the tip.

Jean-Luc Aufranc (CNXSoft)

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.