Ambarella CV75S AI SoC brings Vision Language Models (VLM) and Vision Transformer Networks to cameras

Ambarella has been expanding its AI SoC portfolio, and the latest addition is the CV75S family of 5nm chips. The company claims this family introduces the most cost- and power-efficient SoC option for running the latest AI-based image processing like vision language models (VLMs) and vision transformer networks in security, robotics, conferencing, and sports cameras.

The CV75S family is the first in Ambarella’s lineup to integrate the latest CVflow 3.0 AI engine, which results in 3 times the performance compared to the former generation. CVflow 3.0 is a chip architecture designed based on a deep understanding of the core computer vision algorithms. It features a dedicated vision-processing engine that Ambarella has programmed using a high-level algorithm description and works with Tensorflow, Caffe, and PyTorch. This engine enables the SoC to perform trillions of operations each second at a fraction of the power consumption of leading GPUs and general-purpose CPU solutions.

These chips also feature the newest generation of the company’s image signal processor, two 1.6 GHz Arm Cortex-A76 cores, 4Kp30 H264/5 video encoding, and USB 3.2 connectivity.

According to Ambarella’s VP of marketing and business development, this new family of SoCs will enable mass-market product designers to integrate the latest vision transformer technologies and VLMs in zero-shot image classification and multi-modal inferencing for real-time visual analytics. For instance, the CV75S chip can run a multi-modal VLM like CLIP (Contrastive Language-Image Pre-training) in the camera to scan footage in real-time and provide instantaneous results without requiring training before installation.

Ambarella introduced the N1 SoC series in January this year, and these chips differ from the new CV75S installment in the AI operation models. The company pre-ports and optimizes the N1 chips to run LLMs (generative AI) and LLava models for multi-modal vision analysis, which come trained and fine-tuned to analyze multiple video streams (up to 32 cameras) like in video surveillance solutions. On the other hand, the CV75S will run pre-trained and fine-tuned multi-modal VLMs and vision transformer networks inside cameras to identify things like scenes and objects from the camera feed in real-time. These AI models are handy for autonomy applications in robots, drones, and cars. Pre-porting in both chips helps to reduce the customer’s time to market.

Object Detection and Identification in an Autonomous Car

Like with other Ambarella AI systems, the CV75S is supported by the Cooper Developer Platform, which provides a flexible and modular developer environment comprising the core, foundation, vision, and UX to accelerate the time to market.

The CV75S is currently in sampling, and Ambarella plans to introduce this advanced AI-based image processing technology to cameras in a broad range of price points to suit different applications. I could not find any product page for the CV75S family of 5nm SoCs at the time of writing, so the only information I have is from the press release. But what I’m certain about is the power efficiency and performance of 5nm chips, which will make these cutting-edge image processing solutions feasible in a wide range of cost and power-constrained cameras for different applications.

Thanks to TLS for the tip

Dennis Mwihia

Dennis Mwihia is a technical writer specializing in IoT, PCBs, SBCs, and single-board microcontrollers. He has worked with several companies in those areas and has over 5 years of research, writing, and software development experience.