Tomeu Vizoso has been working on an open-source driver for NPU (Neural Processing Unit) found in Rockchip RK3588 SoC in the last couple of months, and the project has nicely progressed with object detection working fine at 30 fps using the SSDLite MobileDet model and just one of the three cores from the AI accelerator.
Many recent processors include AI accelerators that work with closed-source drivers, but we had already seen reverse-engineering works on the Allwinner V831’s NPU a few years ago, and earlier this year, we noted that Tomeu Vizoso released the Etvaniv open-source driver that works on Amlogic A311D’s Vivante NPU. Tomeu has now also started working on porting his Teflon TensorFlow Lite driver to the Rockchip RK3588 NPU which is closely based on NVIDIA’s NVDLA open-source IP.
He started his work in March leveraging the reverse-engineering work already done by Pierre-Hugues Husson and Jasbir Matharu and was quickly able to run TensorFLow Lite’s Conv2D and DepthwiseConv2D operations. Only two weeks later, MobileNetv1 model could run on the Pine64 QuartzPro64 SBC with the same performance level as the blob (closed-source binary).
Work was much easier than on the Verisilicon Vivante NPU because lots of the reverse-engineering work was done, and NVDLA is open-source so at least some documentation was available, which was not the case for the Vivante NPU. Nevertheless, it took only four weeks (not full-time) to have the object detection shown below work on the Rockchip RK3588’s NPU at 30 FPS.
You’ll find the source code for the Teflon project on Freedesktop website, and you can also the status of the project on Tomeu’s blog. Next up, Tomeu plans to write a kernel driver for Linux mainline in the drivers/accel subsystem. There’s still much work to be done and it’s unclear how long it will take, especially since he is working on different NPUs and will split his time between each implementation unless additional contributors join the project(s).
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress
Minor correction: according to Tomeu’s blog post, that 30fps is when running on just one of the three cores!
Oops. I completely misread that part. I updated the post.
I did too at first!
It’s the fist time I eard about 3 cores inside the NPU of the 3588.
More info on the internals of the RK3588 NPU in the my post as I discuss its use for LLMs.
Nicely done! Have you considered collaborating with Tomeu?
Rockchip is also working on its LLM support using its closed-source driver, but it takes a while, so it looks quite challenging to implement… That’s why people are only using the GPU for now.
https://www.cnx-software.com/2024/02/27/testing-ai-and-llm-on-rockchip-rk3588-using-mixtile-blade-3-sbc-with-32gb-ram/
rk llm sdk already released as beta. Performance varies depending on model so need to set your exceptions.
Creepy
How so? Great to see you here Megi! Looking forward to the next update of your xnux blog
Not the work on the accelerator, just the object/face recognition tech in general. 🙂
Ah! Yes…
Exciting! Hardware wise these RK3588 boards are perfect for NVRs. It’s nice to see the software progress toward taking advantage of that on mainline.
It’s just incredible to mesure the amount of work, patience, talent and so on for developing such an open source project. Starting from close to NIL without any reliable doc on a quiet advanced and new topic, this is remarkable.
Excellent work!
Do you know which available cameras work with the RK3588? Last time i checked not a lot were supported by the libcamera framework, and this limits the useability quite a lot.