Practical Applications and Benchmarks of GPU Computing via RenderScript and OpenCL with ARM Mali-T6XX GPU

Since the announcement of ARM Mali-T604 in 2010, ARM has explained that GPGPU (General Purpose computing on GPU), aka GPU Compute, would be one of the key features of their new Mali graphics processor, and the company now expects GPGPU to become mainstream in embedded and mobile devices in 2014 and beyond. I’ve just come across a presentation by Roberto Mijat, technical marketing manager at ARM, entitled “Unleashing the benefits of GPU Computing with ARM Mali” which shows practical applications and use cases where the use of RenderScript, or OpenCL can make massive performance improvements, at much lower power consumption, over the same parallel tasks processed by the CPU only. Let’s have a look at some of the most interesting slides.

GPU_Compute_Use_CasesGPU compute can be used for multiple applications in mobile, multimedia, and automotive sectors.

GPU Compute for H.265 / HEVC

HEVC aka H.265 is the next generation codec providing twice the bandwidth with the same quality compared to H.264. The problem is that most SoCs today don’t have VPUs supporting this new standard, and the CPU are not quite powerful enough for 1080p decoding, and software decoding via CPU will require a lot of energy, and quickly drain battery.

HEVC
HEVC Processing Blocks

Luckily many of the tasks for HEVC decoding require parallel data processing, and these can be partially offloaded from the CPU to the newer GPUs supporting OpenCL or RenderScript. Several companies, including Ittiam, have then developed HEVC implementations leveraging the GPU in ARM SoCs with very good results.

HEVC_GPU_CPU_FPS_EnergyCPU usage has been reduced by 50%, the frame rate doubled, and energy consumption been reduced by 20 to 30%.

GPU Compute for Image and Video Processing

Nvidia already touted the GPU compute capabilities of the Tegra 4 for computational photography, and in the ARM slides, we can see some order of magnitudes improvement over CPU processing.

HDR_GPU_Compute_ArndaleHigh Dynamic Range (HDR) imaging is technique taking two shots (foreground/background) to generate a better image. This is computationally intensive, and GPU compute (OpenGL) can provide a speed of about 16x over a CPU only implementation in an Arndale board with Mali-T604 GPU.

GPU_Compute_Image_Processing_BenchmarkOther image processing algorithms are also greatly sped-up, between 3.5x to 15.7x, as shown in table on the right. This time the tests where performed on Nexus 10 tablet (Exynos 5250 with Mali-T604) in Android using RenderScript with software implemented by MuticoreWare.

GPGPU can also be used for Super-resolution techniques aiming to increase resolution of imaging systems, as well as video pre- and post-processing, leading to performance improvements of at least 3x, and a power consumption reduced by up to 80%.

GPU Compute for Computer Vision

Computer Vision entails the acquisition, processing, analysis and understanding of sensor data (images), in
order to derive information to enable decisions to be made. It seems particularly suited to GPU compute, as the face detection algorithm, (OpenCV) accelerated with OpenCL is able to achieve 8.7x more detection per seconds, and consume 83% less energy, both on average, compared to the CPU only implementation.

GPU_Compute_Computer_Vision

If you’re a developer and have an application that may leverage the GPU compute capabilities of the newer Mali GPUs, you may want to have a look at Mali OpenCL SDK (Linux) and/or visit Android Developer’s RenderScript page.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK 5 ITX RK3588 mini-ITX motherboard
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
1 Comment
oldest
newest
Boardcon Rockchip and Allwinner SoM and SBC products