ARM NEON Tutorial in C and Assembler

The Advanced SIMD extension (aka NEON or “MPE” Media Processing Engine) is a combined 64- and 128-bit single instruction multiple data (SIMD) instruction set that provides standardized acceleration for media and signal processing applications similar to MMX, SSE and 3DNow! extensions found in x86 processors. Doulos has a video tutorial showing how you can exploit NEON instructions in assembler, how to modify your C code and provides the compile options for gcc to enable NEON during the build. Abstract: With the v7-A architecture, ARM has introduced a powerful SIMD implementation called NEON™. NEON is a coprocessor which comes with its own instruction set for vector operations. While NEON instructions could be hand coded in assembler language, ideally we want our compiler to generate them for us. Automatic analysis whether an iterative algorithm can be mapped to parallel vector operations is not trivial not the least because the C language is […]

Faster JPEG decoding on ARM with libjpeg-turbo and NEON Instructions

libjpeg-turbo is based on libjpeg, but uses SIMD instructions (MMX, SSE2, etc.) to accelerate JPEG compression and decompression on x86 targets. On such systems, libjpeg-turbo is generally 2-4x as fast as the original version of libjpeg with the same hardware. ARM does not support MMX or SSE2 instructions, but it has its own SIMD instructions processed by the NEON Engine on ARM Cortex Core A5, A8, A9 and A15. ARM claims that “NEON technology can accelerate multimedia and signal processing algorithms such as video encode/decode, 2D/3D graphics, gaming, audio and speech processing, image processing, telephony, and sound synthesis by at least 3x the performance of ARMv5 and at least 2x the performance of ARMv6 SIMD.” Linaro worked on libjpeg-turbo and added NEON support to it. The code is available on launchpad at https://code.launchpad.net/~tom-gall/linaro/libjpeg-turbo Linaro has also provide benchmark result for libjpeg-turbo with a 12 Mpixel image on TI OMAP4 (Pandaboard) using the […]

UP 7000 x86 SBC