Compile with ARM Thumb2 to Reduce Memory Footprint and Improve Performance

ARM claims that Thumb-2 instructions (for ARM Cortex cores and all ARMv7 processors) provides performance improvements and code size optimization:

Thumb-2 technology is the instruction set underlying the ARM Cortex architecture which provides enhanced levels of performance, energy efficiency, and code density for a wide range of embedded applications.

For performance optimized code Thumb-2 technology uses 31 percent less memory to reduce system cost, while providing up to 38 percent higher performance than existing high density code, which can be used to prolong battery-life or to enrich the product feature set. Thumb-2 technology is featured in the  processor, and in all ARMv7 architecture-based processors.

Dave Martin (Linaro) has recently posted a message entitled “ARM/Thumb-2 kernel size comparison” on Linaro mailing list:


The results provided by Linaro at not as high as those claimed by ARM, but a 20% code size reduction is still impressive.

If you want to use Thumb2 to compile your applications for Cortex A8/A9 core with GCC,export the following:

export CFLAGS=”-mthumb -march=armv7-a”

You may also add -mtune=cortex-a8 or -mtune=cortex-a9 depending on your core.

Linaro team also ran Coremark, an embedded systems benchmark, with different compilation option including arm, armv7-a, thumb and thumb-2  in January 2011 on an 1 GHz processor featuring a cortex-A9 core.

The best options  for  armv7-a, thumb-2 and thumb-1 and overall:

  • The best is -O3 -funroll-loops -marm -march=armv5te -mtune=cortex-a8
  • The best armv7-a is -O3 -funroll-loops -marm -march=armv7-a -mtune=cortex-a8 at 95.2 % of overall best
  • The best Thumb-2 is -O3 -funroll-loops -mthumb -march=armv7-a -mtune=cortex-a8 at 88.7% of overall best
  • The best Thumb-1 is -O2 -mthumb -march=armv5te -mtune=cortex-a8 at 64.4% of overall best

Thumb-1 code is slower but that should be expect as it focus on code size optimization. Thumb2 code should yield similar or even faster result than armv5 code, but I suppose that’s because they are still optimizing the code / compiler and later on thumb-2 will be faster.

See the full details at Linaro Coremark Run.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

Radxa Orion O6 Armv9 mini-ITX motherboard
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
0 Comments
oldest
newest
Boardcon EM3562 Rockchip RK3562 SBC with 8 analog camera inputs