Arm CPU Roadmap to 2022 – Matterhorn and 64-bit only Makalu CPU Cores

The Arm DevSummit 2020, previously known as Arm TechCon, is taking place virtually this week until Friday 9th, and besides some expected discussions about NVIDIA’s purchase of Arm, there have been some technical announcements, notably a high-performance CPU roadmap for the next two years, which will see Matterhorn (Cortex-A79?) in 2021, and Makalu (Cortex-A80?), the first 64-bit only Arm processor, in 2022.

Arm Roadmap Peak Performance Matterhorn & Makalu

The company did not provide many details about the new cores, but they expected a peak performance uplift of up to 30% from the Cortex-A78 to the future Makalu generations. It should be noted that while performance keeps improving, the curve has flattened a bit.

But the main announcement is that starting from 2022, all high-end Arm CPU cores (i.e. the “big” cores) will be 64-bit. So far, most Cortex-A cores supported both 32-bit (Aarch32) and 64-bit (Aarch64) architecture, and as we noted four years ago, the latter does not only makes it possible to address more memory, but  64-bit instructions can boost performance up 15 to 30% compared to 32-bit instructions.

Arm explains it did the move as complex digital immersion experiences and compute-intensive workloads from future AI, XR (AR and VR), and high-fidelity mobile gaming experiences require 64-bit for improved performance and compute capabilities. The move to 64-bit only will also lower the cost and time-to-market of mobile apps, since developers of app suitable for high-end devices will be able to focus on 64-bit development only.

https://twitter.com/ArmMobile/status/1313864805615370241

Since most phones will likely ship with dynamIQ SoCs combining 64-bit big and 32-bit/64-bit LITTLE Arm cores, “legacy” 32-bit apps should still be able to run on those phones, but only on the LITTLE cores. What’s a little confusing is that Arm talks about “64-bit only mobile devices expected to arrive by 2023”, implying 32-bit app will not be supported at all. We’ll have to wait a little longer to understand the implications of the move.

Share this:
FacebookTwitterHacker NewsSlashdotRedditLinkedInPinterestFlipboardMeWeLineEmailShare

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK 5 ITX RK3588 mini-ITX motherboard

11 Replies to “Arm CPU Roadmap to 2022 – Matterhorn and 64-bit only Makalu CPU Cores”

  1. SBC and TV, TV box manufactures can always work on better cooling to add more GHz to their designs, something GPU manufacturers have more experience in.

    Would help with the lowering performance curve.

    Lower end Phones A53 and A55 are 64 bit instruction sets ?.

    1. All Cortex A7x/A5x are 64-bit, but they also support 32-bit instructions. Makalu will not support 32-bit instructions at all.

      1. Yes I follow that but with lower Cores still around, middle to budget bottom price products that run 32bit and 64bit, could be viable for several years to come.

        Unless Google says that’s it and removes all 32bit apk from Google Android.

        TV, TV box etc

        Is Amazon Fire TV 64bit or 64, 32bit ?

        What Is Roku OS, 32, 64 or 64 bit?

        1. I think that a (painful) model where the small cores support 32-bit but not the large one will appear and last a long time. While most modern OSes have no problem with 64-bit and will trivially switch (after taking 30% code size inflation and 50% data size inflation), other mainstream OSes like Windows are having a much more painful migration path due to being LLP64. To the ones not used to this, it means that a “long” remains 32-bit and cannot hold a pointer anymore once you switch to 64-bit. You have to store it into a long long but then it will break 32-bit ports (hence the more desirable intptr_t). Doing such changes is not only a matter of reviewing your own code, but also of adapting the API of all the libs you depend upon. And these ones cannot change without performing breaking changes, so it’s never the best moment to switch. This explains why so much x86 tools on windows are still 32-bit. And windows is probably not a negligible platform for ARM, so this will for sure require a bit of effort from everyone involved, and the solution consisting in letting old apps run on an reasonably fast A55 could suit most users.

        2. The cut-off is usually the amount of memory: on machines with less than 1GB, one would definitely want to run 32-bit code, while on machines with more than 2GB, one would definitely want to run 64-bit code. In the middle it could be either one, often depending on other factors like existing software.

          This means than any machine with a lot of RAM tends to have a 64-bit core (A35/A53/A55/A7x) out of necessity, while the low end machines tend to run exclusively 32-bit code even if they have 64-bit capable cores. All the current Amazon Fire tablets fall into this category, same for Android Go phones and a lot of TV boxes, presumably including Roku OS and Fire TV (I have no specific information on those).

          From Google’s previous announcements, I would assume that they plan to remove support for mixed 32/64 environments in the future, but I expect pure 32-bit userland on 64-bit CPUs (with either 32-bit or 64-bit kernel) to stay around for much longer on those machines, presumably until 2GB LP-DDR4 becomes so cheap that it’s not worth bothering with less.

          Also, the rules that Google make for their app store don’t necessarily apply to others like F-droid, Amazon, Tencent or Huawei that each get to make their own decisions about what binaries they allow on which platform.

          1. Hi Arnd! I generally agree with your point above, but there are two exceptions: even with less than 1GB you may be interested in 64-bit to get the new instructions that come with it by default (e.g. idiv) or optionally (crypto, crc32, etc). And even with more than 2GB you may be interested in a more compact and more efficient instruction set that often provides higher performance in user land thanks to better L1I cache hits, as long as your application does not need more than 2GB per process.

            That’s where the compatibility more is interesting. I’ve been doing that in my build farm, where the kernel runs in 64-bit and eases memory management, while the compilers are in thumb2 and significantly faster than when built with the armv8 instruction set (typically +20%).

          2. I am guessing there is a good three – five years, low price products life left for 32bit 64bit. People still buy RPI Zero.

          3. Right, I was oversimplifying. When you have a specific application you are optimizing for, there are clearly additional factors and you may end up going the other way, as you describe.

            On the specific example of compilers, I have done some experiments as well with 32-bit binaries on x86. What I found there was that x32 compiler binaries tend to be the fastest by a good margin, but the complexity of maintaining them doesn’t seem worth the effort.
            64-bit compilers appear to suffer mostly from the additional memory and cache usage of larger pointers within the internal data structures, while i386 compilers appeared to be worse at cross-compiling code for 64-bit targets (regardless of target architecture) because of the problems of doing 64-bit integer math in a 32-bit binary. I don’t have those numbers any more, and it’s possible that I’m misremembering things.

          4. You have not misread your numbers, I made the exact same observations. I’m seeing about 30% performance loss when building 64-binaries from a 32-bit compiler, making it less interesting to use armv7 than armv8 to cross-compile x86_64. Like you I concluded that it was the heavy 64-bit calculations on 32-bit that was the culprit. And I also gave up on x32 given that nobody uses it, some issues remain in the configure scripts and it’s a bit of a pain to set it up. Do you know if there’s anything equivalent in the ARM world ? I’ve tried armv8l which in fact seems to be purely 32-bit (eg for cortex a32 likely), and it resulted in similar code size and perf as armv7 (and probably the same code in fact, just aarch32 instruction set).

          5. There is an equivalent of x32 for arm64, but we never merged that into mainline because of the experience with x32 (and mips n32) showing that there just isn’t enough demand compared to work involved in properly supporting an extra ABI in distros.

            The arm64 ilp32 patches for kernel and glibc are cleanly implemented and I believe actually still have a small number of users, compiler support is actually all there, but the current approach of continuing to support aarch32 on the “little” Arm cores while moving the “big” cores to 64-bit only is a better outcome in my opinion.

            Systems that use the big cores are rarely the ones that benefit from 32-bit user space, and removing the old instructions allows a better overall CPU design in terms of performance/power/area (PPA).
            The Cortex-A34 didn’t really benefit that much because it only left out the 32-bit instruction path rather than being designed from scratch for A64.

            It will be interesting to see which direction future little cores take, given that there is clearly a benefit to designing cores primarily for 64-bit, but there is also still demand for 32-bit workloads, both for legacy reasons and for efficiency.

          6. Interesting issues of choosing between 32 and 64 bit mode.

            Back when the i386 was new, I wanted to support both 16-bit and 32-bit code on UNIX. This would allow the majority of programs to be compiled compactly, with only the few that would benefit from being 32-bit compiled for 32-bit. But the 16-bit mode was so inferior that the UNIX distribution (System V Release 4) didn’t want to support 16-bit.

            With x86 and ARM it would be a lot easier to pull this 32 / 64 dual mode off.

            It used to be the case that Sun SPARC kernels were all 64-bit but most SPARC programs were 32-bit. I assumed that this was because the penalty in density was much higher on SPARC than on x86. I vaguely think that this is true for AIX on Power too.

            PS: the 32-bit ARM architecture supports PAE so it can handle more than 4GiB of RAM. However, I infer that many SoCs don’t have enough address pins for more than 4GiB.

Leave a Reply

Your email address will not be published. Required fields are marked *

Boardcon Rockchip and Allwinner SoM and SBC products
Boardcon Rockchip and Allwinner SoM and SBC products