Arm has just announced its 2020 Arm Mobile IP portfolio with no less than five IP blocks including Arm Cortex-A78 CPU, Arm Mali-G78 and G68 GPUs, Arm Ethos-N78 neural processing unit, and the custom Cortex-X program starting with Cortex-X1, the most powerful Arm core to date.
Arm Cortex-A78 CPU

Cortex-A78 highlights:
- Architecture – Armv8-A (Harvard)
- Extensions – Armv8.1, Armv8.2, Cryptography, and RAS; Armv8.3 (LDAPR instructions only)
- ISA support – A64, A32, and T32 (at EL0 only)
- Microarchitecture
- Pipeline – Out of order
- Superscalar
- Neon / Floating Point Unit
- Optional cryptography Unit
- Max number of CPUs in cluster – 4
- Physical Addressing (PA) – 40-bit
- Memory system and external interfaces
- 32KB to 64KB L1 I-Cache / D-Cache
- 256KB to 512KB L2 Cache
- Optional 512KB to 4MB L3 Cache
- ECC and LPAE support
- Trustzone security
Cortex-A78 delivers 20% extra performance compared to Cortex-A77 at the same power budget (one Watt), but peak performance is about 7% faster, and machine learning performance is basically the same. So the real benefit of Cortex-A78 is higher efficiency which should lead to either a more constant performance or longer battery life with Cortex-A78 consuming 50% less than Cortex-A77 at the same performance.
We’ll provide more data and charts below in our comparison with Cortex-X1 processor.
More details about Cortex-A78 can be found on the product page, announcement post, and developer website.
Arm Mali-G78 GPU

As usual, Arm will announce accompanying GPU, Display Processor, and NPU with their lastest Arm Cortex-A core, this time with Mali-G78 GPU with the following key features and specifications:
- Architecture – Second-generation Valhall architecture
- Number of Cores – 7 to 24 cores
- API support – OpenGL ES 1.1, 2.0, 3.1, 3.2; Vulkan 1.1, 1.2; OpenCL 1.1, 1.2, 2.0 Full profile
- AMBA 4 ACE, ACE-LITE, and AXI bus interface
- Configurable 512KB – 2MB L2 cache
- 4x/8x/16x MSAA Anti-aliasing
- Adaptive Scalable Texture Compression (ASTC) – Low Dynamic Range (LDR) and High Dynamic Range (HDR).
- Arm Frame Buffer Compression (AFBC) v1.3
Compared to Mali-G77, Mali-G78 is said to provide a GPU performance boost of up to 25% and also improve on-device ML capabilities by up to 15%. Mali-G78 is also more efficient and a new Fused Multiply-Add (FMA) unit in the execution engine leading to a 30% energy reduction in the unit. The new Asynchronous Top Level feature, together with tiler and fragment dependency tracking improvements, plays a key role in increasing the performance of PC-like games such as Fortnite and PUBG.
There’s also a new “sub-premium” Mali-G68 GPU with many of the same features as Mali-G78 but limited to 6 cores for lower costs and power consumption.
More information can be found on the blog post, product page, and developer’s website.
Ethos-N78 NPU
The Ethos-N78 NPU supports up to 90 different configurations with performance ranging from 1 TOPS to 10 TOPS, and customizable area (inferences/s/mm2), throughput (inferences/s) and average DRAM bandwidth (GB/s). The new NPU also delivers up to twice the peak performance of Ethos N77, 25% better performance efficiency, and 40% greater DRAM bandwidth efficiency.
Visit the product page, and/or read Arm blog post for more details.
Arm Cortex-X1 and Arm Cortex-X Custom Program
Usually, Arm would stop here with their new IP announcements, but this year is a little different, as the company has also introduced the Cortex-X Custom (CXC) program where partners can work in collaboration with Arm engineers to design a CPU closely meeting their specific requirements and go beyond Cortex-A performance, power, and area (PPA).

The first CPU part of the CXC program is called the Arm Cortex-X1 CPU. It brings 30 percent peak performance improvements over Arm Cortex-A77 CPU and 22% over the just-announced Cortex-A78 core. The Cortex-X1 also delivers twice the machine learning (ML) performance compared to Cortex-A77.
It’s also possible to make full use of DynamIQ technology by combining one Cortex-X1 core with Cortex-A78 and Cortex-A55 cores to bring a specific boost to single-core performance (+30 percent) at the cost of a larger cluster area due to the more powerful core and larger 8MB L3 cache.
There’s no product page nor developer’s info for Cortex-X1 just yet, so it may take a bit longer to come to market. You can still read more about it on Arm community’s blog post.

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress