Arm has just announced its 2020 Arm Mobile IP portfolio with no less than five IP blocks including Arm Cortex-A78 CPU, Arm Mali-G78 and G68 GPUs, Arm Ethos-N78 neural processing unit, and the custom Cortex-X program starting with Cortex-X1, the most powerful Arm core to date.
Arm Cortex-A78 CPU
Cortex-A78 highlights:
- Architecture – Armv8-A (Harvard)
- Extensions – Armv8.1, Armv8.2, Cryptography, and RAS; Armv8.3 (LDAPR instructions only)
- ISA support – A64, A32, and T32 (at EL0 only)
- Microarchitecture
- Pipeline – Out of order
- Superscalar
- Neon / Floating Point Unit
- Optional cryptography Unit
- Max number of CPUs in cluster – 4
- Physical Addressing (PA) – 40-bit
- Memory system and external interfaces
- 32KB to 64KB L1 I-Cache / D-Cache
- 256KB to 512KB L2 Cache
- Optional 512KB to 4MB L3 Cache
- ECC and LPAE support
- Trustzone security
Cortex-A78 delivers 20% extra performance compared to Cortex-A77 at the same power budget (one Watt), but peak performance is about 7% faster, and machine learning performance is basically the same. So the real benefit of Cortex-A78 is higher efficiency which should lead to either a more constant performance or longer battery life with Cortex-A78 consuming 50% less than Cortex-A77 at the same performance.
We’ll provide more data and charts below in our comparison with Cortex-X1 processor.
More details about Cortex-A78 can be found on the product page, announcement post, and developer website.
Arm Mali-G78 GPU
As usual, Arm will announce accompanying GPU, Display Processor, and NPU with their lastest Arm Cortex-A core, this time with Mali-G78 GPU with the following key features and specifications:
- Architecture – Second-generation Valhall architecture
- Number of Cores – 7 to 24 cores
- API support – OpenGL ES 1.1, 2.0, 3.1, 3.2; Vulkan 1.1, 1.2; OpenCL 1.1, 1.2, 2.0 Full profile
- AMBA 4 ACE, ACE-LITE, and AXI bus interface
- Configurable 512KB – 2MB L2 cache
- 4x/8x/16x MSAA Anti-aliasing
- Adaptive Scalable Texture Compression (ASTC) – Low Dynamic Range (LDR) and High Dynamic Range (HDR).
- Arm Frame Buffer Compression (AFBC) v1.3
Compared to Mali-G77, Mali-G78 is said to provide a GPU performance boost of up to 25% and also improve on-device ML capabilities by up to 15%. Mali-G78 is also more efficient and a new Fused Multiply-Add (FMA) unit in the execution engine leading to a 30% energy reduction in the unit. The new Asynchronous Top Level feature, together with tiler and fragment dependency tracking improvements, plays a key role in increasing the performance of PC-like games such as Fortnite and PUBG.
There’s also a new “sub-premium” Mali-G68 GPU with many of the same features as Mali-G78 but limited to 6 cores for lower costs and power consumption.
More information can be found on the blog post, product page, and developer’s website.
Ethos-N78 NPU
The Ethos-N78 NPU supports up to 90 different configurations with performance ranging from 1 TOPS to 10 TOPS, and customizable area (inferences/s/mm2), throughput (inferences/s) and average DRAM bandwidth (GB/s). The new NPU also delivers up to twice the peak performance of Ethos N77, 25% better performance efficiency, and 40% greater DRAM bandwidth efficiency.
Visit the product page, and/or read Arm blog post for more details.
Arm Cortex-X1 and Arm Cortex-X Custom Program
Usually, Arm would stop here with their new IP announcements, but this year is a little different, as the company has also introduced the Cortex-X Custom (CXC) program where partners can work in collaboration with Arm engineers to design a CPU closely meeting their specific requirements and go beyond Cortex-A performance, power, and area (PPA).
The first CPU part of the CXC program is called the Arm Cortex-X1 CPU. It brings 30 percent peak performance improvements over Arm Cortex-A77 CPU and 22% over the just-announced Cortex-A78 core. The Cortex-X1 also delivers twice the machine learning (ML) performance compared to Cortex-A77.
It’s also possible to make full use of DynamIQ technology by combining one Cortex-X1 core with Cortex-A78 and Cortex-A55 cores to bring a specific boost to single-core performance (+30 percent) at the cost of a larger cluster area due to the more powerful core and larger 8MB L3 cache.
There’s no product page nor developer’s info for Cortex-X1 just yet, so it may take a bit longer to come to market. You can still read more about it on Arm community’s blog post.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress
Marketing at work. Scream 20% higher performance for a given power envelope, but put in the fineprint the fact your comparison was between 2.6 GHZ in 7FF and 3.0 GHz in 5FF. Or celebrate +100% ML performance by merely doubling the cache. Who could have expected such great gains from such a sliver of parameter increase?
Yeah definitely beware of marketing 🙂 But in the last few years, Anandtech reviews have shown that ARM claims were rather accurate.
The +30% over Cortex-A77 for Cortex-X1 looks much more interesting. I hope we’ll soon get laptops with that chip.
I noticed that too. Its a 13.3% increase in clock frequency which yields an actual 6.7% performance increase at the same clock rate.
That’s exactly what’s written on an ARM slide: +7% at ISO frequency. And that’s no fine print.
Yet these wwill be multii $100 dollar devices. I wiish arm would work a little with arm media box market to promote standardisation of linux drivers for periferals etc. Also since these devices are mains powerd, with no batttery worries, or lack of heatsink, allow more higher clocking. etc A standard 3 or 4 port pci would allow custom use to.
Because Arm does not want to be outside of mobile or highly controllable environments. They want the markets that don’t care about kernels or drivers or peripherals. They’re happy to leave that dying market to Intel. Instead they are focused on disposable devices that don’t require longterm software support or servers that are happy to run LTS kernels for years instead of mainline.
>servers that are happy to run LTS kernels for years instead of mainline.
Maybe the lowend server stuff. People doing high performance/high throughput stuff might actually want to be able to use all of the new stuff happening with eBPF etc without having to backport it.
RHEL 7 is running a 3.10 kernel and RHEL 8 is on 4.18. ARM is targeting companies who use Redhat style distributions, not Ubuntu.
>RHEL 7 is running a 3.10 kernel and RHEL 8 is on 4.18.
Redhat kernels are heavily patched with all sorts of backported stuff so those numbers mean very little.
>ARM is targeting companies who use Redhat style distributions, not Ubuntu.
ARM doesn’t sell hardware. They are selling designs to whatever company thinks they can turn those designs into something that’ll make money.
Maybe there are companies making chips that want RHEL because they somehow think they can beat generic x86 machines in the “stack in high, sell it cheap” bog standard hosting space but I doubt it because we haven’t seen anyone outside of Amazon actually manage it and people have been going on about it for a long time.
IMHO it’s more likely that if you’re thinking of putting a core like this into a design you’re going to couple it with some very specific stuff that RHEL and LTS kernels have no hope in hell supporting.. and you want all the fun stuff that Facebook, Google etc are working on like eBPF, XDP because you aren’t spending all of that money just to serve wordpress blogs with apache or nginx.
So what silicon powers Roku, Amazon Fire tv, nvidia shield tv, Nintendo Switch, iot boards, smart meters, routers, wireless ear buds, smart cars, digital video or camera , TV,’s and Smart watches
>So what silicon powers
Whatever is cheap to license and can be produced on whatever second hand fab has extra capacity that week.
Those products don’t use one core over another because of some quasi-religious love of how it runs poorly written C code that came out of some ancient version of GCC.
The point is arm is more than a one trick smartphone pony.
>The point is arm is more than a one trick smartphone pony.
Well I would hope so considering they are a fabless ip company. But you know licensing their high end cores for smartphones is probably a good chunk of their income these days.
Those all count as “highly controllable” environments. Random people are not trying to run random PCI-E cards they found on eBay for $10 on those. They are locked down in firmware, kernel, and supported accessory interfaces.
This whole industry is broken. Just look at it. We have new chips every year, often without any software support or if, with a linux kernel which is already at least 1 year old. Why should they care when they can sell a new chip in the following year? The problem is that they get rewarded for this behavior, but they should get punished. It should be rewarded to be sustainable.
Would have been nice to see this with one of the newer v8 versions with the new security extensions and so forth.
Why this X1 looks like candidate for Apple new ARM laptop?
I don’t think so yet considering the lower IPC (5 vs 7 or so). However I suspect that ARM wants to send a signal to such makers and gauge their interest. If there is some, they might issue an X2 or so with different performance levels focusing on more aggressive and expensive optimizations (maybe X3/X5/X7 to remind people of atoms or core iX).