Intel AVX10 SIMD instructions will succeed AVX-512 instructions with AVX10.2 adding support for Intel E-cores to bring multimedia and AI acceleration to low-power cores, while the earlier AVX10.1 will add version-base enumeration and make 512-bit instructions optional, but still only work on Intel (Xeon) P-cores.
The new Intel Advanced Vector Extensions 10 (AVX10) architecture was unveiled in an update to the Advanced Performance Extensions (Intel APX) bringing AVR-512-like support to new hybrid processors with P-cores and E-cores, as well as potentially their entry-level versions with E-cores only.
Intel explains the new SIMD architecture includes all the capabilities and features of the Intel AVX-512 ISA, both for processors that feature 256-bit maximum vector register sizes, as well as for processors that feature 512-bit vector registers. The AVX10 ISA adds a new version-based enumeration scheme that reduces the number of CPUID feature flags needing to be checked for feature support. As mentioned in the introduction the AVX10 works on both P-core and E-core-based processors, while AVX-512 could only be implemented on Performance cores.
As stated in the technical paper accompanying the announcement, Intel decided to work on the new SIMD ISA for three main reasons:
- To continue to support a high performance, vector ISA with all the features of the existing Intel AVX-512 ISA.
- To create a converged vector ISA based on Intel AVX-512 that will be supported on all future Intel processors.
- To ease the developer task of verifying CPUID feature support.
If I understand it correctly AVX10 will be implemented on all future Intel processors, and it might not be optional like AVX-512 ISA, but this will have to be confirmed.
Intel further clarifies the converged version of the Intel AVX10 vector ISA will include Intel AVX-512 vector instructions with an AVX512VL feature flag, a maximum vector register length of 256 bits, as well as eight 32-bit mask registers and new versions of 256-bit instructions supporting embedded rounding. The optional 512-bit vector use is possible on supporting P-cores, but not on E-cores, and future Core-based Xeon processors will continue to support all Intel AVX-512 instructions for backward compatibility.
The new vector ISA will also bring some performance benefits:
- Intel AVX2-compiled applications, re-compiled to Intel AVX10, should realize performance gains without the need for additional software tuning.
- Intel AVX2 applications sensitive to vector register pressure will gain the most performance due to the 16 additional vector registers and new instructions.
- Highly-threaded vectorizable applications are likely to achieve higher aggregate throughput when running on E-core-based Intel Xeon processors or on Intel products with performance hybrid architecture.
Existing Intel AVX-512 applications, many of them already using maximum 256-bit vectors, should see the same performance when compiled to Intel AVX10/256 at iso-vector length, while applications that can leverage greater vector lengths will be supported on Intel P-cores only.
Intel says that version 1 of the AVX10 vector ISA (AVX10.1) will first be implemented on Intel Xeon “Granite Rapids” processors that, according to some media reports, are expected to launch by 2024 or 2025, so it will likely take a long while before AVX10.2 is implemented on processors with E-cores.
Via Tom’s Hardware
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress
What Intel says exactly:
“A “converged” version of Intel AVX10 with maximum vector lengths of 256 bits and 32-bit opmask registers will be supported across all Intel processors, while 512-bit vector registers and 64-bit opmasks will continue to be supported on some P-core processors.”
This means that, starting in 2025, all new Intel CPUs will support a 256-bit subset of AVX-512 and that AVX-512 has been rebranded as AVX10 (because most CPUs will no longer support 512-bit registers and instructions).
Only some CPUs, i.e. the server CPUs that contain only P-cores, will support the full AVX-512.
This is still great news, because the 512-bit length is just a minor feature of AVX-512, what matters are the mask registers and many useful instructions that are missing in AVX.
Intel had to do this, otherwise their policy of using E-cores in most products would have made them completely non-competitive with AMD.
It looks complicated at first but in fact I think this is because the current situation is described and is already quite complicated. With a bit of hope the future will be a bit simpler and it’s very possible that new applications will simply ignore the old flags and will consider that anything before avx10 is legacy and not supported, period.
Also, something I would like to see in AVX-whatever version is explicit support for atomics, particularly compare-and-swap. These act at the cache line level anyway, and it would be so much better to be able to pre-load a cache line into an AVX512 register, modify some parts in it and try to commit it! It would become hexadeci-CAS 🙂