Midgard architecture for Embedded GPUs (Mali-T604 / Mali T658)

I’ve attended a webinar entitled “Harness the power and flexibility of the Midgard architecture for Embedded GPUs” presented Steve Steele, Product Manager at ARM Media Processing Division and sponsored by EETimes.

Steve starts to talk about the current GPU architecture “Utgard” used in Mali-200, Mali-300 and Mali-400MP which allows resolutions up to 1080p and are used in many smartphones today including Samsung Galaxy S2 (Mali-400MP) which provides great graphics performance.

He then explains how mobile devices are used today and what performance we may except in the future:

  • Mobile As Main compute platform:
    • New UI and Augmented Reality
    • Social Networks and emails
    • Content Creation/consumption
    • 1 Device to multiple screen (e.g LCD screen and TV via HDMI)
  • Evolving Processing Demand:
    • Graphics Complexity multiplied by 25
    • Increase in screen size (1080p resolution support).
    • Graphics API: Khronos OpenGL ES, Microsoft DirectX 11
    • Compute API: OpenCL, Renderscript Compute and Direct Compute.
25x in content complexity per pixel and increased screen size
Evolving Processing Demands

After this overview, he starts to talk about Midgard architecture and gives the agenda of this part of the  presentation:

  • Native 64-bit GPU Architecture.
  • Job Manager hardware.
  • Drivers compatibility across Mali GPU.
  • GPU compute.
  • ARM Heterogeneous  compute approach.

Native 64-bit GPU Architecture

He describes the main advantages of the midgard architecture:

  • Balanced and efficiency architecture (power and power trade-off).
  • Multi-threaded architecture that allows hiding of memory latency.
  • ALUs in 64-bit forboth integers and (IEEE 754) floating point.
  • Coherency in GPU memory which is important for GPU compute.
  • Hardware job manager
  • Scalability with future requirements allows drivers compatibility with future GPUs.
  • Desktop capability with 64-bit support (compatible with ARMv8 Architecture) and IEEE-754 Floating Point Double Precision.
  • GPU compute: Up to 1 Terabyte memory allocation, allows real-time image analysis and processing.
  • Integrated CPU/GPU application development for:
    • Augmented reality
    • Advanced users interface features such as physics,  speech recognition and gestures.

He then introduces the 1st midgard GPU: Mali-T604 that enables GPU computing, has great performance, flexibility and graphics. It includes a hardware job manager and implement different techniques to reduce bandwidth such as texture compression. This GPU is available now and is currently used by ARM partners.

The new Mali-T658 is an evolution of the Mali-T604 and supports up to 8 cores (vs. 4 for T604) and 4 arithmetic pipelines (vs. 2 for T604).  Its graphics performance is 10 times greater than Mali-400MP and its compute performance 4 times greater than Mali-T604.

Midgard Job Manager

In the next section, he explains the role of the hardware job manager allowing to manage load balancing and power management of cores by hardware instead of using the driver. This allows to reduce CPU consumption. I’ll skip the details here.

Drivers and Software Compatibility Across GPU

Lower development costs can be achieved as:

  • Midgard architecture can be support standard API to allow software re-used:
    • Graphics APIs: Khronos OpenGL ES and OpenVG, Microsoft DirectX 11
    • Compute APIs: Khronos OpenCL, Google Renderscript and Microsoft DirectCompute.
  • The same drivers can be used for all Midgard GPUs

GPU Compute (aka GPGPU)

GPU compute allows a range of new applications and performance improvements for:

  • Visual Computing
  • Physics Engines
  • Image Processing
  • Augmented Reality
  • Natural Speech Recognition
  • Computational Photography
  • Cryptography
  • 3D Graphics
GPGPU (General Purpose GPU) Computing Applications
GPU Compute Enables Future Applications

ARM Heterogeneous  Computing

ARM Heterogeneous System Design
ARM System Design: big CPU, LITTLE CPU, GPU, Memory and Interconnect

In the final section of the webinar, Steve explains ARM philosophy of using the right processor for the right task and optimizing performance and power efficiency at the system level.

They can achieve this by using ARM Cortex A processors (including big.LITTLE Processing with Cortex A7 and A15), Mali GPUs and Corelink Interconnect. Task will switch between Cortex A7 and Cortex A15 by hardware depending on the work load and the same will be done in the GPU thanks to the job manager. This will be transparent to the drivers.

Cortex A7 core would be used for low power low processing tasks such as video playback, social networks and Cortex A15 cores would be used for high power high processing tasks such as web browsing and augmented reality. The GPU would be used to accelerated 2D and 3D graphics, as well as performing general purpose GPU computing (GPU Compute) where applicable.

The next generation “heterogeneous” processors will use Cortex A15 (big CPU) and Cortex A7 (LITTLE CPU) with Mali-T600 Series GPU and CoreLink CCI-400.

Finally, he mentions ARM ecosystem with an infographic with a list of companies and projects involved in operating systems and standards (e.g. khronos, linaro, …),  gaming (e.g. unity, gameloft, …), middleware and services (e.g. movial, metaio,…), user interfaces (e.g. qt, mentor graphics,…) and semiconductors (e.g Samsung, NXP,…).

Further information is available for developers at http://www.malideveloper.com

The webinar is available on-demand at http://w.on24.com/r.htm?e=376015&s=1&k=00DD161285CBFA4D9D7786FB5D47EC4B and you can aslo download the presentation slides.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

Radxa Orion O6 Armv9 mini-ITX motherboard
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
0 Comments
oldest
newest
Boardcon EM3562 Rockchip RK3562 SBC with 8 analog camera inputs