Perfetto Profiler Now Supports Mali GPU Hardware Counters via Panfrost

Perfetto Mali GPU Profiling

Perfetto is an open-source system profiler, app tracer, and trace analyzer for Linux, Android & Chrome platforms, and user-space apps. The program can already visualize CPU and memory usage, as well as power consumption.  GPU support is more limited with the program only capable of sampling the GPU frequency when the driver outputs that information via ftrace. When Perfetto is also extendable thanks to a Tracing C++ SDK that “allows userspace applications to emit trace events and add more app-specific context to a Perfetto trace”. Collabora made use of the tracing SDK to add support for Mali Midgard GPU performance profiling in gfx-pps project using the Mali GPU hardware counters exposed via Panfrost open-source Mali GPU driver. After following the installation instructions, you’ll be able to run the following executables for tracing and profiling: tracedtracing service. traced_probes OS probes service. perfetto command-line tool for recording traces. producer-gpuproviding the Panfrost data […]

GNOME Renders on Arm Mali-G31 Bifrost GPU with Fully Open Source Code

Panfrost ODROID Go Advance Black Edition

We first wrote about Panfrost open-source Arm Mali GPU driver getting initial support for Mali-G31 Bifrost GPU in late April, when engineers at Collabora managed to run some basic demos. Progress has been fast-paced as the company has now implemented support for all major features of OpenGL ES 2.0 and some features of OpenGL 2.1. That means hardware-based on Arm Mali-G31 GPU such as ODROID Go Advance (used for testing) can run Wayland compositors with zero-copy graphics, including GNOME 3, every scene in glmark2-es2 benchmarks, and some 3D games such as Neverball. All without any binary blobs. The company also claims to support hardware-accelerated video players mpv and Kodi. The way it should work is that while Panfrost driver renders the user interface, Amlogic open-source video decoder developed by BayLibre handles hardware video decoding. All changes are already included in upstream Mesa with no out-of-tree patches required, and Bifrost support […]

Arm Announces Cortex-A78 CPU, Mali-G78 GPU, Ethos-N78 NPU and Custom Cortex-X1 Core

Arm Cortex A78

Arm has just announced its 2020 Arm Mobile IP portfolio with no less than five IP blocks including Arm Cortex-A78 CPU, Arm Mali-G78 and G68 GPUs, Arm Ethos-N78 neural processing unit, and the custom Cortex-X program starting with Cortex-X1, the most powerful Arm core to date. Arm Cortex-A78 CPU Cortex-A78 highlights: Architecture –  Armv8-A (Harvard) Extensions – Armv8.1, Armv8.2, Cryptography, and RAS; Armv8.3 (LDAPR instructions only) ISA support – A64, A32, and T32 (at EL0 only) Microarchitecture Pipeline – Out of order Superscalar Neon / Floating Point Unit Optional cryptography Unit Max number of CPUs in cluster – 4 Physical Addressing (PA) – 40-bit Memory system and external interfaces 32KB to 64KB L1 I-Cache / D-Cache 256KB to 512KB L2 Cache Optional 512KB to 4MB L3 Cache ECC and LPAE support Trustzone security Cortex-A78 delivers 20% extra performance compared to Cortex-A77 at the same power budget (one Watt), but peak […]

Panfrost Gets First 3D Renders on Bifrost GPU (Mali-G31) including Basic Texture Support

Collabora has been working on Panfrost open-source Arm Mali GPU driver for over a year. The drive aims to support both Midgard and Bifrost families. But so far, the company had mostly focused on Midgard (Mali-T6xx/T7xx) GPUs with for example experimental OpenGL ES 3.0 support announced last February. Collabora engineers, such as Alyssa Rosenzweig, have now started to work on Bifrost support, and some good progress has been made since they managed to have Panfrost render the first 3D graphics with basic texture support using a platform with an Arm Mali-G31 GPU. Alyssa notes that while Midgard and Bifrost have a similar command stream requiring a few changes, the Bifrost instruction set is completely different and required building a new compiler from scratch. This leads to changes to the Intermediate Representation (IR), 16-bit data support, a different register allocation mechanism due to adapt to irregular vector architectures, and the latter […]

Collabora & Microsoft to Bring OpenCL 1.2 and OpenGL 3.3 to DirectX 12 enabled Windows Devices

OpenCL DirectX Translation Layer

Collabora has been working on open-source graphics projects for a while, including Panfrost open-source drivers for Arm Midgard and Bitfrost GPUs which got experimental OpenGL ES 3.0 support earlier this year. But the company has also been working with Microsoft in order to provide an OpenCL 1.2 & OpenGL 3.3 translation layer for Windows devices compatible with DirectX 12. Their solution relies on Mesa 3D OpenCL and OpenGL open-source implementation with three main components: an OpenCL compiler using LLVM and the SPIRV-LLVM-Translator to generate SPIR-V representations of OpenCL kernels. The data goes through an SPIR-V to NIR translator (NIR is Mesa’s internal representation for GPU shaders), and finally to NIR-to-DXIL generating a DXIL compute shader and metadata understood by DirectX 12 (D3D12) a custom OpenCL runtime to do a direct translation of DirectX 12 (Not based on Mesa Clover implementation) a Gallium driver that builds and executes command-buffers on the […]

Panfrost Open-Source Arm Mali GPU Driver Gets Experimental OpenGL ES 3.0 Support

Panfrost OpenGL ES 3.0

Panfrost is the open-source driver being developed for Arm Midgard and Bitfrost GPUs. The first versions focused on support for OpenGL ES 2.0, but the more recent OpenGL ES 3.0 enables faster and more realistic rendering. The goods news is that Panfrost support for experimental OpenGL ES 3.0 has landed in Mesa according to a recent post on Collabora blog. Specifically, Panfrost now supports instanced rendering, primitive restart, uniform buffer objects, 3D textures, and multiple render targets (on Mali T760 and up) all of which are OpenGL ES 3.0 features. People who are not into graphics development may not know about the purpose of those features, but Alyssa Rosenzweig, a free software graphics hacker leading Panfrost, explains: … instanced rendering and primitive restart allow developers to write faster graphics applications, to render efficiently scenes more complex than possible in ES 2.0. … uniform buffer objects and 3D texture give developers […]

MediaTek Helio P95 Processor Launched with APU 2.0 AI Accelerator, Faster Graphics

MediaTek Helio P95

MediaTek has just unveiled an upgrade to its Helio P90 processor. MediaTek Helio P95 still features the same APU 2.0 engine for AI accelerator but with a 10% performance improvement based on ETH Zurich benchmark, and the PowerVR GM 9446 GPU clock has been increased as well to deliver a 10% graphics performance boost. The company also claims a “60% shorter GPU rendering-to-display pipeline”, but it’s unclear whether the comparison is against the P90 processor or other “typical” Arm processors. MediaTek Helio P95 key features and specifications: CPU – 2x Arm Cortex-A75 processors @ 2.2GHz and 6x Arm Cortex-A55 processors @ 2.0GHz using DynamIQ technology GPU – Imagination PowerVR GM 9446 GPU NPU – MediaTek APU 2.0 fusion AI architecture with 1127 GMACs MediaTek CorePilot for sustainable peak performance, longer battery life, and lower operating temperature System Memory –  Up to 8GB of dual-channel LPDDR4x memory @ 1866 MHz Storage […]

NetBSD 9.0 Released with Aarch64 Support, Arm ServerReady Compatibility

NetBSD 9.0

Yesterday, we wrote about Raspberry Pi 4 getting UEFI+ACPI firmware for Arm SSBR compliance allowing the board to run operating systems designed for “Arm ServerReady” servers out of the box. NetBSD 9.0 was just released on February 14, 2020, with support for Aarch64 (64-bit Arm) which had been in the works for a few years, and includes support for “Arm ServerReady” compliant machines (SBBR+SBSA). NetBSD 9.0 main changes related to hardware support: Support for AArch64 (64-bit Armv8-A) machines Compatibility with “Arm ServerReady” compliant machines (SBBR+SBSA) using ACPI. Tested on Amazon Graviton and Graviton2 (including bare metal instances), AMD Opteron A1100, Ampere eMAG 8180, Cavium ThunderX, Marvell ARMADA 8040, QEMU w/ Tianocore EDK2 Symmetric and asymmetrical multiprocessing support (big.LITTLE) Support for running 32-bit binaries via COMPAT_NETBSD32 on CPUs that support it Single GENERIC64 kernel supports ACPI and device tree based booting Supported SoCs Allwinner A64, H5, H6 Amlogic S905, S805X, S905D, […]

UP 7000 x86 SBC