Perfetto is an open-source system profiler, app tracer, and trace analyzer for Linux, Android & Chrome platforms, and user-space apps. The program can already visualize CPU and memory usage, as well as power consumption. GPU support is more limited with the program only capable of sampling the GPU frequency when the driver outputs that information via ftrace.
When Perfetto is also extendable thanks to a Tracing C++ SDK that “allows userspace applications to emit trace events and add more app-specific context to a Perfetto trace”. Collabora made use of the tracing SDK to add support for Mali Midgard GPU performance profiling in gfx-pps project using the Mali GPU hardware counters exposed via Panfrost open-source Mali GPU driver.
After following the installation instructions, you’ll be able to run the following executables for tracing and profiling:
traced
tracing service.traced_probes
OS probes service.perfetto
command-line tool for recording traces.producer-gpu
providing the Panfrost data source.
There’s also gpu.cfg
config file to feed as input to Perfetto describing what to trace, and found in gfx-pps/scripts directory.
Run the following command to quickly get started
1 2 3 4 |
traced traced_probes producer-gpu perfetto --txt -c gpu.cfg -o trace |
Run whatever GPU workload you’d like to profile, and once tracing is complete, you can open trace
file with ui.perfetto.dev in Chrome Browser (I first tried in Firefox and it won’t load).
The screenshot above shows some of the GPU parameters including CPU/GPU balancing (not make sure the CPU is not the bottleneck), Vertex/Fragment balancing, and Tripipe (Arithmetic/Load-Store/Texture) balancing using a trace of WebGL Aquarium taken on a Rockchip RK3399 SoC equipped with a Mali-860MP4 Midgard GPU.
The gfx-pps project is under active development on FreeDesktop’s GitLab and licensed under an MIT license. You’ll find the Panfrost data source in gfx-pps/src/gpu/panfrost/gpu_ds.h file, and the implementation of more GPU data sources is planned.
More details about gfx-pps project can be found on Collabora blog.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress
This is awesome, for older hardware. I actually modified a Galaxy S6 and we (meaning us (mobile) tech enthusiasts) all know that it has the 20nm Samsung fabbed Exynos 7420 aka 4x ARM based A57 + A55 cores clocked at 2.1GHz max but, the important part is the Mali-T760 MP8. I took the logic board out of the phones glass sandwich and put a custom micro copper finned heatsink on the SOC. Looks like a mini ZALMAN. This Mali-T760 MP8 runs flat out at 772 MHz. When it’s clocked that high and holds it’s frequency without throttling, you’d be surprised at what this GPU can do. It was matched to an Adreno 530 in a SD821. So, it matched an overclocked version of the Adreno 530, in some GPU workloads. What’s so amazing, is the, Adreno 530+ has 256 ALU’s, GPU cores, ALU Foating Point Vector units whatever they want to call them. 256 ALU’s. About the same as a Maxwell based nVIDIA Tegra X1, which coincidently has 256 cuda cores. But, at 1GHz. Pretty powerful little GPU’s. Don’t get me goin on the LG V40 with the ROG Phone 1 2.96GHz clock mod and Adreno 630 with 512 ALU’s clocked at 725MHz. Yeah, almost 1 Teraflop of FP32 compute coming from that setup but, that’s another story. There’s my 2cents hope someone learned something new by this. Maybe post another story if no one else replies….