Modular Mojo claims to be over 36,000 times faster than Python for AI workloads

Modular Mojo is a new programming language designed for AI developers that is said to combine the usability of Python with the performance of C with over 36,000 times the performance of Python on a matrix multiplication workload.

Modular Mojo programming language was not in the initial plan of the company but came about when the company’s founders – who focused on building a platform to unify the world’s ML/AI infrastructure – realized that programming across the entire stack was too complicated and also ended up writing a lot of MLIR (Multi-Level Intermediate Representation) by hand.

Modular Mojo vs Python matmul

The “over 36,000 times speedup” claim comes with the matmul.py script performing a 128×128 matrix multiplication in Python with a throughput of 0.00215 GFLOP/s and another script doing 512×512 vectorized + parallelized matrix multiplication in Mojo at 79.636 GFLOP/s.

The claim looks dubious and that’s odd they used different matrix sizes, but some are ready to spend a lot of money on the company, as Monitor claims to have raised $100 million dollars. A discussion thread on Twitter/X reveals the results may not be that relevant because as Sachin Shekhar puts it:

All ML libraries use C under the hood. Python is merely an interface. Given, C is still being used, it isn’t easy to replace a programming language. You need to give something special. Nobody’s running big loops using native Python in production.

We can’t reproduce the test above just yet, because the Mojo SDK will only be released in September, but the documentation is available and includes a such about matrix multiplication. Since Mojo is a superset of Python, both matmul scripts are identical:


They just changed the function name to “matmul_untyped” in the Mojo script. Here’s the code for the Python matrix multiplication script:


The throughput is shown to be 0.0016717199881536883 GFLOP/s in the documentation.

Now we can look at the Mojo script which also imports modules specific to Mojo:


The output from the script is:


So when using the same 128 x 128 matrix, Mojo is 17.5 times faster than Python so the 36,000 times speedup claim feels fake to me as it’s only due to the larger matrix, and makes Monitor look dishonest… A 17.5 times speedup is still pretty good, but if most Python programs for AI workloads indeed rely on C libraries, it may not be relevant.

Since Mojo is a superset of Python, I suppose it won’t run on low-end hardware like microcontroller boards, but should be fine on any Linux platforms.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

Radxa Orion O6 Armv9 mini-ITX motherboard
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
6 Comments
oldest
newest
Boardcon CM3588 Rockchip RK3588 System-on-Module designed for AI and IoT applications