[Update May 7, 2019: Giggle Score has been updated to use 7-zip to benchmark the boards instead of sysbench, and the “best value” rankings are now quite different]
People like to compare single board computers, and usually want to have a simple answer as to which is better than the others. But in practice it’s impossible, because the beauty of SBCs is that they are so versatile and can be used in a wide variety of project, and that means in some cases the “best board” may be completely useless to you since it lacks a critical feature and interface for YOUR project be it H.265 video encoding or a MIPI DSI display interface.
Still, it’s still always fun to look at benchmark scores and trying to compare SBCs, and for projects that mostly require CPU processing power it may also be useful. Robbie Ferguson has been developing and maintaining NEMS (Nagios Enterprise Monitoring Server) Linux for single board computers which runs an after-hours benchmark once per week and logs the server’s score anonymously and securely meaning he has a database with benchmark of hundreds boards running NEMS, mostly of which are Raspberry Pi 3 Model B/B+ boards.
The NEMS Performance Score (NPS) is then weighted by the selling price of the board to derive the Giggle Score providing a list of the boards with the best value. As the name implies, you may not want to take it too seriously but the results are in and Amlogic S922X based ODROID-N2 board with 2GB RAM is the board with best value, followed by the 4GB RAM version, and RockPro64 (4GB) comes in third.
At the other end of the ranking, the three boards with the worst value are all coming from the Raspberry Pi Foundation with Raspberry Pi 3 model B, Raspberry Pi 2, and Raspberry Pi 1/Zero being dead last.
If networking monitoring is indeed a low CPU usage task, the Raspberry Pi Zero may ironically be the best value at $10 for running NEMS Linux, as all the processing power potentially delivered by ODROID-N2 may be just be wasted since the system may be idled at most times. I’d assume it all depends on the size of your network.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress
Shell script that runs sysbench…Why not just call it sysbench score?
Why not call it ‘benchmarking gone wrong’ or ‘collection of numbers without meaning’?
The guy seems to not look at the numbers he collects and publishes. If ODROID XU4 achieves the same score as RPi 3 A+ (180) and the RPi 3 B+ then scores just 139 then you know your numbers are simply BS and should be deleted instead of being published.
We should start a performance review site together and do it correctly. You in? XD
Count me as an extra ; )
This is a sepcific use case but seems to do a fair job: http://wiki.ant-computing.com/Choosing_a_processor_for_a_build_farm
You’re right. There needs to be unbiased review site with competent, relevant benchmarks – that would cut down on the BS boards and let the good boards succeed.
There indeed seems to be something seriously wrong. This benchmark apparently favors only 6-core and 64-bit boards. The 4-core ones are way lower, and the 32-bit ones are further lower. XU4 has 4 big + 4 little in 32-bit and is completely defavored. It might be that the workload is realistic and really depends on the hardware, but I doubt it. It’s also possible that the distro and installed packages matters first. I’ve seen in the past that some workloads making intensive use of scripts could run more than twice as fast by switching to simpler shells or simply removing… Read more »
It’s simply a matter of using the wrong tool. Sysbench when used correctly can provide insights but usually sysbench is only used in ‘fire and forget’ mode by clueless people who never thought a single second about the results they get. Sysbench’s cpu test is a number crunching application calculating prime numbers inside the CPU’s L1 caches. No dependency on memory performance at all but 100% dependency on compiler optimizations and ISA (guess why the XU4 scores are that low or why sysbench executes 15 times faster on the RPi 3 once it runs in aarch64 state? Since ARMv8 64-bit… Read more »
I expected to see a comment by Jerry here…
He’s probably busy learning the new propaganda material from RPi foundation 🙂
> If networking monitoring is indeed a low CPU usage task Depends mostly on count of hosts and services you monitor and how the data is processed. With a lot of services you’re likely be bottlenecked by (integer) CPU, memory and IO performance. The NEMS guy decided to use the least reliable benchmark for this use case in the most stupid mode: https://github.com/Cat5TV/nems-scripts/blob/master/benchmark.sh Sysbench calculates prime numbers inside the CPU’s L1 caches. It has no relationship to the performance requirements of a Nagios monitoring host and even worse different sysbench versions spit out totally different scores while his benchmark script… Read more »
The most amazing is that if not found he installs tons of libs on the system and builds them from source (using make -j, unbounded), then installs, then runs the tool. *This* compilation phase could have served as a hint for the board’s performance and is way more relevant to a nagios workload than what the resulting sysbench executable might produce! But the compilation time is not measured here so the only useful phase of this script it lost.
Can we agree that the only meaningful benchmarks are ones that help predice the performance of the app we care about? So, here’s someone with an app and his ‘benchmark’ isn’t even testing the performance of his app. I mean, really, here you are writing an app and you don’t use critical bits of the app as a benchmark? That’s just *facepalm*
For this type of ‘app’ there’s really no need to benchmark individual SBC. It’s more important to educate users about what bottlenecks a Nagios install so they know where to spend their money on. Obviously there’s some CPU horsepower needed (Integer not floating point). For this type of use case I consider 7-zip’s benchmark mode still sufficient. If a 7z b run scores twice as high on board A compared to B then A will be able to monitor roughly twice as much services than B as long as storage requirements are also met. And that means sufficient sequential and… Read more »
Thank you. This is a much more productive post. I was looking at 7zr b as a possible benchmark to include, so it’s good to see a bit more details about why this would be beneficial.
Nobody has submitted a PR, but this certainly helps. Watch for some new commits to improve the accuracy based in part on your feedback.
Thank you.
Robbie // The Bald Nerd
I opened up a Github issue pointing back to the discussion here: https://github.com/Cat5TV/nems-scripts/issues/2
I’d argue that as far as ARM boards go, the Jetson Nano Developer Kit offers even better value than the odroid-N2 when one takes into consideration its 4GB of RAM, GPU and AI hardware and Nvidia support.
Not CPU-wise – N2 is a clear winner there. GPU-wise they’re comparable; of course if one needs CUDA Nano has no alternative.
First off, thank you Jean-Luc for taking notice of my little project. I’ll try to answer as many comments as possible. The intention behind the Giggle Score is not really to compare a RockPro64 to a Raspberry Pi Zero. As noted in the post, a Pi Zero might be the perfect SBC for your project even though its Giggle Score shows it as expensive for the performance it provides. BUT it is still only $10 USD! Giggle Score doesn’t say it is expensive: It says it is providing less performance for its price. If it meets your needs, it may… Read more »
> As long as the same tests are performed, and the same numbers compared for each board, the comparison should prove reasonably accurate
Absolutely not (read the reasons above, everything is already explained in detail — sysbench’s cpu test is great for people calculating prime numbers for a living but has absolutely nothing to do with any real-world application and especially a Nagios host).
I appreciate your knowledge, and I’m sure there is a suggestion in here somewhere and not just bashing sysbench (which is a great tool) and me (also a great tool). What CPU test would you prefer see part of the Giggle Score algorithm (which is not a benchmark)?
Amen to this. I see tkaiser posting a lot of problems but zero solutions. As someone who is just getting into the sbc game it would be good to know which test(s) he thinks is most relevant to getting a meaningful baseline.
> I see tkaiser posting a lot of problems but zero solutions. This is not exact. He proposed a few times 7z as one candidate. I’m personally not completely fond of this one either but I admit that it will depend on CPU, the memory controller and RAM performance at least, which is already much better than just integer divisions which are only relevant in specific number crunching applications. The ratios between the high-end and the low-end results here are definitely misleading and while they possibly place the top and bottom at their right place, the intermediary ones are not… Read more »
> bashing sysbench Seriously, if this is your impression I can’t help you. Sysbench when used correctly can be used to determine the performance of specific database servers (that’s where it comes from). The problem is the mode people use it: without thinking a second what the sysbench tests really do. Care to explain how it’s possible that a RPi 3 when running Raspbian (32-bit) scores 15 times lower compared to running a 64-bit Debian? Why upgrading an RPi from Jessie to Stretch results in faster sysbench scores? Why using a different sysbench version (differing by the 2nd decimal point)… Read more »
> Giggle Score algorithm (which is not a benchmark) Sorry, which algorithm are you referring to? You use one specific application benchmark called sysbench in an absolutely questionable way to generate some sort of ‘hardware performance’ indicator. And then you factor in a hypothetical list price which has rather limited meaning if you compare real prices with list prices for majority of users in this world (talking about well known additional costs like shipping, VAT/taxes, necessary accessories like an additional PSU since Micro USB is crap and so on). I spent some time today to explain why using sysbench to… Read more »
Hi. I review SBC’s, and once made a video about benchmarking tools for SBC’s. I show there that sysbench uses different versions, and performs differnt on armhf and arm64. It is not reliable to compare sbc’s with each other. And it isn’t a useful test. Here is my video. https://www.youtube.com/watch?v=EZMHo3bVnOo&t=156s I find your findings very misleading. They take nothing else in account. For example, the N2 doesn’t have wifi, only 1 lane USB3 no PCIe/SATA/M.2. Compared to the NanoPi M4 it lacks behind in all these things. The list is also very limited. The NanoPi Fire3 for example would have… Read more »
Nico, your video is nice, hopefully it will teach some people how numbers alone are meaningless until you put them in their context. I noticed two things that you will possibly be interested in taking into account for a future series of tests : – you place a fan on all your boards to prevent them from throttling, this is great for people like a few of us who are willing to invest on cooling to get the highest performance, but it’s not accurate for most users who will simply use the board inside its enclosure. For example an RPi… Read more »
You think a 1.4GHz 8xA53 is going to be faster than 1.8GHz 4xA72 and 1.9GHz 2xA53? The former with no cooling and the latter with a massive heatsink? The former with 1GB of slow DRAM and the latter with 4GB of fast DRAM?
What’s your benchmark? Because it can’t be anything that requires fast single core or multi-core CPU. It can’t be something that depends on memory speed. It can’t be anything that requires fast storage access. Maybe the benchmark is interfacing to a variety of LCD displays as that seems to be something the NanoPi Fire3 is good at.
It depends. After having tried a program called cpuminer to test my boards stability under extreme heating, I noticed that some algos were more sensitive to CPU frequency than architectural optimizations and were running at almost the same speed on the A72 as on the A53. Anything using crypto or CRC is mostly sensitive to frequency. In this case having 8 cores simply is faster than 6 cores. And the difference is not bad as can be seen in the numbers I reported here after testing on various boards : https://github.com/bschn2/rainforest/issues/15#issuecomment-488729894 As you can see, the NanoPI Fire3 at 8×1.6… Read more »
You’re actions reflect those of someone whose goal is not to provide unbiased feedback, but sell something.
Performance / $ would put Pi on top. You completely neglect software. You’ve ignored every constructive suggestion others have clearly made and claimed the responders are being “mean”.
Fortunately in the U.S. there are laws that fine advertising without disclosure. So if anyone in the U.S. buys such a product based on your “analysis” and finds it incorrect- you are liable for false advertising and damages.
Take off the tinfoil hat. He isn’t selling anything. Quite the opposite actually. I know others have questioned the benchmarking used, but I can tell you for a fact that out of all of those SBCs, there is no way in hell the Pi gives the best performance per dollar.
>I can tell you for a fact that out of all of those SBCs, there is no way in hell the Pi gives the best performance per dollar.
I think that almost everyone on this site (except Jerry) will agree on this 🙂
Musta hit the nail on the head. Triggered a reaction and the obvious deflection is to call someone crazy.
Pi Zero is $5. Pi Zero W is $10. Performance/$ is basic math, if division is too difficult, enjoy your internet circle jerk lol
> Regarding /tmp… you’re being alarmist, but you’re right in this: It would indeed be better practice to move this operation away from a user-accessible folder.
Check manual page of mktemp, chmod, do a web search for ‘temp file vulnerability’ and spend a few hours on the issue please 🙂
Honestly man, learn some humility, there is no need to be a jerk 24/7.
Some, hopefully helpful, criticism. First, decide what it is you want to convery or measure in your benchmark. Usually for an app to have a benchmark, the intent is to measure the performance of some critical piece of the app itself. For example, if your app does some large matrix math with BLAS, then that’s going to be a good indicator of performance of your app. So, the process really starts with understanding the needs of your app. Do you spend a lot of time processing data? If so, how much data? Does it fit in the L1 or L2… Read more »
Thank you so much. Having taken the criticism constructively, v2 was released this past weekend. It addresses all of the concerns, and adds some new calculations to further improve the usefulness of GiggleScore.com
I haven’t tested it, but there’s already the Phoronix test suite, which it should be compatible with ARM as it builds from source.
https://phoronix-test-suite.com/
I checked this site out the other day when I saw it linked on the odroid forum. Obviously the testing method leaves a little to be desired (my complaint is that the testing method isn’t documented) but this is a really cool idea. It reminds me of the Backblaze drive stats that they publish.
For everyone complaining, is there an alternative to this that exists? I have searched high and low for a way to compare SBCs and have found nothing. This is the first site I have seen that even attempts to tackle this issue.
It really is not easy. Most workloads depend on a factor of integer, floating point, cache size and latency, memory, context switching speed, architectural choices and extensions, storage bandwidth, storage latency, network bandwidth, network latency, etc. And any application will have a different set of factors. One reasonable possibility consists in using a small set of tools each depending on a small set of factors (possibly with a bit of overlap) and synthesize these results in 3-4 integers, and let users know that depending on their use case, they’d rather focus on this or that column. Then it is possible… Read more »
Hi everyone, As the community here so kindly pointed out (hehe), the first iteration of my algorithms were less than accurate. In its defense, I would like to point out that GiggleScore.com was an idea that came to me just last month, so what was criticized here was the earliest iteration of a rushed hobby project. I appreciate all the excellent information that was shared in this thread, and chose to take it all constructively. This past weekend, I launched v2, which implements the changes brought about by the comments I received here. And you will find the Raspberry Pi… Read more »
> I launched v2, which implements the changes brought about by the comments I received here I don’t think so for various reasons: 1) it’s impossible to generate a performance chart relying on a single metric/tool. The use cases those SBC are used for are too different and the architecture differences (especially with big.LITTLE/DynamIQ designs) make it impossible to use one score to rank different SoCs or boards. The importance of single vs multi threaded for various use cases alone prevents the use of a single score 2) your math is flawed. You can not add the single-threaded 7-zip score… Read more »