A Look at a Small NanoPi NEO4 Build Farm

FriendlyElec NanoPi NEO4 is currently the cheapest and smallest SBC powered by Rockchip RK3399 hexa-core processor which packs two Cortex A72 “fast” 64-bit cores, and four Cortex-A53 “efficiency” cores, so it should be an obvious candidate if you plan on building an Arm build farm costs to its low cost, small form factor, and relatively good performance.

As part of his work on HAProxy load balancer, Willy Tarreau often has to run time-consuming builds for Arm targets, and to speed up the builds he’s put together several Arm based build farms powered by low cost development boards / SBCs. Up to now he had a build farm powered by five MIQI boards featuring Rockchip RK3288 processor with four Cortex-A17 “fast” 32-bit processor, and controlled with a ClearFog Pro networking board. He’s now decided to build another similar build farm but with NanoPi NEO4 boards instead.

NANOPI NEO4 Build Farm

Willy goes through the hardware setup, and software into much details in a blog post, so I’ll try to give a summary highlighting the key points in this article.

Hardware Build and DIY Heatsink

As you can see from the photo above, the setup comes with five NanoPi NEO4 boards connected to ClearFog Pro board over a Gigabit Ethernet connection.  But there’s something funny about the heatsink, it does not look like the heatsink provided by FriendlyElec at all… Willy wanted to save some horizontal space, so instead he made his own heatsinks out of an L-shaped aluminum block that 5.2cm wide and comes with a 2mm thick aluminum corner.

DIY Heatsink NanoPi NEO4

He drilled four mounting holes into the aluminum block, and added a ceramic pad for cooling the processor at first, but eventually switched to copper pads coupled with screws and spring to kept the processor, pad and aluminum block in good contact without too much force applied.

Copper heatsink NanoPi NEO4

Once all five boards are prepared in a similar fashion, we’re still not finished with the five boards themselves connected to a much larger heatsink taken from an old Pentium2 system using a large band of thermal tape since going with screws was not practical.

Build Farm Heatsink Cooling

That part was fun, and I could even have published a separate post just about the DIY heatsink.  The hardware build needs to be completed USB-C cables and a 5-port USB power supply such as this one.

Software Installation, and Optimization (Overclocking)

The default FriendlyElec Ubuntu images were installed to eMMC flash module, and Willy disabled “systemd related crap”, as well as graphics mode since it’s needed for the use case. An important tip during installation is not to duplicate the flash from one board to install on the others, as it will also replace the MAC address, and you obviously don’t want 5 boards with the same MAC address in your local network 🙂

Since that’s a build farm for C programs, distcc was installed as well as various compilers from kernel.org.

In order to get the most performance of the board he overclocked them with the Cortex-A72 cores reaching up to 2.2 GHz, and Cortex-A53 cores up to 1.8 GHz, except for one were it had to be limited to 1.7 GHz, so finally all boards for their LITTLE cores limited to 1.7 GHz for practical reasons. The overclocking requires some kernel patches for linux 4.4.138 and a script to change the frequencies:

Final words

Some improvements are also discussed such as replacing the overpowered ClearFog Pro board by a NanoPi NEO2 and an 8-port Gigabit Ethernet switch, as well as adding a reset button to the board to facilitate debugging.

Willy concludes as follows:

This constitutes a nice upgrade to the previous farm and I feel more confident hacking a bit with it thanks to the removable eMMC that I can easily re-flash from my PC. The boards are easy to hack on since all sources and docs are available, which is a real joy.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

Radxa Orion O6 Armv9 mini-ITX motherboard
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
34 Comments
oldest
newest
dgp
dgp
6 years ago

> as well as adding a reset button to the board to facilitate debugging.

I have an Allwinner v3s board that would probably be good for something like this.
The v3s has 8 uarts and 8 load switches connected to it and you can control it over mqtt to bring up machines on demand/trigger reboots. Potentially it could interact with u-boot on the boards to do flashing via tftp.

willy
willy
6 years ago

This could indeed be an option. Over time I also found that using USB hubs with cheap USB UARTs is nice as well. When you see the cheap CH340G and its SOP8 successor whose I already forgot the name, it’s trivial to connect them to any device in fact. I haven’t worked on this part yet, for now I don’t need the reset that much.

willy
willy
6 years ago

Not much in fact, the two first ones are related to internal heat sinks, and the last one provides medium-sized ones. I’m interested in having a huge one sharing the heat between all devices. I have a much better design in mind, I’ll draw it soon. In short it would consist in using the L-shaped alu blocks to conduct the heat towards a thick bottom plate, on which a rear large heatsink would be installed. This would remove all the thermal tape and allow for screws to be used for more efficient contact.

theguyuk
theguyuk
6 years ago

True, but to thick a thermal tape can hinder heat transfer, also having the main heat sink away from the heat source hinders heat transfer. That is why some GPU use heat pipes to transfer heat. A low voltage sub 3V fan and a cone cylinder pipe,,with a cowl to pull air through would work fine, oh well.

willy
willy
6 years ago

This quick-n-ugly drawing shows how I’m seeing it :

http://1wt.eu/nanopi4/neo4-cooling.png

It should even shorten the distance between the CPU and the big heat sink, and using a thick plate it will reduce the resistance. Also the benefit is that the plate will also help spread some of the heat into the support (table or whatever lies under it).

Since the 4 holes represent an exact square, I can turn the L-shaped block 90 degrees and try this once I find the plate I need 😉

tkaiser
tkaiser
6 years ago

After I did some tests a while ago for me I came to the conclusion to rely on a heatsink solution with as less own thermal mass as possible and with as large heatsink finks with sufficient spacing in between (later operated vertically so convection can help): https://github.com/ThomasKaiser/Knowledge/blob/master/articles/Heatsink_Efficiency.md For five NEO4 I would probably end up with five copper shims (20x20x1mm) and then order on Aliexpress ‘245mm+60mm+25mm full aluminum E Heatsink for Power amplifier’. All NEO4 would be mounted next to it other directly to the heatsink using the copper shims and two thin films of thermal compound in between.… Read more »

roel
roel
6 years ago

tkaiser, if you can get hands on a defect psu there are plenty of heatsinks inside. No need to buy one on aliexpress.

willy
willy
6 years ago

That’s approximately the type of heatsink I have for the MiQi farm. And yes, spacing is important for passive convection. Picking a dense heatsink designed for a CPU with an active fan gives horrible results. For compactness, I really want to mount the boards vertically. I thought about mounting them horizontally on a large heat sink as you suggest but that would have been too large for me (it’s a matter of taste).

theguyuk
theguyuk
6 years ago

I think you are asking a lot for the heat transfer, heat likes going up in warm air, and the base and the back will both be heat sources, I suspect causing heat spots.

eas
eas
6 years ago

Did you measure any temperatures with your existing arrangement, either assembled or in pieces?

You said you did a cpu burn test before attaching things to the big heatsink and it was “OK.” What constitutes “OK?”

It seems like you aren’t going to get enough of a thermal gradient between the big heatspreader and the heatsink to move much power through that 2mm cross-section of aluminum.

willy
willy
5 years ago

> Did you measure any temperatures with your existing arrangement, either assembled or in pieces? Yes, with cpuburn it stabilizes around 86 degrees. But it’s not very hot in the room, which is why I’d like to improve the design to better stand summer. I ordered a thermal camera 3 months ago for this but it still didn’t arrive 🙁 > What constitutes “OK?” not throttling for a while, and making sure the L-shaped plate are as hot as the rest of the mainboard, indicating the contact between the CPU and the plate is good. > you aren’t going to… Read more »

theyguyuk
theyguyuk
5 years ago

I still suspect you will get slower air pockets between each SBC causing a warm blanket effect.
Water flows through pipes but heat radiates, so needs a push or pull for maximum cooling effect. IMO

roel
roel
6 years ago

Willy,

where did you get the springs to put over the bolts? I did a similar design some while ago (just for one board), and used some springs from a heatsink from a old graphics card. Now I won to do a similar design for another board, but I don’t have donor material anymore.

willy
willy
6 years ago

I took similar springs, that I had to cut because 1) I didn’t have enough of them, and 2) even once compressed they were too thick. So I cut them in 3 to have around 3 turns each at most. I really don’t like my design there but the alternative was to use 4.5 washers per hole and I didn’t have enough either, and this .5 was becoming problematic to make 🙂

I also thought about using some 3-4mm heat shrink tube to replace springs. I think it can work as it will resist compression quite a bit.

roel
roel
6 years ago

Yes, maybe a piece of rubber tube should do also. Good idea!

tkaiser
tkaiser
6 years ago

As already asked over there https://www.cnx-software.com/2018/10/10/nanopi-neo4-cheapest-smallest-rk3399-board/#comment-559659 I would be really interested in knowing results with mainline kernel (due to higher memory performance) and a quick 7-zip benchmark comparison between the MiQi and the NEO4 cluster.

willy
willy
6 years ago

Not tested yet! Too many questions start to flood back here at the same time 🙂 OK, here it comes : # 7z b 7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_US,Utf16=on,HugeFiles=on,64 bits,6 CPUs LE) LE CPU Freq: 2200 2202 2201 2202 2201 2202 2201 2202 2202 RAM size: 967 MB, # CPU hardware threads: 6 RAM usage: 675 MB, # Benchmark threads: 6 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 5142 503 994 5002 | 111592… Read more »

tkaiser
tkaiser
6 years ago

RK3288 (Tinkerboard running with 4.14) and ‘conservative’ settings scores ~5350: (the 1.8 GHz cpufreq OPP results in just 1730 MHz confirmed by your mhz tool, no idea about DRAM controller settings) while a NEO4 with slight overclock (2.0/1.5 GHz) scores between 6750 (4.19) and 6500 (4.4). Results (details in most right column): https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md

Would be interesting to get a 7-zip score also from one of your tuned MiQi’s 🙂 I would expect something slightly above 6000 (would then be in sync with your compile time differences comparing MiQi and NEO4 cluster).

willy
willy
6 years ago

Hmmm quite interesting 🙂 I definitely need to test on 4.19 then. I’ll use my M4 which is spare now for this, it will be easier than hacking in the farm. In the past the rockchip blobs used to provide much higher DDR performance but this might have changed. Regarding the test on the MiQis it will not be easy, it’s a hand-made distro, there’s not even a C++ toolchain for it so we’ll need a static 7z utility to try it.

willy
willy
6 years ago

By the way, which one is supposed to be the correct value here ? There are plenty on the Avr and Tot column. I suppose the rightmost on the Tot column ?

tkaiser
tkaiser
6 years ago

I would believe with preserved formatting it’s somewhat obvious 🙂 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 5142 503 994 5002 | 111592 518 1835 9517 23: 4761 509 953 4851 | 109394 521 1816 9466 24: 4482 513 939 4819 | 106414 521 1792 9340 ---------------------------------- | ------------------------------ Avr: 509 962 4891 | 520 1814 9441 Tot: 514 1388 7166 12345678910                        Compressing  |                  DecompressingDict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS 22:       5142   503    994   5002  |     111592   518   1835   951723:       4761  … Read more »

willy
willy
6 years ago

> I would believe with preserved formatting it’s somewhat obvious
I have absolutely no idea how to present the output like you do!

Anyway thanks for the explanation, now I see where you took the numbers and next time I can give you just the number you need 😉

willy
willy
6 years ago

OK, good to know, thank you!

Gégé
6 years ago

There’s a typo in Willy’s name …

willy
willy
6 years ago

By the way, thank you Jean-Luc for relaying this article, I really feel like it was worth writing it if people find it interesting 🙂

e97
e97
6 years ago

What issues are you having with the power supply?

AC110-220V to 5V/60A DC Switching Power Supply for LEDs is about $25. The 40A version is about $20.

The Tizen and LAVA folks use a plain ol’ ATX power supply.

willy
willy
6 years ago

with this one, none.

e97
e97
6 years ago

Cool. Looks like you’re already using the LED power supply 🙂

In your article:

> “Possible improvements”
> …
> “The power supply would still remain an issue though.”

What does this mean ?

willy
willy
6 years ago

Ah got it! Well it’s simple, in order to power this number of boards you need a *strong* power supply. The cheap USB ones whose real rating needs to be halved are often not enough to power 5 boards, hence the big 5V/30A one I purchased there. But since this one doesn’t have USB outputs, I had to make the adapter board myself. Another solution would be to use one single-port PSU per board, but it requires many mains plugs.

e97
e97
6 years ago

Yes that is troublesome. The adapter you made is neat, makes it modular.

I chose to cut the USB A ports and wire directly to the PSU outputs. This also decreased the voltage drop and let me tidy up the cables to the exact length needed.

willy
willy
6 years ago

Just noticed this : > has to run time-consuming builds for Arm targets In fact not even, I use this farm to build for x86_64 and run it on my laptop. Building everything in 3 seconds is quite pleasant and doesn’t make you think before running “make clean”. And for the kernel, it’s the same, I’m building x86 kernels as well. This is why I’m using cross-compilers there. I noticed that gcc 7 is almost twice as slow as was gcc 4.7, to produce code that is not always as good… Fortunately the kernel is still compatible with 4.6 and… Read more »

Boardcon EM3562 Rockchip RK3562 SBC with 8 analog camera inputs