We’ve already seen how to assemble NanoPi M4V2 metal case kit which offers an Arm mini PC solution with support for NVMe SSD. The new NanoPi M4V2 Rockchip RK3399 SBC is an evolution of the M4 board that brings faster LPDDR4 memory and adds power & recovery buttons.
Since we’ve already tested several RK3399 SBC‘s and TV boxes, I planned to focus the review on thermal design evaluation (i.e. see how well the board cools), and see how memory bandwidth evolved from LPDDR3 to LPDDR4.
I wanted to do so both with Linux and Android, since I could compare NanoPC-T4 (LPDDR3) benchmarks in Android. But this requires an eMMC flash module, and I don’t own any. So instead I planned to run Armbian because of support for armbian-monitor for nice temperature chart but it’s not working just yet, so instead I’ve done all tests with FriendlyCore Desktop (rk3399-sd-friendlydesktop-bionic-4.4-arm64-20190926.img) based on Ubuntu 18.04.
System Information
The desktop environment will auto-login, but if you want to login over SSH you can use root username and fa password. Just a few details about the system:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
root@NanoPi-M4v2:~# uname -a Linux NanoPi-M4v2 4.4.179 #1 SMP Thu Aug 29 17:08:23 CST 2019 aarch64 aarch64 aarch64 GNU/Linux root@NanoPi-M4v2:~# cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS" root@NanoPi-M4v2:~# df -h Filesystem Size Used Avail Use% Mounted on udev 1.9G 0 1.9G 0% /dev tmpfs 385M 1.1M 384M 1% /run overlay 24G 761M 23G 4% / tmpfs 1.9G 0 1.9G 0% /dev/shm tmpfs 5.0M 4.0K 5.0M 1% /run/lock tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup tmpfs 385M 12K 385M 1% /run/user/1000 /dev/sda4 200G 174G 24G 89% /media/pi/USB3_BTRFS /dev/sda3 245G 163G 82G 67% /media/pi/USB3_EXFAT /dev/sda2 241G 181G 48G 80% /media/pi/USB3_EXT4 /dev/sda1 245G 182G 63G 75% /media/pi/USB3_NTFS /dev/mmcblk0p2 24G 761M 23G 4% /media/pi/userdata tmpfs 385M 0 385M 0% /run/user/0 root@NanoPi-M4v2:~# free -h total used free shared buff/cache available Mem: 3.8G 260M 2.3G 17M 1.2G 3.3G Swap: 0B 0B 0B |
Note I connected my USB 3.0 test hardware as well which explains the four /dev/sda1..4 partitions.
Loaded modules:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
root@NanoPi-M4v2:~# lsmod Module Size Used by btrfs 995328 1 xor 20480 1 btrfs raid6_pq 102400 1 btrfs joydev 20480 0 sg 40960 0 bcmdhd 1314816 0 binfmt_misc 20480 1 uio_pdrv_genirq 16384 0 uio 20480 1 uio_pdrv_genirq sch_fq_codel 20480 3 bnep 24576 2 ip_tables 28672 0 x_tables 36864 1 ip_tables |
GPIOs appear to be properly configured:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
root@NanoPi-M4v2:~# ls /sys/class/gpio/ export gpiochip0 gpiochip128 gpiochip32 gpiochip64 gpiochip96 unexport root@NanoPi-M4v2:~# cat /sys/kernel/debug/gpio GPIOs 0-31, platform/pinctrl, gpio0: gpio-1 ( |vcc_sd ) out hi gpio-4 ( |bt_default_wake_host) in hi gpio-5 ( |GPIO Key Power ) in hi gpio-9 ( |bt_default_reset ) out hi gpio-10 ( |reset ) out hi gpio-13 ( |? ) out lo GPIOs 32-63, platform/pinctrl, gpio1: gpio-34 ( |int-n ) in hi gpio-46 ( |vsel ) out lo gpio-49 ( |vsel ) out lo GPIOs 64-95, platform/pinctrl, gpio2: gpio-83 ( |bt_default_rts ) out lo gpio-90 ( |bt_default_wake ) out hi GPIOs 96-127, platform/pinctrl, gpio3: gpio-111 ( |mdio-reset ) out hi GPIOs 128-159, platform/pinctrl, gpio4: gpio-154 ( |vbus-5v ) out lo gpio-157 ( |enable ) out lo gpio-158 ( |vcc_lcd ) out lo |
SBC Bench
SBC Bench script is great to benchmark Arm SBC’s and check if CPU throttling occurs under various workloads, So let’s install it:
1 2 3 4 5 |
wget https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/sbc-bench.sh chmod +x sbc-bench.sh root@NanoPi-M4v2:~# ./sbc-bench.sh -m Time big.LITTLE load %cpu %sys %usr %nice %io %irq Temp 07:48:27: 1008/1416MHz 0.08 1% 0% 0% 0% 0% 0% 51.7°C |
The system reports the CPU temperature is 51.7°C at idle, and I measured around 41°C on the top of the enclosure with an IR thermometer. The ambient temperature was around 28-29°C. Note that I do not have any NVMe SSD, and if you do use one temperature may be slightly higher. There’s a fan which will only rotate when the temperature rises further, and when it does it’s really noisy in a way I can hear in another room (if the door is opened) about 6 meters away from the board.
Time to run the benchmarks. It will take a while.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
./sbc-bench.sh -c sbc-bench v0.6.9 Installing needed tools. This may take some time... Done. Checking cpufreq OPP... Done. Executing tinymembench. This will take a long time... Done. Executing OpenSSL benchmark. This will take 3 minutes... Done. Executing 7-zip benchmark. This will take a long time... Done. Executing cpuminer. This will take 5 minutes... Done. Checking cpufreq OPP... Done. ATTENTION: Throttling might have occured on CPUs 0-3. Check the log for details. ATTENTION: Throttling might have occured on CPUs 4-5. Check the log for details. Memory performance (big.LITTLE cores measured individually): memcpy: 1302.1 MB/s (0.5%) memset: 4654.9 MB/s (0.4%) memcpy: 2613.9 MB/s (0.7%) memset: 4758.7 MB/s (0.8%) Cpuminer total scores (5 minutes execution): 7.82,7.76,7.75,7.73,7.71,7.70,7.69,7.64,7.63,7.62,7.61,7.60,7.57,7.56,7.54,7.53,7.51,7.49,7.48,7.46,7.45,7.44,7.42,7.41,7.40,7.37,7.36,7.34,7.32,7.29,7.28,7.26,7.25,7.24,7.23,7.22,7.21,7.20,7.19,7.18,7.17 kH/s 7-zip total scores (3 consecutive runs): 5875,5492,5560 OpenSSL results (big.LITTLE cores measured individually): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 124065.11k 368155.37k 705491.20k 947014.31k 1052090.37k 1058908.84k aes-128-cbc 358389.56k 807787.75k 1159997.95k 1278035.63k 1345746.26k 1347988.14k aes-192-cbc 118232.62k 327556.91k 576749.91k 730951.34k 791800.49k 795858.26k aes-192-cbc 337263.40k 739913.49k 980548.69k 1132787.71k 1185980.42k 1185147.56k aes-256-cbc 114553.00k 300478.08k 499657.81k 611518.12k 653445.80k 654939.48k aes-256-cbc 325593.91k 660635.97k 904857.86k 977412.78k 1017637.55k 1018451.29k Full results uploaded to http://ix.io/1ZWO. Please check the log for anomalies (e.g. swapping or throttling happenend) and otherwise share this URL. |
I also had ./sbc-bench -m running in a separate window to monitor the temperature more often, and the CPU temperature rose up to 71.1°C. The top of the enclosure was only slightly warmer 42°C.
CPU Throttling
No problem for single-core benchmark, but we can see a tiny bit of throttling for 7zip multi-thread benchmark:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
System health while running 7-zip multi core benchmark: Time big.LITTLE load %cpu %sys %usr %nice %io %irq Temp 09:47:58: 1800/1416MHz 4.60 3% 0% 2% 0% 0% 0% 59.4°C 09:48:19: 1800/1416MHz 5.06 79% 1% 77% 0% 0% 0% 67.8°C 09:48:40: 1800/1416MHz 5.18 77% 1% 76% 0% 0% 0% 66.7°C 09:49:00: 1800/1416MHz 5.76 90% 1% 88% 0% 0% 0% 69.4°C 09:49:20: 1800/1416MHz 5.52 73% 1% 72% 0% 0% 0% 67.8°C 09:49:40: 1800/1416MHz 5.51 84% 1% 82% 0% 0% 0% 69.4°C 09:50:00: 1800/1416MHz 5.80 78% 1% 76% 0% 0% 0% 69.4°C 09:50:21: 1008/1416MHz 6.21 92% 1% 90% 0% 0% 0% 70.6°C 09:50:44: 1800/1416MHz 5.92 78% 1% 76% 0% 0% 0% 70.0°C 09:51:05: 1800/1416MHz 5.88 81% 1% 79% 0% 0% 0% 67.2°C 09:51:25: 1800/1416MHz 5.90 87% 2% 85% 0% 0% 0% 70.0°C |
but it happens much more frequently with cpuminer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
System health while running cpuminer: Time big.LITTLE load %cpu %sys %usr %nice %io %irq Temp 09:51:38: 600/1200MHz 5.77 4% 0% 3% 0% 0% 0% 68.9°C 09:52:02: 1800/1200MHz 5.96 100% 1% 98% 0% 0% 0% 68.9°C 09:52:26: 1008/1200MHz 6.03 100% 1% 98% 0% 0% 0% 70.6°C 09:52:50: 600/1200MHz 6.02 100% 1% 98% 0% 0% 0% 69.4°C 09:53:14: 1800/1200MHz 6.15 100% 1% 98% 0% 0% 0% 68.3°C 09:53:38: 1800/1008MHz 6.23 100% 1% 98% 0% 0% 0% 68.9°C 09:54:02: 408/1416MHz 6.21 100% 1% 98% 0% 0% 0% 70.6°C 09:54:26: 600/1416MHz 6.19 100% 1% 98% 0% 0% 0% 69.4°C 09:54:50: 1800/1416MHz 6.21 100% 1% 98% 0% 0% 0% 68.9°C 09:55:14: 600/1416MHz 6.15 100% 1% 98% 0% 0% 0% 68.3°C 09:55:38: 600/1416MHz 6.29 100% 1% 98% 0% 0% 0% 70.0°C 09:56:03: 600/1008MHz 6.24 100% 1% 98% 0% 0% 0% 70.0°C 09:56:28: 1800/1008MHz 6.22 100% 1% 98% 0% 0% 0% 71.1°C |
It looks as if the system will drop the clock speed of the big cores to around 600 MHz when the CPU temperature goes over 70°C.
Memory Bandwidth
NanoPi M4 results for the big cores (2x Arm Cortex-A72) taken from SBC-Bench results database:
- memcpy: 4080 MB/s
- memset: 8270 MB/s
So now we have software/configuration issues on our hands as NanoPi M4V2 with the supposedly faster memory is actually much slower:
- memcpy: 2613.9 MB/s
- memset: 4758.7 MB/s
M4V2 runs Ubuntu 18.04 64-bit with Linux 4.4 while the M4 board was tested with Debian Stretch 64-bit and Linux 4.19. Here’s the kernel boot log for reference, but I can’t see anything about ddr.
Improving Air Flow
The way the case is designed the fan faces the desktop, and the case is only slightly elevated via rubber pads. One way to potentially improve cooling is to turn the enclosure upside down with the fan facing up, but instead, I elevated the case with four HDMI connector caps.
Let’s repeat the test:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
./sbc-bench.sh -c Average load is 0.1 or higher (way too much background activity). Waiting... System too busy for benchmarking: 13:22:38 up 8:29, 5 users, load average: 0.11, 1.08, 2.41 System too busy for benchmarking: 13:22:43 up 8:29, 5 users, load average: 0.10, 1.06, 2.39 sbc-bench v0.6.9 Installing needed tools. This may take some time... Done. Checking cpufreq OPP... Done. Executing tinymembench. This will take a long time... Done. Executing OpenSSL benchmark. This will take 3 minutes... Done. Executing 7-zip benchmark. This will take a long time... Done. Executing cpuminer. This will take 5 minutes... Done. Checking cpufreq OPP... Done. ATTENTION: Throttling might have occured on CPUs 0-3. Check the log for details. ATTENTION: Throttling might have occured on CPUs 4-5. Check the log for details. Memory performance (big.LITTLE cores measured individually): memcpy: 1336.6 MB/s (0.6%) memset: 4675.6 MB/s memcpy: 2594.7 MB/s (0.9%) memset: 4790.5 MB/s (0.6%) Cpuminer total scores (5 minutes execution): 9.51,9.42,8.89,8.72,8.63,8.59,8.58,8.53,8.52,8.48,8.47,8.44,8.43,8.42,8.40,8.39,8.38,8.37,8.36,8.34,8.29,8.28,8.26,8.23,8.22,8.21,8.19,8.18,8.17,8.16,8.15,8.14,8.12,8.11,8.10,8.09 kH/s 7-zip total scores (3 consecutive runs): 6097,6091,6131 OpenSSL results (big.LITTLE cores measured individually): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 124053.61k 368255.06k 706825.39k 947712.34k 1052890.45k 1058805.08k aes-128-cbc 358191.40k 807757.33k 1160006.83k 1282757.29k 1342816.26k 1348059.14k aes-192-cbc 118253.40k 326713.71k 577089.45k 730836.31k 792005.29k 796027.56k aes-192-cbc 337222.84k 739990.25k 984173.23k 1131525.12k 1186556.59k 1189811.54k aes-256-cbc 114629.87k 301376.96k 499779.24k 611570.69k 653781.67k 655742.29k aes-256-cbc 325630.63k 670746.20k 904601.77k 977996.46k 1017951.57k 1013841.92k Full results uploaded to http://ix.io/1ZYx. Please check the log for anomalies (e.g. swapping or throttling happenend) and otherwise share this URL. |
Sadly throttling still occurred but there are still some improvements since it did not happen with 7-zip at all (4°C lower), and happened less often with cpuminer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
System health while running 7-zip multi core benchmark: Time big.LITTLE load %cpu %sys %usr %nice %io %irq Temp 14:02:16: 1800/1416MHz 4.64 7% 0% 7% 0% 0% 0% 55.0°C 14:02:37: 1800/1416MHz 4.89 77% 0% 76% 0% 0% 0% 64.4°C 14:02:57: 1800/1416MHz 4.55 75% 1% 74% 0% 0% 0% 59.4°C 14:03:17: 1800/1416MHz 4.89 91% 1% 90% 0% 0% 0% 62.5°C 14:03:37: 1800/1416MHz 4.54 70% 0% 69% 0% 0% 0% 63.8°C 14:03:58: 1800/1416MHz 4.64 82% 1% 80% 0% 0% 0% 62.5°C 14:04:18: 1800/1416MHz 4.61 77% 1% 75% 0% 0% 0% 63.1°C 14:04:38: 1800/1416MHz 4.68 81% 1% 80% 0% 0% 0% 63.8°C 14:04:58: 1800/1416MHz 4.81 82% 1% 81% 0% 0% 0% 65.0°C 14:05:18: 1800/1416MHz 5.06 79% 1% 78% 0% 0% 0% 63.1°C 14:05:41: 1800/1416MHz 5.39 86% 1% 84% 0% 0% 0% 66.1°C System health while running cpuminer: Time big.LITTLE load %cpu %sys %usr %nice %io %irq Temp 14:05:45: 1800/1416MHz 5.52 8% 0% 7% 0% 0% 0% 68.3°C 14:06:08: 1800/1416MHz 5.89 100% 0% 99% 0% 0% 0% 67.8°C 14:06:31: 1800/1416MHz 5.92 100% 0% 99% 0% 0% 0% 68.9°C 14:06:53: 1800/1416MHz 6.06 100% 0% 99% 0% 0% 0% 68.9°C 14:07:16: 1800/1416MHz 6.04 100% 0% 99% 0% 0% 0% 69.4°C 14:07:39: 1800/1416MHz 6.09 100% 0% 99% 0% 0% 0% 68.9°C 14:08:02: 1800/1416MHz 6.06 100% 0% 99% 0% 0% 0% 70.0°C 14:08:25: 1800/1416MHz 6.10 100% 0% 99% 0% 0% 0% 70.0°C 14:08:48: 1800/1416MHz 6.19 100% 0% 99% 0% 0% 0% 69.4°C 14:09:12: 1800/1416MHz 6.14 100% 0% 99% 0% 0% 0% 67.2°C 14:09:35: 1800/ 600MHz 6.15 100% 0% 99% 0% 0% 0% 68.9°C 14:09:59: 408/ 600MHz 6.15 100% 0% 99% 0% 0% 0% 70.0°C 14:10:22: 1800/1416MHz 6.11 100% 0% 99% 0% 0% 0% 70.0°C |
GPU Acceleration and VPU Hardware Decoding
FriendlyCore comes with a set of programs preinstalled including some that allow us to test whether 3D graphics acceleration and hardware video decoding work.
glmark2-es2 is preinstalled and runs fine…
But the glmark2 score is on the low side at 54 because, as I understand it, vsync is enabled so the maximum score is 60 fps:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
pi@NanoPi-M4v2:~$ glmark2-es2 ======================================================= glmark2 2014.03+git20150611.fa71af2d ======================================================= OpenGL Information GL_VENDOR: ARM GL_RENDERER: Mali-T860 GL_VERSION: OpenGL ES 3.2 v1.r14p0-01rel0-git(966ed26).f44c85cb3d2ceb87e8be88e7592755c3 ======================================================= [build] use-vbo=false: FPS: 50 FrameTime: 20.000 ms [build] use-vbo=true: FPS: 47 FrameTime: 21.277 ms [texture] texture-filter=nearest: FPS: 60 FrameTime: 16.667 ms [texture] texture-filter=linear: FPS: 60 FrameTime: 16.667 ms [texture] texture-filter=mipmap: FPS: 60 FrameTime: 16.667 ms [shading] shading=gouraud: FPS: 59 FrameTime: 16.949 ms [shading] shading=blinn-phong-inf: FPS: 60 FrameTime: 16.667 ms [shading] shading=phong: FPS: 60 FrameTime: 16.667 ms [shading] shading=cel: FPS: 60 FrameTime: 16.667 ms [bump] bump-render=high-poly: FPS: 60 FrameTime: 16.667 ms [bump] bump-render=normals: FPS: 60 FrameTime: 16.667 ms [bump] bump-render=height: FPS: 60 FrameTime: 16.667 ms [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 59 FrameTime: 16.949 ms [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 60 FrameTime: 16.667 ms [pulsar] light=false:quads=5:texture=false: FPS: 60 FrameTime: 16.667 ms [desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 60 FrameTime: 16.667 ms [desktop] effect=shadow:windows=4: FPS: 60 FrameTime: 16.667 ms [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 30 FrameTime: 33.333 ms [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 30 FrameTime: 33.333 ms [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 30 FrameTime: 33.333 ms [ideas] speed=duration: FPS: 59 FrameTime: 16.949 ms [jellyfish] <default>: FPS: 59 FrameTime: 16.949 ms [terrain] <default>: FPS: 27 FrameTime: 37.037 ms [shadow] <default>: FPS: 57 FrameTime: 17.544 ms [refract] <default>: FPS: 32 FrameTime: 31.250 ms [conditionals] fragment-steps=0:vertex-steps=0: FPS: 60 FrameTime: 16.667 ms [conditionals] fragment-steps=5:vertex-steps=0: FPS: 60 FrameTime: 16.667 ms [conditionals] fragment-steps=0:vertex-steps=5: FPS: 60 FrameTime: 16.667 ms [function] fragment-complexity=low:fragment-steps=5: FPS: 60 FrameTime: 16.667 ms [function] fragment-complexity=medium:fragment-steps=5: FPS: 60 FrameTime: 16.667 ms [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 59 FrameTime: 16.949 ms [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 60 FrameTime: 16.667 ms [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 60 FrameTime: 16.667 ms ======================================================= glmark2 Score: 54 ======================================================= |
I could test 4K video playback in FriendlyELEC Player and I was able to play H.265, H.264 and VP9 videos with hardware video decoding.
However .ts files won’t play at all, and as such, I could not play any 10-bit H.264 nor 10-bit H.265 4K videos since all my samples are based on TS container format. I was also unable to change the display resolution from 1920×1080 to 4K resolutions such as 3840×2160 or 4196×2160 since those were simply not detected despite being connected to a 4K TV.
Qt Development & Demos
I could not help but also notice the Qt demos on the desktop. There was also an OpenCV demo but it requires a USB or MIPI camera and I could not find my USB webcam.
The Qt5 QML demos include an image browser and a smart hone UI. If you want to easily get started with Qt UI development on Arm, FriendlyELEC has a “Develop Qt Applications” section in their wiki so NanoPI M4/M4V2 may be a good starting point. They also have several Qt repositories with demos in their GitHub account.
Final words
NanoPi M4V2 metal case kit does the job at keeping the Rockchip RK3399 board cool enough in most conditions. But you must be aware the fan is really noisy when it kicks off, and I did not test the kit with an NVMe SSD which may further generate heat.
If you plan to use Android, you’ll need to purchase an additional eMMC flash module, but with FriendlyCore Desktop tested above a MicroSD card will suffice. I used a 32GB class A1 MicroSD card and performance was satisfactory. The software appears to be fairly solid and should be a good base for product development. There may be further tweaks needed to extra more performance, as we’ve found out memory bandwidth was about half of boards with DDR3 memory.
If you’d like to purchase the hardware used for this review you can do so for $98 plus shipping Just make sure to also select “Metal Case w/ Cooling Fan(NVMe SSD Adapter included) (+$28.00)” option.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress
Great review. I’ve also got one to review. I’m waiting for official Armbian to run on it. I also noticed it’s performing a little worse than the M4 in FriendlyDesktop.
It’s not to say that lpddr4 should perform better than ddr3. I think the lpddr4 modules are just cheaper than the ddr3 ones. LPDDR4 might have a worse latency than ddr3 what makes it perform worse.
I like the case. But I’ve removed the plate under the fan. With it on there the fan makes a lot of noise. Even on 5V. Without that plate you almost can’t hear it running.
Cheers.
> as we’ve found out memory bandwidth was about half of boards with DDR3 memory
This is most probably just a kernel setting. Rockchip’s Android BSP uses CONFIG_HZ=1000 by default while most Linux distros choose CONFIG_HZ=250 for both server and desktop scenarios which will end up with much better memory bandwidth benchmark numbers, see for example: https://github.com/armbian/build/issues/1142
How fast memory access really is depends on the boot BLOB in question (but maybe Rockchip in the meantime published DRAM initialization code as they promise for years now — no idea since I’m pretty much done with this ARM mess anyway).
Back on x86 then or found an unmessy arm? Or something completely different?
I’m not sure CONFIG_HZ alone explains such a difference. I already noticed bad memory performance on RK3399. Each A53 alone has 1/3 the read performance of the A72 but higher write performance! However when using two A53 at once for reads I noticed that their total performance more than doubles. Typically (I don’t remember the exact numbers) you’d get 400ns DRAM access time for a single core and only 250ns for two cores! The smaller 64-bit read datapath of the A53 alone cannot explain this, and the CPU supports assigning rotating priorities to access the L2 cache which I’ve long suspected not to be optimal, and to leave idle slots when not enough cores are in use, and to reduce the burst length. Also mixing one A72 with one A53 would further increase the A53 performance. I spent some time reading the data sheet but failed to figure working changes.
Note that I already noticed this one year ago on the H96 with DDR4 and rockchip’s BSP so it’s not limited to the NanoPI.
I gave up searching since in my build farm most cores are in use together anyway, but definitely for me there is something totally wrong in the RK3399’s memory controller.
This chip is getting old now, it’s not worth trying to investigate into this anymore IMHO.
> I’m not sure CONFIG_HZ alone explains such a difference.
It does in this case (looking at numbers generated with one specific benchmark).
Please see the four RockPro64 entries here: https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md — RK3399 running with mainline kernel resulted in twice the memory bandwidth compared to Rockchip’s 4.4 Android BSP. But there’s no magic, just the real difference between Rockchip’s BSP kernel and mainline being simply
CONFIG_HZ=1000
vs.CONFIG_HZ=250
. Once this is adjusted in RK’s 4.4 kernel memory related benchmark numbers improve.> This chip is getting old now, it’s not worth trying to investigate into this anymore IMHO
Sure, let’s move on to some other ARM adventure where software support becomes mature once the SoC is obsolete 😉
Seriously: something like exchanging the type of DRAM modules resulting in a board not being able to run any ‘standard images’ without digging into the world of smelly boot blobs or outdated u-boot variants from 5 years ago is pathetic. But pretty much represents the ARM world today if it’s not about some selected ARM based ‘real servers’.
> Seriously: something like exchanging the type of DRAM modules resulting in a board not being able to run any ‘standard images’ without digging into the world of smelly boot blobs or outdated u-boot variants from 5 years ago is pathetic. But pretty much represents the ARM world today if it’s not about some selected ARM based ‘real servers’.
This is inevitable in a world of multiple vendors and least-effort/first-to-market forces at play where a SKU is not meant to see much ram variation. Under such conditions there will always be the SoC vendor who ships something with a barely-running ram support, and the board vendor who takes that and runs away with it. Luckily, non-server arm ‘desktop’ boards (of the mITX variety) where ram is not soldered and thus vendors have no choice but provide a BL1/2 that does robust ram support. And the more SoC vendors aim for u/mITX, the more robust ram support we will get. Marvell and nxp are two examples of vendors who don’t skimp on ram support. E.g. macchiatobin (A8040) runs with DDR4 DIMMs from a plethora of vendors, in both ECC and non-ECC variants, just like any (xeon) desktop.
> Marvell and nxp are two examples of vendors who don’t skimp on ram support. E.g. macchiatobin (A8040)
Not the brightest future according to this rumor: https://forum.armbian.com/topic/11873-helios64-annoucement/?do=findComment&comment=88300
But there’s a lot more that sucks with ARM if you’re simply looking for stable software support with server and storage use cases in mind (like I do).
> Not the brightest future according to this rumor
This one doesn’t even surprize me, I expected it when seeing they acquired Cavium. Far too much overlap between the two, especially when their consumer devices are likely sufficient to pay the bills.
I guess that explains why A8040 has been phased out by some of its notable customers since this summer. Unfortunately, I don’t see Cavium meeting demand for entry-level SoCs where fewer cores, better ST performance is needed. Their TX is lacking in ST, and their TX2 is completely out of the price segment. NXP seem to be picking up with the LX series, but Marvell leaving the segment would be a bummer indeed.
> Please see the four RockPro64 entries
Impressive difference, indeed. Something’s terribly wrong in this kernel. Maybe they’re flushing the cache on each context switch to work around whatever bug :-/
> Seriously: …pathetic
Agreed. Also with DDR training blobs everywhere it’s not possible to even remotely imagine such machines one day becoming almost plug-n-play and extensible with modular components. I was truly impressed when I saw the mcbin take a normal DDR4 stick. Then I realized that it’s been this way for 4 decades in the PC world… In the ARM world, most machines require a full rebuild of some crappy BSP every time you need to adjust whatever minor setting, just as if what they put in them was extremely precious. But I’m still hoping to see such idiocies change, maybe I’m wrong.
Is it bootable from the NVMe SSD?
Can the fan speed be reduced?
Only with Armbian bootable from NVMe. But official Armbian does not yet work on it. (soon)
The fan speed can’t be adjusted to my knowledge. It would be simple to solder a potmeter in between the wires.
> Only with Armbian bootable from NVMe.
Nope since no RK3399 device can ‘boot from NVMe’ since this SoC has no boot support for PCIe at all. The only possibilities are to boot from SD card, eMMC or (non existing) SPI NOR flash. And even then it’s technically still not booting from NVMe but instead loading a patched u-boot from the supported boot media which can then in turn load the kernel from wherever it is. Radxa guys claim to provide this with RockPi 4 starting with V1.4: https://www.cnx-software.com/2019/10/08/ecopi-starter-cute-mini-pc-kit-rockpi-4b-sbc-m2-nvme-ssd/#comment-567004
The SPI NOR flash then kind of acts like a BIOS on x86 providing driver support to load the OS from unsupported media.
In Armbian I added NVMe capabilities to ‘nand-sata-install’ last year but this is still booting from either SD card or eMMC on such a NanoPi (in absence of SPI NOR flash) which means both u-boot and kernel have to reside on traditional boot media but just the rootfs is then located on an NVMe attached SSD.
What a pity: NVMe and SBC look “made for each other”: both small, low power consumption, not too expensive.
See RockPi 4 (V 1.4 or above) with an NVMe capable u-boot in its SPI NOR flash. That’s all it needs.
I run sbc-benc on the NanoPi M4 v2 using an armbian build with “default” kernel (from nanopi-m4v2-u-boot-v2019.10-ddr-miniloader branch which uses updated blobs from rockchip): results are quite simitar to those ones of the review:
Memory performance (big.LITTLE cores measured individually):
memcpy: 1351.5 MB/s (0.4%)
memset: 4803.8 MB/s (0.3%)
memcpy: 2648.2 MB/s
memset: 4871.5 MB/s (0.9%)
While:
pask@nanopim4v2:~/sbc-bench$ zcat /proc/config.gz |grep HZ
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
I guess NicoD’s hypothesis is right: sadly lpddr4 chips used fy friendlyelec on the m4v2 are worse then the ddr3 ones used on the previous version of this board.
Or perhaps the rk3399 doesn’t work well with lpddr4 memory. Would be interesting compare those results with the rockpi4’s ones.
Full results here: http://ix.io/22Oj
> lpddr4 chips used fy friendlyelec on the m4v2 are worse then the ddr3 ones used on the previous version of this board
Why do you think about hardware differences when you use a closed source piece of software that initializes memory and therefore determines its performance?