With 10GbE becoming more widespread and often found in entry-level hardware, the CPU may become the bottleneck, so I’ll explain how to use iperf3 in multi-thread mode to fully saturate the 10GbE bandwidth even with a system based on a relatively low-end multi-core processor.
I’m currently reviewing the iKOOCORE R2 Max mini PC with two 10GbE interfaces and an entry-level Intel Processor N100 quad-core CPU. I have two mostly identical R2 Max systems: one fanless running OpenWrt fork (QWRT) acting as a server, and one actively cooled running Proxmox VE without guest OS. When I test the upload speed with iperf3, it’s fine at 9.41 Gbps, but the download speed is limited to about 8.6 Gbps, and bidirectional is worse:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
root@ikoolcore-r2-max-cnx:~# iperf3 -t 60 -c 192.168.4.1 -i 10 --bidir Connecting to host 192.168.4.1, port 5201 [ 5] local 192.168.4.253 port 44438 connected to 192.168.4.1 port 5201 [ 7] local 192.168.4.253 port 44444 connected to 192.168.4.1 port 5201 [ ID][Role] Interval Transfer Bitrate Retr Cwnd [ 5][TX-C] 0.00-10.00 sec 10.9 GBytes 9.40 Gbits/sec 0 3.15 MBytes [ 7][RX-C] 0.00-10.00 sec 7.56 GBytes 6.49 Gbits/sec [ 5][TX-C] 10.00-20.00 sec 10.9 GBytes 9.37 Gbits/sec 0 3.91 MBytes [ 7][RX-C] 10.00-20.00 sec 7.38 GBytes 6.33 Gbits/sec [ 5][TX-C] 20.00-30.00 sec 11.0 GBytes 9.41 Gbits/sec 0 3.91 MBytes [ 7][RX-C] 20.00-30.00 sec 7.57 GBytes 6.51 Gbits/sec [ 5][TX-C] 30.00-40.00 sec 10.9 GBytes 9.41 Gbits/sec 0 3.91 MBytes [ 7][RX-C] 30.00-40.00 sec 7.39 GBytes 6.35 Gbits/sec [ 5][TX-C] 40.00-50.00 sec 10.9 GBytes 9.40 Gbits/sec 0 3.91 MBytes [ 7][RX-C] 40.00-50.00 sec 7.86 GBytes 6.75 Gbits/sec [ 5][TX-C] 50.00-60.00 sec 10.9 GBytes 9.41 Gbits/sec 0 3.91 MBytes [ 7][RX-C] 50.00-60.00 sec 7.70 GBytes 6.61 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-60.00 sec 65.7 GBytes 9.40 Gbits/sec 0 sender [ 5][TX-C] 0.00-60.00 sec 65.7 GBytes 9.40 Gbits/sec receiver [ 7][RX-C] 0.00-60.00 sec 45.5 GBytes 6.51 Gbits/sec 0 sender [ 7][RX-C] 0.00-60.00 sec 45.5 GBytes 6.51 Gbits/sec receiver iperf Done. |
The overall system CPU usage is around 30% during the test, but we can clearly see that only one CPU core is used with 100% usage.
People using servers with high-speed 40+ Gbps Ethernet have done this type of testing for a while, but it required multiple commands. Luckily, iperf 3.16 has added support for multithreading and it’s now much easier. But I was unable to find clear instructions with a web search, so here we are.
First, we need to check the iperf3 version on our systems is indeed 3.16 or greater
OpenWrt/QWRT:
1 2 3 4 |
root@QWRT:~# iperf3 -v iperf 3.17.1 (cJSON 1.7.15) Linux QWRT 6.12 #0 SMP Sun Nov 17 14:40:37 2024 x86_64 Optional features available: CPU affinity setting, IPv6 flow label, TCP congestion algorithm setting, sendfile / zerocopy, socket pacing, bind to device, support IPv4 don't fragment, POSIX threads |
Proxmox VE 8.3:
1 2 3 4 |
root@ikoolcore-r2-max-cnx:~# iperf3 -v iperf 3.12 (cJSON 1.7.15) Linux ikoolcore-r2-max-cnx 6.8.12-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-4 (2024-11-06T15:04Z) x86_64 Optional features available: CPU affinity setting, IPv6 flow label, SCTP, TCP congestion algorithm setting, sendfile / zerocopy, socket pacing, authentication, bind to device, support IPv4 don't fragment |
It’s fine on OpenWrt, but the version is too old on Promox VE, so I built the freshly released iperf 3.18 from source:
1 2 3 4 5 6 7 |
wget https://github.com/esnet/iperf/releases/download/3.18/iperf-3.18.tar.gz tar xvf iperf-3.18.tar.gz cd iperf-3.18 sudo apt install build-essential ./configure make -j4 cd /src |
We can now launch iperf3 in server mode on the OpwenWrt machine as usual:
1 |
iperf3 -s |
and add the P parameter on the client side (Proxmox VE) to run iperf3 in multi-thread mode for the download test:
1 |
iperf3 -t 60 -c 192.168.4.1 -P 4 -i 10 -R |
Here’s the output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
root@ikoolcore-r2-max-cnx:~/iperf-3.18/src# ./iperf3 -t 60 -c 192.168.4.1 -P 4 -i 10 -R Connecting to host 192.168.4.1, port 5201 Reverse mode, remote host 192.168.4.1 is sending [ 5] local 192.168.4.253 port 42850 connected to 192.168.4.1 port 5201 [ 7] local 192.168.4.253 port 42856 connected to 192.168.4.1 port 5201 [ 9] local 192.168.4.253 port 42866 connected to 192.168.4.1 port 5201 [ 11] local 192.168.4.253 port 42870 connected to 192.168.4.1 port 5201 [ ID] Interval Transfer Bitrate [ 5] 0.00-10.01 sec 1.86 GBytes 1.60 Gbits/sec [ 7] 0.00-10.01 sec 3.66 GBytes 3.14 Gbits/sec [ 9] 0.00-10.01 sec 3.66 GBytes 3.14 Gbits/sec [ 11] 0.00-10.01 sec 1.79 GBytes 1.54 Gbits/sec [SUM] 0.00-10.01 sec 11.0 GBytes 9.41 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 10.01-20.01 sec 1.57 GBytes 1.35 Gbits/sec [ 7] 10.01-20.01 sec 3.65 GBytes 3.14 Gbits/sec [ 9] 10.01-20.01 sec 3.65 GBytes 3.14 Gbits/sec [ 11] 10.01-20.01 sec 2.08 GBytes 1.79 Gbits/sec [SUM] 10.01-20.01 sec 11.0 GBytes 9.41 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 20.01-30.01 sec 1.67 GBytes 1.44 Gbits/sec [ 7] 20.01-30.01 sec 3.65 GBytes 3.14 Gbits/sec [ 9] 20.01-30.01 sec 3.65 GBytes 3.14 Gbits/sec [ 11] 20.01-30.01 sec 1.98 GBytes 1.70 Gbits/sec [SUM] 20.01-30.01 sec 11.0 GBytes 9.41 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 30.01-40.01 sec 1.84 GBytes 1.58 Gbits/sec [ 7] 30.01-40.01 sec 3.65 GBytes 3.14 Gbits/sec [ 9] 30.01-40.01 sec 3.65 GBytes 3.14 Gbits/sec [ 11] 30.01-40.01 sec 1.82 GBytes 1.56 Gbits/sec [SUM] 30.01-40.01 sec 11.0 GBytes 9.41 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 40.01-50.01 sec 1.84 GBytes 1.58 Gbits/sec [ 7] 40.01-50.01 sec 3.65 GBytes 3.14 Gbits/sec [ 9] 40.01-50.01 sec 3.65 GBytes 3.14 Gbits/sec [ 11] 40.01-50.01 sec 1.82 GBytes 1.56 Gbits/sec [SUM] 40.01-50.01 sec 11.0 GBytes 9.41 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 50.01-60.01 sec 1.84 GBytes 1.58 Gbits/sec [ 7] 50.01-60.01 sec 3.65 GBytes 3.14 Gbits/sec [ 9] 50.01-60.01 sec 3.65 GBytes 3.14 Gbits/sec [ 11] 50.01-60.01 sec 1.82 GBytes 1.56 Gbits/sec [SUM] 50.01-60.01 sec 11.0 GBytes 9.42 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-60.01 sec 10.6 GBytes 1.52 Gbits/sec 0 sender [ 5] 0.00-60.01 sec 10.6 GBytes 1.52 Gbits/sec receiver [ 7] 0.00-60.01 sec 21.9 GBytes 3.14 Gbits/sec 0 sender [ 7] 0.00-60.01 sec 21.9 GBytes 3.14 Gbits/sec receiver [ 9] 0.00-60.01 sec 21.9 GBytes 3.14 Gbits/sec 0 sender [ 9] 0.00-60.01 sec 21.9 GBytes 3.14 Gbits/sec receiver [ 11] 0.00-60.01 sec 11.3 GBytes 1.62 Gbits/sec 0 sender [ 11] 0.00-60.01 sec 11.3 GBytes 1.62 Gbits/sec receiver [SUM] 0.00-60.01 sec 65.8 GBytes 9.42 Gbits/sec 0 sender [SUM] 0.00-60.01 sec 65.8 GBytes 9.41 Gbits/sec receiver iperf Done. root@ikoolcore-r2-max-cnx:~/iperf-3.18/src# |
The output is quite verbose with an interval of 10 seconds and four threads, but the important part is that we got the full 9.41 Gbps bandwidth.
Let’s now try a full duplex (i.e. bidirectional test):
1 |
iperf3 -t 60 -c 192.168.4.1 -i 20 -P 4 --bidir |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
root@ikoolcore-r2-max-cnx:~/iperf-3.18/src# ./iperf3 -t 60 -c 192.168.4.1 -i 20 -P 4 --bidir Connecting to host 192.168.4.1, port 5201 [ 5] local 192.168.4.253 port 58126 connected to 192.168.4.1 port 5201 [ 7] local 192.168.4.253 port 58130 connected to 192.168.4.1 port 5201 [ 9] local 192.168.4.253 port 58140 connected to 192.168.4.1 port 5201 [ 11] local 192.168.4.253 port 58148 connected to 192.168.4.1 port 5201 [ 13] local 192.168.4.253 port 58160 connected to 192.168.4.1 port 5201 [ 15] local 192.168.4.253 port 58168 connected to 192.168.4.1 port 5201 [ 17] local 192.168.4.253 port 58184 connected to 192.168.4.1 port 5201 [ 19] local 192.168.4.253 port 58188 connected to 192.168.4.1 port 5201 [ ID][Role] Interval Transfer Bitrate Retr Cwnd [ 5][TX-C] 0.00-20.02 sec 5.02 GBytes 2.16 Gbits/sec 0 1.50 MBytes [ 7][TX-C] 0.00-20.02 sec 2.40 GBytes 1.03 Gbits/sec 0 710 KBytes [ 9][TX-C] 0.00-20.02 sec 7.33 GBytes 3.14 Gbits/sec 1 2.69 MBytes [ 11][TX-C] 0.00-20.02 sec 7.17 GBytes 3.08 Gbits/sec 1 2.41 MBytes [SUM][TX-C] 0.00-20.02 sec 21.9 GBytes 9.40 Gbits/sec 2 [ 13][RX-C] 0.00-20.02 sec 3.40 GBytes 1.46 Gbits/sec [ 15][RX-C] 0.00-20.02 sec 9.70 GBytes 4.16 Gbits/sec [ 17][RX-C] 0.00-20.02 sec 4.67 GBytes 2.00 Gbits/sec [ 19][RX-C] 0.00-20.02 sec 4.14 GBytes 1.77 Gbits/sec [SUM][RX-C] 0.00-20.02 sec 21.9 GBytes 9.40 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5][TX-C] 20.02-40.02 sec 4.98 GBytes 2.14 Gbits/sec 0 1.50 MBytes [ 7][TX-C] 20.02-40.02 sec 2.43 GBytes 1.04 Gbits/sec 0 710 KBytes [ 9][TX-C] 20.02-40.02 sec 7.31 GBytes 3.14 Gbits/sec 0 2.69 MBytes [ 11][TX-C] 20.02-40.02 sec 7.16 GBytes 3.08 Gbits/sec 0 2.41 MBytes [SUM][TX-C] 20.02-40.02 sec 21.9 GBytes 9.40 Gbits/sec 0 [ 13][RX-C] 20.02-40.02 sec 4.06 GBytes 1.75 Gbits/sec [ 15][RX-C] 20.02-40.02 sec 8.44 GBytes 3.63 Gbits/sec [ 17][RX-C] 20.02-40.02 sec 4.61 GBytes 1.98 Gbits/sec [ 19][RX-C] 20.02-40.02 sec 4.77 GBytes 2.05 Gbits/sec [SUM][RX-C] 20.02-40.02 sec 21.9 GBytes 9.40 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5][TX-C] 40.02-60.02 sec 4.99 GBytes 2.14 Gbits/sec 0 1.50 MBytes [ 7][TX-C] 40.02-60.02 sec 2.43 GBytes 1.04 Gbits/sec 0 710 KBytes [ 9][TX-C] 40.02-60.02 sec 7.32 GBytes 3.15 Gbits/sec 0 2.69 MBytes [ 11][TX-C] 40.02-60.02 sec 7.14 GBytes 3.07 Gbits/sec 0 2.41 MBytes [SUM][TX-C] 40.02-60.02 sec 21.9 GBytes 9.40 Gbits/sec 0 [ 13][RX-C] 40.02-60.02 sec 3.58 GBytes 1.54 Gbits/sec [ 15][RX-C] 40.02-60.02 sec 7.91 GBytes 3.40 Gbits/sec [ 17][RX-C] 40.02-60.02 sec 5.45 GBytes 2.34 Gbits/sec [ 19][RX-C] 40.02-60.02 sec 4.95 GBytes 2.12 Gbits/sec [SUM][RX-C] 40.02-60.02 sec 21.9 GBytes 9.40 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-60.02 sec 15.0 GBytes 2.15 Gbits/sec 0 sender [ 5][TX-C] 0.00-60.02 sec 15.0 GBytes 2.15 Gbits/sec receiver [ 7][TX-C] 0.00-60.02 sec 7.26 GBytes 1.04 Gbits/sec 0 sender [ 7][TX-C] 0.00-60.02 sec 7.26 GBytes 1.04 Gbits/sec receiver [ 9][TX-C] 0.00-60.02 sec 22.0 GBytes 3.14 Gbits/sec 1 sender [ 9][TX-C] 0.00-60.02 sec 22.0 GBytes 3.14 Gbits/sec receiver [ 11][TX-C] 0.00-60.02 sec 21.5 GBytes 3.07 Gbits/sec 1 sender [ 11][TX-C] 0.00-60.02 sec 21.5 GBytes 3.07 Gbits/sec receiver [SUM][TX-C] 0.00-60.02 sec 65.7 GBytes 9.40 Gbits/sec 2 sender [SUM][TX-C] 0.00-60.02 sec 65.7 GBytes 9.40 Gbits/sec receiver [ 13][RX-C] 0.00-60.02 sec 11.0 GBytes 1.58 Gbits/sec 0 sender [ 13][RX-C] 0.00-60.02 sec 11.0 GBytes 1.58 Gbits/sec receiver [ 15][RX-C] 0.00-60.02 sec 26.1 GBytes 3.73 Gbits/sec 0 sender [ 15][RX-C] 0.00-60.02 sec 26.1 GBytes 3.73 Gbits/sec receiver [ 17][RX-C] 0.00-60.02 sec 14.7 GBytes 2.11 Gbits/sec 0 sender [ 17][RX-C] 0.00-60.02 sec 14.7 GBytes 2.11 Gbits/sec receiver [ 19][RX-C] 0.00-60.02 sec 13.9 GBytes 1.98 Gbits/sec 0 sender [ 19][RX-C] 0.00-60.02 sec 13.9 GBytes 1.98 Gbits/sec receiver [SUM][RX-C] 0.00-60.02 sec 65.7 GBytes 9.40 Gbits/sec 0 sender [SUM][RX-C] 0.00-60.02 sec 65.7 GBytes 9.40 Gbits/sec receiver iperf Done. |
Looking at the [SUM][TX-C] and [SUM][RX-C] summary over the one-minute test, we can see 9.40 Gbps transfers in both directions. Success!
HTOP shows multiple iperf3 instances nicely distributed over all four cores of the Intel N100 processor.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress
Thanks!
On my Ubuntu 24.04 “$ iperf3 –version … iperf 3.16”. So time to upgrade
Iperf2 has had thread support for decades. It’s also has many new stats when -e is used. Maybe give it a try.
need to be
multithreading 😉
It’s a 4 core machine, so those are identical.