Software configuration tips for Raspberry Pi clusters & parallel-ssh command

I missed that linux.conf.au 2021 took place on January 23-25 2021, and while browsing the schedule I noticed a talk entitled “Building Raspberry Pi Supercomputers” by Federico Lucifredi, Product Management Director for Ceph Storage at Red Hat.

In the talk, he mostly focuses on the software part, and besides some basic steps, I learned about some new commands that be useful to people managing clusters of Raspberry Pi or other Linux boards or hosts.

Configuring a cluster

He used Picocluster image in his example, but for people wanting to use 64-bit OS, he recommends Ubuntu or Fedora images until Raspberry Pi OS 64-bit becomes stable. The first part of the configuration is making sure all the main user is the same on all board, disable SSH for root, and configure run levels (X not needed on clusters). Networking is configured with fixed IP addresses for Ethernet, and DHCP for WiFi.

He also configured ssh without password (i.e. public/private keys), but using ssh-copy-id command to enable the keys on all boards, as well as NTP with timedatectl command, and an NFS share on the master node to share data.

Some of the useful commands from the Picocluster image include:

restartAllNodes.sh used to restart all of the nodes in the cluster.
stopAllNodes.sh used to stop or shut down all of the nodes in the cluster.
genKeys.sh used to generate an SSH key and distribute it to all nodes.
testAllNodes.sh runs df -h on each node to indicate that each node is up and running.

Parallell-ssh command

But the command that interested me the most from the talk was parallel-ssh which can be installed on the master node as follows:

sudo apt install pssh

1	sudo apt install pssh

The manpage describes pssh as “a program for executing ssh in parallel on a number of hosts. It provides features such as sending input to all of the processes, passing a password to ssh, saving output to files, and timing out”.

Let’s take the first example from Federico talk:

parallel-ssh -h nodes "cat /etc/hosts"

1	parallel-ssh -h nodes "cat /etc/hosts"

The nodes file contains the list of host in the following format:

user@ip_address
user@ip_address
user@hostname

user@ip_address

user@hostname

I tried it with two Linodes I own and already configured with private/public keys:

parallel-ssh -h nodes "cat /etc/hosts"
[1] 16:40:06 [SUCCESS] user@173.230.156.xxx
[2] 16:40:07 [SUCCESS] user@172.104.243.xxx

parallel-ssh -h nodes "cat /etc/hosts"

[1] 16:40:06 [SUCCESS] user@173.230.156.xxx

[2] 16:40:07 [SUCCESS] user@172.104.243.xxx

It just returns whether the command worked without output from the command. If we can output we need to add inline:

parallel-ssh -h nodes --inline "cat /etc/hosts"
[1] 16:42:48 [SUCCESS] user@172.104.243.xxx
127.0.0.1	localhost

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
[2] 16:42:48 [SUCCESS] user@173.230.156.xxx
127.0.0.1	localhost.localdomain localhost
173.230.156.xxx xyz.cnx-software.com xyz

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
2600:3c01::f03c:91ff:xxxx:yyyy xyz.cnx-software.com xyz

parallel-ssh -h nodes --inline "cat /etc/hosts"

[1] 16:42:48 [SUCCESS] user@172.104.243.xxx

127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts

::1 localhost ip6-localhost ip6-loopback

ff02::1 ip6-allnodes

ff02::2 ip6-allrouters

[2] 16:42:48 [SUCCESS] user@173.230.156.xxx

127.0.0.1 localhost.localdomain localhost

173.230.156.xxx xyz.cnx-software.com xyz

# The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback

fe00::0 ip6-localnet

ff00::0 ip6-mcastprefix

ff02::1 ip6-allnodes

ff02::2 ip6-allrouters

2600:3c01::f03c:91ff:xxxx:yyyy xyz.cnx-software.com xyz

The third command just checks which hosts are alive with a ping command to the master host IP address in Federico’s cluster configuration:

parallel-ssh -h nodes "ping -c 10.1.10.240"

1	parallel-ssh -h nodes "ping -c 10.1.10.240"

The fourth command is also using ping but this time to test DNS connectivity wih output from the command on each other:

parallel-ssh -h nodes --inline "ping -c 5 www.github.com"
[1] 16:51:30 [SUCCESS] user@172.104.243.xxx
PING github.com (140.82.121.4) 56(84) bytes of data.
64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=1 ttl=59 time=0.720 ms
64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=2 ttl=59 time=0.727 ms
64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=3 ttl=59 time=0.670 ms
64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=4 ttl=59 time=0.700 ms
64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=5 ttl=59 time=0.783 ms

--- github.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4057ms
rtt min/avg/max/mdev = 0.670/0.720/0.783/0.037 ms
[2] 16:51:30 [SUCCESS] user@173.230.156.xxx
PING github.com (192.30.255.113) 56(84) bytes of data.
64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=1 ttl=56 time=21.2 ms
64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=2 ttl=56 time=21.3 ms
64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=3 ttl=56 time=21.3 ms
64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=4 ttl=56 time=21.4 ms
64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=5 ttl=56 time=21.3 ms

--- github.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4005ms
rtt min/avg/max/mdev = 21.246/21.341/21.418/0.057 ms

parallel-ssh -h nodes --inline "ping -c 5 www.github.com"

[1] 16:51:30 [SUCCESS] user@172.104.243.xxx

PING github.com (140.82.121.4) 56(84) bytes of data.

64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=1 ttl=59 time=0.720 ms

64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=2 ttl=59 time=0.727 ms

64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=3 ttl=59 time=0.670 ms

64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=4 ttl=59 time=0.700 ms

64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=5 ttl=59 time=0.783 ms

--- github.com ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 4057ms

rtt min/avg/max/mdev = 0.670/0.720/0.783/0.037 ms

[2] 16:51:30 [SUCCESS] user@173.230.156.xxx

PING github.com (192.30.255.113) 56(84) bytes of data.

64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=1 ttl=56 time=21.2 ms

64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=2 ttl=56 time=21.3 ms

64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=3 ttl=56 time=21.3 ms

64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=4 ttl=56 time=21.4 ms

64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=5 ttl=56 time=21.3 ms

--- github.com ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 4005ms

rtt min/avg/max/mdev = 21.246/21.341/21.418/0.057 ms

We can also check the ping time, and it’s easy to find out which host is based in the US and which one in Europe.

The last command from the slide is specific to Raspberry Pi, and used to check whether it’s possible to read the temperature (without actually returning it), so I’ll skip it.

I can see many really useful use case for parallel-ssh. For example, if you had to install a program on multiple boards, you’d only need to type the command once, and the output would show where the installation was successful, and potentially where it failed.

You can see the full talk in the video above, ~~as sadly Federico did not share the presentation slides, at least not yet.~~ as well as the slides that also include examples about the Message Passing Interface (MPI) open library standard for distributed memory parallelization, but it was quickly skipped in the video due to time constraints.

Jean-Luc Aufranc (CNXSoft)

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.