I missed that linux.conf.au 2021 took place on January 23-25 2021, and while browsing the schedule I noticed a talk entitled “Building Raspberry Pi Supercomputers” by Federico Lucifredi, Product Management Director for Ceph Storage at Red Hat.
In the talk, he mostly focuses on the software part, and besides some basic steps, I learned about some new commands that be useful to people managing clusters of Raspberry Pi or other Linux boards or hosts.
Configuring a cluster
He used Picocluster image in his example, but for people wanting to use 64-bit OS, he recommends Ubuntu or Fedora images until Raspberry Pi OS 64-bit becomes stable. The first part of the configuration is making sure all the main user is the same on all board, disable SSH for root, and configure run levels (X not needed on clusters). Networking is configured with fixed IP addresses for Ethernet, and DHCP for WiFi.
He also configured ssh without password (i.e. public/private keys), but using ssh-copy-id command to enable the keys on all boards, as well as NTP with timedatectl command, and an NFS share on the master node to share data.
Some of the useful commands from the Picocluster image include:
- restartAllNodes.sh used to restart all of the nodes in the cluster.
- stopAllNodes.sh used to stop or shut down all of the nodes in the cluster.
- genKeys.sh used to generate an SSH key and distribute it to all nodes.
- testAllNodes.sh runs df -h on each node to indicate that each node is up and running.
Parallell-ssh command
But the command that interested me the most from the talk was parallel-ssh which can be installed on the master node as follows:
1 |
sudo apt install pssh |
The manpage describes pssh as “a program for executing ssh in parallel on a number of hosts. It provides features such as sending input to all of the processes, passing a password to ssh, saving output to files, and timing out”.
Let’s take the first example from Federico talk:
1 |
parallel-ssh -h nodes "cat /etc/hosts" |
The nodes file contains the list of host in the following format:
1 2 3 |
user@ip_address user@ip_address user@hostname |
I tried it with two Linodes I own and already configured with private/public keys:
1 2 3 |
parallel-ssh -h nodes "cat /etc/hosts" [1] 16:40:06 [SUCCESS] user@173.230.156.xxx [2] 16:40:07 [SUCCESS] user@172.104.243.xxx |
It just returns whether the command worked without output from the command. If we can output we need to add inline:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
parallel-ssh -h nodes --inline "cat /etc/hosts" [1] 16:42:48 [SUCCESS] user@172.104.243.xxx 127.0.0.1 localhost # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters [2] 16:42:48 [SUCCESS] user@173.230.156.xxx 127.0.0.1 localhost.localdomain localhost 173.230.156.xxx xyz.cnx-software.com xyz # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 2600:3c01::f03c:91ff:xxxx:yyyy xyz.cnx-software.com xyz |
The third command just checks which hosts are alive with a ping command to the master host IP address in Federico’s cluster configuration:
1 |
parallel-ssh -h nodes "ping -c 10.1.10.240" |
The fourth command is also using ping but this time to test DNS connectivity wih output from the command on each other:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
parallel-ssh -h nodes --inline "ping -c 5 www.github.com" [1] 16:51:30 [SUCCESS] user@172.104.243.xxx PING github.com (140.82.121.4) 56(84) bytes of data. 64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=1 ttl=59 time=0.720 ms 64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=2 ttl=59 time=0.727 ms 64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=3 ttl=59 time=0.670 ms 64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=4 ttl=59 time=0.700 ms 64 bytes from lb-140-82-121-4-fra.github.com (140.82.121.4): icmp_seq=5 ttl=59 time=0.783 ms --- github.com ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4057ms rtt min/avg/max/mdev = 0.670/0.720/0.783/0.037 ms [2] 16:51:30 [SUCCESS] user@173.230.156.xxx PING github.com (192.30.255.113) 56(84) bytes of data. 64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=1 ttl=56 time=21.2 ms 64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=2 ttl=56 time=21.3 ms 64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=3 ttl=56 time=21.3 ms 64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=4 ttl=56 time=21.4 ms 64 bytes from lb-192-30-255-113-sea.github.com (192.30.255.113): icmp_seq=5 ttl=56 time=21.3 ms --- github.com ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4005ms rtt min/avg/max/mdev = 21.246/21.341/21.418/0.057 ms |
We can also check the ping time, and it’s easy to find out which host is based in the US and which one in Europe.
The last command from the slide is specific to Raspberry Pi, and used to check whether it’s possible to read the temperature (without actually returning it), so I’ll skip it.
I can see many really useful use case for parallel-ssh. For example, if you had to install a program on multiple boards, you’d only need to type the command once, and the output would show where the installation was successful, and potentially where it failed.
You can see the full talk in the video above, as sadly Federico did not share the presentation slides, at least not yet. as well as the slides that also include examples about the Message Passing Interface (MPI) open library standard for distributed memory parallelization, but it was quickly skipped in the video due to time constraints.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress
I also find cssh (https://linux.die.net/man/1/cssh) very useful. Especially when you have to run a similar (but not identical) sequence of commands on a number of machines.
Interesting Article! Can’t wait to try some of this. And I’m not Wendy, browser auto filled that (facepalm)
Ugh, don’t bother with parallel-ssh. Ansible has replaced it completely, will do the same things you want there along with a whole lot more you can get it to do as you need it.
You seem to be confusing administration and automation. But you’re not the only one, it’s frequent.
I’m using pdsh for the same thing. I tried pssh a while ago but didn’t figure how to get the output from the machines so I didn’t seek further, since if it requires to read a man page to use it once in a while, it’s not as convenient:
Anyway, when it comes to just remotely restarting services, flashing images or rebooting all nodes, I guess they’re basically equivalent. Pssh has a “pscp” command which I’ve not tried yet but which has no equivalent in pdsh, so that might be convenient to upload files, especially kernels and flash images to be burnt using nandwrite.