CNXSoft: This is a guest post by Drew Moseley, Technical Solutions Architect at Toradex, explaining how the company updates Linux IoT devices firmware with OSTree (aka libostree) open-source operating system build and deployment tool, as well as Docker software containers.
Every day more and more connected devices are being brought to market and estimates for the total size of the Internet of Things (IoT) market are as high as $1.5 trillion by 2027. Gas pumps, medical devices, and point of sale systems are increasingly connected, making it virtually impossible to avoid interacting with these devices, even for complete Luddites. In the home, devices such as power meters, light switches, and security cameras are commonly internet-enabled allowing for smart home functionality.
The level of complexity in the software for these devices increases with the functionality, and the number of devices with software defects in the field is growing. In many cases, these systems are designed, produced, and shipped without any consideration given to providing software updates beyond the initial program load. That’s a serious problem, and it can extend far beyond causing problems for the owner of the device, or adding warranty or recall expense for the manufacturer. In many cases, IoT devices can be aggregated into large IoT botnets that, due to large numbers, have been used for large scale attacks on critical pieces of infrastructure, such as the Distributed Denial of Service attack (DDOS) against the Dyn Domain Name Service (DNS) provider, resulting in service interruptions for large organizations such as Twitter and Github.
Why update devices?
The most obvious reason to provide software updates to devices in the field is to address software vulnerabilities. Not all vulnerabilities become exploits resulting in large-scale attacks as mentioned above, but the risk to your brand is significant. And with more and more devices in your user’s homes, it may be possible to chain vulnerabilities from multiple devices together to get broader access to your users’ data. In one memorable incident, an unnamed casino had its high-roller database breached through a vulnerability in an internet-enabled fish tank thermometer. There are approximately 14 million lines of code in the 787 Dreamliner jet (likely limited to the avionics system, and not including things such as the in-flight entertainment systems), compared with about 28 million lines of code in just the Linux kernel (as of January 2020). Keep in mind that the Linux kernel is only one part of a Linux system, so you can start to get a feel for the scale of the problem. These many lines of code will undoubtedly contain many errors needing fixes throughout your product’s lifetime.
Providing an update capability for your devices also enables you to deliver new features to your users. Depending on your business model this can be helpful for long-term customer retention or simply for providing up-sell capabilities and increasing revenue. Given the benefits of update capabilities, you may wonder why any device would be shipped without this. I struggle to define a use case where software updates would be completely unneeded.
OTA Server
Any fully automated OTA update solution requires a server that manages the fleet and allows your operations staff to manage the devices. Discussing the server-side in depth is out of scope for this article but there are numerous options available. In general, you will want to pick an end-to-end solution meaning both the update server and the update clients have been developed to be a full solution, or at the very least that the combination of server and client have been well tested and integrated with each other.
Update methods
There are a few common methods to allow for software updates.
- In-place package-based updates: this is the mechanism used by most desktop Operating Systems. Basically, an installer application or packaging system is run in the currently active OS image. This can install anything needed by the system but it may be difficult to ensure all the devices in your fleet are running the exact same binaries that you have tested in your design labs.
- Asymmetric image updates: this method generally uses a separate installer partition that is able to download the appropriate images and overwrite the primary OS partition. This eliminates the concern of partially installed package sets that happens with in-place updates but can result in long downtimes for your users. This is the method that, until recently, was employed by most mobile phone updates, and I’m sure we have all been annoyed by the amount of time these updates take.
- Symmetric image updates (commonly called dual A/B updates): this method uses fully redundant partitions containing an active and a passive partition. While running in the active partition, the update client can download and install a full image into the passive partition. Since this can all happen in the background while your application code is active, it removes the downtime concerns that come with asymmetric images. However since it uses fully redundant partitions, it generally takes more block device storage than the other methods.
- OSTree based updates: This is the subject of the next section and provides a good mix of features, allowing for minimal device downtime, and not requiring extra storage to house the redundant partitions.
OSTree
The documentation for the OSTree project defines it as follows:
libostree is both a shared library and suite of command line tools that combines a “git-like” model for committing and downloading bootable filesystem trees, along with a layer for deploying them and managing the bootloader configuration.
This is a bit nebulous so let’s work through an example. First, we will create an empty repository in a subdirectory on our development workstation. OSTree is normally used for entire filesystems but for simplicity, we will use a directory for this example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
$ ostree init --repo repo $ tree -F repo repo ├── config ├── extensions/ ├── objects/ ├── refs/ │ ├── heads/ │ ├── mirrors/ │ └── remotes/ ├── state/ └── tmp/ └── cache/ 9 directories, 1 file |
We have initialized an empty repository. You can see that it has created a number of empty directories and a single config file. The repository metadata has much in common with git. You see familiar directories such as refs/heads which are used similarly. Let’s now add a file to the repository:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
$ mkdir -p rootfs/etc $ echo 'enabled=1' > rootfs/etc/config_file.txt $ ostree --repo=repo commit --branch=main rootfs 1d0f1459b634bfbd5f573ae56a55a03a35b8941d4c45ab07dbdd6edc462c3e77 $ tree -F . . ├── repo/ │ ├── config │ ├── extensions/ │ ├── objects/ │ │ ├── 1d/ │ │ │ └── 0f1459b634bfbd5f573ae56a55a03a35b8941d4c45ab07dbdd6edc462c3e77.commit │ │ ├── 2a/ │ │ │ └── 28dac42b76c2015ee3c41cc4183bb8b5c790fd21fa5cfa0802c6e11fd0edbe.dirmeta │ │ ├── 92/ │ │ │ └── d6c7afcaedabd4504d2e16de3ffc200cd156ac733306cf7b8991f56859bcd5.file │ │ ├── af/ │ │ │ └── 98568126959bb302d2b86a25646360bd7f4fafa657490da635edf1450b32d4.dirtree │ │ └── ef/ │ │ └── 8396cb2291fe95f7b187a90fce5c6c973388c793730be1f698143bf7b01eb0.dirtree │ ├── refs/ │ │ ├── heads/ │ │ │ └── main │ │ ├── mirrors/ │ │ └── remotes/ │ ├── state/ │ └── tmp/ │ └── cache/ └── rootfs/ └── etc/ └── config_file.txt |
We see that many new objects have been created. The .dirmeta, .dirtree, and .commit files are metadata that track the file and directory metadata (permissions, ownership, etc), directory tree structure, and commit metadata respectively. The file refs/heads/main file contains the commit hash for the new commit:
1 2 3 4 5 |
$ ostree --repo=repo show $(cat repo/refs/heads/main) commit 1d0f1459b634bfbd5f573ae56a55a03a35b8941d4c45ab07dbdd6edc462c3e77 ContentChecksum: f34528e7549c000b13abceb403b65298f9d54b1c2987bd02ab0a3ea1169fec27 Date: 2021-05-27 20:21:01 +0000 (no subject) |
Note also that the object created with the .file extension is identical to the file we created. This is commonly called content-addressable storage which simply means that the files in the object store are named based on their content. The name of the object (in this case 92/d6c7afcaedabd4504d2e16de3ffc200cd156ac733306cf7b8991f56859bcd5.file) is generated from the sha256sum of the file itself, as well as the file attributes.
1 2 3 |
$ sha256sum rootfs/etc/config_file.txt $(find repo -name *.file) 16f4affa3003cb1ae3f22d5e0be86a5b6fc16bbf40662629df0aa5ad0ff52e15 rootfs/etc/config_file.txt 16f4affa3003cb1ae3f22d5e0be86a5b6fc16bbf40662629df0aa5ad0ff52e15 repo/objects/92/d6c7afcaedabd4504d2e16de3ffc200cd156ac733306cf7b8991f56859bcd5.file |
It is important to note that these files are actually hard links to the same filesystem blocks. This is an important principle of OStree and shows that it will be very space-efficient; any files that are unchanged between versions will not be duplicated, resulting in significant space savings, both for block storage on the device, as well as for download bandwidth when pulling new revisions.
1 2 3 |
$ stat rootfs/etc/config_file.txt repo/objects/92/d6c7afcaedabd4504d2e16de3ffc200cd156ac733306cf7b8991f56859bcd5.file | grep Inode: Device: 10301h/66305d Inode: 58746231 Links: 2 Device: 10301h/66305d Inode: 58746231 Links: 2 |
Just as with git, we can remove the file and then check it back out from the repository.
1 2 3 4 5 6 7 |
$ rm -rf rootfs $ ostree --repo=repo checkout main rootfs && tree -F rootfs rootfs └── etc/ └── config_file.txt 1 directory, 1 file |
Now let’s add a full root filesystem to the repository. I created a simple filesystem for a QEMU Arm device using Buildroot.
1 2 3 4 5 6 7 8 9 |
$ git clone git://git.buildroot.net/buildroot -b 2021.02.2 $ cd buildroot $ make qemu_arm_versatile_defconfig $ make $ sudo mount output/images/rootfs.ext2 /mnt $ rm -rf rootfs $ mkdir rootfs $ sudo tar -C /mnt -cf - . | tar -C rootfs/ -xf - $ ostree --repo=repo commit --branch=main --subject="Initial Linux system" rootfs |
Now we will make a second version of the filesystem. I’ve used the previous Buildroot configuration and added the bc utility which is listed under Target packages\Miscellaneous when running make menuconfig.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
$ rm -rf rootfs $ mkdir rootfs $ sudo tar -C /mnt -cf - . | tar -C rootfs/ -xf - $ ostree --repo=repo commit --branch=main --subject="Added bc binary" rootfs $ ostree --repo=repo show main commit e4568293f591b80cdf68451f6329683fbecd42ec9a5915ea44b8c77bbbf47a41 ContentChecksum: 4a9765852b6134a39444751734cc0d70ee207999b73216919644cd69ad0b3d50 Date: 2021-05-28 13:24:47 +0000 Added bc binary $ ostree --repo=repo diff main M /usr/bin/bc M /usr/bin/dc |
Now, let’s say we decide we no longer want the bc binary. Without rebuilding we can simply roll back to the previous release. First, we check that bc exists in our current filesystem; then we rollback; finally, we verify that bc is once again a symlink to busybox:
1 2 3 4 5 6 |
$ ls rootfs/usr/bin/bc -l .rwxr-xr-x 63k dmoseley 27 May 17:18 rootfs/usr/bin/bc* $ rm -rf rootfs $ ostree --repo=repo checkout 819beafa993f254adc0fc4f80bbc417a59099271317e9d3e0fe7768234dfacc3 rootfs $ ls rootfs/usr/bin/bc -l lrwxrwxrwx 17 dmoseley 27 May 17:14 rootfs/usr/bin/bc -> ../../bin/busybox |
The last feature that is important for an over-the-air update system is remote repositories. Similar to how git uses repositories, these are remotely accessible data stores containing OSTree metadata. This example is run on a Toradex Verdin i.MX8M Mini system running Torizon which is an industrial embedded Linux system based on OSTree. We connect it to the Toradex TorizonCore OSTree repository and check out the latest nightly release. There are additional details, not covered here, related to switching to the new version of the filesystem on boot so that it is an atomic operation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
root@verdin-imx8mm-06759464:~# ostree admin status * torizon 8c514a595469283bb69603f2fa32395b5dbbc3d80a8f9d824dc661f1b68c4d82.0 Version: 5.1.0+build.1 origin refspec: torizon:5/verdin-imx8mm/torizon/torizon-core-docker/release root@verdin-imx8mm-06759464:~# ostree remote add --no-gpg-verify torizon <https://feeds.toradex.com/ostree/> root@verdin-imx8mm-06759464:~# ostree pull torizon:5/verdin-imx8mm/torizon/torizon-core-docker/nightly 403 metadata, 2772 content objects fetched; 170803 KiB transferred in 54 seconds root@verdin-imx8mm-06759464:~# ostree log 5/verdin-imx8mm/torizon/torizon-core-docker/nightly commit befaaa1691c26881cbcbf804509761dffc767da5679fba895fdcf8d3cc898b80 ContentChecksum: 8f6aec54c23fa0e453526d44c5d03f759af512d24e303e5f43b1fa8ea19e356a Date: 2021-05-26 04:02:04 +0000 Version: 5.3.0-devel-20210525+build.308 5.3.0-devel-20210525+build.308 << History beyond this commit not fetched >> |
With the set of features shown by OSTree, we have the basic features needed for an OTA update system:
- The capability of storing multiple versions of the entire filesystem.
- Retention of the older versions is used to provide a robust rollback facility.
- Hard links are used to optimize storage space.
- Remote repositories allow connected devices to download updates over the “air”.
Containers
Containers are a form of OS Level Virtualization, allowing for isolated environments to run applications. They differ from Virtual Machines in that they do not virtualize the entire hardware platform and do not run a full Operating System. Their primary use case is to encapsulate an application with all the dependencies, libraries, etc, it needs to operate. Setting up an application in a container allows you to ensure that all the dependencies are met without having to install additional packages into the base OS. For example, if you are running a NodeJS application, you will package it into a container with the JavaScript runtime and all other components needed. You can specify the versions of each of the dependencies, test everything together, and then deploy the exact combination; this removes the worry that the base OS image might have a different version of a specific package. Additionally, it allows different containers to contain different versions of components if needed; for instance, you can run one package running a python application with Python v2, and another container with a package that needs Python v3.
In addition to the dependency management, containers can isolation from other components of the system, potentially increasing security. Using standard features of the base OS kernel, containers can be limited to only certain parts of the filesystem, certain devices, and even limited to a specific CPU in a multicore system. Depending on the container runtime you are using, you may be able to limit the overall CPU or memory usage of individual containers allowing your system to adjust based on usage patterns.
The third main feature of container systems is their built-in delivery mechanism. Using docker, one of the most popular container engines, you can create new containers that inherit functionality from many base images that are provided by various software providers. If you have an application written in Python v3 and you want to run it in a Debian style environment, you can create a Dockerfile that looks like the following:
1 2 3 |
FROM python:buster COPY myapp.py /myapp.py CMD python /myapp.py |
You then create the myapp.py file in the same directory containing your application. We will use the following as our test app:
1 |
print("Hello from myapp") |
You can now build and run this container directly on your build system with the following steps:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
$ docker build . -t myapp:latest Sending build context to Docker daemon 404.5kB Step 1/3 : FROM python:buster ---> a6a0779c5fb2 Step 2/3 : COPY myapp.py /myapp.py ---> f8e6a139e325 Step 3/3 : CMD python /myapp.py ---> Running in 408690452f74 Removing intermediate container 408690452f74 ---> 179d244081b9 Successfully built 179d244081b9 Successfully tagged myapp:latest $ docker run --rm myapp Hello from myapp |
The first command builds the image and tags with the name myapp and the revision latest. The second command runs the container with the start command as specified in Dockerfile with the CMD statement. Note that we are explicitly not running on our embedded device at this point. You can use containers to do a lot of development and troubleshooting on your desktop. For many application development tasks, this is more efficient than doing development directly on the embedded device. When you are ready to run the container on your embedded device, you can copy the entire working directory to your device and rerun the above commands. This will work when you are testing or dealing with a small number of devices, however as your fleet size increases, you need a better delivery mechanism. Docker provides a convenient mechanism for sharing images. We have already used this when we specified the FROM statement in our Dockerfile. This instructs docker to base our image on the python:buster image that is available on the docker hub. You can also push your images to the docker hub or any other docker repository, which is useful when you want to keep your containers private. Once you have created a docker hub account, you can create your custom image for the target architecture and publish it using the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
$ docker login Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to <https://hub.docker.com> to create one. Username: myuser Password: Login Succeeded $ docker buildx create --name mybuilder --use mybuilder $ docker buildx build --platform linux/arm64,linux/amd64,linux/arm/v7 -t myuser/myapp:latest --push . [+] Building 200.2s (16/16) FINISHED <snip> => pushing layers 176.3s => pushing manifest for docker.io/drewmoseley/myapp:latest 1.3s => [auth] myuser/myapp:pull,push token for registry-1.docker.io 0.0s |
Since we created versions for Arm32, Arm64, and AMD64, you can run your image from any system based on those architectures as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
colibri-imx8x-06748684:~$ docker run --rm myuser/myapp:latest Unable to find image 'myuser/myapp:latest' locally latest: Pulling from myuser/myapp c54d9402d498: Already exists a91bbb3592d6: Already exists 08d8c28b9129: Already exists c70b3803033b: Already exists 4d0a70a69a4a: Already exists c35b5a95ec3d: Already exists 45d8b9f6ef0c: Already exists 07fa13da7fc4: Already exists f87548a0f8e2: Already exists ccc4c71f2e3a: Pull complete Digest: sha256:099010e9ad1a7719a418422b29eecc2bafdd4c95edff6bc36d7dbb8f98f303c1 Status: Downloaded newer image for myuser/myapp:latest Hello from myapp |
The combination of a flexible, familiar software environment, as well as the built-in packaging and delivery mechanisms, make a compelling case for using containers as your application deployment environment.
OTA Updater
The combination of OStree and containers provides a rich feature set from which we can develop full OTA update capabilities. Taken together, these two projects deliver a powerful system, capable of handling the needs of the connected devices being developed today.
As discussed above, OSTree provides:
- A stable, power-tolerant mechanism for managing your Operating System binaries.
- Efficient use of storage and download bandwidth, with the ability to reuse any unmodified files.
- Rollback functionality to ensure a broken update does not render your fleet lifeless.
Using containers to house your application stack provides:
- A flexible and familiar software development environment. Your developers can continue to use the tools they are already familiar with in their desktop Linux systems.
- Built-in packaging and delivery. We don’t need to reinvent the wheel here and can take advantage of industry-proven solutions.
- Active developer community. The amount of documentation, blog posts, and ready-made containers available for reuse is vast and can be used as a starting point for your development efforts.
Using such a system allows for updating any component in the system, including the kernel, device tree, and application code. Deployed properly, you can ensure the health of your device fleet throughout its lifetime.
For the update capability to be the most useful, it needs to be automatic, unattended, and available over a network connection of some kind; commonly called over-the-air (OTA) but this does not necessarily imply a wireless connection. Users of these devices do not think of them as computers requiring maintenance, but rather they think of them as appliances that should “just work”. If updating requires user intervention, then you are likely to have many out-of-date devices in your fleet. There are use cases, such as medical devices, where the connectivity of the devices may be deliberately restricted but even in these cases the update should be automated as much as is feasible such as when the device is connected to its docking station for charging.
Security
While security is not the point of this article, we would be remiss if we didn’t discuss it at least a bit. One of the biggest threats to any software system is the ability to run arbitrary code. Since the whole point of an OTA update system is to get new code installed and running on a system, extreme care must be taken to ensure that the code being installed has not been tampered with and is the expected image for your system.
Consider the following points:
- Physical Security: You likely have control over the server infrastructure so make sure you lock down physical access as much as feasible. Note however that it will not likely be possible for the client devices.
- Transport Encryption: You must ensure that the transport between the client and server is properly encrypted. Ideally, you will use proper TLS certificate verification on both endpoints to ensure that you are talking to the expected device.
- Image Verification: Your client devices need a mechanism to validate the images being installed. Cryptographic validation should be used to protect against arbitrary software installation.
- Security Key Management: Any security architecture will rely on keys of some kind. The architecture should provide mechanisms to expire old keys and rotate in new keys, as well as providing appropriate protection of the keys.
There are open-source frameworks that provide extremely secure designs that you can use in implementing your OTA update system. The Update Framework and Uptane are two notable projects that you should consider if you need to design a custom update system.
There are numerous open-source projects that implement OTA update systems that you can integrate into your design to avoid the overhead and risk of designing your system. The Torizon Platform is the project I am currently involved with and implements the full OTA system as described in this post. OStree provides limited security features such as cryptographically signing commit and delta objects. Torizon is based on the Uptane architecture providing for a ready-built highly secure end-to-end OTA update solution.
Conclusion
We have discussed how to combine several Open Source projects as infrastructure to create a fully automated, end-to-end OTA update solution.
The use of OSTree for our primary operating system storage allows for a very space-efficient solution and does not require the use of fully redundant partitions. It provides atomic, transactional updates resulting in minimal downtime for device users. OSTree has been carefully designed to be resilient to unpredictable power cycles and allows for rollback when issues are detected with an update.
The use of containers for the application stack provides a convenient packaging and delivery mechanism that can be handled independently of the base operating system. Containers are relatively easy to use and many developers already have skills in working with them. You can choose a base image that matches your desktop Linux distribution which will allow you to work in a familiar environment with a rich set of tools at your disposal. Or you can choose a container-optimized base image (such as Alpine Linux) that is designed to be small, and inherently more secure; after all the most secure software is that which is not installed
Combining containers and OSTree gives the best of all worlds when considering build reproducibility, maintainability, and flexibility for your developers. Using a system, such as Torizon, provides a ready-made solution that uses the architecture described here. This allows you to quickly get started developing your application without worrying about the details of OSTree, containers, and OTA updates, while safe in the knowledge that you have a solid solution for managing the lifetime of your device fleet.
Providing proper updates to your devices should be considered a must-have for any modern connected device design. The risk to your users and your brand is too great to overlook.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress
Very good and interesting article with more in-depth knowledge for ppl working with immutable OSes (FCOS, RHCOS)