Radxa Fogwise Airbox AI box review – Part 2: Llama3, Stable Diffusion, imgSearch, Python SDK, YOLOv8

After checking out Radxa Fogwise Airbox hardware in the first part of the review last month, I’ve now had time to test the SOPHGO SG2300x-powered AI box with an Ubuntu 20.04 Server image preloaded with CasaOS as well as Stable Diffusion and Llama3 containers.

I’ll start the second part of the review by checking out the pre-installed Stable Diffusion text-to-image generator and Llama3 AI chatbot, then manually install imgSearch AI-powered image search engine in CasaOS web dashboard, test the Python SDK in the command line, and run some AI vision models, namely Resnet50 and YOLOv8.

Radxa Airbox Fogwise review

Radxa Fogwise Airbox OS installation

Radxa only provided an Ubuntu Server 20.04 image last month with only the basics pre-installated. The company has now improved the documentation and also made two images available for the Radxa Fogwise Airbox:

  • Base image (1.2GB) – Based on Ubuntu Server 20.04; contains only Sophon base SDK and backend.
  • Full image (9.5GB) – Same as above, but adding the Radxa LLM frontend, CasaOS, and demos for common LLMs.

Beginners should go with the second even if it takes longer to download, as it will make everything much easier to test out of the box. So that is the image I went with (Radxa Airbox B5 0606), and I used USBImager to flash it in a 32GB microSD card, before inserting the microSD card into the Fogwise Airbox and monitoring the installation with the serial console as I did in the first part of the review.

After installation is complete, we can see airbox.local’s TCP port 81 is open and we can access the CasaOS dashboard using radxa and radxa as login credentials.


Fogwise Airbox CasaOS

Using Radxa Fogwise Airbox with CasaOS

CasaOS Stable Diffusion Llama3

It will show some system information (I also connected a USB hard drive), but the most important is that Stable Diffusion and Lllama3 are already installed. You may consider changing CasaOS’s username and password, and accessing the Linux terminal through SSH with username (linaro) and password (linaro) to change the password there as well…

CasaOS change username password

Here is some extra information about the system from inxi.


Only 2.99GB RAM is available to the system, although I have a machine with 16GB of RAM. That’s because the RAM is shared between the NPU (Neural Processing Unit), VPU (Video Processing Unit), and VPP (Graphics Acceleration Unit). We can check the settings with the memory_edit utility as follows:


That means we are left with 4096 MB for the system with this device tree file. If needed, it can be changed with the script as follows:


I have not changed it before the current memory configuration works fine for the AI models used.

Stable diffusion test

Let’s now click on the Stable Diffusion icon in CasaOS dashboard to start and open it. The first time I tried it looked like it would not work and Firefox was unable to connect. That’s simply because it takes time for the container to start, so you’ll want to wait a minute or two before trying again.

Gradio Stable Diffusion Select Model Controlnet

The web interface for Stable Diffusion is built with Gradio which explains why the window is called “Gradio”. The first step is to select the Model and Controlnet and click on the Load Model button although there’s not much of a choice here because each dropdown menu only has a single choice. It will take a little while to load the model (around 1 minute), and then we can try the text-to-image generator. I started with one of the examples provided at the bottom with both a prompt:

1girl, ponytail ,white hair, purple eyes, medium breasts, collarbone, flowers and petals, landscape, background, rose, abstract

and a negative prompt:

ugly, poor details, bad anatomy

plus various parameters including denoising strength and CFG (classifier-free guidance scale) scale.

Radxa Fogwise Airbox Stable Diffusion example

It took about 7 seconds to create the image. I then tried my own text prompt creating an image with a penguin surfing at a beach with some palm trees.

Local Stable Diffusion Surfing Pengium

It also took 5 to 7 seconds to create the image. Adding an input image may be quite fun and it allows us to better understand the denoising strength and CFG scale parameters. All processing is done locally, you don’t have to worry about sharing personal photos online. I used a stock photo with older people as a test.

Stable Diffusion Image Test

I set the denoising strength to the minimum to keep most of the original image, and the CFG scale to 0 to give it some flexibility. The resulting image is very close to the original.

Stable Diffusion Increase Denoising Strength

If I boost the denoising strength and play around with the CFG scale to get an acceptable result… Stable Diffusion still takes the input as a guide, but it has much more flexibility when creating a photo.

I’ve noticed that when human subjects are small in a photo the results don’t look that good. Let’s try another photo.

Make people younger stable diffusion

That’s better. Adjusting the denoising strength will create images further and further from the original. The GFC scale can create “monsters” with funny fingers and physical attributes. All tests I did completed in 5 to 10 seconds.

Llama3 on Fogwise Airbox

Time to shut down the Stable Diffusion container, and start Llama3. We’ve presented with a text prompt and a submit button at the bottom of the page. So I asked whether Llama3 knew anything about CNX Software.

Radxa Fogwise Airbox Llama3Most of it gives a pretty good summary of what CNX Software does, although the starting year is wrong. But I was told (on X), that I should not expect Llama3 to spew accurate information. I guess it’s some art project then 🙂

We don’t have performance metrics in the web interface, so I shot a screencast in Ubuntu to show the rendering speeds.

YouTube video player

I then asked Llama3 to translate a relatively short text into English but this stopped in the middle of the answer with a warning reading “reach the maximal length, Llama3 would clear all history record”.

Llama3 warning reach the maximal length

So stopped the container, changed the memory limits from 256 to 2048, and restarted Llama3.

CasaOS Llama3 change memory limit

But I got the same issue. Radxa told me it’s possible to change that:

Our Llama3 is fixed length input due with TPU design, for now is 512 length, if the total input + output > 512 the model would clear all of history information, if you want to increase the length of model, you can compile it to 1024 or more follow this link: https://github.com/sophgo/LLM-TPU/tree/main/models/Llama3 (chinese), but it would cost more inference time, or you can set –enable_history to False to ignore the history

I don’t think I can do that in CasaOS, but I’ll try again later in the command line.

Installing and running imgSearch in CasaOS

We’ve only used some preinstalled apps so far. But we can install extra apps manually including “Radxa whisper”, “Radxa ImageSearch”, and “Radxa chatdoc”. I’ll go with the imgSearch image search implementation

The first step is to click on the + icon and select “Install a customized app”

CasaOS Install Customized AppThen we need to add the parameters for the docker image:

  • Docker Image – radxazifeng278/radxa_imgsearch_app:0.1.0
  • Title – Image_Search
  • Web UI port – 9007 (you can select any unused TCP port)
  • Host Port – 9007
  • Container Port – 8501
  • Host Devices – /dev
  • Container Devices – /dev
  • CPU Shares – Medium

Radxa Fogwise Airbox Image Search

Radxa Fogwise Airbox Image Search installationNow click on the “Install” button to start the installation process which only takes several seconds.

CasaOS Install App

We now have a new app called “Image_Search”.

CasaOS New App Installed

We can click on the app to start it, however the first time I tried, it would get stuck forever in a loop showing “Running”

Airbox ImgSearch fails

If I check the log in Ubuntu 20.04 we can see a few out-of-memory errors:


We can also check the log in CasaOS and the program is continuously being killed and restarted.

CasaOS log

The trick is to change the memory limit in the app settings. I changed it to 2048.

CasaOS app Increase memory limitWe can click on Save which will reinstall the docker container with the new parameters, and this time around it can run:

Radxa Fogwise Airbox imgSearch running

My first idea was to select a directory on the hard drive attached to the Radxa Fogwise Airbox AI Box, but it’s not implemented that way, as instead, we need to manually upload a list of files. I was told the Steamlit Python framework used for this demo does not have a widget to load files from a directory. Nevertheless, I created a “Test 1” gallery with around 3 photos to get started. It could process the photos within a few seconds.

imgSearch Process Files

However, I was unable to run the test due to an error that reads “could not open ./results/EN/Test 1/index.faiss for reading: No such file or directory“.

imgSearch with Airbox file missing

If I go to the results/EN directory in the Terminal for the container, there’s a “Test 1” folder, but it’s empty.

imgSearch No Folder

Then I thought maybe, it’s not liking the space in the gallery name. So I changed that to “test2” and it worked after uploading a few recent pictures.

imgSearch Airbox success

Then I created a “CNXSoftware” gallery trying to add all images from 2023 (about 4,000 of those), but it was stuck and nothing seemed to happen. So I just uploaded a few hundred from the directory, and I was able to search for “block diagram” from the list of photos and got relevant results.

imgSearch Airbox block diagram

Checking out the Python Virtual environment on Fogwise Airbox

Time to shut down all containers running in CasaOS, and try the Python Virtual environment that should give the user more flexibility. I’ll be following the instructions for Llama3 since I have some unfinished business.

At first, I did this in the root partition (/), but I quickly ran out of space.


So I switched to the 25GB “data” partition instead. Adding an M.2 2230 NVMe SSD to the Fogwise Airbox might be a good idea since AI models are large, and you may not want to delete or move the files around all of the time…

Let’s get the Llama3 Python demo:


The Llama3 8B model can be downloaded as follows:


Let’s now setup the environment and install dependencies:


We can now start the Llama3 demo in the terminal (Note: it will take a while before we get to the prompt, so be patient):


Let’s ask the same question as before:


The AI box generates 9.566 token/s.  if you want to reproduce the Gradio demo as in CasaOS, we can start the web_demo.py Python script instead:


Python Environment Llama3 Gradio demo

All good.  I still have a 512-byte limit with either method:

How to increase Llama3 answer limit

Let’s see if we can increase the limit to 1024 and to what extent it impacts performance. Note that I first started those instructions on my Ubuntu 22.04 laptop with an Intel Core i5-13500H, 16GB RAM, and about 26GB of free space. But once I read the instructions at the end where we are told that:

Generating the bmodel takes about 2 hours or more. It is recommended to have 64G memory and over 200G of disk space, otherwise OOM or no space left errors are likely.

It’s not going to work. If only I had a machine with 64GB RAM. Oh! wait… I do!  So I installed Ubuntu 24.04 on Cincoze DS-1402 embedded computer with an Intel Core i9-12900E processor, 64GB DDR5, and a 300GB Ubuntu partition. That should do. All instructions below are done in the x86 host unless otherwise stated.

We’ll first need to install the compiler:


We’ll now need to ask permission to download the Llama3 model by filling out the form at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/tree/main. Note that it requires a manual permission. I asked on Saturday, and I could download it on Sunday. We need to install git-lfs (Large File Storage) first:


Then we have to generate a token to download the model and make sure to select “Read access to contents of all public gated repos you can access”. Let’s run the following command:


and input your username and token to retrieve the code. I did that in the root directory for the current user.

The first time I tried with Python 3.12 preinstalled in Ubuntu 24.04, but the solution requires Torch-1.12.0 not available in Python 3.12. So I had to install Python 3.10 with miniconda3:


After restarting the system will see (base) added to the prompt and Python 3.10 version is used:


We can go back to the Radxa instructions to create a virtual environment in the LLM-TPU/models/Llama3 directory:


Now copy modeling_llama.py to the transformers library in venv2, install a few extra dependencies (apparently missed by the requirements.txt file)


We will need to edit compile/files/Meta-Llama-3-8B-Instruct/config.json with our selected length:


and copy the file to the Llama3 directory from Huggingface:


Now generate the onnx file using the downloaded Llama3 and a 1024-byte length:


This part could be completed in a little over 6 minutes:


The next step would be to exit the Python virtual environment.


We can now start the compilation. I exited Docker above and also rebooted the Cincoze DS-1402, and the compilation part must take place in Docker. So we’ll need to reattach it, load the environment setting, and finally start the compilation:


There will be some warnings about the onnx model check, but those can be discarded:


This step could be completed in under 40 minutes on my machine:


Let’s copy the new model to the Fogwise Airbox:


Cincoze DS-1402 job is now done, so we need to continue on the Fogwise Airbox. First, let’s test the model:


It looks OK, but we can try to run the demo in the terminal:


It working fine:


But you’ll notice the speed has gone down with about 7.2 tokens generated per second. If I try to translate a few paragraphs in Thai, I’m still hitting the new 1024-byte length limit, but it goes further:

Radxa Model Zoo – Resnet50 and YOLOV8

I decided to do one last test by following the instructions to install the Radxa Model Zoo and run the 8-bit integer (INT8) Resnet50 model (INT16 and FP32 are also available). We are back in the Fogwise Airbox terminal:


Grace Hopper
Grace Hopper

Again we need to set up a Python virtual environment and install dependencies:


Two samples are provided one using OpenCV and the SAIL API,  and the other using SAIL and “BMVC” processing. Let’s run the OpenCV demo:


Output:


The inference time was 4.23ms (236FPS), and the total time was 75.44ms. Results in the JSON file:


It looks to work but the output is not exactly human-readable… So let’s try Yolov8 object segmentation instead using the same Python virtual environment:


There are also two samples, but let’s keep using OpenCV:


Output:


The inference time was 16.18ms (61.8 FPS), and the total time was 246.72ms.

The JSON file is still not human-readable, but the demo also generates an image (or more if there are more input images) with descriptions and outlines for each object.


Radxa Airbox Fogwise Yolo8 segmentation demo

I added another larger image (1920×1080) with more object to the test:


Inference is still fast, but postprocessing takes some time. The resulting image is pretty good.

YOLOV8 seg demo car pedestrian bicyle street

Let’s try the BMCV sample to see if the speed is better:


Preprocessing with BMCV is quite faster than OpenCV. That sample decodes with SAIL, but it’s about as fast as OpenCV. In both cases SAIL handles inferences, so it’s probably just variability between the tests.

Radxa Fogwise Airbox’s power consumption and fan noise

When I first tested the Aibox Fogwise with a 100W GaN power supply, I noted idle power consumption was about 30 Watts. Since then I’ve received a power adapter from Radxa. and the idle power consumption is around 28 Watts. That’s still high. It goes up to 39W while Llama3 provides an answer, and jumps to about 49W when generating an image with Stable Diffusion. The power consumption varies depending on the image generated.

The fan runs all the time and is quite noisy. For a device close to the user such as a mini PC that would be an issue, but considering it’s a headless system, it can always be placed in a room with Ethernet connectivity and adequate ventilation far from users.

Conclusion

Radxa Fogwise Airbox is a great little headless box for offline/onsite AI processing that works with generative AI such as LLMs and text-to-image generators, as well as computer vision models like Resnet50 or YOLOv8. It’s very easy to get started thanks to the Ubuntu + CasaOS image preloaded with Stable Diffusion and Llama3 containers making a plug-and-play AI box. There’s also a Python SDK to customize models or create your own.

The documentation is pretty good, although I often had to run extra commands to succeed, and in one case (recompiling Llama3), it did not work for me the first time despite my best efforts, and I had to work with Radxa quite a bit to complete this task. I still think that overall Radxa Fogwise Airbox is an interesting device for people wanting to experiment with on-device generative AI and computer vision, or even integrate it into a project. Power consumption may be an issue, but the 32 TOPS AI box should be compared to similar solutions such as NVIDIA Jetson modules.

I’d like to thank Radxa for sending the Fogwise Airbox AI box for review. The model reviewed here with 16GB RAM and 64GB eMMC flash, but no SSD and no WiFi can be purchased on Aliexpress for $331 plus shipping.

Share this:
FacebookTwitterHacker NewsSlashdotRedditLinkedInPinterestFlipboardMeWeLineEmailShare

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK 5 ITX RK3588 mini-ITX motherboard

10 Replies to “Radxa Fogwise Airbox AI box review – Part 2: Llama3, Stable Diffusion, imgSearch, Python SDK, YOLOv8”

  1. Interesting, especially how all is done locally.

    One comment regarding the wrong data on CNX, I heard usually such things don’t happen if the prompt asks to check out the data on the internet instead of just asking for facts. I didn’t try it myself though.

  2. Hmm, do the cores always run at 2300 MHz? What is the governor? Might explain the high power consumption. This is the absolute maximum the processor is rated for and I’ve never seen it run so high. If there was a way to reduce it to, say, 400 (minimum) at idle, it could be more power efficient. Reducing to 1150 is still a around the max on many systems (for example, raspberry pis 3 or zero 2w).

    1. Depends mostly on the process node, which I can find no info what the cpu was manufactured at.
      Being a new SoC its very likely much smaller than the Pi3’s 28nm and you can not compare core (A53) unless the process is also the same.
      Its strange they picked A53 but guess the licensing was cheap.
      Still though ‘the idle power consumption is around 28 Watts’ that is pretty awful

      1. The Cortex-A53 cores do very little on this machine. Each time I checked CPU usage was close to non-existent. The only exception may be the post-processing in YOLOv8.

    2. > Hmm, do the cores always run at 2300 MHz? What is the governor?

      These questions are the sole reason for sbc-bench’s -m/-r/-R modes. The idea is to start sbc-bench in one session and to execute the benchmarks in another to get everything monitored.

      1. Here’s the output from sbc-bench for reference:

        The governor is not reported.

        1. If I check manually, it’s set to performance after a reboot:

          1. You could find out the available governors and set it to others, or take some cores offline and see what it does to power consumption (if that’s something which interests you).

  3. I finally managed to recompile Llama3 with the help from Radxa. It was a pain, but I now have an up to 1024-byte length for replies in Llama3 and while the speed is slower it’s still acceptable (7.2 tokens/s).

    1. The LLM-TPU SDK instructions are too confusing at best, not at all user friendly. Every time a new model is released (like Llama3.1) the amount of manual work involved with no no clear documentation in english is a big show stopper for putting anything in production for this system. Until Sophon takes these issues seriously and address the SDK with a clear standards the adoption curve is not going to happen with developer community.

Leave a Reply

Your email address will not be published. Required fields are marked *

Boardcon Rockchip and Allwinner SoM and SBC products
Boardcon Rockchip and Allwinner SoM and SBC products