LLMStick – An AI and LLM USB device based on Raspberry Pi Zero W and optimized llama.cpp

Youtuber and tech enthusiast Binh Pham has recently built a portable plug-and-play AI and LLM device housed in a USB stick called the LLMStick and built around a Raspberry Pi Zero W. This device portrays the concept of a local plug-and-play LLM which you can use without the internet.

After DeepSeek shook the world with its performance and open-source accessibility, we have seen tools like Exo that allow you to run large language models (LLMs) on a cluster of devices, like computers, smartphones, and single-board computers, effectively distributing the processing load. We have also seen Radxa release instructions to run DeepSeek R1 (Qwen2 1.5B) on a Rockchip RK3588-based SBC with 6 TOPS NPU.

LLMStick is aportable plug-and-play AI device

Pham thought of using the llama.cpp project as it’s specifically designed for devices with limited resources. However, running llama.cpp on the Raspberry Pi Zero W wasn’t straightforward and he had to face architecture incompatibility as the old Pi Zero W uses an older ARMv6 architecture, while llama.cpp had optimizations for newer Arm architectures (like ARMv8-A found in the Raspberry Pi 5). These optimizations relied on specific Arm Neon instructions that were not available on the Pi Zero W’s processor. This caused compilation errors.

To solve these issues he modified the source code of llama.cpp and identified the architecture-specific optimizations. Then, he manually removed or modified these parts of the code for compatibility with the ARMv6 architecture to create the llama.zero project. This was a time-consuming and complex process, requiring a deep understanding of the codebase and ARM architectures. On a side note, he also mentions that compiling this on a 23-year-old CPU takes out 12 hours of compilation time.

LLM on USB Stick

Once the compilation process was completed, he started working on an interface and settled on running the Raspberry Pi in USB Gadget Mode, where the Pi would show up as a USB storage drive. Now, to give a prompt to the LLM, you just need to make a file with a prompt, and the LLM populates the file with the answers. This setup effectively turns the Raspberry Pi Zero W into a portable plug-and-play AI device, allowing for offline LLM interaction in a compact form factor.

While writing about this I could not understand why he was not using a newer Raspberry Pi Zero 2W board, as it would be a near drop-in replacement, and significantly boost performance, enabling larger, more practical models. The new Pi uses ARMv8 architecture so it would have eliminated the need for this much modification, I think in the end he wanted to make the video interesting. Either way, this portable plug-and-play AI device shows the potential of running LLMs on minimal hardware, even if the performance remains a limiting factor. The project includes a modified version of llama.cpp, along with instructions for setting up the Pi Zero as a USB device all of which can be found on Pham Tuan Binh’s GitHub repo.

World’s First USB Stick with Local LLM – AI in Your Pocket!

Via Hackster.io

Share this:
FacebookTwitterHacker NewsSlashdotRedditLinkedInPinterestFlipboardMeWeLineEmailShare

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

Radxa Orion O6 Armv9 mini-ITX motherboard

2 Replies to “LLMStick – An AI and LLM USB device based on Raspberry Pi Zero W and optimized llama.cpp”

  1. Well, first if llama.cpp cannot build on Arm without NEON, then it’s essentially either an issue with Cmake’s granularity (where build options are somewhat a bit painful, but if you manually pass -march that’s OK), or maybe there are implied uses of NEON when ARM was detected in the code, and in this case I hope he contributed the build fixes back. But that’s not customization, it’s just totally regular packaging. Second, I see no reason for removing unused code from the project. It makes me think of someone who’s not used at all to building opensource projects and feels like they’re customizing it even better, but the result is that as soon as that work is done, it’s obsolete since end users will not be able to benefit from daily updates that land in the project. Third, nobody wants to build llama.cpp on such a small target device! It should be built on the developer’s machine instead using a regular cross-compiler for that target. I feel like this self-inflicted pain just serves as an example of how heroic this task was, while my understanding is that it’s just someone who built llama.cpp for ARMv6.

    I’ve just started an ARMv6 build right now on an ARM board (miqi) with ‘cmake -B build -DCMAKE_CXX_FLAGS=”-Ofast -marm -mcpu=arm1176jzf-s” -DCMAKE_C_FLAGS=”-Ofast -marm -mcpu=arm1176jzf-s” -DBUILD_SHARED_LIBS=OFF’ -DGGML_NATIVE=OFF and from what I can tell for now it’s building, it’s already at 55% in the time it took me to type the above.

    In the end I feel like the project only consists in punching holes in a plastic box and place an RPi Zero inside just running some open source software.

    Update: and it starts on CPU (no hw acceleration etc):

    system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |

  2. You’re really overstating the amount of changes he made. He edited 4 lines in the cmake and wrote a readme. The project is neat, but this article seems out of touch.

Leave a Reply

Your email address will not be published. Required fields are marked *

Boardcon CM3588 Rockchip RK3588 System-on-Module designed for AI and IoT applications
Boardcon CM3588 Rockchip RK3588 System-on-Module designed for AI and IoT applications