LLMStick - An AI and LLM USB device based on Raspberry Pi Zero W and optimized llama.cpp

Youtuber and tech enthusiast Binh Pham has recently built a portable plug-and-play AI and LLM device housed in a USB stick called the LLMStick and built around a Raspberry Pi Zero W. This device portrays the concept of a local plug-and-play LLM which you can use without the internet.

After DeepSeek shook the world with its performance and open-source accessibility, we have seen tools like Exo that allow you to run large language models (LLMs) on a cluster of devices, like computers, smartphones, and single-board computers, effectively distributing the processing load. We have also seen Radxa release instructions to run DeepSeek R1 (Qwen2 1.5B) on a Rockchip RK3588-based SBC with 6 TOPS NPU.

Pham thought of using the llama.cpp project as it’s specifically designed for devices with limited resources. However, running llama.cpp on the Raspberry Pi Zero W wasn’t straightforward and he had to face architecture incompatibility as the old Pi Zero W uses an older ARMv6 architecture, while llama.cpp had optimizations for newer Arm architectures (like ARMv8-A found in the Raspberry Pi 5). These optimizations relied on specific Arm Neon instructions that were not available on the Pi Zero W’s processor. This caused compilation errors.

To solve these issues he modified the source code of llama.cpp and identified the architecture-specific optimizations. Then, he manually removed or modified these parts of the code for compatibility with the ARMv6 architecture to create the llama.zero project. This was a time-consuming and complex process, requiring a deep understanding of the codebase and ARM architectures. On a side note, he also mentions that compiling this on a 23-year-old CPU takes out 12 hours of compilation time.

Once the compilation process was completed, he started working on an interface and settled on running the Raspberry Pi in USB Gadget Mode, where the Pi would show up as a USB storage drive. Now, to give a prompt to the LLM, you just need to make a file with a prompt, and the LLM populates the file with the answers. This setup effectively turns the Raspberry Pi Zero W into a portable plug-and-play AI device, allowing for offline LLM interaction in a compact form factor.

While writing about this I could not understand why he was not using a newer Raspberry Pi Zero 2W board, as it would be a near drop-in replacement, and significantly boost performance, enabling larger, more practical models. The new Pi uses ARMv8 architecture so it would have eliminated the need for this much modification, I think in the end he wanted to make the video interesting. Either way, this portable plug-and-play AI device shows the potential of running LLMs on minimal hardware, even if the performance remains a limiting factor. The project includes a modified version of llama.cpp, along with instructions for setting up the Pi Zero as a USB device all of which can be found on Pham Tuan Binh’s GitHub repo.

Via Hackster.io

Debashis Das

Debashis Das is a technical content writer and embedded engineer with over five years of experience in the industry. With expertise in Embedded C, PCB Design, and SEO optimization, he effectively blends difficult technical topics with clear communication