The Mouse That Roared: Building a Collapsed AI Lab on Limited Budget
Many AI enthusiasts assume that meaningful development requires at least one NVIDIA RTX 5090. While these cards are undeniably fast, they are often "VRAM-starved" for large-scale work, effectively limiting their utility to models and contexts that can fit within approximately 30GB of VRAM. Supporting a full stack—including embedding models, rerankers, and transcription engines like the one discussed here—would typically necessitate additional cards and host systems, which drastically inflates cost, complexity, noise, and power consumption.
In response, I have developed the Collapsed AI Lab, a setup I call the "Roaring Mouse". This system is low-cost, has a minimal physical footprint, and maintains low power consumption. Crucially, it can run massive models that would exceed the memory capacity of an RTX 5090, forcing that high-end hardware into a performance crawl by comparison. It serves as the ideal home-lab environment for development and Proof-of-Concept (PoC) configurations.

======================================================
NODE PROFILE: mf-x1 | STRIX POINT
======================================================
[ CPU ]
Model: AMD Ryzen AI 9 HX 470 w/ Radeon 890M
Topology: 24 Threads
Load: 0.01, 0.03, 0.00
[ MEMORY ]
System: 54Gi Total (37Gi Used)
(actually 64Gi total, 8GB set as dedicated VRAM)
[ GPU & COMPUTE ]
Hardware: AMD Radeon 890M (GFX1150)
VRAM/GTT: 8192 MiB Dedicated / 49152 MiB Shared
Vulkan API: 1.4.318
NPU Status: XDNA Driver Loaded (AI Ready)
[ STORAGE ]
Root FS: 773G / 1.9T (43% used)
Disk: zd0 (50G)
Disk: zd16 (50G)
Disk: zd32 (150G)
Disk: zd48 (300G)
Disk: nvme0n1 (1.9T) KINGSTON
[ NETWORK & OS ]
Local IP: 172.17.212.252/24 on enp194s0
Kernel: 6.17.0-23-generic
Uptime: up 1 week, 2 days, 22 hours, 7 minutes
======================================================
llama@mf-x1:~$
The Philosophy: Physical Models, Virtual Incubator
To maximize performance, the lab is designed with a strict architectural split:
The Physical Layer (The Muscle): The heavyweight models—transcription (Whisper) and reasoning/summarization (Qwen)—run directly on the host OS (Ubuntu 24.04). This gives them bare-metal access to the GPU and RAM without virtualization overhead.
The Virtual Layer (The Incubator): The applications run within isolated environments managed by LXC. This acts as a software incubator, housing various projects under different accounts while they "call out" to the physical layer for AI processing.
What is LXC?
LXC (Linux Containers) is a system-level virtualization method for running multiple isolated Linux systems on a single control host. Unlike traditional Virtual Machines (VMs) that emulate hardware and run their own kernels, containers share the host's kernel, making them incredibly lightweight and efficient. My setup uses a management layer that handles both high-density Containers and full Virtual Machines, allowing me to toggle environments as needed.
Current Incubator Status:
+-------------+---------+----------------------------+-----------------+
| NAME | STATE | IPV4 | TYPE |
+-------------+---------+----------------------------+-----------------+
| Dev-Station | RUNNING | 172.17.212.246 (enp5s0) | VIRTUAL-MACHINE |
| cognivault | RUNNING | 172.17.212.247 (enp5s0) | VIRTUAL-MACHINE |
| Atrus | STOPPED | | VIRTUAL-MACHINE |
| WeCa | STOPPED | | VIRTUAL-MACHINE |
| WebRAG | STOPPED | | VIRTUAL-MACHINE |
| sd-forge | STOPPED | | CONTAINER |
+-------------+---------+----------------------------+-----------------+
Note: Active projects like Dev-Station and cognivault run as VMs within this layer, while specialized tools like sd-forge are kept in lightweight containers.
Hardware: Minisforum AI X1 Pro-470
The "Roaring Mouse" is built on a headless workstation profile tuned for high-capacity memory tasks:
The Engine Room: Physical Layer Services
Currently, the transcription engine is hosted directly on the metal to provide instant access to the Radeon 890M:
1. High-Precision Transcription (Port 8090)
I run whisper-server using the large-v3-turbo model, at 6 bit.
whisper-server -m models/whisper-large-v3-turbo-q6_k.gguf --host 0.0.0.0 --port 8090
2. Real-World Performance
In a recent test, I processed a ~39.5-minute audio file (2373.7 seconds) using high-accuracy settings:
Settings: Auto-language detection, Beam Size 5, Best of 5.
Result: The system finished in 112 seconds
Efficiency: This is roughly 20x real-time speed while maintaining elite "5-beam" precision.
New Resident: The Anti-Clickbait Utility
The latest project in th incubator is the Anti-Clickbait Utility. It leverages the lab's distinct layers to act as a guardian of your time:
UI & Orchestration: The front-end resides in the Dev-Station VM, where it manages user requests and retrieves YouTube audio. It orchestrates the entire workflow: sending audio to the host's Whisper server, receiving the transcript, forwarding it to the Qwen 3.6 35B engine for analysis, and finally displaying the results to the user.
Transcription: This process is offloaded to the Physical Layer, utilizing whisper-server and the Whisper model for high-speed, bare-metal transcription.\
Critique: The raw transcript is analyzed by the Qwen 35B reasoning engine, also running on the physical layer to ensure maximum performance during complex semantic evaluations.
Verdict: If the AI determines the video is overdramatized fluff, it issues a clickbait warning. If the content is legitimate, the system produces a detailed, chaptered summary, allowing the user to extract the most important information at a glance.System Status: Stability Under Load
Even with these massive models resident in memory, the system maintains plenty of breathing room:
total used free shared buff/cache available
Mem: 54Gi 37Gi 8.6Gi 3.4Gi 12Gi 17Gi
Currently using 37Gi of memory to provide high-end AI services to the entire virtual incubator.
Next up: I'll be sharing the specific logic and prompts used to determine if a video is actually worth your 10 minutes or just chasing the algorithm.
UPDATE
Pairing this machine with a
NVIDIA DGX Spark Founders Edition EU
But this is another 5K so no more low budget!
In any case, if anyone feels like giving me a present for some reason (nameday, birthday, I would think of a reason if one is needed!) I would not mind one!
No comments:
Post a Comment