Repurposing Enterprise GPUs: The Tesla P40 Home Lab Story
A practical guide to building an AI inference server from retired enterprise GPUs. Four Tesla P40s, 96GB of VRAM, $2,500 all-in — and the benchmarks to prove it was worth it.
If AI follows a Jevons trajectory — and five pieces in this series argue that it does — the investment question isn't whether demand will expand, but where the expansion creates bottlenecks. This piece maps the Jevons framework onto concrete investment layers: energy, physical infrastructure, custom silicon, and the application tier. It also addresses the most common objection — that GPU diminishing returns cap the expansion — and explains why multiple overlapping cost curves make the case stronger, not weaker.
I used Claude Code with Opus 4.6 and the Claude Agent SDK to generate three technical handbooks — totaling over 2,400 pages and 232 chapters — from real project source code. The key was a framework that launches 10-12 AI agents in parallel, each reading actual codebases and writing LaTeX chapters simultaneously. This post describes how the system works, what goes right, what goes wrong, and what it means for the future of technical documentation.
Steve Yegge's "The AI Vampire" describes AI-driven burnout as an extraction problem — companies capture the productivity surplus while workers absorb the cognitive toll. But what he's actually describing is Jevons Paradox applied to human attention. AI makes cognitive output dramatically cheaper, demand for it expands exactly as the model predicts, and the expansion concentrates on the one input that can't scale: human judgment.
Matt Shumer's widely-shared essay warns that AI will displace half of entry-level white-collar jobs within five years. He's right about the capability curve. He's wrong about what follows from it — because he commits the same analytical error that has produced incorrect predictions at every prior technological inflection point: modeling displacement without modeling demand expansion.
AI inference costs are declining at a rate that mirrors the early decades of Moore's Law. If the cost per token continues to fall by an order of magnitude every two to three years, the implications extend far beyond making current AI applications cheaper. This piece explores what becomes possible — not what gets displaced — when intelligence-per-dollar follows the same exponential curve that turned computing from a military luxury into a pocket commodity.
Using Claude to build a complete Z80 emulator under clean room constraints — no existing emulator source code referenced, only the Zilog CPU User Manual. An LLM clean room implementation covering all official instructions, ACIA serial emulation, and CP/M support.