<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>TinyComputers.io</title><link>https://tinycomputers.io/</link><description>Hands-on hardware projects and deep dives into embedded systems, Z80 retro computing, FPGAs, Rust on microcontrollers, PCB design, 3D printing, and AI on AMD GPUs.</description><atom:link href="https://tinycomputers.io/rss.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 A.C. Jokela 
&lt;!-- div style="width: 100%" --&gt;
&lt;a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"&gt;&lt;img alt="" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" /&gt; Creative Commons Attribution-ShareAlike&lt;/a&gt;&amp;nbsp;|&amp;nbsp;
&lt;!-- /div --&gt;
</copyright><lastBuildDate>Fri, 08 May 2026 19:39:30 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Building Stalker: A Mid-Cap Trading Bot and the Data Network That Feeds It</title><link>https://tinycomputers.io/posts/building-stalker-a-mid-cap-trading-bot-and-the-data-network-that-feeds-it.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/building-stalker-a-mid-cap-trading-bot-and-the-data-network-that-feeds-it_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;60 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Three years ago I &lt;a href="https://tinycomputers.io/posts/a-little-rust-a-little-python-and-some-openai-custom-company-stock-reports.html"&gt;built a Slack bot that generated company stock reports&lt;/a&gt;. You'd type &lt;code&gt;/report TSLA&lt;/code&gt; and a Lambda fan-out would pull yfinance data, scrape recent news through BeautifulSoup, run technical indicators with the &lt;code&gt;ta&lt;/code&gt; library, and ask GPT-4 to write three paragraphs about what was happening with the stock. The output landed back in Slack and on a static S3 site. It was fun. It was a toy.&lt;/p&gt;
&lt;p&gt;It told you what one stock looked like. It didn't tell you what to do about it.&lt;/p&gt;
&lt;p&gt;Stalker is what I built after I stopped wanting toys.&lt;/p&gt;
&lt;p&gt;It's an autonomous mid-cap equity trading bot. It runs on AWS Lambda. It reads a daily macro brief, ranks a 300-name universe by factor scores, asks Claude Sonnet 4.6 to propose orders against the current portfolio, runs the proposed plan through a deterministic risk gate, and submits the survivors to Alpaca's paper trading API with deterministic client-order IDs so re-fires are idempotent. It logs every decision, every fill, every rejection. It emails me a daily report. It runs without me touching it.&lt;/p&gt;
&lt;p&gt;Right now it has 12 positions in a paper account modeled as a synthetic \$1,000 seeded deposit. Inception-to-date return on that \$1,000 baseline is +3.76% against SPY's +2.12% — so +1.65pp of alpha over four weeks of live operation. (To head off a misreading of the next number: the position book currently shows \$1,576 of market value, which is &lt;strong&gt;not&lt;/strong&gt; \$576 of gains on the seed. It's over-deployment from a bug I'll cover later — the bot was reading its cash limit as auto-replenishing every brief day instead of flowing from the order ledger, so cumulative buys ran past what a real \$1,000 account could have funded. The fix is in; the bot is currently trimming positions back to within the seed. The +3.76% number is the honest baseline-relative return.) The numbers don't matter at four weeks anyway — that's noise. What matters is that the system is running, the architecture is settled, and the methodology for measuring whether any of it actually works is pre-registered.&lt;/p&gt;
&lt;p&gt;Stalker is one of six related projects. The other five are data sources that feed it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Headwater&lt;/strong&gt; is a daily financial-newsletter aggregator that emits a structured macro brief twice a weekday morning and afternoon. It reads the writers I follow, classifies what they're saying, and ships a JSON document with a regime call, sector tilts, themes, and a watchlist.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Estuary&lt;/strong&gt; does cross-source consensus detection. When five different writers all flag the same ticker on the same week, that's a cluster, and clusters end up in a daily brief at 23:30 UTC.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PrivateEye&lt;/strong&gt; decodes paywalled-newsletter teasers into ticker picks. The actual decoded stock recommendations land in a daily digest.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tributary&lt;/strong&gt; ingests SEC EDGAR 8-K filings, extracts each item-level event, and classifies materiality. NT-10K late-filings, 4.02 restatements, 1.03 bankruptcies, 5.02 executive departures — Stalker reads these as risk-and-opportunity overlays.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Goldfinch&lt;/strong&gt; pulls federal prime contract awards from USAspending.gov, maps recipient legal entities to public tickers, and emits the matches as confirmatory revenue-disclosure signals.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each one is its own AWS account-scope project — its own Lambdas, its own DynamoDB tables, its own SES inbound endpoint. They're loosely coupled: each emits a JSON document on a known schema; Stalker reads from each one through a producer-specific loader at analyze time. Adding a new feeder is a Pydantic model plus a load function plus an SES rule. The architecture is "more sources are better, but no source is required."&lt;/p&gt;
&lt;p&gt;This is the post that explains how all of it hangs together.&lt;/p&gt;
&lt;h3&gt;The Data Network&lt;/h3&gt;
&lt;p&gt;The shape of the network matters. Each feeder is a separate project because each one solves a different problem. Trying to put all the signal extraction inside Stalker would have produced a monolith that does five things badly. The split lets each project focus on one thing — newsletter aggregation, consensus detection, teaser decoding, EDGAR ingestion, contract scraping — and lets Stalker focus on the trading layer alone.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Headwater&lt;/strong&gt; generates the macro overlay. The structured brief carries a &lt;code&gt;regime&lt;/code&gt; field (one of &lt;code&gt;risk_off&lt;/code&gt;, &lt;code&gt;transitional&lt;/code&gt;, &lt;code&gt;neutral&lt;/code&gt;, &lt;code&gt;transitional_risk_on&lt;/code&gt;, &lt;code&gt;risk_on&lt;/code&gt;), a list of &lt;code&gt;sector_tilts&lt;/code&gt; with &lt;code&gt;view&lt;/code&gt; and &lt;code&gt;strength&lt;/code&gt;, a list of &lt;code&gt;thematic_views&lt;/code&gt; with &lt;code&gt;affected_sectors&lt;/code&gt;, a &lt;code&gt;key_risks&lt;/code&gt; list with horizons, and a &lt;code&gt;watchlist&lt;/code&gt; of tickers under active discussion. Stalker doesn't trade the watchlist names directly — those are mostly megacaps that fall outside the mid-cap filter — but the macro block converts directly into multipliers on the combined factor score. A bullish-Energy tilt at high strength multiplies every Energy mid-cap's &lt;code&gt;combined_z&lt;/code&gt; by 1.20 before ranking. A bearish-Tech tilt at medium strength multiplies it by 0.90. The factor selection is sector-tilted by the macro view at the rank-construction stage, not by Claude's interpretation at the prompt stage. The macro pipe into selection is deterministic and traceable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Estuary&lt;/strong&gt; is a different role. It tracks what individual newsletter writers are publishing across a window, computes consensus when multiple writers converge on the same ticker within a few days, and emits the clusters along with the per-writer high-conviction picks. A cluster of five sources flagging a single name in the same week is a stronger signal than five separate calls scattered across the year. Stalker reads Estuary's output as a confirmatory soft signal — when a name in the candidate universe also has Estuary cluster support, Claude can use that as a tie-breaker between similarly-ranked candidates.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PrivateEye&lt;/strong&gt; is the cheapest unit-economics piece in the stack. Financial-newsletter teasers are designed to make you subscribe — they're written to gesture at a recommendation without giving it away. PrivateEye reads the teasers and extracts the underlying ticker pick. The decoded picks ship in a daily digest. Most of them are megacaps that don't apply to Stalker, but the percentage that fall in the \$2B–\$10B band become another confirmatory soft signal alongside the Estuary clusters.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tributary&lt;/strong&gt; is the structural-events feed. The SEC publishes 8-K filings continuously, and many of them are noise — boilerplate amendments, routine disclosures, exhibit lists. The interesting ones are the 5.02 executive departures, the 1.03 bankruptcy filings, the 4.02 audit-restatements, the 5.07 favorable shareholder votes, and the 8.01 material acquisitions. Tributary classifies each item-level event by materiality and salience, summarizes the substance, and emits one record per filing. Stalker uses high-materiality Tributary events as risk overlays: a recent NT-10K on a holding is a reason to consider trimming. The architecture treats negative-materiality items as exit triggers and positive items as additional context.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Goldfinch&lt;/strong&gt; is the newest feeder. Federal contract awards land in USAspending.gov within a few days of action; they're a real fundamental disclosure that often precedes earnings discussion of the same revenue. The challenge is the recipient-to-ticker mapping — federal awards are made to legal entities, not to listed companies, and many awards go to subsidiaries or government-services divisions whose parent ticker isn't obvious. Goldfinch handles the mapping (with a confidence rating per match), filters to material awards, and emits the matched records. A \$300M Department of Defense award to a \$5B mid-cap defense contractor is a real near-term revenue event; the same award to a \$200B megacap is rounding error. Stalker weighs the signal accordingly.&lt;/p&gt;
&lt;p&gt;Each feeder ships through SES inbound mail to its own recipient at &lt;code&gt;in.stalker.bot&lt;/code&gt;. &lt;code&gt;briefs@&lt;/code&gt; is Headwater. &lt;code&gt;estuary@&lt;/code&gt; is Estuary. &lt;code&gt;events@&lt;/code&gt; is Tributary. &lt;code&gt;picks@&lt;/code&gt; is PrivateEye. Goldfinch is a special case — it writes directly to a shared DynamoDB table because it runs in the same AWS account, but conceptually it's the same pattern: structured JSON, schema-versioned, lenient on unknown fields. SES drops each inbound message into S3, an EventBridge notification fires the Stalker ingest Lambda, the Lambda dispatches to the producer-specific parser, validates against a Pydantic model, archives the parsed payload, indexes a row in DynamoDB, and (for Headwater only) emits a &lt;code&gt;BriefIngested&lt;/code&gt; event that triggers the analysis layer.&lt;/p&gt;
&lt;p&gt;That's the inbound side. The outbound side is one Lambda — the analyzer.&lt;/p&gt;
&lt;h3&gt;The Trading Core&lt;/h3&gt;
&lt;p&gt;Stalker's actual decision logic is layered. There's a deterministic factor stack at the bottom, an LLM call in the middle, and a deterministic risk gate at the top. The LLM has freedom in the middle layer; everything below and above is mechanical.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The universe.&lt;/strong&gt; Every morning at 04:00 UTC, a Lambda hits FMP's screener API for US-listed stocks with market cap between \$2 billion and \$10 billion that are tradable on Alpaca. The result is a partition in the &lt;code&gt;stalker-universe&lt;/code&gt; DynamoDB table keyed on &lt;code&gt;refresh_date&lt;/code&gt;. The mid-cap band is the strategy's first commitment: Stalker doesn't trade megacaps (they're efficient, no edge available) and doesn't trade microcaps (liquidity, regulatory, and size-premium hazards). Mid-cap is the band where factor strategies historically have shown the most edge over passive indexing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Factor scoring.&lt;/strong&gt; Fifteen minutes after the universe refresh, a second Lambda computes per-name factor scores. Three factors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Momentum&lt;/strong&gt;: 12-month return minus the most recent month — the classical Jegadeesh-Titman 12-1 factor. The skip-month removes short-term reversal noise.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Quality&lt;/strong&gt;: trailing-twelve-month ROIC plus operating margin, equally weighted. This is the post-ADR-011 production definition. (Stalker tracks load-bearing strategy decisions in numbered Architecture Decision Records — short Markdown documents in &lt;code&gt;docs/adr/&lt;/code&gt; that record the context, the evidence, the chosen option, and the reasoning. ADR-011 was the one that switched the quality factor.) The original ROE-plus-gross-margin definition turned out to be silently destructive in backtest, dragging alpha by 31 percentage points over the 3-year survivorship-bias-free window. The ADR walks through the seven candidate definitions, the alpha and Sharpe under each, and the reasoning for picking ROIC-plus-operating-margin over single-metric ROIC even though the latter showed a slightly higher headline alpha. Subsequent ADRs in this post (009, 010, 013, 014) follow the same pattern.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Low-vol&lt;/strong&gt;: the negative of 60-day realized volatility. Lower-vol names get higher z-scores.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each factor is z-scored within sector — a Financials name's quality is graded against other Financials, not against Tech. The sector-neutral approach prevents structural concentration: REITs and banks naturally have high ROE and low vol, so cross-sectional z-scoring would otherwise dump the whole portfolio into one or two sectors regardless of the macro view. Sectors with fewer than five names skip scoring; their z-scores would be noise.&lt;/p&gt;
&lt;p&gt;The three factor z-scores combine into a &lt;code&gt;combined_z&lt;/code&gt; via configurable weights. The current production weights are 0.55 momentum, 0.225 quality, 0.225 low-vol — momentum-tilted because the bias-corrected backtest sweep showed monotonic alpha improvement up through 0.55 with diminishing return beyond that. The factor stack writes top-300 by &lt;code&gt;combined_z&lt;/code&gt; back to DynamoDB with &lt;code&gt;in_universe=true&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The macro overlay applies here.&lt;/strong&gt; Before the factor scores are written, Headwater's sector tilts are pulled in and applied as multiplicative adjustments to &lt;code&gt;combined_z&lt;/code&gt; per sector. This is the "macro pipe" mentioned earlier: the sector view from the structured brief becomes a factor in the rank itself. A &lt;code&gt;bullish/high&lt;/code&gt; Energy tilt floats Energy names up; a &lt;code&gt;bearish/medium&lt;/code&gt; Tech tilt sinks Tech names. The multipliers are deterministic — defined in &lt;code&gt;macro_sizing.py&lt;/code&gt; and unit-tested — so a given brief plus a given factor snapshot produces a reproducible rank. No LLM in this layer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The analyze layer.&lt;/strong&gt; When a Headwater brief lands in S3 and the ingest Lambda emits &lt;code&gt;BriefIngested&lt;/code&gt;, the analyze Lambda fires. It loads the brief, fetches the current Alpaca account state, fetches today's universe partition with factor scores, queries the four producer feeder loaders for recent confirmatory signals, builds a structured user message, and calls Claude Sonnet 4.6 with a tool-use schema that constrains the output to a &lt;code&gt;propose_trade_plan&lt;/code&gt; JSON object. The system prompt explains the strategy posture, the hard rules the risk gate enforces, and the role of each feeder. The user message contains the macro block, the top-50 candidates by &lt;code&gt;combined_z&lt;/code&gt; with their factor scores and Kelly-derived suggested allocations, the current positions, the universe whitelist, and the soft-signal sections from each feeder.&lt;/p&gt;
&lt;p&gt;Claude's job is constrained selection. It picks 6–10 names from the candidate list (or the existing positions) and assigns target dollar values. It cannot trade outside the universe whitelist. It cannot propose a single trade larger than 15% of NAV. It cannot exceed 25% of NAV in any single position post-trade. It cannot buy a name with earnings inside seven days. The system prompt makes these constraints explicit, but the risk gate enforces them mechanically — the LLM is one defense layer, not the only one.&lt;/p&gt;
&lt;p&gt;The proposed plan flows into &lt;code&gt;risk.evaluate()&lt;/code&gt;, a pure-Python function that takes the state and the proposed orders and returns one of four decisions: &lt;code&gt;approved&lt;/code&gt;, &lt;code&gt;needs_human_approval&lt;/code&gt;, &lt;code&gt;rejected&lt;/code&gt;, or — under specific edge cases — a noise band that triggers a re-prompt. The risk gate enforces:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;25% per-position cap (post-trade)&lt;/li&gt;
&lt;li&gt;15% per-trade cap on buys&lt;/li&gt;
&lt;li&gt;2% minimum cash buffer&lt;/li&gt;
&lt;li&gt;31-day IRS wash-sale block on buys of recently sold-at-loss names&lt;/li&gt;
&lt;li&gt;Earnings veto (no buys within 7 days of next earnings)&lt;/li&gt;
&lt;li&gt;Daily drawdown halt (-5% intraday → no new buys)&lt;/li&gt;
&lt;li&gt;Total drawdown halt (-15% from inception → no new buys)&lt;/li&gt;
&lt;li&gt;Losing-streak halt (3 consecutive losing sells → no new buys)&lt;/li&gt;
&lt;li&gt;Universe whitelist (any non-whitelisted ticker is rejected)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The risk gate's outputs are logged in DynamoDB regardless of whether execution proceeds. If the plan is &lt;code&gt;approved&lt;/code&gt;, the executor Lambda picks it up via EventBridge and submits each order to Alpaca with a deterministic &lt;code&gt;client_order_id&lt;/code&gt; formed from the SHA-1 of &lt;code&gt;(plan_id, ticker, action)&lt;/code&gt;. The deterministic ID makes re-fires idempotent: if the executor crashes after submitting three of five orders, the next invocation tries to submit the same orders, hits a 422 collision on the three already-submitted, fetches their existing state, and proceeds with the remaining two.&lt;/p&gt;
&lt;p&gt;There's one nuance in the executor that took an incident to find. Alpaca's day-trade detection rejects new orders on a symbol when an opposite-side order is already open — labeled "potential wash trade" in the rejection message. We discovered this when a sell got blocked because a stale stop-loss was still on the book. The fix is a preflight: before each submission, the executor lists open Alpaca orders for the symbol, cancels them, syncs the matching &lt;code&gt;stalker-orders&lt;/code&gt; rows to &lt;code&gt;cancelled&lt;/code&gt;, and then submits the new order. Belt and suspenders — Alpaca's enforcement still works, but we don't rely on it.&lt;/p&gt;
&lt;h3&gt;The Seeded Account&lt;/h3&gt;
&lt;p&gt;There's a subtle constraint baked into the architecture that took two iterations to get right: the bot is supposed to behave as if I'd actually deposited \$1,000 of real money, not as if it had access to Alpaca's \$100,000 paper-account default. Live trading caps the position size to what a \$1,000 retail account would actually do; paper testing should exercise the same sizing logic.&lt;/p&gt;
&lt;p&gt;The first cut of this used a simple &lt;code&gt;min(real_cash, NAV_CAP)&lt;/code&gt; cap on the cash field reported to the LLM and risk gate. That worked when the bot had no positions. Once positions accumulated, the cap silently broke the percentage math: a position with \$230 of market value plus a proposed \$40 buy is \$270 against the real \$2,500 NAV (a benign 10.8% of portfolio), but the cap reported the NAV as \$1,000, making the same position read as 27% — over the hard 25% cap. Three plans got rejected over a 36-hour window before the bug surfaced.&lt;/p&gt;
&lt;p&gt;The first fix tightened that: cap the cash, not the NAV. NAV becomes &lt;code&gt;capped_cash + sum(position market values)&lt;/code&gt;, which is a coherent number that reflects real portfolio percentages while preserving the small-account sizing exercise. That fixed the rejection bug.&lt;/p&gt;
&lt;p&gt;But it introduced a deeper problem: the cash kept reading \$1,000 every brief, replenished from Alpaca's bottomless paper-account seed. A real \$1,000 account doesn't work that way. A real account spends \$200 to buy a position; cash goes to \$800. The bot was effectively redeploying \$1,000 of fresh capital every brief day. Over four weeks the cumulative buys totaled \$2,123 against \$557 of sells — 2.1× the intended seed. A real account would have hit a cash wall after about ten buys.&lt;/p&gt;
&lt;p&gt;The honest fix is the second iteration: cash flows from the order ledger. The seeded cash at any moment is &lt;code&gt;$1,000 + sum(filled_sells) − sum(filled_buys)&lt;/code&gt;. The function scans &lt;code&gt;stalker-orders&lt;/code&gt; for filled and partially-filled rows, sums the &lt;code&gt;filled_qty * filled_avg_price&lt;/code&gt; per side, and returns the seeded cash. NAV is &lt;code&gt;seeded_cash + positions_mv&lt;/code&gt;. Buying power is &lt;code&gt;max(0, seeded_cash)&lt;/code&gt; — no margin in the seeded model. When seeded cash goes negative (the bot is over-deployed relative to the seed), the LLM sees the negative number with an inline note instructing trim-first behavior, and the risk gate's existing 2% cash-buffer check naturally enforces rebalance-only mode until sells refill the seed.&lt;/p&gt;
&lt;p&gt;After the second fix, the bot's reported state is &lt;code&gt;cash = -$566, nav = $993, buying_power = $0&lt;/code&gt;. It will spend the next several brief cycles trimming positions back into the seed before it can buy again. That's exactly how a real \$1,000 account would behave coming off a 4-week over-deployment streak. Self-healing — no manual reset, no position rebalancing required.&lt;/p&gt;
&lt;p&gt;The lesson here generalizes beyond Stalker: when a paper-test wrapper diverges from the live behavior it's supposed to mirror, the divergence accumulates silently. The only protection is to define the wrapper's semantics carefully and test the boundary conditions. "What happens when positions exceed the cap" was the question I should have asked at design time.&lt;/p&gt;
&lt;h3&gt;Backtesting Without Lying To Yourself&lt;/h3&gt;
&lt;p&gt;Backtesting a strategy against historical data is the easiest way to fool yourself in finance. The classic failure mode is &lt;strong&gt;survivorship bias&lt;/strong&gt;: you backtest against today's universe of public companies, applied retroactively. The names that delisted, got acquired, or went to zero aren't in your test set, because they're not in today's universe. Your universe is by construction a sample of survivors. You're testing how well the strategy would have done on the names that turned out to be successful — which is not the same as testing how well it would have done in real time.&lt;/p&gt;
&lt;p&gt;Stalker's backtest engine handles this through a point-in-time (PIT) universe archive. The bootstrap process queries FMP's &lt;code&gt;/delisted-companies&lt;/code&gt; endpoint to recover the names that left the public market, then queries &lt;code&gt;/historical-market-cap&lt;/code&gt; per ticker to determine the cap band membership at each historical date. The result is a JSON archive at &lt;code&gt;~/.cache/stalker-backtest/pit_universe/&amp;lt;window&amp;gt;.json&lt;/code&gt; that answers the question "which tickers were in the \$2B–\$10B mid-cap band on date X?" for any X in the bootstrap window. The current archive covers 2023-01-01 → 2026-04-30 and contains roughly 2,000 tickers ever in band, of which several hundred have a &lt;code&gt;delisted_date&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Every backtest run can take a &lt;code&gt;--pit&lt;/code&gt; flag. With it on, the rebalance candidate set at each weekly rebalance date is filtered to symbols whose market cap was actually in the \$2B–\$10B band on that date. With it off, the rebalance uses today's universe applied retroactively — the survivorship-biased path, kept around for legacy comparability.&lt;/p&gt;
&lt;p&gt;The first thing the PIT archive did was discredit a previous result. ADR-009 had bumped the production momentum weight from a balanced 0.40 / 0.30 / 0.30 to 0.55 / 0.225 / 0.225 based on a non-PIT backtest showing +24pp alpha over a 1-year window. When I re-ran the same configuration with PIT, alpha collapsed to -12pp. The original number was a survivorship-bias artifact. The momentum tilt looked dominant because the survivors were the names with strongest momentum — by definition.&lt;/p&gt;
&lt;p&gt;ADR-010 superseded ADR-009 with the bias-corrected result: at 0.55 momentum on PIT, the strategy beats SPY by +21.6% over the 3-year window. Better than equal-weight but a fraction of what the survivorship-biased number suggested. Honest accounting hurts.&lt;/p&gt;
&lt;p&gt;The PIT archive also enabled ADR-011's quality-factor switch. The original quality definition (ROE plus gross margin) was producing essentially zero contribution in the bias-corrected backtest. The natural diagnostic question was whether quality is a noisy factor at our universe size, or whether ROE-plus-GM is the wrong measurement of quality. Adding a &lt;code&gt;quality_definition&lt;/code&gt; knob to the engine and sweeping seven definitions showed the second answer: switching to ROIC-plus-operating-margin lifted alpha by +31 percentage points at production weights and improved Sharpe from 1.23 to 1.44. ROE rewards leverage (which varies enormously across mid-cap capital structures), and gross margin is industry-structural (SaaS at 80%, distribution at 8%) which sector-neutral z-scoring only partially undoes. ROIC is leverage-neutral and operating margin captures pricing power within sector — both tighter signals at our universe size.&lt;/p&gt;
&lt;p&gt;The PIT archive turns "backtest" from a marketing exercise into a real measurement. It's not perfect — historical FMP fundamental data has its own gaps and revisions — but it removes the dominant bias.&lt;/p&gt;
&lt;h3&gt;The Meta-Experiment&lt;/h3&gt;
&lt;p&gt;The factor stack has been validated end-to-end on bias-corrected backtests. The risk gate is a pure-Python module with full test coverage. The executor is mechanical. What hasn't been validated, in any rigorous way, is the LLM layer in the middle. The brief-driven analyze step might be adding alpha by combining macro context, position-aware reasoning, and human-style synthesis the factors can't see. Or it might be a wash. Or it might be subtracting alpha by overriding good factor picks with brief-narrative picks that don't survive in the data.&lt;/p&gt;
&lt;p&gt;The cost of getting this wrong in either direction is asymmetric. A non-additive LLM layer costs roughly \$200–700 a year in API plus an ongoing complexity tax — debugging an LLM-mediated trade path is harder than debugging a deterministic one. An additive LLM layer foregone is real alpha left on the table. Either way, the answer should come from data, not intuition.&lt;/p&gt;
&lt;p&gt;ADR-013 is the pre-registered A/B test that mechanizes the question. Two arms, identical except for the selection signal: the brief arm (current Stalker, with Claude reading the brief and picking 6–10 names) versus a factors-only arm (top-N by &lt;code&gt;combined_z&lt;/code&gt;, equal-weighted, same risk gates, same Kelly sizing). The factors-only arm runs as a daily shadow book — a separate DynamoDB table tracking what the deterministic strategy would have done each weekday close. The pairwise comparison is a paired daily-return diff t-test pooled over the elapsed window.&lt;/p&gt;
&lt;p&gt;The pre-registration document locks the methodology before any data is examined. Hypotheses are pinned. The test statistic is &lt;code&gt;mean(d) / (sd(d) / sqrt(N))&lt;/code&gt; where &lt;code&gt;d&lt;/code&gt; is the daily return difference. The threshold is 30 basis points per month of alpha at p &amp;lt; 0.05. The decision rule is mechanical:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Brief arm wins by &amp;gt;30bp/mo at p&amp;lt;0.05 → keep&lt;/li&gt;
&lt;li&gt;Brief arm loses by &amp;gt;30bp/mo at p&amp;lt;0.05 → simplify (retire the LLM)&lt;/li&gt;
&lt;li&gt;Inconclusive → simplify (default; the burden of evidence is on the LLM layer to justify itself)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The horizon is 12 months from inception (2026-05-04 to 2027-05-04). At horizon end, the test runs once on the pooled paired returns, the decision rule is applied, and the ADR's status updates from &lt;code&gt;proposed&lt;/code&gt; to one of &lt;code&gt;accepted (kept)&lt;/code&gt;, &lt;code&gt;accepted (simplified)&lt;/code&gt;, or — if the data supports a conditional-fire hybrid — &lt;code&gt;accepted (hybrid)&lt;/code&gt; with a follow-up ADR scoping the hybrid design.&lt;/p&gt;
&lt;p&gt;Mid-flight changes invalidate the pre-registration. If the factor weights change, the test restarts. If the prompt changes, the test restarts. The whole point of pre-registration is that it converts a tempting post-hoc optimization into a principled experiment, with a documented decision rule that doesn't move once data starts coming in.&lt;/p&gt;
&lt;p&gt;The shadow book runs on a daily 16:35 CT cron that ticks the factors-only portfolio through one rebalance, marks positions to today's close, and writes a row to &lt;code&gt;stalker-shadow-performance&lt;/code&gt;. A weekly Monday-morning cron joins the live and shadow performance series, computes the running paired-diff statistic, and posts a status comment to the project's tracking ticket. The cron is dormant infrastructure for the first 12 months — the running stats are informational only; the keep-versus-simplify decision fires once at horizon close, not every week.&lt;/p&gt;
&lt;p&gt;There's a sub-experiment I ran during the pre-registration drafting that's worth noting because it killed a tempting alternative. ADR-013's locked baseline is equal-weight factors-only, but the live system uses Kelly-derived suggestions to bias Claude's sizing. A reasonable concern was: if Kelly-as-binding-sizer (using the &lt;code&gt;combined_z&lt;/code&gt; to set position weights directly, not just suggest them to the LLM) beats equal-weight by a lot in backtest, then the equal-weight baseline is a weak counterfactual and the brief arm is being given an unfair advantage. ADR-014 ran the offline comparison: at top_n=30, Kelly-as-binding-sizer appeared to beat equal-weight by +47.75pp alpha. Striking number. I almost wrote it up as a win.&lt;/p&gt;
&lt;p&gt;The result didn't survive sensitivity checking. At top_n=10 (where Kelly's per-name weights actually fit within the cash budget), Kelly underperformed equal-weight by -43pp. The +47.75pp at top_n=30 was an implementation artifact: with the unnormalized Kelly weights summing to over 100% of NAV, the engine's per-buy &lt;code&gt;cost = min(delta, cash)&lt;/code&gt; clamp was eating the lower-ranked names and concentrating capital in the highest-z names. Kelly was acting as both selector and sizer at top_n=30 by exhausting cash on the top names. The apparent edge was concentration, not better relative weighting. ADR-014 was rejected — Kelly stays as advisory input to Claude rather than a binding sizer, and the equal-weight baseline for ADR-013 is defensible.&lt;/p&gt;
&lt;p&gt;This kind of pre-flight check is the discipline pre-registration enables. The temptation to ship a +47pp number is real. The discipline of asking "but does it survive the obvious sensitivity check?" is what separates a research finding from a marketing claim.&lt;/p&gt;
&lt;h3&gt;The Data Extraction Layer&lt;/h3&gt;
&lt;p&gt;Most of what makes the data network valuable is the upstream work — the work of getting the data into a normalized, schema-versioned form that Stalker can read. Each feeder has its own extraction story. Headwater reads HTML email digests and parses them into structured records. Estuary tracks individual writer feeds and computes consensus. PrivateEye decodes paywalled teasers into ticker picks, which is its own can of worms. Tributary subscribes to SEC EDGAR's filing stream and runs a structured extraction over the 8-K text. Goldfinch hits USAspending.gov's API and runs the recipient-to-ticker mapping.&lt;/p&gt;
&lt;p&gt;The orchestration for the heavier extraction work runs on a Bosgame M5 mini PC in my basement — the same machine that handles DirtScout's tax-list PDF extraction. It's a Ryzen-class system with 128GB of RAM and decent local inference horsepower for the structured-extraction passes that don't need frontier-model quality. The split between cloud and on-prem is roughly: cloud handles the trade-path quality-sensitive work (Claude Sonnet 4.6 for analysis), and on-prem handles the batch extraction work (PDF parsing, structured field extraction, classification at volume). Same pattern I described in &lt;a href="https://tinycomputers.io/posts/the-economics-of-owning-your-own-inference.html"&gt;the economics of owning your own inference&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Cron jobs on the Bosgame run the weekly gradient sweep that probes the factor weight space, the daily shadow-book tick for the ADR-013 A/B, and the periodic ADR validation runs that check whether shipped strategy changes are tracking their predicted alpha. Each script is wrapped in a &lt;code&gt;cron_wrapper.sh&lt;/code&gt; that does a &lt;code&gt;git pull --ff-only&lt;/code&gt; before exec, so changes pushed from my laptop propagate to the on-prem cron without manual SSH. The bash wrapper is forty lines long and has saved me hours.&lt;/p&gt;
&lt;p&gt;The point of the on-prem layer is operational independence. The cloud Lambdas are for the trading path — they need to be reliable, fast, and well-observable. The on-prem cron is for the background research path — it can take twenty minutes to run a backtest sweep, and that's fine. Putting the long-running work on Lambda would burn timeout budget and money for no benefit. Putting the trading work on the on-prem machine would bind the strategy's uptime to my home internet. The split is operationally cleanest.&lt;/p&gt;
&lt;h3&gt;What This Is Actually For&lt;/h3&gt;
&lt;p&gt;Stalker is paper-trading. It hasn't moved a dollar of real money. The Alpaca account is paper, the brokerage credentials are paper-mode, and the seeded \$1,000 is synthetic. The point isn't to make money in the next four weeks. The point is to validate the architecture under realistic conditions before any real money is at stake.&lt;/p&gt;
&lt;p&gt;The criteria I want to satisfy before live cutover are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The strategy beats SPY meaningfully on PIT-corrected backtest (currently +21.6% over 3 years at production weights — done).&lt;/li&gt;
&lt;li&gt;The factor definitions are documented in ADRs and the rationale is reproducible (done).&lt;/li&gt;
&lt;li&gt;The risk gate is unit-tested with full coverage of every guardrail (done).&lt;/li&gt;
&lt;li&gt;The executor handles real-world edge cases like opposite-side wash-trade rejection and partial fills (done).&lt;/li&gt;
&lt;li&gt;The seeded-account model is validated end-to-end including the over-deployment failure mode (done).&lt;/li&gt;
&lt;li&gt;Six months of paper-trading without operational incidents — no missed briefs, no failed analyses, no risk-gate false positives, no executor failures that get past idempotency.&lt;/li&gt;
&lt;li&gt;The brief-versus-factors-only A/B has run to horizon and the LLM layer is justified (12 months — in progress).&lt;/li&gt;
&lt;li&gt;The bot's behavior under drawdown halts has been observed in practice (waiting for a real correction).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That's a long list. The 6-month operational checkpoint and the 12-month ADR-013 close-out are the slow parts. The other items are mostly done.&lt;/p&gt;
&lt;p&gt;What this isn't: a money-printing scheme. The factor stack has documented edge in academic literature and bias-corrected backtest, but mid-cap factor strategies are well-known and don't have huge mispricing margins. The expected outcome at the upper end is something like SPY +5–15pp annualized at modestly higher volatility. That's a decent risk-adjusted return, not a free lunch. If the strategy underperforms in real time, the answer is to deconstruct what changed — regime shift, factor decay, prompt drift, brief-stream change — not to tweak parameters until the curve looks right.&lt;/p&gt;
&lt;p&gt;What this is: an exercise in building the smallest deployable instance of a real-money trading system, then validating its components individually before scaling. The \$1,000 seed is the smallest amount that exercises live-account semantics (rounding, fractional shares, cash management) without trivial scaling. The mid-cap focus is the band where factor edges historically existed. The pre-registered A/B is the methodology that converts "I think the LLM is doing useful work" into "the data either supports keeping the LLM or it doesn't."&lt;/p&gt;
&lt;p&gt;The other five projects in the data network exist for the same reason. Each one is its own small thing that does one job well, with a published schema, with structured output, with a clean handoff to whoever's downstream. Headwater could feed any number of consumers; Stalker is just one. Tributary's 8-K events could drive any number of overlays; Stalker happens to read them as risk filters. The architecture is a sequence of clean producer-consumer interfaces with no shared state, no implicit dependencies, no monolith. Adding a seventh project tomorrow would be — schema, ingest path, loader, prompt mention. Same pattern, different signal.&lt;/p&gt;
&lt;h3&gt;What I'd Do Differently&lt;/h3&gt;
&lt;p&gt;A few things, in retrospect.&lt;/p&gt;
&lt;p&gt;I'd start with the PIT archive sooner. ADR-009 shipped a non-PIT backtest result that turned out to be largely a survivorship-bias artifact. The corrected number from ADR-010 is still a real edge, but it's a fraction of what the original number suggested. If I'd built the PIT bootstrap as part of the initial backtest infrastructure rather than retrofitting it, I would have shipped fewer ADRs that needed superseding. The cost of the PIT bootstrap is real — the FMP &lt;code&gt;/delisted-companies&lt;/code&gt; endpoint takes a few minutes to walk and the per-ticker historical-cap queries take an hour the first time — but the cost is one-time, and the cost of being wrong about a strategy parameter is worse.&lt;/p&gt;
&lt;p&gt;I'd pre-register the LLM-versus-factors test earlier. ADR-013's pre-registration locks methodology before data examination. I drafted it after the live system had been running for three weeks, which means the locked-baseline decision was already informed by the running paper performance. That's not strictly invalidating — the locked threshold (30bp/mo at p&amp;lt;0.05) doesn't move with informed data — but a properly clean pre-registration runs before any live data is observed. The lesson is to instrument the meta-experiment before instrumenting the experiment.&lt;/p&gt;
&lt;p&gt;I'd separate the risk constants from the strategy constants more cleanly. Things like &lt;code&gt;MAX_POSITION_PCT&lt;/code&gt; and &lt;code&gt;WASH_SALE_DAYS&lt;/code&gt; live in the same module as factor weights, which conflates policy-decision parameters (where changes are sensitive and should be ADR-driven) with implementation parameters (where changes are routine). The current structure works, but a clean separation would make the policy boundary more obvious.&lt;/p&gt;
&lt;p&gt;I wouldn't change the producer-consumer pattern. That's the most reusable architectural decision in the stack. Each feeder being its own project with its own schema and its own SES inbound endpoint means I can add or remove sources without touching the consumer logic. Stalker reads from each loader independently and degrades gracefully if any loader returns empty — DDB throttle, S3 hiccup, brief stream paused, whatever. The system stays operational on partial signal. That property has been worth every minute of the architecture-discipline cost.&lt;/p&gt;
&lt;h3&gt;Cross-Project Notes&lt;/h3&gt;
&lt;p&gt;If you've read &lt;a href="https://tinycomputers.io/posts/building-dirtscout-a-land-acquisition-platform-with-claude-code.html"&gt;the DirtScout post&lt;/a&gt;, some of the patterns here will look familiar. CDK in Python for infrastructure-as-code. Python Lambdas with deterministic IDs for idempotency. DynamoDB instead of a relational database. SES inbound for the producer-mail pattern. Static Next.js export on CloudFront for the human-facing dashboard. The same architectural style that worked for a land-acquisition platform works for a trading system, because both are read-heavy event-driven workloads with bursty inbound and structured persistence.&lt;/p&gt;
&lt;p&gt;The differences are at the edges. DirtScout deals with parcels — slow-moving, geographically-bound, low-cardinality. Stalker deals with tickers — fast-moving, market-state-dependent, high-correlation. DirtScout's risk model is "did we accidentally surface a parcel that's not for sale." Stalker's risk model is "did we put 30% of NAV into one ticker right before its earnings miss." The shape of the failure modes determines the shape of the safeguards.&lt;/p&gt;
&lt;p&gt;The other shared piece is the &lt;a href="https://tinycomputers.io/posts/vibecoding-the-controversial-art-of-letting-ai-write-your-code-friend-or-foe.html"&gt;vibecoding&lt;/a&gt; approach to the codebase itself. I direct the architecture, make the load-bearing decisions, and review the diffs. The actual lines of code mostly come from conversations. Stalker is around 8,500 lines of Python plus 800 lines of CDK plus 3,400 lines of TypeScript (the dashboard). I wrote almost none of that by hand. I directed all of it.&lt;/p&gt;
&lt;p&gt;That's a real distinction. "Directing" means owning the architecture, the policy decisions, the risk constants, the methodology for evaluation, the criteria for live cutover. It means saying "no, this isn't how we should structure that" or "actually let's pull this up to its own ADR before we ship it." It's design and review, not typing. The typing is a commodity. The design isn't.&lt;/p&gt;
&lt;h3&gt;The Forward Path&lt;/h3&gt;
&lt;p&gt;The 12-month ADR-013 horizon ends 2027-05-04. Between now and then, the system runs on its own. The shadow book accumulates daily. The weekly stats post lands in the project tracker every Monday. If the strategy hits a real drawdown, the halt logic engages and I find out whether the halt criteria are calibrated correctly. If a brief stream goes down for a day, the bot still has factor signal to fall back on. If a feeder schema changes, the lenient Pydantic models tolerate the addition until I update the consumer.&lt;/p&gt;
&lt;p&gt;At horizon, the close-out runs the t-test, applies the decision rule, and updates ADR-013's status. If the brief arm wins by margin, the LLM layer keeps its place in the architecture. If it loses or is inconclusive, I retire the LLM analysis path — &lt;code&gt;analyze_handler.py&lt;/code&gt; becomes a thin "select top-N by combined_z" function and the brief stream becomes pure observability rather than the trading driver. Both outcomes are structurally fine; the point is the choice is made by data.&lt;/p&gt;
&lt;p&gt;The five upstream projects keep running regardless. Headwater publishes its briefs. Estuary computes its consensus clusters. PrivateEye decodes its teasers. Tributary classifies its 8-Ks. Goldfinch matches its contracts to tickers. Each project is independently valuable; each one feeds Stalker; none depends on Stalker for its purpose.&lt;/p&gt;
&lt;p&gt;The five-feeders-and-a-trader architecture is the part I'm most certain about. The factor stack might need to evolve. The LLM layer might get retired at horizon. The risk constants might need adjustment under live conditions. But the producer-consumer pattern, the per-project SES inbound endpoint, the lenient Pydantic schemas, the deterministic IDs, the bias-corrected backtest discipline, the pre-registered A/B for the load-bearing architectural choice — those are durable. They're the parts I'd rebuild the same way if I started over.&lt;/p&gt;
&lt;p&gt;Three years ago I built a Slack bot that generated stock reports. It told you what one stock looked like.&lt;/p&gt;
&lt;p&gt;Stalker tells you what to do about it. It runs on its own. It's written down what it expects to see and how it'll know whether it was right. And it's surrounded by five other projects that do the work of making structured data available in the first place — because the trading layer is the smallest part of the system, and the data layer is where most of the leverage lives.&lt;/p&gt;
&lt;p&gt;The work continues.&lt;/p&gt;</description><category>ai</category><category>alpaca</category><category>aws</category><category>backtest</category><category>claude code</category><category>dynamodb</category><category>estuary</category><category>factor investing</category><category>finance</category><category>goldfinch</category><category>headwater</category><category>lambda</category><category>mid-cap</category><category>paper trading</category><category>privateeye</category><category>python</category><category>stalker</category><category>trading</category><category>tributary</category><guid>https://tinycomputers.io/posts/building-stalker-a-mid-cap-trading-bot-and-the-data-network-that-feeds-it.html</guid><pubDate>Fri, 08 May 2026 18:00:00 GMT</pubDate></item><item><title>What Terra Populus Taught Me About Cancelling Quiver</title><link>https://tinycomputers.io/posts/what-terra-populus-taught-me-about-cancelling-quiver.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/what-terra-populus-taught-me-about-cancelling-quiver_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;35 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;I joined Terra Populus in 2012 as a senior engineer. It was a project at the Minnesota Population Center, now part of the Institute for Social Research and Data Innovation at the University of Minnesota, and it was funded by an NSF cooperative agreement totaling about \$8 million across the life of the project. The output was free to the research community. The work that produced it was not.&lt;/p&gt;
&lt;p&gt;In 2013 the original leadership of the project left and I started inheriting the work. Nobody hands you the lead role on a federally-funded data integration project the way you'd hand someone a clean repo and a Friday lunch. You inherit it the way you inherit a house with the heating system halfway through a retrofit — there's a budget, there's a deadline, there's a researcher in Massachusetts who needs a specific harmonized variable to ship a paper, and there's a stack of NSF reporting requirements that don't pause while you figure out what's going on. By 2014 the role was formalized and I led the engineering through to the project's wind-down in late 2016 and early 2017.&lt;/p&gt;
&lt;p&gt;I've been thinking about that experience for the last week because I'm cancelling a Quiver Quantitative subscription. The Quiver tier I'm on costs somewhere in the \$30 to \$70 per month range and it's the one with the alternative-data feeds I needed for a recent research project. The project is done. The subscription renews in a few weeks. I have a decision to make.&lt;/p&gt;
&lt;p&gt;What surprised me, sitting with the decision, is how much my comfort with making it draws on the years I spent building the academic version of what Quiver does. The two services look nothing alike on the surface. Underneath, they are doing exactly the same shape of work. And the question of who pays for that work — and how — is more interesting than any cost-per-month arithmetic.&lt;/p&gt;
&lt;h3&gt;What Terra Populus Actually Did&lt;/h3&gt;
&lt;p&gt;Terra Populus integrated three things that don't natively join: population microdata (people, households, decade-by-decade), environmental data (climate, land cover, atmospheric measurements), and land-use data (parcels, agricultural designations, administrative boundaries). The pitch to NSF was that researchers studying the relationship between human populations and the natural environment shouldn't have to spend two years writing crosswalks before they can ask their question.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ipums-terra-landing.png" alt="Screenshot of the IPUMS Terra landing page. Header reads 'What is IPUMS Terra?' with the description 'IPUMS Terra integrates the world's population and environmental data including: Population censuses and surveys; Land cover data classified from satellite imagery; Temperature, precipitation, and related climate data; Land use data derived from censuses and surveys in combination with remotely sensed data.' A central diagram shows three interconnected nodes labeled MICRODATA, RASTER DATA, and AREA-LEVEL DATA, linked by arrows. To the right are three output cards: Microdata Output (characteristics of individual people with attached contextual variables derived from area-level and/or raster data), Area-level Output (characteristics of geographic units including aggregate population data and/or summaries from raster data), and Raster Data Output (data in spatial grids potentially derived from area-level data). A left sidebar lists Available Datasets (Microdata, Area-level, Raster) and Tutorials." style="max-width: 100%; border: 1px solid #ddd; border-radius: 8px;"&gt;&lt;/p&gt;
&lt;p&gt;The user-facing version of the project shipped under the IPUMS Terra brand and ran inside the existing IPUMS data-extraction infrastructure. The screenshot above is what a researcher saw when they came to ask the question we were trying to make askable: pick a population dataset, pick an environmental or land-use dataset, choose the join structure that fits your analysis (microdata, area-level, or raster), and pull a custom extract. The three little nodes in the middle of that diagram look simple. Most of the engineering effort in the project was in making them simple.&lt;/p&gt;
&lt;p&gt;The hardest specific problem we ran into, over and over, was geographic boundary harmonization across census decades. Census tracts redraw every ten years. A neighborhood that was one tract in 1990 might be three tracts in 2000 and two tracts in 2010 with completely different shapes. A researcher who wants to study, say, neighborhood-level income trajectories from 1980 to 2010 cannot just join the 1980 file to the 2010 file on tract ID. The tract IDs don't refer to the same places. The boundaries shift. Pieces split. Pieces merge. Pieces get absorbed into adjacent tracts when populations decline. The whole map breathes between decades.&lt;/p&gt;
&lt;p&gt;The honest answer to "what does the same place look like across these boundary changes" is that there is no single right answer. There are defensible answers that depend on what you're studying. If you're tracking land-use change you probably want fractional area weighting — give a 2010 tract that overlaps 60% with a 1990 tract sixty percent of the 1990 measurement. If you're tracking population movement you want something different, because populations don't distribute uniformly across area. If you're tracking voting patterns you want yet another thing, because voting precincts don't align with tracts at all.&lt;/p&gt;
&lt;p&gt;We made decisions. We documented them. We defended them in deliverable reports to NSF every year. We versioned them, because once a researcher cited a specific harmonized variable in a published paper we couldn't silently change what that variable meant in the next data release. Schema discipline as a research infrastructure problem, not a code problem. Every variable we shipped came with a paper trail that explained what it was, what it wasn't, and what alternative we considered.&lt;/p&gt;
&lt;p&gt;Spatial harmonization was only the worst-named version of a more general problem. The same shape of decision had to be made on the temporal axis. Population microdata in the US is decadal. Environmental data is hourly, daily, monthly, depending on the source. Land-use data is irregular — it changes when somebody updates the parcel records, which can happen on any timescale from weeks to decades depending on jurisdiction. Joining a 2010 census tract to MODIS satellite imagery for the same area requires a decision about what time window of imagery counts as "matching" the 2010 measurement. We documented that decision too. It was different from the spatial decision, defended in different sections of the same NSF report, and tested by different researchers downstream.&lt;/p&gt;
&lt;p&gt;The harmonized output sat on a server somewhere and was free to download. The labor of arriving at it was funded at roughly \$1.6 million per year for five years. Engineers, researchers, project managers, the costs of running servers, the costs of presenting at conferences and writing the reports that justified the next year's allocation. It was a lot of money. It produced a finite, citable, queryable dataset that hadn't existed before, and would not have existed without that money.&lt;/p&gt;
&lt;h3&gt;The Funding Model&lt;/h3&gt;
&lt;p&gt;NSF cooperative agreements are a particular kind of grant. Unlike a standard research grant, where the agency funds you and then mostly leaves you alone, a cooperative agreement keeps NSF involved in the project's direction. There's a program officer who attends meetings. There are quarterly check-ins. There are annual reports that include not just what you spent and what you produced but where you're going next and why. There are presentations, in person, to people whose job is to ask hard questions about whether the work is worth what it costs.&lt;/p&gt;
&lt;p&gt;This structure exists because the agency is making a public-good argument. Federal money pays for outputs that meet certain criteria — they have to be broadly useful, they have to serve a research community that can actually use them, and they have to produce evidence of impact. If your data is sitting on a server and nobody is downloading it, the next funding cycle is going to be a hard conversation.&lt;/p&gt;
&lt;p&gt;The trade-off is that this model only works for cross-domain data services that have a research community organized enough to push for them. The Census has one. Climate data has one. Land-use and population integration has one. Alternative financial data — the kind of feeds that say "here's what Pelosi traded last week" or "here's what Cramer mentioned on Mad Money" — does not. There is no academic constituency lobbying NSF to fund harmonized politician-trading disclosures. The data exists; the demand is real; the funding model isn't there.&lt;/p&gt;
&lt;p&gt;The other thing the academic model doesn't solve is what happens when the grant ends. Terra Populus wound down in late 2016 and early 2017. New harmonization work, new boundary updates, new integrations — those mostly stopped when the funding stopped. Sustainability is the fragile part of the academic public-good model. You can produce a high-quality dataset for five years and then watch it slowly decay because there's no organizational reason to keep the engineering staffed.  I left the University of Minnesota in early 2019, so, I have less knowledge as to what the legacy of Terra Populus is at ISRDI; I'm sure lessons were learned and some of the data itself likely shifted into other projects.&lt;/p&gt;
&lt;h3&gt;What Quiver Actually Sells&lt;/h3&gt;
&lt;p&gt;Quiver Quantitative sells the same shape of work for financial alternative data. They are a commercial service, not a federally-funded one, and the visible product is a paywalled API and a public-facing dashboard. But the engineering underneath is recognizable. I know it on sight because I spent seven years doing the academic version of it.&lt;/p&gt;
&lt;p&gt;A few examples of what Quiver actually does, beyond "scraping public data":&lt;/p&gt;
&lt;p&gt;A politician's personal trade disclosure shows up on the House STOCK Act feed under whatever name the politician uses. That name has to map to a BioGuide ID. The BioGuide ID has to join to the politician's party affiliation, chamber, and committee assignments — &lt;em&gt;at the date of the trade&lt;/em&gt;, not at the date of the disclosure, which can be months later, and not as of today, because the politician may have switched committees or parties since. The join logic has to handle people who change their legal name mid-term. It has to handle filers who use a spouse's name on the disclosure. None of this is intellectually deep. All of it is the kind of thing that fails silently in a hand-rolled scraper if you're not careful.&lt;/p&gt;
&lt;p&gt;A corporate jet flight is reported by tail number. The tail number is owned by an LLC. The LLC is owned by a holding company. The holding company is owned by an executive, or by a fractional-ownership program, or by a trust whose beneficiary is the executive. The executive worked at Company A from 2018 to 2022 and at Company B since then. If you want to attribute the flight history to "the Company A CEO's plane" correctly across that transition you need to maintain the ownership graph and apply it temporally, with an understanding of when M&amp;amp;A and management changes happened.&lt;/p&gt;
&lt;p&gt;An insider trade reported on Form 4 belongs to an officer, a director, a 10% beneficial owner, or a relative. The role hierarchy matters because the predictive content of the trade depends on the insider's position. The company's filing history runs through name changes, ticker swaps, and reorganizations that would break a naive join on stock symbol.&lt;/p&gt;
&lt;p&gt;A &lt;a href="https://www.reddit.com/r/wallstreetbets/"&gt;WallStreetBets&lt;/a&gt; ticker mention has to be disambiguated from English words ("ANY", "REAL", "GO", "ALL"), from financial terms used as common nouns ("CASH", "BANK", "DD"), and from the constantly-shifting community vocabulary that makes "Roaring Kitty" mean something specific to the reader on a specific date. Sentiment scoring on those mentions has to track what counts as bullish in the WSB dialect, which is not the same as what sentiment models trained on news text think bullish means.&lt;/p&gt;
&lt;p&gt;The hard part of all of these is maintenance. Source formats change. Disclosure requirements add fields. Politicians change parties. Companies merge. Tail numbers transfer. New tickers IPO; old tickers delist; ETFs split into share classes. Quiver writes those decisions into customer-facing changelogs and SLAs instead. The decisions are no less real.&lt;/p&gt;
&lt;p&gt;What you pay for, when you pay for Quiver, is the labor of making those decisions and the sustainability of having an organization that keeps making them. The data feed is the visible artifact. The engineering function is the actual product.&lt;/p&gt;
&lt;h3&gt;What You Take On When You Cancel&lt;/h3&gt;
&lt;p&gt;A third funding model exists alongside the academic-grant and commercial-subscription models: pay with your own time. Build it yourself. Maintain it yourself. Absorb every format change personally, on your own schedule, without an SLA or a changelog or a research community to share the burden.&lt;/p&gt;
&lt;p&gt;If you cancel a service like Quiver, here is the inventory of what you take on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Name disambiguation.&lt;/strong&gt; Politicians, companies, executives, all changing identifiers and roles over time. Every join needs to know which version of the entity it's joining.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ticker stability through corporate actions.&lt;/strong&gt; M&amp;amp;A, name changes, share class splits, delistings. The historical price series for Activision Blizzard ends in October 2023 because Microsoft acquired it; if you want to study Activision's history you need to know that and stitch it together.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Schema normalization across changing source formats.&lt;/strong&gt; House STOCK Act PDFs and Senate eFD filings have different schemas. SEC EDGAR Form 4 has had three or four major XBRL revisions in the last decade. Every change is a parser update on your end.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Audit trail and reproducibility.&lt;/strong&gt; If you want anyone — including future you — to be able to re-run your analysis or cite your data, you need to record what you used, when you pulled it, and what cleaning decisions you made. This is the part that sounds like overhead until it isn't.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Format-change response time.&lt;/strong&gt; When the upstream source publishes a new PDF template with shifted column boundaries, you have days to fix the parser before the data gap shows up in your analysis.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adversarial inputs on the source side.&lt;/strong&gt; The upstream entities have their own incentives. A politician's disclosure form filled out the legal minimum way is not necessarily the version that makes downstream joins clean. Filers transpose digits in CUSIPs. They report a transaction date that's the settlement date instead of the trade date. They list a partial name for a holding LLC that you have to disambiguate against three plausible matches. None of this is fixed by trying harder; it's fixed by accumulating a library of known patterns and a tolerance for ambiguity.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What you give up: coverage, consistency, the audit trail that makes the data citable in research papers, and the option to forget about the data layer entirely while you focus on the question you actually wanted to answer. What you keep: full control over the corner cases that matter to &lt;em&gt;your&lt;/em&gt; specific question, and the ability to make different cleaning decisions than the service made.&lt;/p&gt;
&lt;p&gt;I just finished a study of Cramer's &lt;em&gt;Mad Money&lt;/em&gt; recommendations as a quant filter signal — write-up coming separately — and the data work for that project is what made the engineering layer concrete to me again. I needed specific segment classifications that didn't quite match Quiver's defaults. I needed corner-case handling for tickers that were renamed mid-window. I built most of that infrastructure anyway. The subscription was a useful starting point, but the actual analysis lived in code I owned.&lt;/p&gt;
&lt;h3&gt;I Live Here Already: DirtScout&lt;/h3&gt;
&lt;p&gt;The Cramer study is the recent example. The longer-running one is &lt;a href="https://tinycomputers.io/posts/building-dirtscout-a-land-acquisition-platform-with-claude-code.html"&gt;DirtScout&lt;/a&gt;, a land acquisition platform I've been building. Strip away the user-facing parts and DirtScout is essentially a series of custom data ingestors for county and state level real property information. It is the DIY funding model lived, not described.&lt;/p&gt;
&lt;p&gt;The pipelines that keep DirtScout fed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Minnesota statewide parcel data&lt;/strong&gt;, refreshed quarterly. Every county in the state has its own schema for what counts as a parcel record, with different field names, different geometry encodings, and different conventions for whether vacant land is a separate row or a flag on the residential record. Harmonizing those into a single queryable parcel table is the largest single piece of cleaning work in the project.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;St. Louis County weekly&lt;/strong&gt;, with tax-delinquent enrichment. The county publishes a delinquent register that uses different identifiers than the parcel records, and the join is fragile in exactly the way you'd expect — name spellings drift, owner addresses change, parcel splits get attributed differently between the assessor's database and the treasurer's database. Cross-source enrichment is where most of the corner-case handling lives.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tax Forfeited Land sale lists&lt;/strong&gt;, scraped daily off an SSRS-rendered report. The format is defined by Microsoft's reporting service and emitted as the world's least-loved HTML table. The report definition changes occasionally and the parser has to track it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Elevation profiles&lt;/strong&gt;, computed from USGS topographic data and joined to parcel polygons on demand. Cross-domain integration: real property data and digital elevation models live in different schemas, projections, and coordinate systems, and joining them properly requires a stack of decisions that look exactly like the ones we made joining environmental rasters to census tracts in Terra Populus.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I keep all of it running with cron jobs on a Linux box and weekly integration tests. The cron jobs say &lt;em&gt;pull the new data;&lt;/em&gt; the integration tests say &lt;em&gt;scream if the join logic broke.&lt;/em&gt; Together they constitute exactly the same kind of ongoing engineering that went into NSF reports, and exactly the same kind of function Quiver writes into customer-facing changelogs and SLAs. The funding model is different — DirtScout is funded by my time and the electricity bill on a workhorse machine — but the labor is recognizably the same shape.&lt;/p&gt;
&lt;p&gt;That's the DIY model in actual practice rather than as an abstraction. When I cancel Quiver, I'm not adding a new category of work to my life. I'm adding one more pipeline to the maintenance surface I already carry. The cancel decision is about whether the marginal pipeline is worth the marginal time, given the alternative of paying for the labor.&lt;/p&gt;
&lt;h3&gt;The Decision&lt;/h3&gt;
&lt;p&gt;The arithmetic is simple. Thirty to seventy dollars per month, times twelve, equals three hundred sixty to eight hundred forty per year for the Quiver tier I needed. The cleaning labor for the Cramer study, in hours of my own time, came out below that — partly because the project window was bounded (no ongoing maintenance burden after publication), partly because the corner cases I cared about were corner cases Quiver wouldn't optimize for in any case, and partly because the subscription's marginal value to me dropped to near zero the moment the study finished its primary analysis.&lt;/p&gt;
&lt;p&gt;So I'm cancelling.&lt;/p&gt;
&lt;p&gt;This is a verdict on the use case, not on the service. Quiver is correctly priced for what it does. The labor it absorbs on behalf of its customers is real, valuable, and the kind of work I have personal sympathy for because I used to do its academic equivalent. The question is not "is this overpriced." The question is "does what they do match what I need on this specific project, on this specific timeline, with this specific willingness to absorb maintenance personally."&lt;/p&gt;
&lt;p&gt;For me, right now, the answer is no. For someone running a fund and consuming alt-data continuously, the answer is obviously yes. For an academic researcher with grant funding and a multi-year project, the answer probably depends on whether the institution will pay. For a hobbyist with one specific question and time on their hands, the answer is probably no. Different use cases, different absorption preferences, different right answers. The subscription is a tool, and like any tool it's correctly chosen against a specific job.&lt;/p&gt;
&lt;h3&gt;Three Funding Models, One Bill&lt;/h3&gt;
&lt;p&gt;The labor doesn't disappear in any of these models. The variable is who absorbs it.&lt;/p&gt;
&lt;p&gt;Academic grants socialize the cost across taxpayers in exchange for outputs that meet public-good criteria and serve a research community. The output is free to use. The sustainability is fragile, because the moment the grant ends the engineering staff disperses and the dataset starts ossifying. Terra Populus is one of dozens of NSF-funded data integration projects that produced excellent work for a fixed window and then went into a maintenance-mode purgatory or were eventually decommissioned.&lt;/p&gt;
&lt;p&gt;Commercial subscriptions price the cost into the customer relationship. The output is paywalled. The sustainability is more robust, because as long as renewals cover engineering payroll the work continues — but the paywall means the data is unevenly distributed. Some questions that should be asked about Cramer or Pelosi or insider trading get asked only by people willing to pay for the data layer.&lt;/p&gt;
&lt;p&gt;DIY personalizes the cost into your own time. The output is yours. The sustainability is one upstream format change away from breaking, and the audit trail depends entirely on your discipline. Most DIY data work decays for the same reason most personal projects decay: the maintenance burden is invisible until it isn't.&lt;/p&gt;
&lt;p&gt;When somebody tells you a commercial data service is overpriced, ask them what they're proposing to do with the labor. The answer is always one of those three models, and the right answer depends on what you're actually optimizing for. If you want broad public access to a citable dataset, the academic model is correct and the price tag (which is real, just paid by someone else) is justified. If you want consistent, maintained, audit-trailed access on a service-level agreement, the commercial model is correct. If you want full control over the corner cases and have time to spend, DIY is correct.&lt;/p&gt;
&lt;p&gt;There's a &lt;a href="https://tinycomputers.io/posts/the-feedback-loop-that-jevons-couldnt-name.html"&gt;Jevons-flavored&lt;/a&gt; pattern hiding here. Cheaper data access creates demand for more data work — more analyses, more cross-cuts, more downstream products built on top. The labor savings at one layer (the cleaning is already done; just query it) get spent at the next layer (now the question becomes "what does this dataset tell me," and that question has no upper bound). The bill is real either way. The form it takes shifts depending on how the supply side is funded.&lt;/p&gt;
&lt;p&gt;The same logic, weirdly, applies to a different kind of subscription decision I &lt;a href="https://tinycomputers.io/posts/the-economics-of-owning-your-own-inference.html"&gt;wrote about a few weeks back&lt;/a&gt;. Local LLM inference versus a Claude Max subscription is the same shape of question as Quiver versus DIY scraping: who absorbs the cost, what coverage do you get, how fragile is the sustainability, and what specifically are you optimizing for. The arithmetic is different. The structure isn't.&lt;/p&gt;
&lt;p&gt;The subscription cancellation in front of me this week is small — a few hundred dollars saved per year, in exchange for some weekend hours I'd have spent on the data layer anyway. The bigger thing it's clarified is the model I want to use when I make the next one.&lt;/p&gt;</description><category>alternative data</category><category>build-vs-buy</category><category>cooperative agreements</category><category>data engineering</category><category>economics</category><category>harmonization</category><category>ipums</category><category>isrdi</category><category>mpc</category><category>nsf</category><category>quiver quantitative</category><category>terra populus</category><guid>https://tinycomputers.io/posts/what-terra-populus-taught-me-about-cancelling-quiver.html</guid><pubDate>Sun, 26 Apr 2026 13:00:00 GMT</pubDate></item><item><title>What a Commercial PCB Placer Does That My Open-Source One Can't</title><link>https://tinycomputers.io/posts/what-a-commercial-pcb-placer-does-that-my-open-source-one-cant.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/what-a-commercial-pcb-placer-does-that-my-open-source-one-cant_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;28 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;div class="sponsor-widget"&gt;
&lt;div class="sponsor-widget-header"&gt;&lt;a href="https://baud.rs/wdr0dP"&gt;Quilter&lt;/a&gt; · Partner&lt;/div&gt;
&lt;p&gt;I partnered with &lt;a href="https://baud.rs/wdr0dP"&gt;Quilter.ai&lt;/a&gt; on this post. They provided access to their commercial placement service at no cost so I could run this comparison. The open-source placer I built to benchmark against them, the evaluation methodology, and every number in this post are mine.&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;This is an interlude in a running series on designing a level-shifter shield for the &lt;a href="https://baud.rs/poSQeo"&gt;Arduino Giga R1&lt;/a&gt;. The main arc is about the design itself — how to get 72 channels of logic-level translation across a 155mm x 90mm board using ten SN74LVC8T245 shifters, in a repeatable Python-scripted workflow. Previous posts covered the &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;original Fiverr design&lt;/a&gt;, the &lt;a href="https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html"&gt;Claude Code redesign&lt;/a&gt;, &lt;a href="https://tinycomputers.io/posts/how-a-pin-numbering-bug-killed-a-pcb.html"&gt;a pin-numbering bug&lt;/a&gt;, and the &lt;a href="https://tinycomputers.io/posts/what-routing-314-nets-taught-me-about-ai-assisted-pcb-design.html"&gt;v0.4 respin&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This post is an honest comparison of two ways to place the components on that board: Quilter, a commercial service aimed at exactly this problem, and pyplacer, an open-source simulated-annealing placer I wrote over two weekends specifically to have something to benchmark Quilter against. The results are not flattering to pyplacer, but they are interesting. They also surface a concrete and narrow thing that commercial placement services do right that a naive first pass does not.&lt;/p&gt;
&lt;h3&gt;Why Write a Placer At All&lt;/h3&gt;
&lt;p&gt;A PCB has two distinct layout problems. The first is &lt;em&gt;placement&lt;/em&gt;: deciding where each component goes on the board. The second is &lt;em&gt;routing&lt;/em&gt;: deciding how to connect the pins once the components are placed. These are usually solved by different tools, and the quality of the placement determines how well the routing can go. A bad placement makes a good routing impossible; a good placement makes routing almost boring.&lt;/p&gt;
&lt;p&gt;There is no good open-source placement tool. Freerouting, which is genuinely excellent as an autorouter, does not do placement — it takes placed components as input. KiCad has a rudimentary "move to group" feature and nothing else. Eagle has nothing. pcb-rnd has nothing. For decades, PCB placement has been a human-only activity, and the industry's answer to "automate placement" has been to employ better humans.&lt;/p&gt;
&lt;p&gt;A few commercial services are now genuinely trying to change that. &lt;a href="https://baud.rs/wdr0dP"&gt;Quilter&lt;/a&gt; is one of them: you upload a KiCad board with component libraries and a netlist, it returns a placed-and-routed board. I approached Quilter about partnering on a post and I decided that a straight review would be uninteresting. What I actually wanted to know was: how much of the gap between Quilter and "do nothing" could be closed by a motivated hobbyist with a weekend, some Python, and a simulated annealer? If the gap is small, Quilter is an incremental improvement. If the gap is large, Quilter is doing something specific that is hard to reproduce.&lt;/p&gt;
&lt;p&gt;So I built &lt;a href="https://baud.rs/tcrfa0"&gt;pyplacer&lt;/a&gt;. The goal was not to beat Quilter — that would have been foolish. The goal was to produce a credible open-source baseline, run both tools on the same board under the same constraints, and see how the numbers fell.&lt;/p&gt;
&lt;h3&gt;The Board&lt;/h3&gt;
&lt;p&gt;The benchmark is &lt;a href="https://baud.rs/pOawfA"&gt;giga_shield v0.4&lt;/a&gt; — the same board discussed in the previous post. Relevant specs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;155mm x 90mm, 4-layer stack (two signal, two plane)&lt;/li&gt;
&lt;li&gt;161 nets across 64 components&lt;/li&gt;
&lt;li&gt;Ten SN74LVC8T245PW level-shifter ICs in TSSOP-24 packages&lt;/li&gt;
&lt;li&gt;Two 36-pin 2x18 headers (J9 and J10) along the right edge for the Z80 side&lt;/li&gt;
&lt;li&gt;Several 8-pin and 26-pin single-row headers for the Arduino Giga side&lt;/li&gt;
&lt;li&gt;Forty 0603 bypass and DIR-control caps/resistors&lt;/li&gt;
&lt;li&gt;All components on the top side of the board&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a &lt;em&gt;dense&lt;/em&gt; layout for 161 nets on 155x90mm. The level shifters are the routing chokepoint: each one has 8 A-side pins that connect to one header and 8 B-side pins that connect to another, plus VCCA/VCCB/GND/DIR. Get the shifters wrong relative to the headers and the board is unroutable at 4 layers.&lt;/p&gt;
&lt;p&gt;The honest baseline for comparison is not "random placement." It is &lt;em&gt;my&lt;/em&gt; hand-placed v0.4 layout, which Freerouting routes to 100% at 0.3mm clearance in about 45 minutes of wall-clock time. I know that because I spent several weeks iterating to produce it. The goal of any automated placer is to do that work so I do not have to.&lt;/p&gt;
&lt;h3&gt;Quilter: The Commercial Option&lt;/h3&gt;
&lt;p&gt;Quilter presents as a web service: you upload your &lt;code&gt;.kicad_pcb&lt;/code&gt; (unplaced) with its component libraries and netlist, set a few parameters (board outline, fixed components, target layer count), and they generate candidate placements.&lt;/p&gt;
&lt;p&gt;For this comparison I submitted giga_shield v0.4 with the ten shifters and all passives unplaced, the four headers and mounting holes locked in position (because those are mechanical constraints tied to the Arduino Giga's hardware), and the board outline fixed. Quilter returned &lt;code&gt;Candidate_1&lt;/code&gt; after 1 hour 10 minutes 48 seconds of compute.&lt;/p&gt;
&lt;div style="text-align: center; margin: 30px 0;"&gt;
&lt;img src="https://tinycomputers.io/images/giga-shield/quilter-top.png" alt="Top-down render of Quilter's Candidate_1 placement. Ten level shifters distributed across the board: three in a row along the top edge near J5, two in the bottom third near J8, and five clustered between J9 and J10 on the right-hand side. Passives are scattered across the routing channels in short local clusters. The board has routing traces visible on the top and inner layers." style="max-width: 100%; border: 1px solid #ddd; border-radius: 8px;"&gt;
&lt;p style="color: #666; font-size: 12px; margin-top: 10px;"&gt;Quilter Candidate_1, rendered via &lt;code&gt;kicad-cli pcb render&lt;/code&gt;. Ten shifters spread into three clusters, each one closest to the connectors it serves. Routing threads through every available channel.&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;Freerouting, given Quilter's placement and the same 0.3mm clearance rule I used for my hand-placement, routed 99.4% of the board: 1 net unrouted out of 161. The single unrouted net was a non-critical DIR control signal that could have been fixed with a manual jumper or a small placement tweak. For practical purposes, the board is fabricatable as Quilter produced it.&lt;/p&gt;
&lt;p&gt;That is a strong result. It is in the same quality neighborhood as my hand-placement, produced in about seventy minutes of their wall-clock time versus my several weeks of iteration.&lt;/p&gt;
&lt;h3&gt;pyplacer: The Open-Source Attempt&lt;/h3&gt;
&lt;p&gt;pyplacer is about 2,000 lines of Python. It does &lt;a href="https://baud.rs/rZysbW"&gt;simulated annealing&lt;/a&gt; over component positions on a fixed board outline, with a cost function that combines half-perimeter wirelength, bounding-box overlap penalties, an out-of-bounds penalty, a grid-cell congestion estimate from probing L-shaped routes across a coarse mesh, and a small pad-exit direction bias that tries to align TSSOP-24 pads with their natural routing sides.&lt;/p&gt;
&lt;p&gt;The overall architecture is pragmatic rather than clever:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A KiCad board loader that reads footprints, pads, and netlist&lt;/li&gt;
&lt;li&gt;A heuristic seed placement that puts each shifter at the midpoint between its two dominant connectors (the connectors that account for the most of its signal pins), with special handling for clusters of shifters sharing the same connector pair&lt;/li&gt;
&lt;li&gt;An SA loop: propose a move (shift, swap, or component-to-component swap), evaluate the cost delta, accept with &lt;a href="https://baud.rs/09b03A"&gt;Metropolis probability&lt;/a&gt; at the current temperature, cool&lt;/li&gt;
&lt;li&gt;A DSN exporter that writes a Specctra file with the placed positions and the netlist, ready to feed to Freerouting&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The cost function I arrived at after maybe a dozen tuning rounds:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;W_HPWL&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;baseline&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;wirelength&lt;/span&gt;
&lt;span class="n"&gt;W_OVERLAP_HARD&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1000.0&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;two&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;components&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;may&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;overlap&lt;/span&gt;
&lt;span class="n"&gt;W_OVERLAP_SOFT&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;50.0&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;keepout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;zones&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;around&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;components&lt;/span&gt;
&lt;span class="n"&gt;W_OUT_OF_BOUNDS&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;200.0&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;stay&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;board&lt;/span&gt;
&lt;span class="n"&gt;W_CONGESTION&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;40.0&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;L&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;probe&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;based&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;congestion&lt;/span&gt;
&lt;span class="n"&gt;W_PAD_EXIT&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;toward&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;natural&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;routing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sides&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Everything else — rotation moves, attachment-to-dominant-connector, displacement-from-heuristic — I tried and disabled, because each one either produced worse results or did not produce a measurable improvement.&lt;/p&gt;
&lt;p&gt;The SA run itself is 5,000 iterations at 0.96 cooling with a 1e-4 final-temperature ratio, taking about 90 seconds on a single core of a Mac M3 Pro. That is fast enough that I could do broad seed sweeps cheaply.&lt;/p&gt;
&lt;p&gt;For the benchmark, I ran pyplacer on 64 distinct random seeds across three machines (my Mac, a 32-core Linux host, a 64-core Linux host), routed every resulting placement through Freerouting at the same 0.3mm clearance, and recorded the number of unrouted nets. The seed distribution matters because SA is stochastic: different seeds give different local minima, and the spread of outcomes is informative.&lt;/p&gt;
&lt;h3&gt;The Numbers&lt;/h3&gt;
&lt;p&gt;Over 64 seeds:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Best result: 10 unrouted nets (93.8% routed) — this was the early best from the first batch of runs&lt;/li&gt;
&lt;li&gt;Typical plateau result: 16–20 unrouted nets (87.6%–90.1% routed)&lt;/li&gt;
&lt;li&gt;Worst result: 114 unrouted nets (29.2% routed)&lt;/li&gt;
&lt;li&gt;Median: 21 unrouted nets (87.0% routed)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Every single pyplacer run was worse than Quilter's &lt;code&gt;Candidate_1&lt;/code&gt;. The gap between the best pyplacer seed and Quilter is about six percentage points of routing completion, which on a 161-net board is nine to fifteen missing traces. In practice, a board with 10–16 unrouted nets is not fabricatable as-is. It needs hand-fixing, which is exactly the work the placer was supposed to save.&lt;/p&gt;
&lt;div style="text-align: center; margin: 30px 0;"&gt;
&lt;img src="https://tinycomputers.io/images/giga-shield/pyplacer-top-routed.png" alt="Top-down render of a pyplacer placement after Freerouting. All ten level shifters clustered tightly in a horizontal band across the middle of the board, immediately to the left of J9. Routing traces visible, but the shifter cluster is visibly cramped and many nets run long distances to reach it." style="max-width: 100%; border: 1px solid #ddd; border-radius: 8px;"&gt;
&lt;p style="color: #666; font-size: 12px; margin-top: 10px;"&gt;A pyplacer placement after a Freerouting pass. Notice how the shifters are crammed into a single cluster hugging the left edge of J9, and the routing channel on that edge is visibly dense. This layout routes to about 90%; the unrouted nets are all in the congested zone on the left side of J9.&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;Compare that image to the Quilter render above. The topological difference is immediate: Quilter spreads its shifters into several clusters, each close to the connector or connector-pair it serves. pyplacer piles them all into the shortest-wirelength position, which happens to be a narrow strip where routes cannot fan out. The visual story matches the numbers.&lt;/p&gt;
&lt;h3&gt;Which Nets Failed&lt;/h3&gt;
&lt;p&gt;I logged Freerouting's warning-level output across the entire 64-seed sweep and extracted the nets that consistently failed to route on the best seed's plateau. Seventeen candidate nets, of which sixteen appear in any given plateau pass. The surprising thing is not the count — it is the topology:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Net&lt;/th&gt;
&lt;th&gt;Pins&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;D23&lt;/td&gt;
&lt;td&gt;J9-4 ↔ U8-3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D25&lt;/td&gt;
&lt;td&gt;J9-6 ↔ U8-4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D27&lt;/td&gt;
&lt;td&gt;J9-8 ↔ U8-5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D29&lt;/td&gt;
&lt;td&gt;J9-10 ↔ U8-7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D31&lt;/td&gt;
&lt;td&gt;J9-12 ↔ U8-8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D33&lt;/td&gt;
&lt;td&gt;J9-14 ↔ U8-9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D35&lt;/td&gt;
&lt;td&gt;J9-16 ↔ U8-10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D37&lt;/td&gt;
&lt;td&gt;J9-18 ↔ U9-2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D39&lt;/td&gt;
&lt;td&gt;J9-20 ↔ U9-3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D41&lt;/td&gt;
&lt;td&gt;J9-22 ↔ U9-5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D43&lt;/td&gt;
&lt;td&gt;J9-24 ↔ U7-3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D45&lt;/td&gt;
&lt;td&gt;J9-26 ↔ U7-5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D47&lt;/td&gt;
&lt;td&gt;J9-28 ↔ U7-8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D49&lt;/td&gt;
&lt;td&gt;J9-30 ↔ U7-10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D51&lt;/td&gt;
&lt;td&gt;J9-32 ↔ U10-4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PJ5&lt;/td&gt;
&lt;td&gt;J10-16 ↔ U8-13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PB1&lt;/td&gt;
&lt;td&gt;J1-4  ↔ U2-5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Fifteen of the seventeen are the exact same topological class: a short digital signal from a level shifter to a sequential pin on the J9 double-row header. These are the easiest routes on the board. They fail because pyplacer has piled all four shifters (U7, U8, U9, U10) up against J9's left edge, and sixteen signals are trying to enter J9 through a 10mm-wide routing channel that can physically only accommodate maybe eight parallel traces.&lt;/p&gt;
&lt;p&gt;This is a connector-edge congestion failure. It is not a routing problem. It is a placement problem that the router cannot undo.&lt;/p&gt;
&lt;h3&gt;Why pyplacer Gets This Wrong&lt;/h3&gt;
&lt;p&gt;Half-perimeter wirelength rewards placing U7/U8/U9/U10 as close to J9 as possible, because every one of their B-side pins connects to J9 and the distance matters. The bounding-box penalty keeps them from overlapping each other. The congestion estimate &lt;em&gt;does&lt;/em&gt; catch some of this — it is why pyplacer's placements are not as awful as they would be without it — but the resolution of the congestion grid is too coarse to catch what is happening at J9's edge specifically. The cost function sees "OK, the shifters are stacked but they are not literally on top of each other, and the average 2x2mm cell is not saturated." The router sees "sixteen traces trying to enter the same five-millimeter gap, no thanks."&lt;/p&gt;
&lt;p&gt;Fixing this correctly is not just a matter of tuning weights. The cost function is missing a dimension. Specifically, it needs a term that penalizes &lt;em&gt;incoming pin density at a fixed connector's edge&lt;/em&gt; — a count of how many signal nets want to enter a given side of each fixed header per millimeter of edge, with a soft threshold above which the penalty climbs sharply. A naive implementation of this adds about 50 lines to the cost function and a per-move incremental update. I have not yet written it because the interesting finding of this benchmark is not "how do I close the gap" — that is the next post. The interesting finding is &lt;em&gt;where&lt;/em&gt; the gap lives.&lt;/p&gt;
&lt;p&gt;Quilter, whatever it is doing internally, is not piling shifters up against J9. It spreads them. U7, U8, U9, and U10 in Quilter's placement are in a 4x1 tight column arrangement a few millimeters away from J9's edge, with gaps between each chip that are deliberately large enough for routes to fan out between them. That does not minimize wirelength. It maximizes &lt;em&gt;routability&lt;/em&gt;. The two are not the same objective, and pyplacer only optimizes the first.&lt;/p&gt;
&lt;h3&gt;Speed and Cost&lt;/h3&gt;
&lt;p&gt;A fair comparison also needs to account for what each approach costs to produce.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Quilter&lt;/strong&gt;: &lt;code&gt;Candidate_1&lt;/code&gt; took 1h 10m 48s of their wall-clock compute. Pricing is on their website; I received this candidate through my partnership with Quilter and did not personally pay. One good candidate is enough for a finished board, so the total spend for a project like this is whatever a single candidate costs, not a subscription or a retainer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pyplacer&lt;/strong&gt;: A single seed takes about 90 seconds of single-core wall-clock time. A 64-seed sweep with partial parallelism across three machines took about 35 minutes of real time and consumed roughly three CPU-hours. The Freerouting pass that evaluates each placement takes 10–15 minutes per candidate, so evaluating all 64 placements took an additional ~15 CPU-hours, mostly on Linux hosts. Electricity cost is negligible — pennies. But the human-hours cost of writing, tuning, and debugging pyplacer was maybe 25 hours of my weekend time spread across two weeks.&lt;/p&gt;
&lt;p&gt;So: Quilter produces a usable placement in about seventy minutes of their compute, at whatever their per-candidate list price is. pyplacer cost 25 hours of engineering time, some CPU, and still does not produce a usable placement. The break-even point, assuming my engineering time is valued at even a modest hourly rate, favors Quilter by a wide margin for single-project work. The calculus only flips if you are going to place hundreds of boards and amortize the pyplacer development cost across all of them. For a hobby project or a one-off prototype, it is not close.&lt;/p&gt;
&lt;h3&gt;What About Future Improvements?&lt;/h3&gt;
&lt;p&gt;The obvious counter is that pyplacer is a weekend project and Quilter has been in development for years with a team. If I added the connector-edge congestion term, fixed the coarse grid resolution, added rotation moves, and let the SA run for 10x longer, would I close the gap?&lt;/p&gt;
&lt;p&gt;Probably partially. Maybe to 95–97%. I do not think I would close it fully. Here is why.&lt;/p&gt;
&lt;p&gt;Quilter is doing things that are not obvious from its output but are visible in the spacing patterns: it seems to have a model of the router's behavior, of how close is "too close" for vias to cluster, of how routing channels compose across layers. A simulated annealer can be taught any cost function you can express, but the cost function has to know to include these things. Writing a cost function that captures what Quilter captures is not a weekend project. It is months of work by someone who understands routing at a deep level, which I do not.&lt;/p&gt;
&lt;p&gt;I could also throw more compute. The 64-seed sweep took ~18 CPU-hours total. If I ran 1,000 seeds on a cluster, I would likely find a placement better than the best seed from 64, by selection pressure alone. But the ceiling set by the cost function's limitations is not breached by more samples; it is breached by a better objective function. And that objective function is where the commercial tool is earning its fee.&lt;/p&gt;
&lt;h3&gt;What This Means for the Series&lt;/h3&gt;
&lt;p&gt;The immediate practical takeaway for the GigaShield series is that if I wanted to save time on future revisions of this board, running it through Quilter and accepting &lt;code&gt;Candidate_1&lt;/code&gt; would be a sensible default — especially for the v1.0 redesign where I want to add thermal pads, improve the layer stack, and shuffle the power rails. That is a genuine upgrade over the hand-placement workflow, which, while competent, took me multiple weeks of iteration to get right.&lt;/p&gt;
&lt;p&gt;The broader takeaway is that placement is a harder problem than routing. Freerouting is an excellent autorouter because routing has a clean, well-posed objective: find paths that respect clearance rules and minimize length. Placement has no such clean objective. A "good" placement is one the router can finish, which is not something you can compute directly from the placement — you only find out by running the router. That feedback loop makes placement the sort of problem where stochastic search without a good model does not close the gap with humans or with informed commercial tools.&lt;/p&gt;
&lt;p&gt;I am going to keep iterating on pyplacer because I find the problem interesting and because an open-source baseline has value even when it is worse than the commercial alternative. The connector-edge congestion term is the first thing I will add. After that, I want to look at rotation moves in the SA, because the ability to flip a shifter's A/B-side orientation is exactly the sort of thing that can unclog a congested connector edge at the cost of a small HPWL penalty.&lt;/p&gt;
&lt;p&gt;But I am not going to pretend I am closing the gap to Quilter through cleverness alone. Whatever Quilter is doing, it is working. For a dense board with real constraints and a real fabrication deadline, their output is better than mine.&lt;/p&gt;
&lt;h3&gt;The Honest Review&lt;/h3&gt;
&lt;p&gt;Companies that partner with me on posts sometimes expect the post to soften the criticism of their tool. Quilter, to their credit, did not. I uploaded my own board, they ran it through their service, and they let me analyze the output however I wanted. The result is that this is a post about an open-source placer that loses to a commercial one, not a post about a commercial tool defeating a straw man.&lt;/p&gt;
&lt;p&gt;For the specific use case I built this around — dense four-layer mixed-signal shields with 150+ nets and tight routing channels — Quilter's &lt;code&gt;Candidate_1&lt;/code&gt; is better than what I can produce with a weekend placer. The gap is not close. The gap is specifically at connector-edge congestion, which is a place where my naive objective function fails and theirs does not. That is a real and informative finding, and the right conclusion is that whatever Quilter charges per candidate is a reasonable price for what you get.&lt;/p&gt;
&lt;p&gt;I would recommend them for boards like this one. I would not recommend them for dead-simple boards where a hand placement in KiCad takes twenty minutes — that is still the cheapest path for trivial work. The interesting threshold is somewhere in the range of 50–100 nets and a non-trivial connector layout; below that, do it yourself; above that, pay the pros.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Previous posts in the GigaShield series: &lt;a href="https://tinycomputers.io/posts/what-routing-314-nets-taught-me-about-ai-assisted-pcb-design.html"&gt;What Routing 314 Nets Taught Me About AI-Assisted PCB Design&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/how-a-pin-numbering-bug-killed-a-pcb.html"&gt;How a Pin Numbering Bug Killed a PCB&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html"&gt;Redesigning with Claude Code (Part 1)&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;Fiverr PCB Design (\$468)&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description><category>arduino giga</category><category>benchmark</category><category>freerouting</category><category>level shifter</category><category>open-source</category><category>pcb design</category><category>placement</category><category>pyplacer</category><category>quilter</category><category>retroshield</category><category>simulated annealing</category><category>z80</category><guid>https://tinycomputers.io/posts/what-a-commercial-pcb-placer-does-that-my-open-source-one-cant.html</guid><pubDate>Fri, 24 Apr 2026 15:30:00 GMT</pubDate></item><item><title>Architecture Verified, Mythology Intact: Running OpenMythos on a Strix Halo</title><link>https://tinycomputers.io/posts/architecture-verified-mythology-intact.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/architecture-verified-mythology-intact_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;37 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Anthropic has a rumored upcoming model called Mythos. The weights are not public, the architecture is not published, and Anthropic has said nothing official about how it works. That has not stopped people from guessing.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://baud.rs/w4wo4T"&gt;OpenMythos&lt;/a&gt; is one of those guesses: an open-source "theoretical reconstruction" by Kye Gomez, built from publicly available research on what Anthropic's architecture might look like. The repository's disclaimer is blunt: "an independent, community-driven theoretical reconstruction based solely on publicly available research and speculation. It is not affiliated with, endorsed by, or connected to Anthropic."&lt;/p&gt;
&lt;p&gt;The architecture Gomez bets on is called a Recurrent-Depth Transformer. That's a specific and unusual design choice. Most current language models, like GPT or Llama, are feed-forward: tokens enter at the bottom, flow through dozens of distinct layers stacked on top of each other, and exit as predicted next tokens. A Recurrent-Depth Transformer splits that stack differently. A small number of ordinary layers run once at the start and once at the end. In between, a single layer runs many times in sequence, with the output of each run fed back in as the input to the next. Same weights. More computation.&lt;/p&gt;
&lt;p&gt;You can pip install OpenMythos. It has configurations from &lt;code&gt;mythos_1b&lt;/code&gt; (one billion parameters, toy scale) up to &lt;code&gt;mythos_1t&lt;/code&gt; (one trillion, frontier scale). The README shows you how to instantiate the 1B version in about ten lines of Python and run a forward pass.&lt;/p&gt;
&lt;p&gt;I ran that 1B variant on my Strix Halo box (a Ryzen AI MAX+ 395 with an integrated Radeon 8060S GPU, 60 GB of unified memory, running PyTorch on ROCm). The question is not whether it runs. The question is what running it can tell you. The answer turns out to be interesting in both directions: more than expected about the &lt;em&gt;architecture&lt;/em&gt;, and exactly nothing about the &lt;em&gt;model&lt;/em&gt;.&lt;/p&gt;
&lt;h3&gt;The Setup&lt;/h3&gt;
&lt;p&gt;The Strix Halo has one GPU. OpenMythos targets distributed training via FSDP, but the forward and inference paths work single-GPU. Install path:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;pip install --no-deps open-mythos
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;--no-deps&lt;/code&gt; matters. The &lt;code&gt;pyproject.toml&lt;/code&gt; pins &lt;code&gt;torch = "2.11.0"&lt;/code&gt;, which is not what my gfx1151 wheels are at, and the package's actual runtime requirements are satisfied by any torch &amp;gt;=2.1. Skipping deps keeps my ROCm stack intact.&lt;/p&gt;
&lt;p&gt;One ROCm-specific patch was needed. The &lt;code&gt;DepthWiseLoRA&lt;/code&gt; module's forward has this line:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tensor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loop_t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That creates a 0-dim tensor and passes it to an &lt;code&gt;nn.Embedding&lt;/code&gt;. On gfx1151 this produces a hip launch failure. The fix is a one-line change to index the embedding weight directly:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;loop_t&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Same semantics, different kernel path, no crash. Expect similar papercuts in any research code run on non-reference hardware.&lt;/p&gt;
&lt;p&gt;With that done, &lt;code&gt;mythos_1b&lt;/code&gt; instantiates cleanly. Parameter count: 1,064,028,034. A forward pass at batch 1, sequence 128, 16 loops, using bfloat16 mixed precision (a numerical format that halves memory versus regular float32 with negligible quality loss for inference): 2.07 seconds. Peak GPU memory: 6.44 GB. Well within the Strix Halo's envelope.&lt;/p&gt;
&lt;p&gt;That gives me a working model. The rest of this post is what I did with it.&lt;/p&gt;
&lt;h3&gt;Why a Looped Transformer, Briefly&lt;/h3&gt;
&lt;p&gt;Before the experiments, a quick tour of what's specifically weird about this architecture, because everything downstream depends on it.&lt;/p&gt;
&lt;p&gt;A standard transformer has roughly 32 to 100 distinct layers. Each layer has its own parameters. A prompt passes through every layer once. The parameter count is proportional to the layer count times the width of each layer.&lt;/p&gt;
&lt;p&gt;A looped transformer keeps only one "inner" layer but runs it many times. Training on 32 "effective layers" requires only 1 layer's worth of parameters. Inference with more loops is equivalent to running a deeper model, without actually storing a deeper model. The architectural bet: if you can get this to work, you get a deeper reasoning model for a fraction of the memory.&lt;/p&gt;
&lt;p&gt;There are two reasons to care about this for a model like Claude Mythos. First, memory efficiency at scale. A trillion-parameter model is expensive to serve; a looped model with the capability of a trillion-parameter feed-forward model but 1/16th the parameters would be dramatically cheaper. Second, reasoning depth. A 2025 paper by Saunshi et al. proved mathematically that running a looped transformer for T loops is equivalent to doing T implicit steps of chain-of-thought reasoning (the now-familiar "let me think step by step" trick that makes large models better at hard problems), except the "thoughts" happen in continuous latent space inside the model rather than being emitted as visible text tokens. If Mythos is doing that, it would explain why the model seems to do multi-step reasoning without the user ever seeing intermediate "scratch" tokens.&lt;/p&gt;
&lt;p&gt;The catch is that training a looped transformer is notoriously unstable. If the single inner layer amplifies the signal each time it runs, that amplification compounds. A 5% boost per loop becomes a 65% boost after 10 loops, and a model with a 65% boost per forward pass either explodes in training or produces outputs that don't resemble language. Most attempts at looped transformers over the last decade failed for exactly this reason.&lt;/p&gt;
&lt;p&gt;The fix that makes OpenMythos (and the hypothesized Mythos) workable is borrowed from a 2026 paper called Parcae (Prairie et al.). It introduces a clever parameterization of the "gain" of the recurrent update. Instead of letting the model learn arbitrary weights in the core recurrence, Parcae constrains one piece of the architecture to always have its largest amplification factor strictly less than 1. In dynamical-systems terms, the "spectral radius" of the update matrix is always less than one. That guarantee is what makes the loops stable: any signal gets damped by each repeated application, so repeated iteration converges toward a useful fixed point instead of blowing up.&lt;/p&gt;
&lt;p&gt;This is the claim I'm about to test. The spectral radius should be less than 1 by construction. Without that constraint, training should break. And the failure mode should match what Parcae predicts.&lt;/p&gt;
&lt;h3&gt;What I Verified&lt;/h3&gt;
&lt;h4&gt;1. The spectral radius at initialization&lt;/h4&gt;
&lt;p&gt;The cleanest possible check: the &lt;a href="https://baud.rs/Qp3fEh"&gt;Parcae paper&lt;/a&gt; claims a specific mathematical structure for the stability-guaranteeing matrix. Starting from the parameters' default initialized values of zero, the formula works out to a single number: &lt;code&gt;exp(-1) = 0.3679&lt;/code&gt;. If the code matches the paper, a fresh &lt;code&gt;mythos_1b&lt;/code&gt; should have its key matrix set to exactly that value everywhere.&lt;/p&gt;
&lt;p&gt;The parameterization in the code is:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;A = exp(-exp(log_dt + log_A))
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;With &lt;code&gt;log_A&lt;/code&gt; and &lt;code&gt;log_dt&lt;/code&gt; both initialized to zero, that becomes &lt;code&gt;exp(-exp(0)) = exp(-1) = 0.3679&lt;/code&gt;. The matrix in question is diagonal, meaning it's effectively a list of numbers rather than a two-dimensional grid, so the spectral radius (technical definition: magnitude of the largest eigenvalue) reduces to the largest absolute value in that list. At initialization, every entry is the same 0.3679.&lt;/p&gt;
&lt;p&gt;Measurement on the instantiated 1B model:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;log_A init value (first 5): [0. 0. 0. 0. 0.]
A min: 0.367879
A max: 0.367879
rho(A) at init: 0.367879
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Matches the theoretical prediction to six decimal places. The constraint is doing what the paper claims. This is the kind of thing you can only verify by actually running the code, because documentation and papers often drift from implementations, and subtle bugs in implementations of clever mathematical constructions are common.&lt;/p&gt;
&lt;h4&gt;2. The loops are not a no-op&lt;/h4&gt;
&lt;p&gt;Next question: do the loops actually do anything? The README claims each loop iteration is "functionally equivalent to one step of chain-of-thought." In practical terms, that means running more loops should produce different (and presumably better) output than running fewer. If the recurrent block has learned to do nothing, or if the architecture happens to be set up such that the injection of the original input drowns out everything the loop contributes, then all loop counts would produce identical outputs and the whole looped-transformer idea is moot.&lt;/p&gt;
&lt;p&gt;The cleanest test: take the same input, the same random-initialized model, and run it with different loop counts. Compare the outputs. For each token position, the model predicts a probability distribution over the next token. Two different ways to compare:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Argmax agreement&lt;/strong&gt; is just "for what fraction of positions does the most-likely-next-token come out the same?" If two runs pick the same top token 95% of the time, they mostly agree. If they agree 35% of the time, they're meaningfully different. The comparison below uses the 16-loop run as the reference.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KL divergence&lt;/strong&gt; is a standard measure of how different two probability distributions are, expressed in nats (units of the natural logarithm). Zero means identical distributions. Higher means more different. Intuitively: how much information is lost if you model a distribution as something other than itself.&lt;/p&gt;
&lt;p&gt;Running a fresh, &lt;em&gt;untrained&lt;/em&gt; &lt;code&gt;mythos_1b&lt;/code&gt; with a fixed input and seed:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;n_loops= 1: argmax agreement with 16-loop run = 35.2%   KL = 0.19 nats
n_loops= 2: argmax agreement                  = 65.6%   KL = 0.10 nats
n_loops= 3: argmax agreement                  = 72.7%   KL = 0.07 nats
n_loops= 4: argmax agreement                  = 80.5%   KL = 0.06 nats
n_loops= 6: argmax agreement                  = 88.3%   KL = 0.04 nats
n_loops= 8: argmax agreement                  = 90.6%   KL = 0.02 nats
n_loops=12: argmax agreement                  = 96.1%   KL = 0.005 nats
n_loops=16: argmax agreement                  =100.0%   KL = 0 nats
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Even with random initialization, the loops do substantive work. After a single loop, only 35% of the 128 output tokens match what the model produces after 16 loops. By three loops, 73% match. By twelve, 96%. The KL divergence tells the same story from a different angle: the probability distributions converge monotonically toward the 16-loop baseline as loop count rises.&lt;/p&gt;
&lt;p&gt;This is exactly the signature of a well-behaved recurrent system settling toward a fixed point. The loops aren't a no-op. They also aren't chaotic: each successive loop gets closer to convergence, which is what the stability guarantee predicts.&lt;/p&gt;
&lt;h4&gt;3. The stability constraint does its job&lt;/h4&gt;
&lt;p&gt;The reconstruction becomes load-bearing here. The Parcae paper claims the constraint on the matrix A is not just a nice-to-have but a requirement. Without it, they say, training diverges at aggressive learning rates. With it, training is stable.&lt;/p&gt;
&lt;p&gt;The test: build three otherwise-identical small models (shrunk to a 128-dimensional hidden state for training speed while keeping the full looped architecture). The only difference is how the matrix A is parameterized:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;stable&lt;/strong&gt;: the shipped &lt;code&gt;LTIInjection&lt;/code&gt; that uses the &lt;code&gt;exp(-exp(...))&lt;/code&gt; construction to keep A in the stable range by construction&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;unstable (start at 0.368)&lt;/strong&gt;: replace the clever construction with a raw learnable parameter initialized to the same value the stable version starts at&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;unstable (start at 0.95)&lt;/strong&gt;: same raw parameter, but initialized close to the stability boundary, to see whether training pushes it over&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Trained at a deliberately high learning rate of 0.05 with 8 recurrent loops per forward pass, for 300 steps, on random next-token prediction. (The point isn't to train a good model. It's to stress-test the stability mechanism under conditions where unstable training would be expected to break.)&lt;/p&gt;
&lt;p&gt;The metric is &lt;code&gt;max|A|&lt;/code&gt;, the largest entry in the diagonal. For the stable version, this is the spectral radius and the theory guarantees it stays below 1. For the unstable versions, nothing guarantees anything; we're watching whether training happens to keep it bounded.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: right;"&gt;Step&lt;/th&gt;
&lt;th style="text-align: center;"&gt;Stable&lt;/th&gt;
&lt;th style="text-align: center;"&gt;Unstable (0.368)&lt;/th&gt;
&lt;th style="text-align: center;"&gt;Unstable (0.95)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;0&lt;/td&gt;
&lt;td style="text-align: center;"&gt;0.368&lt;/td&gt;
&lt;td style="text-align: center;"&gt;0.418&lt;/td&gt;
&lt;td style="text-align: center;"&gt;1.000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;20&lt;/td&gt;
&lt;td style="text-align: center;"&gt;0.496&lt;/td&gt;
&lt;td style="text-align: center;"&gt;0.719&lt;/td&gt;
&lt;td style="text-align: center;"&gt;1.319&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;60&lt;/td&gt;
&lt;td style="text-align: center;"&gt;0.480&lt;/td&gt;
&lt;td style="text-align: center;"&gt;0.749&lt;/td&gt;
&lt;td style="text-align: center;"&gt;1.340&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;100&lt;/td&gt;
&lt;td style="text-align: center;"&gt;0.477&lt;/td&gt;
&lt;td style="text-align: center;"&gt;0.736&lt;/td&gt;
&lt;td style="text-align: center;"&gt;1.315&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;200&lt;/td&gt;
&lt;td style="text-align: center;"&gt;0.474&lt;/td&gt;
&lt;td style="text-align: center;"&gt;0.700&lt;/td&gt;
&lt;td style="text-align: center;"&gt;1.251&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;299&lt;/td&gt;
&lt;td style="text-align: center;"&gt;0.469&lt;/td&gt;
&lt;td style="text-align: center;"&gt;0.666&lt;/td&gt;
&lt;td style="text-align: center;"&gt;1.190&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Within 20 training steps, both unstable variants push at least one entry of A well above the stable version's cap. The 0.95-init case jumps past 1 immediately and stays there. Above 1 is the forbidden regime: a diagonal entry greater than 1 in magnitude means that dimension's contribution to the hidden state grows with every loop instead of shrinking. The Parcae paper says this is fatal. Does it actually kill training?&lt;/p&gt;
&lt;p&gt;Mostly, yes. The stable variant kept producing meaningful gradients the whole way through and the loss moved (noisily, because the training data was random). Both unstable variants had their gradient norm collapse to machine zero within 20 steps and stay there. Their loss froze at &lt;code&gt;log(512) = 6.238&lt;/code&gt;, which is the entropy of a uniform distribution over the 512-token vocabulary we used: the training signal became meaningless because the model was outputting a flat "I have no preference about any token" distribution regardless of input.&lt;/p&gt;
&lt;p&gt;This isn't the classic way training fails. It's not the "loss explodes to infinity and the whole job crashes" failure mode most people think of. It's subtler: the recurrent state grows large enough that the final output saturates to uniform, every possible update to the weights produces the same (wrong) uniform output, so the gradients go to zero and the optimizer stops making progress. Training is effectively dead, silently.&lt;/p&gt;
&lt;p&gt;That is a specific failure mode the Parcae paper warns about, and it is exactly what happens here when the constraint is removed.&lt;/p&gt;
&lt;h4&gt;4. Hidden states blow up by exactly the predicted factor per loop&lt;/h4&gt;
&lt;p&gt;The previous experiment showed that removing the stability constraint breaks training. This one looks at the mechanism underneath. What does "the recurrent state grows unboundedly" actually look like numerically?&lt;/p&gt;
&lt;p&gt;The theory predicts that if the spectral radius is ρ, then after each loop the magnitude of the hidden state grows (or shrinks) by a factor of roughly ρ. With ρ &amp;lt; 1, repeated shrinkage by less than one converges toward a fixed value. With ρ &amp;gt; 1, repeated growth by more than one goes to infinity exponentially.&lt;/p&gt;
&lt;p&gt;Setup: instrument the model to record the magnitude of the hidden state at each loop iteration. Force A to specific values from stable (0.37) through borderline (1.0) through clearly unstable (2.0). Disable ACT halting (an early-exit mechanism explained below in experiment 5) so all 8 loops run and we can see the full trajectory. Each number below is the magnitude of the hidden state measured after that loop iteration completes.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: right;"&gt;Loop&lt;/th&gt;
&lt;th style="text-align: right;"&gt;ρ=0.37&lt;/th&gt;
&lt;th style="text-align: right;"&gt;ρ=0.9&lt;/th&gt;
&lt;th style="text-align: right;"&gt;ρ=1.0&lt;/th&gt;
&lt;th style="text-align: right;"&gt;ρ=1.2&lt;/th&gt;
&lt;th style="text-align: right;"&gt;ρ=1.5&lt;/th&gt;
&lt;th style="text-align: right;"&gt;ρ=2.0&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;1&lt;/td&gt;
&lt;td style="text-align: right;"&gt;91&lt;/td&gt;
&lt;td style="text-align: right;"&gt;91&lt;/td&gt;
&lt;td style="text-align: right;"&gt;91&lt;/td&gt;
&lt;td style="text-align: right;"&gt;91&lt;/td&gt;
&lt;td style="text-align: right;"&gt;91&lt;/td&gt;
&lt;td style="text-align: right;"&gt;92&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;td style="text-align: right;"&gt;124&lt;/td&gt;
&lt;td style="text-align: right;"&gt;172&lt;/td&gt;
&lt;td style="text-align: right;"&gt;182&lt;/td&gt;
&lt;td style="text-align: right;"&gt;200&lt;/td&gt;
&lt;td style="text-align: right;"&gt;228&lt;/td&gt;
&lt;td style="text-align: right;"&gt;274&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;3&lt;/td&gt;
&lt;td style="text-align: right;"&gt;136&lt;/td&gt;
&lt;td style="text-align: right;"&gt;246&lt;/td&gt;
&lt;td style="text-align: right;"&gt;272&lt;/td&gt;
&lt;td style="text-align: right;"&gt;330&lt;/td&gt;
&lt;td style="text-align: right;"&gt;432&lt;/td&gt;
&lt;td style="text-align: right;"&gt;638&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;4&lt;/td&gt;
&lt;td style="text-align: right;"&gt;141&lt;/td&gt;
&lt;td style="text-align: right;"&gt;312&lt;/td&gt;
&lt;td style="text-align: right;"&gt;363&lt;/td&gt;
&lt;td style="text-align: right;"&gt;487&lt;/td&gt;
&lt;td style="text-align: right;"&gt;738&lt;/td&gt;
&lt;td style="text-align: right;"&gt;1367&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;5&lt;/td&gt;
&lt;td style="text-align: right;"&gt;143&lt;/td&gt;
&lt;td style="text-align: right;"&gt;371&lt;/td&gt;
&lt;td style="text-align: right;"&gt;454&lt;/td&gt;
&lt;td style="text-align: right;"&gt;675&lt;/td&gt;
&lt;td style="text-align: right;"&gt;1199&lt;/td&gt;
&lt;td style="text-align: right;"&gt;2825&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;6&lt;/td&gt;
&lt;td style="text-align: right;"&gt;143&lt;/td&gt;
&lt;td style="text-align: right;"&gt;425&lt;/td&gt;
&lt;td style="text-align: right;"&gt;544&lt;/td&gt;
&lt;td style="text-align: right;"&gt;901&lt;/td&gt;
&lt;td style="text-align: right;"&gt;1889&lt;/td&gt;
&lt;td style="text-align: right;"&gt;5740&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;7&lt;/td&gt;
&lt;td style="text-align: right;"&gt;144&lt;/td&gt;
&lt;td style="text-align: right;"&gt;473&lt;/td&gt;
&lt;td style="text-align: right;"&gt;635&lt;/td&gt;
&lt;td style="text-align: right;"&gt;1172&lt;/td&gt;
&lt;td style="text-align: right;"&gt;2924&lt;/td&gt;
&lt;td style="text-align: right;"&gt;11570&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;8&lt;/td&gt;
&lt;td style="text-align: right;"&gt;144&lt;/td&gt;
&lt;td style="text-align: right;"&gt;517&lt;/td&gt;
&lt;td style="text-align: right;"&gt;726&lt;/td&gt;
&lt;td style="text-align: right;"&gt;1498&lt;/td&gt;
&lt;td style="text-align: right;"&gt;4476&lt;/td&gt;
&lt;td style="text-align: right;"&gt;23232&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Two things to notice. First, the stable column (ρ=0.37) converges to a fixed value around 143 and stops moving. Each loop shrinks the hidden state closer to an equilibrium, then settles. This is the intended behavior: a useful, computation-performing recurrent system that's doing work but not running away.&lt;/p&gt;
&lt;p&gt;Second, the ρ=2.0 column grows by almost exactly 2× per loop after the first couple: 274 → 638 → 1367 → 2825 → 5740 → 11570 → 23232. The last three ratios average 2.02×, which is as close to the theoretical 2.0 as you'd expect given the transformer block itself contributes nonlinear noise on top of the linear dynamics. The prediction is tight.&lt;/p&gt;
&lt;p&gt;Four loops of ρ=2.0 take the hidden state from 91 to 1367, already a 15× blowup. Sixteen loops (the designed inference depth for &lt;code&gt;mythos_1b&lt;/code&gt;) would push it to around 10^7, which is well past the representable range of bfloat16 and would produce infinities in a real forward pass.&lt;/p&gt;
&lt;p&gt;That is the stability analysis verified on actual silicon. The clever &lt;code&gt;exp(-exp(...))&lt;/code&gt; construction does what the paper says it does, and removing it produces exactly the divergence the paper says it produces, at the rate the paper says it produces it.&lt;/p&gt;
&lt;h4&gt;5. Throughput on a consumer APU&lt;/h4&gt;
&lt;p&gt;The numbers for curiosity, not for the thesis. Single prompt of 128 tokens, bfloat16, running on the integrated Radeon 8060S:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: right;"&gt;n_loops&lt;/th&gt;
&lt;th style="text-align: right;"&gt;Latency&lt;/th&gt;
&lt;th style="text-align: right;"&gt;Tokens/sec&lt;/th&gt;
&lt;th style="text-align: right;"&gt;Peak GB&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;1&lt;/td&gt;
&lt;td style="text-align: right;"&gt;141 ms&lt;/td&gt;
&lt;td style="text-align: right;"&gt;910&lt;/td&gt;
&lt;td style="text-align: right;"&gt;6.29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;td style="text-align: right;"&gt;254 ms&lt;/td&gt;
&lt;td style="text-align: right;"&gt;503&lt;/td&gt;
&lt;td style="text-align: right;"&gt;6.37&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;4&lt;/td&gt;
&lt;td style="text-align: right;"&gt;501 ms&lt;/td&gt;
&lt;td style="text-align: right;"&gt;255&lt;/td&gt;
&lt;td style="text-align: right;"&gt;6.42&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;8&lt;/td&gt;
&lt;td style="text-align: right;"&gt;1021 ms&lt;/td&gt;
&lt;td style="text-align: right;"&gt;125&lt;/td&gt;
&lt;td style="text-align: right;"&gt;6.44&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;16&lt;/td&gt;
&lt;td style="text-align: right;"&gt;2071 ms&lt;/td&gt;
&lt;td style="text-align: right;"&gt;62&lt;/td&gt;
&lt;td style="text-align: right;"&gt;6.44&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Latency is almost perfectly linear in loop count, which is what you'd expect: double the loops, double the compute, double the wall-clock time. The knee of the inference-time scaling curve, to the extent one exists, sits around 6 to 8 loops before the "more reasoning" benefit stops being worth the "wait twice as long" cost.&lt;/p&gt;
&lt;p&gt;The architecture has a feature called Adaptive Computation Time (ACT) that is supposed to help here. ACT learns, per-position in the prompt, whether that token's representation has "converged enough" to stop looping. Easy tokens (a period, a common function word) should halt after a couple of loops; hard tokens (the key answer in a math problem) keep looping. In theory, this saves compute on easy tokens.&lt;/p&gt;
&lt;p&gt;In practice, ACT had no measurable effect on throughput in my runs. Two reasons. First, ACT only breaks the loop early if &lt;em&gt;every&lt;/em&gt; position in the batch has halted, because the GPU runs all positions in parallel and can't just skip one. With 128 positions in the prompt, the probability that every single position happens to halt simultaneously is effectively zero, so the early-exit path never fires. Second, the halting decision is made by a learned predictor, and my model was randomly initialized (not trained). An untrained model doesn't know which tokens are easy. You'd need actual training plus a mix of easy-and-hard positions for ACT to help. Neither was present in my experiments.&lt;/p&gt;
&lt;h3&gt;The Architectural Ceiling I Didn't Expect&lt;/h3&gt;
&lt;p&gt;While trying to run 24 loops for an ablation that's not in this post, I hit this error:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;IndexError&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bounds&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dimension&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It turned out the architecture has a small per-loop adaptation component, a tiny learnable "tweak" that's different for each loop iteration, letting loop #1 behave slightly differently from loop #8. That component is implemented as a lookup table with exactly &lt;code&gt;max_loop_iters&lt;/code&gt; entries. If you configured the model to train on 16 loops, you have 16 entries in the table. Trying to run a 17th loop means looking up entry 16 in a 16-entry table, which fails.&lt;/p&gt;
&lt;p&gt;This matters because one of the headline claims about looped transformers is &lt;em&gt;depth extrapolation&lt;/em&gt;: train the model on (say) 5-step reasoning chains, then at inference time let it run 10 or 20 loops to handle harder problems than it ever saw during training. The theoretical argument is that running more loops = more reasoning depth, and this should emerge as a free capability at inference time.&lt;/p&gt;
&lt;p&gt;The OpenMythos implementation supports depth extrapolation only up to &lt;code&gt;max_loop_iters&lt;/code&gt;. Past that, the per-loop adaptation lookup fails. You can extend the table, but only by reinitializing it larger and resuming training. You cannot simply crank a knob at inference time.&lt;/p&gt;
&lt;p&gt;That's a genuine constraint on the "more loops = deeper reasoning at inference" story, and it's the kind of thing you find only by trying to cross the boundary. Nothing in the README warns you about it. It's the sort of detail that disappears when a paper's theoretical claim ("more loops at inference!") becomes a concrete implementation ("a table indexed by loop number, which has a fixed size").&lt;/p&gt;
&lt;h3&gt;What I Could Not Verify&lt;/h3&gt;
&lt;p&gt;Nothing I did tells you anything about Claude Mythos.&lt;/p&gt;
&lt;p&gt;The architecture OpenMythos implements could be exactly the Mythos architecture. It could be a reasonable guess that shares some features with Mythos. It could be entirely wrong. You and I have no way to check, because Anthropic has not published the architecture. The &lt;code&gt;mythos_1b&lt;/code&gt; I trained is a 1B-parameter looped transformer that &lt;em&gt;behaves&lt;/em&gt; consistently with published research on looped transformers. It is not Mythos.&lt;/p&gt;
&lt;p&gt;This is the epistemic limit that the repo's disclaimer is trying to name. Running a speculative reconstruction tells you whether the reconstruction is internally coherent, and whether it matches the published research it claims to match. It tells you nothing about whether the reconstruction maps to the thing it was reconstructed from. No amount of running it closes that gap. The gap is closed only by information the model's creators chose not to release, and running silicon against latent belief doesn't produce that information.&lt;/p&gt;
&lt;p&gt;So "I verified that OpenMythos's architecture works as claimed" is a real and useful statement. "I verified that Claude Mythos uses this architecture" is not something I can say, and nobody outside Anthropic can, no matter how thoroughly they run the reconstruction.&lt;/p&gt;
&lt;h3&gt;What an Open Reconstruction Is Good For&lt;/h3&gt;
&lt;p&gt;It's good for three things.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;One, as a teaching artifact.&lt;/strong&gt; There's a live research line on looped transformers. Most descriptions of it are paper-shaped: dense with notation, theorem statements, ablation tables. OpenMythos is one of the few places you can read the whole architecture as working code in a single file, with every piece named and addressable. The stability guarantee that takes several pages of the Parcae paper to motivate resolves to one line of Python:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get_A&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_dt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_A&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That's the whole guarantee. You can read it, you can run it, you can verify it on your bench. The paper claim goes from abstract math to a concrete object you can measure. For anyone who wants to understand why looped transformers work and has been bouncing off the academic literature, that's worth the install.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Two, as a testbed for your own ideas.&lt;/strong&gt; If you want to try modifying the architecture (swap which attention variant it uses, change how the experts are routed, make the loops behave differently at different depths) the code is about 1000 lines of clean PyTorch. You don't need to build a looped transformer from scratch; you can start from a working baseline and modify. The research is live and you can participate in it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Three, as a way to calibrate your expectations.&lt;/strong&gt; My 1B-parameter &lt;code&gt;mythos_1b&lt;/code&gt; produces 62 tokens per second at 16 loops on the 8060S. A full-scale Mythos would be far larger and presumably run at similar or more loops per token. If Mythos is actually a Recurrent-Depth Transformer, that tells you something about the real cost of running it: every token takes the full loop count of compute, regardless of how "easy" it is. That's a different cost shape than a standard transformer, which uses the same compute per token but over a fixed number of distinct layers. You can form a rough sense of the compute-per-token ratio that a looped architecture would imply for a frontier deployment, which is useful even if you never run a frontier model yourself.&lt;/p&gt;
&lt;p&gt;None of those three things are "I now know how Claude Mythos works." They are all "I now know things about looped transformers that I did not know before." For the blogger running an 8060S in a home lab, that's the realistic upside, and it's a larger upside than zero.&lt;/p&gt;
&lt;h3&gt;Coda: The Reconstruction as Thing&lt;/h3&gt;
&lt;p&gt;I wrote a &lt;a href="https://tinycomputers.io/posts/the-thing-and-the-endpoint.html"&gt;philosophy piece earlier this week&lt;/a&gt; about Heidegger's distinction between things and endpoints. A Z80 on a RetroShield is a thing: it gathers a world of silicon, engineers, software history, and your own hands. A cloud API is an endpoint: it offers a contract and deliberately hides everything behind it.&lt;/p&gt;
&lt;p&gt;Claude Mythos is an endpoint. You send tokens, you get tokens, the weights are not yours, the architecture is not yours, and if Anthropic swaps the backing model nothing changes for the caller by design. That's the whole value proposition. It refuses to gather.&lt;/p&gt;
&lt;p&gt;OpenMythos is a thing. I have its weights. I know the parameter count down to the last entry: 1,064,028,034. I measured its internal stability matrix at initialization and watched it move across training. I watched the hidden state blow up to 23,000 when I forced the instability and disabled the halting mechanism. I know how long a forward pass takes on my specific GPU with my specific ROCm wheel on a specific Tuesday in April 2026. It gathers a whole lineage: the 2026 Parcae paper that explained the stability trick, the older research on looped transformers that it built on, Kye Gomez's speculative synthesis of the two into a candidate architecture for Mythos, the AMD gfx1151 toolchain that lets me run any of this on a Ryzen APU at all, the one-line patch I had to apply to the code, my own debugging session, and thirty minutes of my GPU's fans running at full tilt.&lt;/p&gt;
&lt;p&gt;The thing gathers. What it gathers, though, is the &lt;em&gt;reconstruction&lt;/em&gt;. Not the reconstructed. My 1B model is a physical artifact with measurable behavior. It is not a window onto Anthropic's internals. Those internals remain an endpoint, and the endpoint remains abstract.&lt;/p&gt;
&lt;p&gt;Mythology intact. Architecture verified. That is what a home lab buys you in 2026.&lt;/p&gt;</description><category>ai</category><category>claude</category><category>looped transformer</category><category>mythos</category><category>openmythos</category><category>philosophy</category><category>reconstruction</category><category>recurrent depth transformer</category><category>rocm</category><category>strix halo</category><guid>https://tinycomputers.io/posts/architecture-verified-mythology-intact.html</guid><pubDate>Mon, 20 Apr 2026 13:00:00 GMT</pubDate></item><item><title>What Routing 314 Nets Taught Me About AI-Assisted PCB Design</title><link>https://tinycomputers.io/posts/what-routing-314-nets-taught-me-about-ai-assisted-pcb-design.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/what-routing-314-nets-taught-me-about-ai-assisted-pcb-design_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;43 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;div class="sponsor-widget"&gt;
&lt;div class="sponsor-widget-header"&gt;&lt;a href="https://baud.rs/youwpy"&gt;&lt;img src="https://tinycomputers.io/images/pcbway-logo.png" alt="PCBWay" style="height: 22px; vertical-align: middle; margin-right: 8px;"&gt;&lt;/a&gt; Sponsored Hardware&lt;/div&gt;
&lt;p&gt;The GigaShield boards discussed in this post were fabricated by &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt;. Their DFM review caught the clearance issue that forms most of this story's middle act — which, as it turns out, is exactly the right kind of thing to catch before the copper is cut. PCBWay offers PCB prototyping, assembly, CNC machining, and 3D printing services with turnaround times starting at 24 hours. &lt;a href="https://baud.rs/youwpy"&gt;pcbway.com&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;This is the fourth post in a running series about designing a level-shifter shield for the &lt;a href="https://baud.rs/poSQeo"&gt;Arduino Giga R1&lt;/a&gt; using &lt;a href="https://baud.rs/Z6Oq4k"&gt;Claude Code&lt;/a&gt; and open-source command-line EDA tools. A brief map of what's come before, since the series grew past its original planned scope:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;Fiverr PCB Design (\$468)&lt;/a&gt; — the first GigaShield, designed by a freelance contractor in KiCad. It worked for most things but broke against the Z80's tri-state bus because the auto-sensing level shifters couldn't cope with floating signals.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html"&gt;Redesigning a PCB with Claude Code and Open-Source EDA Tools (Part 1)&lt;/a&gt; — the v0.2 redesign. Replaced the TXB0108 auto-sensing shifters with SN74LVC8T245 driven shifters, generated the PCB programmatically from a Python script, autorouted, and shipped Gerbers to fabrication. (The "Part 1" in its title was meant to lead directly into a Part 2 about assembly and bring-up — but a bug derailed that plan.)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinycomputers.io/posts/how-a-pin-numbering-bug-killed-a-pcb.html"&gt;How a Pin Numbering Bug Killed a PCB&lt;/a&gt; — the unplanned post-mortem on why the v0.3 board didn't work. A pin-numbering convention mismatch on the dual-row headers put every signal at the wrong physical position. The fabricated board was perfect; the design was wrong.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This post is about the v0.4 respin. It is also an honest accounting of what AI-assisted PCB design actually looks like in practice: the places where the workflow is miraculous, the places where it is agonizing, and the hour-long arguments between Freerouting and pcb-rnd about whether two pieces of copper were three tenths of a millimeter apart.&lt;/p&gt;
&lt;p&gt;If you came here looking for a triumphant "I built a PCB with AI and it just worked" story, this is not that post. If you came here looking for a nuanced report from someone who spent days on this and now has calibrated opinions about where it's worth doing, welcome.&lt;/p&gt;
&lt;h3&gt;The Setup, Briefly&lt;/h3&gt;
&lt;p&gt;For readers who haven't read the earlier posts: the GigaShield is a 155mm x 90mm PCB that sits between an Arduino Giga R1 (3.3V logic) and a RetroShield Z80 (5V logic). It has ten SN74LVC8T245PW level-shifter ICs translating 72 channels between the two voltage domains. The Giga plugs into the bottom (3.3V headers), the RetroShield plugs into the top (5V headers), and the shifters bridge the two.  Our first design used TXB0108 level-shifter ICs but the autosensing signal direction did not play nicely with the Z80's tri-state signals.&lt;/p&gt;
&lt;p&gt;The entire PCB is generated by a Python script. Not the schematic — there is no schematic. Not the layout — there is no graphical layout. The script emits a &lt;a href="https://baud.rs/1J64T5"&gt;pcb-rnd&lt;/a&gt; board file directly: component placements, pad definitions, netlist, board outline, silkscreen text, all in text format. Running &lt;code&gt;python3 build_giga_shield.py&lt;/code&gt; produces &lt;code&gt;giga_shield.pcb&lt;/code&gt; in a fraction of a second. The board is then routed by &lt;a href="https://baud.rs/freer"&gt;Freerouting&lt;/a&gt; and exported to Gerbers via pcb-rnd's command-line tools.&lt;/p&gt;
&lt;p&gt;Everything happens in the terminal. No GUIs, no mouse clicks, no "did I save the layout?" anxiety.&lt;/p&gt;
&lt;p&gt;The v0.3 board — subject of the previous post — failed because of a subtle bug in this pipeline. The v0.4 board fixes that bug and several others I didn't know about. This post covers what it took to find and fix them.&lt;/p&gt;
&lt;h3&gt;Bug Class #1: The Kind AI Caught Easily&lt;/h3&gt;
&lt;p&gt;Before refabricating, I asked Claude Code to audit the Python generator for any additional bugs. I framed it as a careful code review with specific checkpoints: pin numbering in both the pcb-rnd and KiCad generators, SN74LVC8T245 pin mapping against the TI datasheet, critical Z80 signal routing, and component placement validation.&lt;/p&gt;
&lt;p&gt;The audit took maybe fifteen minutes of wall-clock time — during which Claude Code read the entire codebase, cross-referenced it against the datasheet, and produced a severity-graded report. Most findings were cosmetic (stale comments, a print statement that said "1x10 header" for an 11-pin connector, harmless inconsistencies between the two build scripts). One finding was genuinely important:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;H1 — Comment "default A→B" is factually wrong, and U10's pulldown produces incorrect default behavior without user intervention.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;DIR=LOW per the SN74LVC8T245 datasheet means B→A (B-side drives A-side). The comment states the opposite. More importantly, U10 uses DIR for CLK, RESET, INT, NMI — signals that must flow A→B (Giga→Z80). With R10 pulling DIR_U10 to GND, U10 powers up in the B→A direction, which means the 5V Z80 side would drive the 3.3V Giga side on these control pins. Backwards and potentially damaging if the Z80 is outputting signals while the Giga is also driving those pins.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This bug is worth dwelling on, because it highlights something important about how I've been working on this project. I didn't write that code. Claude Code did, three weeks earlier, as part of the initial generator script. The wrong comment and the wrong pulldown direction were both introduced by the same LLM that later caught the mistake on audit.&lt;/p&gt;
&lt;p&gt;That sounds like it should be embarrassing for the workflow — the tool that generates bugs is also the tool that reviews them — but I think it's actually the right shape of the argument. A single LLM pass is fallible. A single human pass is also fallible. What matters is whether the combined system catches bugs reliably before they ship. In this case, the first pass (generation) introduced a subtle error, the second pass (audit, explicitly framed as a datasheet-cross-referencing review) caught it. The human's job was to know that the second pass was worth requesting — and to recognize, when the audit report came back, which findings mattered and which were cosmetic.&lt;/p&gt;
&lt;p&gt;This is exactly the kind of bug that a human engineer might catch on a good day and miss on a bad day. The comment and the circuit were internally consistent but both wrong. The datasheet was authoritative but three hundred pages away. Claude Code's audit didn't find it by being clever — it found it by mechanically cross-referencing every pin mapping against the datasheet, every comment against the actual behavior, with the patience of a machine that doesn't get bored.&lt;/p&gt;
&lt;p&gt;The fix was a one-line change: swap R10 from a pulldown-to-GND to a pullup-to-+3V3. U10 now powers up in the correct direction without any firmware intervention.&lt;/p&gt;
&lt;p&gt;I'll be transparent: the wrong direction would not have destroyed the boards immediately. The 5V CMOS outputs driving a 3.3V CMOS input is within the chips' absolute maximum ratings for short periods, especially if the Giga's pins are configured as inputs during that window. But in the steady state, with the Arduino trying to drive those pins as outputs, you'd have a lot of current flowing through ESD diodes, probably latch-up, almost certainly failure. A bug that would have cooked several \$70 Arduino Giga R1s over the life of the project.&lt;/p&gt;
&lt;p&gt;Fifteen minutes of LLM review. One avoided burn-up. A comfortable argument for the workflow.&lt;/p&gt;
&lt;h3&gt;Bug Class #2: The Kind AI Didn't Help With At All&lt;/h3&gt;
&lt;p&gt;Then came the Freerouting clearance saga.&lt;/p&gt;
&lt;p&gt;After the v0.4 build script was fixed, regenerated, and re-routed with Freerouting at the familiar 0.254mm trace width and 0.254mm clearance rule, I packaged up the Gerbers and uploaded them to &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt;. Their automated DFM check failed almost immediately with a message I'd never seen before:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Failed reason: The spacing between copper traces and pads should be larger than 0.1mm.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A few hours later, a QA engineer at PCBWay followed up with a screenshot from their internal Gerber inspection tool. They had loaded the top-copper layer, zoomed into the crowded region around the level-shifter ICs, and drawn yellow arrows at eight or ten spots where traces were running uncomfortably close to pads. At one spot they'd highlighted in cyan, their measurement tool showed &lt;code&gt;D=0&lt;/code&gt; — literal zero distance between a trace and a pad.&lt;/p&gt;
&lt;div style="text-align: center; margin: 30px 0;"&gt;
&lt;img src="https://tinycomputers.io/images/giga-shield/pcbway-dfm-violations.png" alt="PCBWay QA engineer's Gerber inspection screenshot — yellow arrows mark violation spots, cyan measurement shows D=0 between a trace and a pad" style="max-width: 100%; border: 1px solid #ddd; border-radius: 8px;"&gt;
&lt;p style="color: #666; font-size: 12px; margin-top: 10px;"&gt;The screenshot PCBWay's QA engineer sent back. Yellow arrows mark the violations. Cyan highlight shows their measurement tool reporting D=0 — a trace touching a pad that it shouldn't be touching.&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;This should not have been possible. Freerouting had been told to maintain 0.254mm clearance between all copper. 0.254mm is 10 mil, which is more than double PCBWay's 0.1mm minimum. If the router respected its rules, PCBWay's automated check should have sailed through.&lt;/p&gt;
&lt;p&gt;But it hadn't.&lt;/p&gt;
&lt;p&gt;I went through the usual debugging motions. Ran pcb-rnd's DRC on the routed board: 121 clearance violations. Opened the board in a gerber viewer: confirmed the violations were real, not artifacts. Tightened Freerouting's clearance to 0.3mm, narrowed traces to 0.2mm to give more room, added explicit clearance rules for every object-pair type (wire-pin, wire-via, smd-pin, pin-pin, and so on):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rule&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;width&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m m-Double"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clearance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m m-Double"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clearance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m m-Double"&gt;0.3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;smd_smd&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clearance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m m-Double"&gt;0.3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;smd_via&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clearance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m m-Double"&gt;0.3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;smd_pin&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clearance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m m-Double"&gt;0.3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;pin_pin&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clearance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m m-Double"&gt;0.3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;pin_via&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clearance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m m-Double"&gt;0.3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;via_via&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clearance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m m-Double"&gt;0.3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;wire_wire&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clearance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m m-Double"&gt;0.3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;wire_via&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clearance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m m-Double"&gt;0.3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;wire_pin&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clearance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m m-Double"&gt;0.3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;wire_smd&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Re-ran Freerouting. Ninety minutes of autorouting and optimization later, pcb-rnd's DRC still reported 121 clearance violations. Identical count. As if the clearance setting had been ignored entirely.&lt;/p&gt;
&lt;p&gt;This went on for several iterations. I tried 0.35mm clearance. 0.4mm clearance. I made the trace narrower. I made it wider. I added explicit rules for each layer. The violation count remained 121. Freerouting's optimization phase consistently took ninety minutes and improved the design by "about 52%" each time. The DRC report was unmoved.&lt;/p&gt;
&lt;p&gt;Claude Code was with me through all of this. It read the Freerouting log. It parsed the DRC output. It spotted the &lt;code&gt;p_shape is not bounded&lt;/code&gt; warnings (only three occurrences, probably unrelated). It suggested hypotheses — maybe Freerouting measured clearance from via hole centers rather than copper rings, maybe the padstack path shapes were confusing the bounds calculation, maybe pcb-rnd's DRC was using a threshold different from what Freerouting was told. Each hypothesis was plausible. None of them panned out. The violation count stayed at 121.&lt;/p&gt;
&lt;p&gt;The breakthrough came not from any clever insight but from a dumb experiment: run the DRC on the &lt;em&gt;unrouted&lt;/em&gt; board. Zero violations. Run it after Freerouting: 121 violations. Same 121, every time. Re-route with completely different settings: still 121 violations at roughly the same coordinates.&lt;/p&gt;
&lt;p&gt;The coordinates were the tell. The 121 "violations" weren't scattered — they were clustered at specific, consistent locations. And those locations, when I finally examined them carefully, were pin-to-trace junctions. Where a trace ended at a pad on its own net. Where the copper legitimately overlapped because they were the same electrical node.&lt;/p&gt;
&lt;p&gt;pcb-rnd's DRC was flagging every legitimate trace-pad connection as a "shorted nets: net too close to other net" violation. Not because the nets were shorted, but because pcb-rnd's DRC algorithm didn't properly account for the fact that two pieces of copper on the &lt;em&gt;same&lt;/em&gt; net are supposed to touch. They're supposed to be connected. That's the whole point.&lt;/p&gt;
&lt;p&gt;All 121 "violations" were false positives. Freerouting had been maintaining 0.3mm clearance the whole time. pcb-rnd had been lying about it the whole time. And PCBWay's automated DFM had been flagging the real issue — spots where same-net connections appeared to violate clearance under their algorithm — except their tool was smart enough to usually recognize same-net connections, so it flagged only the ones where the overlap happened to look particularly bad in the Gerber rendering.&lt;/p&gt;
&lt;p&gt;The fix, once I understood this, was to increase clearance (which I'd been doing) until the visual overlap was conservative enough that PCBWay's tool stopped complaining, and stop trusting pcb-rnd's DRC entirely. The winning configuration was 0.254mm traces with 0.3mm clearance and the explicit per-type clearance rules. PCBWay's DFM passed on that submission.&lt;/p&gt;
&lt;p&gt;Total time spent on this: maybe six hours across two days, counting the autorouting time, the iteration loops, and the eventual diagnosis. Nothing Claude Code did made this faster. Claude Code could parse logs and suggest hypotheses as quickly as I could read them, but it couldn't see what I couldn't see. The problem wasn't in any file — it was in the interaction between three tools' different assumptions about what "clearance" means. That kind of bug lives in the interfaces, not in any single artifact. LLMs are not good at debugging interfaces they can't run.&lt;/p&gt;
&lt;p&gt;I don't think this is a damning critique of AI-assisted workflows. But it is a calibrating one. If you're choosing between "write a Python script and fight Freerouting and pcb-rnd and PCBWay's DFM" versus "click around in KiCad for four hours," the second path has fewer interfaces to break. Graphical tools eat their own complexity internally. The CLI workflow exposes every seam.&lt;/p&gt;
&lt;h3&gt;Bug Class #3: The Kind AI Made Worse&lt;/h3&gt;
&lt;p&gt;At some point during the clearance debugging, I decided to reduce the board from 6 layers to 4 layers. Freerouting had been routing successfully on four, so paying for six was wasteful. I updated the pcb-rnd build script to emit a 4-layer stack, changed the Groups() directive to &lt;code&gt;"1,c:2:3:4,s:5:6:7"&lt;/code&gt;, regenerated, exported DSN, routed, imported, exported Gerbers.&lt;/p&gt;
&lt;p&gt;The resulting Gerber package had five copper layers.&lt;/p&gt;
&lt;p&gt;Not six. Not four. &lt;em&gt;Five.&lt;/em&gt; With traces scattered across them in a way that suggested pcb-rnd, on &lt;code&gt;SaveTo&lt;/code&gt;, had rewritten my Groups() string into something of its own invention — &lt;code&gt;"6:8:1,c:2:3:5:10:11:4,s:9:7"&lt;/code&gt; — that created phantom layers and assigned them roles my build script hadn't intended. The gerber named &lt;code&gt;intern.copper.none.12.gbr&lt;/code&gt; had 1,406 traces. This was not a layer I had defined.&lt;/p&gt;
&lt;p&gt;I asked Claude Code to help me understand pcb-rnd's Groups() syntax. It gamely tried to parse the string. It offered three different interpretations, each of which would have produced a different layer stack than what pcb-rnd actually did. It couldn't fix the problem because it couldn't run pcb-rnd and observe the behavior. I couldn't fix it either, for the same reason with more forgivable excuses.&lt;/p&gt;
&lt;p&gt;After maybe two hours of going in circles, I reverted to the original 6-layer Groups() string, re-routed, exported Gerbers (getting six copper gerbers — four with traces, two with only padstack pads), and deleted the two empty ones before zipping. PCBWay doesn't care what my source PCB file says. They fabricate what's in the Gerber bundle. If I submit four copper layers, I get a 4-layer board.&lt;/p&gt;
&lt;p&gt;This is a pattern worth noting. Sometimes the right solution to a tool-chain problem isn't to fix the tool chain. It's to work around it. Claude Code is good at helping you solve problems correctly; it's not as good at helping you recognize that the correct solution is to stop trying to solve the problem. That's a uniquely human skill: knowing when to quit.&lt;/p&gt;
&lt;h3&gt;The Component Placement Lesson&lt;/h3&gt;
&lt;p&gt;One more story worth telling. The v0.3 design had the ten level-shifter ICs arranged as U1-U5 across the top of the board and U6-U10 stacked in a single vertical column on the right side. It looked tidy. It routed successfully on six layers. It also concentrated an enormous amount of signal traffic into a narrow vertical channel between the shifter cluster and the 2x18 headers — so narrow that Freerouting couldn't route all 308 nets on four layers without leaving some unrouted.&lt;/p&gt;
&lt;p&gt;Claude Code flagged this the first time I tried the 4-layer route. Not by proposing a new placement — it didn't. By noting that the unrouted nets all terminated in the same congested region, and asking whether I'd considered spreading U6 through U10 into a staggered two-column layout between J9 and J10. I hadn't. The suggestion was obvious in hindsight, which is the clearest sign that it was useful. A lot of engineering is exactly this: you know the answer once someone mentions it, but no one mentions it and you don't think to ask yourself.&lt;/p&gt;
&lt;p&gt;The new placement — U6, U8, U9 in one column closer to J9, U7 and U10 offset in another column closer to J10 — routed cleanly on four layers with 314/314 nets connected. The LLM didn't solve a geometric problem I couldn't solve. It pointed at a geometric problem I hadn't recognized as a problem. That's a different and narrower contribution than "AI designed my board," but it's also real and repeatable.&lt;/p&gt;
&lt;p&gt;For the curious: my mental model for why the original layout was bad had been "I want them in a tidy column near the connector." The LLM's implicit mental model, derived from having read every PCB design textbook ever digitized, was "signal traffic wants to spread, not concentrate." Both are defensible. The second is more useful.&lt;/p&gt;
&lt;h3&gt;What Actually Works&lt;/h3&gt;
&lt;p&gt;Stepping back from the debugging, here is my honest breakdown of where Claude Code earned its keep on this project versus where it didn't.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Where it was essentially indispensable:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Parsing hierarchical S-expression formats.&lt;/strong&gt; The original KiCad design from &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;the Fiverr engineer&lt;/a&gt; was a &lt;code&gt;.kicad_sch&lt;/code&gt; file with nested sheets, positional net labels, and implicit connections across hierarchical boundaries. Extracting the 72-channel signal mapping from that file would have taken me half a day by hand and maybe an hour of Python if I were patient. Claude Code did it in about twenty minutes of interactive conversation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Generating boilerplate code with domain-specific constraints.&lt;/strong&gt; The &lt;code&gt;tssop24_element()&lt;/code&gt; function that produces the SN74LVC8T245PW footprint is three dozen lines of boring arithmetic — pad positions, coordinate transforms, string formatting. I asked for a pcb-rnd Element that matched the SN74LVC8T245PW footprint and got working code on the first try — Claude Code fetched the TI datasheet, read the pcb-rnd format reference, and produced the correct geometry without me having to hand it either document. I would have gotten it wrong on the first try if I'd written it by hand, because I would have flipped the B-side pin order (pins 13-24 run bottom-to-top on this package, which is easy to miss).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cross-referencing mechanical information against specifications.&lt;/strong&gt; The audit I mentioned earlier — reading my entire codebase and checking it against the TI datasheet — would have been tedious and error-prone to do manually. Claude Code did it reliably and caught a bug I would have shipped.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;File format debugging.&lt;/strong&gt; pcb-rnd's error messages are unhelpful by the standards of modern software. The Specctra DSN format has undocumented quirks. Gerber apertures have version-specific formatting differences. Every time I hit an error I didn't understand, Claude Code could read the file, diff it against a working example, and tell me what was different. This is LLM work at its best — bulk pattern-matching across reference material I don't have memorized.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Bridging the SSH workflow.&lt;/strong&gt; pcb-rnd runs on Linux and I work on a Mac. Half of the pipeline was shell commands to upload files, run pcb-rnd on a remote machine, download results, and iterate. Claude Code managed that SSH shuffle cleanly across hundreds of invocations. Not a hard problem, but a tedious one. Delegating it was valuable.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Where it didn't help:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Diagnosing cross-tool interaction bugs.&lt;/strong&gt; The 121-false-positive clearance saga was a problem that lived between three tools' different mental models. Claude Code could read each tool's output but couldn't run experiments to validate hypotheses. I had to do that myself, slowly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Making placement decisions involving physical intuition.&lt;/strong&gt; Claude Code helped with the U6-U10 restacking only after I'd ruled out other options. It didn't lead the design choice. For pure physical intuition — "this will be too dense," "this trace will pick up noise from this inductor," "this via is a mechanical weakness" — the LLM was helpful as a second opinion but not a primary source.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Knowing when to stop debugging and accept a workaround.&lt;/strong&gt; When pcb-rnd's &lt;code&gt;SaveTo&lt;/code&gt; mangled my layer stack, Claude Code happily kept trying to fix it. It took me longer than it should have to realize the fix was to not use &lt;code&gt;SaveTo&lt;/code&gt;. LLMs are optimized for "helpfully continue the task." Human judgment is needed for "recognize the task is wrong."&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;The Economics&lt;/h3&gt;
&lt;p&gt;It's fair to ask whether any of this is worth it. I'll try to be honest.&lt;/p&gt;
&lt;p&gt;Time invested in the v0.4 respin, from identifying the pin-numbering bug through submitting the second Gerber package:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Designing the fix: 2 hours (including the code audit that caught R10)&lt;/li&gt;
&lt;li&gt;Regenerating and re-routing: ~4 hours of wall-clock time, mostly Freerouting optimization&lt;/li&gt;
&lt;li&gt;The clearance debugging saga: ~6 hours&lt;/li&gt;
&lt;li&gt;The layer-stack misadventure: ~2 hours&lt;/li&gt;
&lt;li&gt;Documentation, git commits, this blog post: ~4 hours&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Total: ~18 hours of my engineering time, plus a lot of Freerouting CPU-hours.&lt;/p&gt;
&lt;p&gt;If I had paid a Fiverr freelancer to redo the design in KiCad from scratch, it would have cost about \$400 and taken about a week of wall-clock time. If I had learned KiCad properly and done the layout myself, it would have taken probably 20 to 30 hours for a 314-net 4-layer board, given I've never used KiCad's layout editor seriously. If I had used a commercial autorouter (Altium's, say), I would have paid thousands of dollars in licensing and probably spent 10 hours on the project.&lt;/p&gt;
&lt;p&gt;The Python-and-Claude-Code workflow cost me 18 hours of engineering time and the frustration of debugging interactions between three different CLI tools. PCBWay sponsored the fabrication, so the direct cash outlay was zero on this project — but that's an artifact of the sponsorship, not the workflow. A non-sponsored hobbyist running the same pipeline would spend roughly \$50 per prototype fabrication run at PCBWay's standard rates, so two runs (v0.3 plus v0.4) would be in the ballpark of \$100 total. The boards will arrive next week.&lt;/p&gt;
&lt;p&gt;Is that a good trade? It depends on what you value. For me, the workflow is repeatable — the next board I design will take a fraction of this effort because the tool chain is now debugged. For someone doing one PCB in their lifetime, this would be a terrible trade. For someone who plans to iterate on a design family over months or years, it's probably a good one. The upfront cost amortizes.&lt;/p&gt;
&lt;p&gt;I don't think the right question is "AI-assisted PCB workflow yes or no." I think it's "what kind of PCB work are you doing, and does your workflow compound?" If your boards are one-offs, use KiCad. If your boards are a family that evolves over time, scripted workflows start paying for themselves around the third iteration. Add AI assistance on top and the break-even point moves earlier.&lt;/p&gt;
&lt;h3&gt;What I'd Do Differently&lt;/h3&gt;
&lt;p&gt;A few hard-won principles from this project, in case anyone tries something similar:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Validate the pipeline on a dead-simple board before using it for a real one.&lt;/strong&gt; If I had generated a toy two-net board first, routed it, exported Gerbers, and submitted the files to PCBWay's online DFM check as a dry run, I would have caught the pin-numbering convention bug without wasting a fabrication run on v0.3. The total time for the dry run would have been maybe two hours. It would have saved a two-week turnaround — and, more importantly, not burned a sponsored fabrication run on a design that was never going to work.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Run DRC with multiple tools and compare.&lt;/strong&gt; pcb-rnd's DRC was actively misleading on this board. If I had also run the Gerbers through a third-party tool like &lt;a href="https://gerber-viewer.com/"&gt;Gerber Viewer&lt;/a&gt; or through KiCad's DRC after importing, I'd have noticed the disagreement and dug into it earlier. Never trust a single tool's DRC as authoritative.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Keep PCBWay's DFM as the source of truth.&lt;/strong&gt; Their automated check has seen more boards than any open-source DRC. It's tuned for actual manufacturability. When the open-source tools say "clean" and PCBWay says "fail," believe PCBWay.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Write test fixtures for the scripted generator.&lt;/strong&gt; A few unit tests that verify "net X has pins Y and Z" would have caught the pin-numbering bug in the build step rather than the fabrication step. For a scripted PCB workflow, the Python code needs the same test discipline as any other production code.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Budget for LLM's failure modes.&lt;/strong&gt; The LLM is fast and confident but can spiral into unproductive debugging loops. When a fix doesn't work on the second or third try, that's the signal to stop and think rather than let the LLM keep trying variations. Six hours on the clearance bug should have been two.&lt;/p&gt;
&lt;h3&gt;The Broader Question&lt;/h3&gt;
&lt;p&gt;There's a cultural current in software circles right now that frames AI coding assistants as either revolutionary or fraudulent. Neither frame captures what I experienced on this project.&lt;/p&gt;
&lt;p&gt;Claude Code didn't replace my expertise. I still had to know what a level shifter is, why the Z80 tri-states its bus during IO cycles, why the annular ring on a via matters for fab yield, when a pull-up is safer than a pull-down. Without that domain knowledge, I couldn't have directed Claude Code at the right problems, and I couldn't have recognized when its suggestions were wrong.&lt;/p&gt;
&lt;p&gt;Claude Code also didn't slow me down. The audit that caught the R10 bug was pure leverage. The file-format debugging was pure leverage. The SSH shuffle was pure leverage. The CLI workflow I'm using would not be tractable without an LLM assistant — too many file formats, too many tools, too much boilerplate. Claude Code didn't enable the workflow, but it made it something I could actually use instead of abandoning it for KiCad's GUI after the first pcb-rnd error message.&lt;/p&gt;
&lt;p&gt;What Claude Code is, for PCB design, is a competent junior collaborator with encyclopedic memory, infinite patience, and no physical intuition. It won't design your board for you. It'll help you design your board, if you know what you're doing. That's a different and less exciting claim than the marketing suggests, but it's also a more durable one.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The v0.4 boards are scheduled to arrive from &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt; in a few weeks. If they work — if the Z80 actually responds to its clock, if the data bus reads are clean, if &lt;code&gt;/IORQ&lt;/code&gt; is no longer stuck at ground — I'll write a Part 4 covering the bring-up and the test results. If they don't work, I'll write a Part 4 about whatever new bug I've introduced.&lt;/p&gt;
&lt;p&gt;In the meantime, the Python build script, pcb-rnd source files, Gerber outputs, Arduino test sketch, and every piece of infrastructure discussed in these posts is open source:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/ajokela/giga-shield"&gt;giga-shield&lt;/a&gt;&lt;/strong&gt; — Complete design files, build pipeline, and test firmware&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you're interested in scripted PCB design workflows, I'd genuinely like to hear from people who've tried similar approaches — or, more interestingly, tried and given up. The body of public literature on "I attempted this and it didn't work for me" is much smaller than on "I succeeded, here's how," and I think the former is more useful.&lt;/p&gt;
&lt;p&gt;*Previous posts in this series: &lt;a href="https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html"&gt;Redesigning with Claude Code (Part 1)&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/how-a-pin-numbering-bug-killed-a-pcb.html"&gt;How a Pin Numbering Bug Killed a PCB&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;Fiverr PCB Design (\$468)&lt;/a&gt; &lt;/p&gt;</description><category>ai</category><category>arduino</category><category>arduino giga</category><category>claude code</category><category>freerouting</category><category>hardware</category><category>level shifter</category><category>open-source</category><category>pcb design</category><category>pcb-rnd</category><category>pcbway</category><category>retroshield</category><category>z80</category><guid>https://tinycomputers.io/posts/what-routing-314-nets-taught-me-about-ai-assisted-pcb-design.html</guid><pubDate>Sun, 19 Apr 2026 23:00:00 GMT</pubDate></item><item><title>How a Pin Numbering Bug Killed a PCB</title><link>https://tinycomputers.io/posts/how-a-pin-numbering-bug-killed-a-pcb.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/how-a-pin-numbering-bug-killed-a-pcb_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;30 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;div class="sponsor-widget"&gt;
&lt;div class="sponsor-widget-header"&gt;&lt;a href="https://baud.rs/youwpy"&gt;&lt;img src="https://tinycomputers.io/images/pcbway-logo.png" alt="PCBWay" style="height: 22px; vertical-align: middle; margin-right: 8px;"&gt;&lt;/a&gt; Sponsored Hardware&lt;/div&gt;
&lt;p&gt;The boards in this post were fabricated by &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt;, who sponsored the GigaShield v0.3 level converter project. PCBWay offers PCB prototyping, assembly, CNC machining, and 3D printing services with turnaround times starting at 24 hours. Whether you're prototyping a single board or scaling to production, check them out at &lt;a href="https://baud.rs/youwpy"&gt;pcbway.com&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/giga-shield/giga-shield-v03-board.jpeg" alt="GigaShield v0.3 PCB — black solder mask, ten SN74LVC8T245PW level shifters, fabricated by PCBWay" style="float: right; max-width: 420px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;p&gt;The boards arrived from &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt; in perfect condition. Black solder mask, clean silkscreen, precise drill hits. The fabrication quality was excellent — 6-layer board, 6/6 mil trace/space, HASL finish, delivered in about two weeks from order to doorstep. PCBWay's online Gerber viewer had flagged one component overlap before manufacturing (a decoupling capacitor crowding a level shifter IC), their team asked about it, and we resolved it in a single email exchange. Everything about the fabrication was smooth.&lt;/p&gt;
&lt;p&gt;Then I plugged in the &lt;a href="https://baud.rs/87wbBL"&gt;RetroShield Z80&lt;/a&gt;, uploaded a test sketch to the &lt;a href="https://baud.rs/poSQeo"&gt;Arduino Giga R1&lt;/a&gt;, and nothing worked.&lt;/p&gt;
&lt;p&gt;Not "mostly worked with some issues." Nothing. The Z80 didn't respond to its clock. The data bus read all zeros. One control signal — &lt;code&gt;/IORQ&lt;/code&gt; — was stuck permanently LOW while the others sat HIGH. Sixteen bus cycles of silence where there should have been a Z80 booting up and fetching instructions from address 0x0000.&lt;/p&gt;
&lt;p&gt;The board wasn't defective. PCBWay had fabricated exactly what I asked them to fabricate. The problem was that what I asked them to fabricate was wrong.&lt;/p&gt;
&lt;h3&gt;The GigaShield v0.3&lt;/h3&gt;
&lt;p&gt;For context: the GigaShield is a level-shifter shield that sits between an Arduino Giga R1 (3.3V logic) and a RetroShield Z80 (5V logic). It has ten SN74LVC8T245PW 8-channel bidirectional level translators, translating 72 signal channels between the two voltage domains. The design was &lt;a href="https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html"&gt;covered in detail in Part 1&lt;/a&gt; of this series — the short version is that the entire PCB was generated programmatically from a Python script, exported to &lt;a href="https://baud.rs/1J64T5"&gt;pcb-rnd&lt;/a&gt; format, autorouted with &lt;a href="https://baud.rs/wdr0dP"&gt;Quilter.ai&lt;/a&gt;, and sent to PCBWay as Gerber files.&lt;/p&gt;
&lt;p&gt;The board has twelve connectors. Ten are single-row pin headers (1xN) for the standard Arduino shield headers, the analog breakout, and a direction-control header. Two are dual-row headers (2x18) that carry the high-pin-count digital signals — Arduino digital pins 22 through 53 on one side, level-shifted to 5V on the other.&lt;/p&gt;
&lt;p&gt;Those two dual-row headers are where the bug lives.&lt;/p&gt;
&lt;h3&gt;First Contact&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/giga-shield/giga-shield-stack.jpeg" alt="Full test stack: Arduino Giga R1 with GigaShield level converter and RetroShield Z80, connected with jumper wires for DIR control" style="float: right; max-width: 420px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;p&gt;The first test was simple: read the Z80's control lines with no clock running. The SN74LVC8T245 level shifters have explicit direction control (that's why I &lt;a href="https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html"&gt;chose them over the TXB0108&lt;/a&gt;), so with the direction defaulting to B-to-A (5V side drives, Arduino reads), I should see the Z80's active-low control outputs all sitting HIGH — their idle state.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="o"&gt;===&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Control&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;active&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;low&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Z80&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;signals&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;===&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;M1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;RD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;WR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;MREQ&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;IORQ&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;With&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Z80&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;all&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;should&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HIGH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inactive&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Four out of five correct. &lt;code&gt;/IORQ&lt;/code&gt; reading LOW was the first clue that something was wrong, but I initially dismissed it as a possible floating pin or a pull-down issue on the RetroShield side.&lt;/p&gt;
&lt;p&gt;The real alarm came when I tried to boot the Z80. The test sketch drives CLK, releases RESET, and captures the first 16 bus cycles:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;=== Test 5: Z80 boot — first 16 bus cycles ===
  RESET released
  T00: /M1=1 /RD=1 /MREQ=1 DATA=0x00
  T01: /M1=1 /RD=1 /MREQ=1 DATA=0x00
  T02: /M1=1 /RD=1 /MREQ=1 DATA=0x00
  ...
  T15: /M1=1 /RD=1 /MREQ=1 DATA=0x00
  RESET asserted — Z80 halted
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Sixteen clock cycles, and the Z80 never responded. No &lt;code&gt;/M1&lt;/code&gt; going low for an opcode fetch. No &lt;code&gt;/MREQ&lt;/code&gt; for a memory request. No &lt;code&gt;/RD&lt;/code&gt; for a read cycle. The Z80 was either not getting a clock signal, or not getting a valid RESET sequence, or both.&lt;/p&gt;
&lt;p&gt;I checked the obvious things first. Is the 3.3V rail powered? Yes. Is the 5V rail powered? Yes. Is the direction control for U10 (the control-output shifter carrying CLK) set correctly? Yes — J11 pin 10 tied to 3.3V, which sets DIR HIGH for A-to-B (Giga drives Z80). Is the RetroShield seated properly? Yes.&lt;/p&gt;
&lt;h3&gt;Looking at the Schematic&lt;/h3&gt;
&lt;p&gt;With solder bridges ruled out and the electrical fundamentals verified, I went back to the Python build script — the single source of truth for the entire PCB design.&lt;/p&gt;
&lt;p&gt;The board's ten single-row headers all worked. The level shifters were passing signals correctly (the control input test proved that U9 was translating). The problem was specific to the Z80 not receiving CLK and RESET, and &lt;code&gt;/IORQ&lt;/code&gt; being stuck at ground.&lt;/p&gt;
&lt;p&gt;All three of those signals route through the 2x18 dual-row headers: J9 (3.3V side) and J10 (5V side). I started tracing nets.&lt;/p&gt;
&lt;p&gt;The build script generates pin headers with a function called &lt;code&gt;pin_header_element&lt;/code&gt;. For a 2x18 header, it iterates:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ncols&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;      &lt;span class="c1"&gt;# 0, then 1&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;   &lt;span class="c1"&gt;# 0 through 17&lt;/span&gt;
        &lt;span class="n"&gt;pin_num&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This produces &lt;strong&gt;column-first&lt;/strong&gt; numbering: pins 1–18 run down column 0, pins 19–36 run down column 1.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Col 0         Col 1
Pin 1         Pin 19
Pin 2         Pin 20
Pin 3         Pin 21
...           ...
Pin 18        Pin 36
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;But the net arrays that assign signals to pin numbers were written in &lt;strong&gt;zigzag&lt;/strong&gt; order — the standard convention for dual-row pin headers, where pin numbers alternate between columns:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Col 0         Col 1
Pin 1         Pin 2
Pin 3         Pin 4
Pin 5         Pin 6
...           ...
Pin 35        Pin 36
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The J9 net array:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;j9&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'+5V'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'+5V'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s1"&gt;'D22'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'D23'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'D24'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'D25'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;
      &lt;span class="s1"&gt;'D52'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'D53'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s1"&gt;'GND'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'GND'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This assumes zigzag: pin 3 = D22 at (col 0, row 1), pin 4 = D23 at (col 1, row 1). But the footprint generator puts pin 3 at (col 0, row 2) and pin 4 at (col 0, row 3) — same column, two rows apart instead of across from each other.&lt;/p&gt;
&lt;p&gt;Every signal on J9 and J10 was at the wrong physical position.&lt;/p&gt;
&lt;h3&gt;The Geometry of the Bug&lt;/h3&gt;
&lt;p&gt;To understand why this specific mismatch is catastrophic, consider what happens to a few critical signals.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;D52 (CLK):&lt;/strong&gt; In the net array, D52 is at index 32 (pin 33). In zigzag, pin 33 is at (col 0, row 16) — left column, second-to-last row. In column-first, pin 33 is at (col 1, row 14) — right column, six rows higher. The RetroShield's CLK pin physically touches the pad at zigzag position (col 0, row 16), but the GigaShield routed CLK to column-first position (col 1, row 14). Different column, different row. The Z80 never sees a clock.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;D53 (/IORQ):&lt;/strong&gt; Pin 34 in zigzag is at (col 1, row 16). In column-first, pin 34 is at (col 1, row 15) — one row off. But at the zigzag (col 1, row 16) position, the column-first numbering places pin 35, which the net array assigns to &lt;strong&gt;GND&lt;/strong&gt;. The RetroShield's &lt;code&gt;/IORQ&lt;/code&gt; pin is physically sitting on a ground pad. That's why it reads LOW — it's hard-wired to ground through the PCB trace.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;D38 (RESET):&lt;/strong&gt; Pin 19 in zigzag is at (col 1, row 9). In column-first, pin 19 is at (col 1, row 0) — the very first row of the second column instead of the middle. RESET goes to a completely unrelated position.&lt;/p&gt;
&lt;p&gt;The pattern holds for every signal on the 36-pin header. The first two pins (+5V, +5V) happen to be at matching positions for both conventions (pin 1 is always col 0 row 0, and pin 2's mismatch doesn't matter since both are power). After that, every signal diverges.&lt;/p&gt;
&lt;h3&gt;Why Nothing Caught It&lt;/h3&gt;
&lt;p&gt;This bug is invisible to every standard verification step in the PCB pipeline.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;DRC (Design Rule Check):&lt;/strong&gt; All traces meet clearance and width rules. The traces connect the correct logical pin numbers — the netlist is internally consistent. DRC validates geometry, not pin-numbering conventions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Visual inspection:&lt;/strong&gt; The board renders look correct. Traces run from shifter pads to header pads in clean, routed paths. You can't tell from a PNG rendering that a header pad at row 14 should be at row 16. The footprint silkscreen shows pin 1, and the rest are just a grid of identical-looking through-holes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Quilter.ai:&lt;/strong&gt; The autorouter takes a netlist and routes traces between named pads. It has no concept of "this pad should be at this physical position" — it just connects pad A to pad B using copper. If the pad positions are wrong in the input file, the router dutifully routes to the wrong positions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Gerber review:&lt;/strong&gt; PCBWay's Gerber viewer (and any standard Gerber viewer) shows copper layers, drill hits, and silkscreen. It doesn't cross-reference pad positions against any external standard. The Gerbers were valid files describing a valid board — just not the board I intended.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pcb-rnd:&lt;/strong&gt; The PCB editor displays the board as defined. It doesn't know that a 2x18 header should use zigzag numbering. It renders what the file says.&lt;/p&gt;
&lt;p&gt;The bug exists in the gap between two conventions: the net array assumes zigzag ordering (which is the industry standard for dual-row headers and what KiCad uses in its standard footprint library), while the footprint generator implements column-first ordering (a natural but incorrect choice when iterating &lt;code&gt;for col... for row...&lt;/code&gt;). Both halves are internally consistent. The error is in their interaction.&lt;/p&gt;
&lt;h3&gt;The One-Line Fix&lt;/h3&gt;
&lt;p&gt;The fix is almost comically small relative to the damage:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Before (column-first — WRONG for standard dual-row headers):&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ncols&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;pin_num&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="c1"&gt;# After (zigzag — correct):&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ncols&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;pin_num&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Swap the loop order. That's it. Row-first iteration produces zigzag numbering: pin 1 at (col 0, row 0), pin 2 at (col 1, row 0), pin 3 at (col 0, row 1), pin 4 at (col 1, row 1), and so on. This matches the KiCad convention, the IPC convention, and what every dual-row connector in the world expects.&lt;/p&gt;
&lt;p&gt;Single-row headers (J1 through J8, J11) are unaffected because column-first and zigzag are identical when there's only one column. Only J9 and J10 — the two 2x18 headers — had the wrong pinout.&lt;/p&gt;
&lt;h3&gt;Can Software Fix It?&lt;/h3&gt;
&lt;p&gt;My first instinct was to work around the bug in firmware — remap which Arduino GPIO pins the sketch uses so that signals arrive at the correct physical positions despite the wrong traces. If the board routes D36 to where D52 should be, just use D36 for CLK in the sketch.&lt;/p&gt;
&lt;p&gt;It doesn't work, for two reasons.&lt;/p&gt;
&lt;p&gt;First, some signals map to power pins. D53 (&lt;code&gt;/IORQ&lt;/code&gt;) physically sits on a GND pad. You can't drive a signal through a ground trace in software. The pad is connected to the ground plane. It's not a GPIO — it's copper bonded to zero volts.&lt;/p&gt;
&lt;p&gt;Second, the level shifters have shared direction control. Each SN74LVC8T245 has eight channels and one DIR pin. All eight channels shift in the same direction. If you remap CLK (which needs Giga-to-Z80 direction) to go through a shifter that also carries address bus signals (which need Z80-to-Giga direction), you can't set both directions simultaneously. The shared DIR creates an unsolvable constraint when signals that need opposite directions land on the same shifter.&lt;/p&gt;
&lt;p&gt;The board needs a respin.&lt;/p&gt;
&lt;h3&gt;The Respin&lt;/h3&gt;
&lt;p&gt;With the bug identified and the fix trivial, the path forward is straightforward:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Fix the loop order in &lt;code&gt;pin_header_element&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Regenerate &lt;code&gt;giga_shield.pcb&lt;/code&gt; from the Python script&lt;/li&gt;
&lt;li&gt;Export to DSN format via pcb-rnd&lt;/li&gt;
&lt;li&gt;Re-route the traces&lt;/li&gt;
&lt;li&gt;Export new Gerbers&lt;/li&gt;
&lt;li&gt;Send to &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt; for fabrication&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The entire pipeline — from fix to fabrication-ready Gerbers — takes about twenty minutes. That's the advantage of the text-based, scriptable workflow described in &lt;a href="https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html"&gt;Part 1&lt;/a&gt;. Change one line of Python, re-run the pipeline, get a new board. No GUI interactions, no manual routing, no "did I remember to update the footprint" anxiety.&lt;/p&gt;
&lt;p&gt;The v0.3 board was originally routed on six layers because the autorouter couldn't find paths for every net with the components packed tightly together. For v0.4, we rearranged the level shifter ICs into a staggered two-column layout between the dual-row headers, giving the router more room to work with. The result: all 313 nets routed cleanly on just four layers. We're also using &lt;a href="https://baud.rs/bdZw62"&gt;Freerouting&lt;/a&gt; for the v0.4 routing — we had initially planned to use &lt;a href="https://baud.rs/wdr0dP"&gt;Quilter.ai&lt;/a&gt;, but their recent release introduced some parsing issues that made it unreliable for our KiCad files. Freerouting's v1.9 codepath, while older, has been rock-solid for this board.&lt;/p&gt;
&lt;p&gt;PCBWay's turnaround on prototype boards is fast — I've consistently gotten boards in under two weeks from order placement to delivery, including the &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;original v0.1 boards from the Fiverr design&lt;/a&gt;. For the v0.4 respin, I'm using slighly different specs: 4-layer, 1.6mm FR-4, black solder mask, HASL finish, standard 6/6 mil trace/space. PCBWay's pricing for prototype quantities (5-10 boards) is genuinely hard to beat — and the quality has been consistently good across every order. &lt;/p&gt;
&lt;p&gt;One thing I appreciate about PCBWay's process: the pre-production review. Before they start cutting boards, their engineering team reviews the Gerbers and flags potential issues. They caught the C29/U10 overlap on the v0.3 boards — a decoupling capacitor footprint that crowded a TSSOP-24 IC. We agreed to leave C29 unpopulated (it was one of 29 bypass caps, not critical), and PCBWay proceeded with fabrication. That kind of proactive communication saves real time and money. If I'd caught the pin numbering bug at that stage, the whole issue would have been avoided. But pin numbering convention mismatches aren't the kind of thing that shows up in a Gerber review — the files were technically correct.&lt;/p&gt;
&lt;h3&gt;What PCBWay Offers&lt;/h3&gt;
&lt;p&gt;For readers who haven't used PCBWay before, a brief overview of what they provide beyond basic PCB fabrication:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PCB Prototyping:&lt;/strong&gt; 1 to 8 layers, multiple surface finishes (HASL, ENIG, OSP, immersion silver/tin), controlled impedance, blind/buried vias, flex and rigid-flex boards. Minimum trace/space of 3.5/3.5 mil for standard process. They handle both small prototype runs (5 boards) and production quantities.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PCB Assembly (PCBA):&lt;/strong&gt; Full turnkey assembly with component sourcing, SMT and through-hole placement, and testing. For a board like the GigaShield with thirty-six SMD components (ten TSSOP-24 ICs, twenty-seven 0603 caps, nine 0603 resistors), assembly service eliminates the most tedious part of the build. TSSOP-24 packages are hand-solderable with a fine-tip iron and flux, but doing ten of them with twenty-four 0.65mm-pitch pins each is several hours of careful work. PCBWay's pick-and-place machines do it in minutes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3D Printing and CNC Machining:&lt;/strong&gt; Useful for enclosures, mounting brackets, and custom mechanical parts. Multiple materials available — PLA, resin, nylon, aluminum, steel. I haven't used these services for this project, but for projects that need a custom case or mounting hardware, having it from the same vendor simplifies ordering.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Stencil Service:&lt;/strong&gt; Solder paste stencils for reflow soldering. If you're doing your own assembly with a hot plate or reflow oven, a properly cut stencil makes paste application dramatically faster and more consistent than syringe application.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Design-for-Manufacturing (DFM) Review:&lt;/strong&gt; As mentioned above, PCBWay reviews your files before production and flags potential issues. This caught the C29 overlap on my boards. For someone iterating on a design — especially a design generated programmatically where visual review of the physical layout is less intuitive — this review is valuable.&lt;/p&gt;
&lt;p&gt;The pricing model scales well: prototype quantities are cheap enough to iterate without stress (important when you're, say, debugging a pin numbering convention), and production quantities get volume discounts. The online quoting system gives you a price instantly when you upload Gerbers, so you know the cost before committing.&lt;/p&gt;
&lt;h3&gt;Lessons&lt;/h3&gt;
&lt;p&gt;Every post-mortem needs a "what did we learn" section. Here's mine.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Test with hardware before ordering quantity.&lt;/strong&gt; If I'd breadboarded the 2x18 connection with jumper wires before committing to fabrication, I'd have caught the mismatch immediately. The single-row headers all work — I could have validated those and assumed the dual-row headers were fine. Testing the full signal path end-to-end, from Giga GPIO through the level shifter to the RetroShield's Z80, would have caught it in an hour.  One of the reasons I did not breadbroad the design first is I was unable to find breadboardable SN74LVC8T245PW level shifters.  I have &lt;a href="https://baud.rs/JyytXb"&gt;TXB0104 Bi-Directional Level Shifters&lt;/a&gt; but no driven level shifter breakouts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Convention mismatches are the hardest bugs.&lt;/strong&gt; The code was correct by its own logic. The net arrays were correct by the KiCad convention. The footprint was correct by its own convention. The bug was in the assumption that both sides used the same convention. No single piece of code was wrong — the error was in the interface between two correct pieces. This is the class of bug that code review, static analysis, and automated testing all miss, because each component passes its own tests.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Text-based PCB design cuts both ways.&lt;/strong&gt; The scriptable pipeline that let me generate and route a board in twenty minutes also let me ship a subtle pin-numbering bug to fabrication in twenty minutes. A graphical PCB editor would have forced me to visually place the header footprint and see the pin numbers on screen, which might have triggered a "wait, that doesn't look right" moment. The speed of automation is a liability when the automation is wrong. The counterargument is that graphical editors have their own class of invisible bugs — accidentally moved components, stray traces from mis-clicks, forgotten net connections. Text-based design doesn't eliminate bugs; it changes which bugs are likely.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pin numbering standards exist for a reason.&lt;/strong&gt; The IPC standard for dual-row connector numbering is zigzag. KiCad follows it. Every 2xN header footprint in every major footprint library follows it. When you write your own footprint generator, you need to follow it too. The column-first iteration (&lt;code&gt;for col... for row...&lt;/code&gt;) is a natural coding pattern — it's how you'd iterate a 2D array in most languages. It's also wrong for connector pin numbering. Convention over intuition.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The fabrication was perfect.&lt;/strong&gt; I want to emphasize this because it's easy to conflate "the board doesn't work" with "the board was made badly." &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt; manufactured exactly what the Gerber files specified, with excellent quality. Every trace, via, drill hit, and solder mask opening matched the design files. The bug was in my design files, not their manufacturing process. The distinction matters: when a board comes back dead, the first question should be "is my design correct?" not "did the fab house make an error?"&lt;/p&gt;
&lt;h3&gt;The Fix in Context&lt;/h3&gt;
&lt;p&gt;This is the second failure mode for this project, and both have been instructive. The &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;v0.1 board&lt;/a&gt; failed because the TXB0108 auto-sensing level shifters couldn't handle Z80 tri-state bus conditions — a component selection problem. The v0.3 board failed because of a pin numbering convention mismatch in the software that generates the PCB — a toolchain problem. Neither was a manufacturing defect. Both were design errors that passed every automated check and only surfaced when physical hardware was connected.&lt;/p&gt;
&lt;p&gt;The v0.4 respin will fix the pin numbering and go back to &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt; for fabrication. The turnaround time from fix to new boards is probably ten days — twenty minutes for the software pipeline, a few days for PCBWay's production, and a few days for shipping. In the meantime, the v0.3 boards are useful as physical references for component placement and as evidence that the level shifters themselves work correctly (the single-row header signals all translate properly through the SN74LVC8T245s).&lt;/p&gt;
&lt;p&gt;The Python build script, pcb-rnd source files, Gerber outputs, and the test sketch are all open source:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/pOawfA"&gt;giga-shield&lt;/a&gt;&lt;/strong&gt; — Complete design files, build pipeline, and test firmware&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;Part 2 of this series was supposed to cover assembled boards and Z80 bus captures. It will — just with v0.4 boards instead of v0.3. In the meantime, the v0.4 Gerbers are being generated and will be sent to PCBWay for the respin. The fix is one line. The lesson was worth more.&lt;/p&gt;</description><category>arduino</category><category>arduino giga</category><category>claude code</category><category>debugging</category><category>hardware</category><category>level shifter</category><category>open-source</category><category>pcb design</category><category>pcbway</category><category>retroshield</category><category>z80</category><guid>https://tinycomputers.io/posts/how-a-pin-numbering-bug-killed-a-pcb.html</guid><pubDate>Sat, 18 Apr 2026 15:00:00 GMT</pubDate></item><item><title>The Thing and the Endpoint: Why a Z80 Gathers a World and an API Doesn't</title><link>https://tinycomputers.io/posts/the-thing-and-the-endpoint.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-thing-and-the-endpoint_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;28 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;A Z80 DIP-40 weighs 5.7 grams. Run Zork on it, or run Zork in a browser emulator. The bytes execute the same way. One of these is a thing. The other isn't.&lt;/p&gt;
&lt;p&gt;That distinction has a name. Heidegger called it &lt;em&gt;Das Ding&lt;/em&gt;, the thing. He meant it in a specific sense that has nothing to do with how we normally use the word. A thing, for him, is something that gathers a world. A wine jug gathers earth (the clay, the grape), sky (the rain that watered the vines, the sun that ripened them), the mortals who drink from it and made it, and the occasion of its use. The jug is not a container that happens to have history. The gathering is the jug's being a jug.&lt;/p&gt;
&lt;p&gt;That sounds mystical on first read. On second read it describes something you already know. A Z80 RetroShield running CP/M and Zork at 2 a.m. on a workbench gathers a world in this specific sense. A request to an OpenAI endpoint does not, and cannot, and was deliberately designed not to. This essay is about why that difference matters, and why the people who build home labs and retro computing setups feel it even when they can't name it.&lt;/p&gt;
&lt;h3&gt;What the RetroShield Gathers&lt;/h3&gt;
&lt;p&gt;Start with the chip. The Z80 on my bench was fabricated by Zilog sometime in the late 1990s, which I know because the date code stamped on the plastic reads 9734. The silicon die underneath that plastic implements an instruction set designed in 1975 by Masatoshi Shima, the engineer who had already co-designed the Intel 4004 and 8080, and Federico Faggin, who had defected from Intel in 1974 to found Zilog. The Z80's register set inherits the 8080's. The opcode encoding is backwards-compatible with 8080 binaries. The chip in my hand is a physical artifact of a specific engineering defection.&lt;/p&gt;
&lt;p&gt;The plastic package is a DIP-40. Two rows of twenty pins, 0.6 inches between rows, 0.1 inches pin-to-pin. When you drop it into a machined socket, the pins bind slightly before seating. That's not sloppy tolerance, it's designed in: the socket contacts have to wipe against the pin to break through the oxide layer that forms on the tin plating. Every retro computer from the TRS-80 to the ZX Spectrum to the MSX used this package.&lt;/p&gt;
&lt;p&gt;The RetroShield is an Arduino shield. Erturk Kocalar published the original design on GitLab as open hardware. His version fits one Z80 on a 2x18 Arduino Mega bus header. Mine is a revision that fits two Z80s on the same shield, shared address and data buses, separate control signals on a supplementary header. The Gerbers were exported from pcb-rnd, a fork of the original gEDA PCB program maintained by Tibor Palinkas. The traces were placed by Freerouting, which Alfons Wirtz originally wrote on Oracle's dime, then re-released as MIT-licensed after Oracle lost interest. The board was fabricated by PCBWay in Shenzhen with a four-week turnaround. I soldered the DIP-40 sockets myself and discovered on the first power-up that every bus line was shorted to ground because the ground fill polygon's clearance cutouts hadn't fully encircled the pins in the Gerber export. Part 2 of this series is the story of finding that out.&lt;/p&gt;
&lt;p&gt;When the chip executes, it reads an opcode. 0xDB is &lt;code&gt;IN A, (n)&lt;/code&gt;, read a byte from an I/O port. The Arduino Mega firmware intercepts that read, treats the Z80 as if it were a CPU attached to a memory-mapped terminal, and feeds bytes back. The bytes are a Z3 storyfile: Zork I, compiled in 1980 by the Dynamic Modeling Group at MIT's Laboratory for Computer Science using a language called ZIL, originally written in MDL on a PDP-10 and ported into a virtual machine that could run on any 8-bit or 16-bit home computer of the era. Infocom ported it to CP/M. CP/M ran on the Z80. The chain closes on itself.&lt;/p&gt;
&lt;p&gt;Typing "GO NORTH" into the serial terminal produces, after a pause, the text "You are in an open field west of a big white house, with a boarded front door." That pause is not latency in any network sense. It is the Z80, at 4 MHz, running the Z-machine interpreter through thirty or so thousand clock cycles, each of which is a real transition of real silicon on real power.&lt;/p&gt;
&lt;p&gt;That is the gathering. It is not decoration. Zilog is present. MIT is present. Kocalar is present. PCBWay is present. pcb-rnd is present. The engineers who decided in 1975 that the return instruction should be one byte are present, because the silicon they designed is still decoding that byte at 4 MHz on my bench. My own ground-fill debugging is present, because the fill polygon is gone from this revision and that absence has a history. The thing gathers.&lt;/p&gt;
&lt;h3&gt;What the Cloud Endpoint Gathers&lt;/h3&gt;
&lt;p&gt;You can play Zork in a browser. archive.org hosts a Frotz build compiled to WebAssembly. You click a link, a Z3 interpreter materializes in a JavaScript sandbox, a virtual screen renders a virtual terminal, and "GO NORTH" produces "You are in an open field west of a big white house." Bit for bit, the same bytes of output. The game is the same. The Z-machine is the same. The story file is the same.&lt;/p&gt;
&lt;p&gt;But nothing gathers.&lt;/p&gt;
&lt;p&gt;The browser tab is not a thing in Heidegger's sense. It is a runtime. Runtimes are designed to be interchangeable. Run the same Frotz build in Chrome, in Firefox, in Safari. Run it on a phone, on a desktop, on a Chromebook in a school. Each one produces identical output from identical input. That interchangeability is not an accident or a failure. It is the entire engineering accomplishment of the web stack. A Z-machine interpreter that only ran on one specific browser on one specific machine would be a lesser piece of software, not a greater one.&lt;/p&gt;
&lt;p&gt;This is even clearer if the emulator is on a cloud-hosted runtime. You click a link to play-zork.com, it spawns a container in some datacenter, the container runs Frotz, the output streams back to you over HTTPS. Where is Zork running right now? Physically, electrically, in which building? You do not know. You are not meant to know. The service's value proposition depends on you not knowing. If US-East-1 fails over to US-East-2, your session survives with at most a reconnect. If Vercel goes under and the operator moves to Cloudflare Workers, your experience is identical. The gathering is suppressed by design.&lt;/p&gt;
&lt;p&gt;The same is true at a higher level of abstraction. A call to &lt;code&gt;api.openai.com/v1/chat/completions&lt;/code&gt; hits some cluster of H100s somewhere. Maybe in Texas. Maybe in Iowa. Maybe in Norway. The model behind the endpoint has weights, trained on hardware you will never see by engineers you will never meet. Tomorrow OpenAI might swap the backing model. Or add a 403 quota limit. Or migrate the inference stack to Blackwell. Your code does not change. That is the contract. The contract is the endpoint. The thing behind the endpoint is deliberately, structurally, invisible.&lt;/p&gt;
&lt;p&gt;This is not a complaint. The contract is useful. A company running a Rails app wants exactly this: stable interface, invisible infrastructure, someone else's problem. But the cost of that abstraction, the thing you pay with, is the gathering. The endpoint cannot gather a world because the world behind it is required to be interchangeable with any other world that can satisfy the contract.&lt;/p&gt;
&lt;h3&gt;Heidegger's Jug&lt;/h3&gt;
&lt;p&gt;In 1950 Heidegger gave a lecture called &lt;em&gt;Das Ding&lt;/em&gt;. He spent most of it talking about a wine jug. The essay is notoriously hard to read and almost comically literal. He describes the jug's sides, its base, its void. He distinguishes the jug from a cup and from a bottle. He asks what it means for a jug to be a jug.&lt;/p&gt;
&lt;p&gt;His answer is that a jug is not defined by its shape, its material, or its containing function. A jug is defined by what it gathers. When wine is poured from the jug, there gathers in that pouring: the earth (the grape that grew in soil, the clay fired into the vessel), the sky (the rain, the sun), the mortals (the drinker, the potter, the host), and what he calls the divinities (the toast, the libation, the occasion that makes this pouring different from running tap water into a glass). The fourfold, he called it. Earth, sky, mortals, divinities.&lt;/p&gt;
&lt;p&gt;The fourfold is the part of the essay that reads as mystical. Ignore the specific terminology if it grates. The structural claim underneath is simpler: a thing is a thing to the extent that it is a node in a web of presence. The jug is not just a container. The jug is a place where a whole world becomes, briefly and locally, present.&lt;/p&gt;
&lt;p&gt;Heidegger's counter-example in a later essay is the bridge. The old bridge at Heidelberg is a thing in his sense. It gathers the two banks, the river underneath, the road that runs across it, the people who cross. The bridge is what makes those things into a coherent place. The hydroelectric plant on the Rhine, which he treats in &lt;em&gt;The Question Concerning Technology&lt;/em&gt;, is not a thing. It is a piece of what he called the standing-reserve, &lt;em&gt;Bestand&lt;/em&gt;. The plant converts the river into potential electrical output, on demand, interchangeable with any other kilowatt on the grid. The plant does not gather. It extracts.&lt;/p&gt;
&lt;p&gt;This is the same distinction that separates the Z80 on my bench from the cloud-hosted Frotz emulator. The Z80 is a bridge. The cloud emulator is a power plant.&lt;/p&gt;
&lt;h3&gt;What's Actually Different&lt;/h3&gt;
&lt;p&gt;The functional output is the same. That is the central puzzle. The bytes of Zork's output are identical. The game is playable in either location. The player's subjective experience of "GO NORTH" producing a description of the open field is the same to within the tolerance of noticing.&lt;/p&gt;
&lt;p&gt;What is different is what each running copy &lt;em&gt;means&lt;/em&gt;, in a sense of meaning that is not about semantics but about presence.&lt;/p&gt;
&lt;p&gt;The Z80 running Zork on my bench means: Zilog's 1975 design decisions, Infocom's 1980 implementation, Kocalar's open hardware, my four-week wait for PCBWay, my ground-fill debugging session, the specific 4 MHz crystal that drives this specific chip tonight. The game is the surface. The gathering is what makes the game &lt;em&gt;this&lt;/em&gt; game and not an abstract instance of gameplay.&lt;/p&gt;
&lt;p&gt;The cloud-hosted Zork means: the game. That's the whole content. The infrastructure is interchangeable by contract, the hardware is invisible by design, the history is irrelevant to the service. You play Zork. That is the product. The product is the endpoint. The endpoint is the product.&lt;/p&gt;
&lt;p&gt;This is why people who run home labs can tell you war stories and people who use APIs cannot. "Remember the fan seizing on the P40 in July." "Remember when the ground fill shorted every bus line." "Remember the first time Forth actually loaded and we watched OK appear on the terminal." These stories are possible because the thing is specific, present, and has its own biography. "Remember that 503 from OpenAI last Tuesday" is not a story. It is a status page entry. The difference is not nostalgia or sentimentality. The difference is that one event happened to a thing and the other event happened to a contract.&lt;/p&gt;
&lt;h3&gt;The Enframing Connection&lt;/h3&gt;
&lt;p&gt;I wrote earlier about &lt;em&gt;Enframing&lt;/em&gt;, Heidegger's term for the mode of revealing that dominates the modern technological era. Enframing, &lt;em&gt;Gestell&lt;/em&gt;, is the stance that frames everything in advance as standing-reserve: resources on call, available on demand, interchangeable for the purpose at hand. The hydroelectric plant enframes the river as kilowatts. The modern timber industry enframes the forest as board-feet. The cloud endpoint enframes computation as a billable unit.&lt;/p&gt;
&lt;p&gt;Enframing is not a villain in Heidegger's telling. It is not a mistake. It is a stance that reveals certain truths about things, specifically their exchangeability as resources, at the cost of concealing other truths, specifically their being as things.&lt;/p&gt;
&lt;p&gt;The cloud endpoint is what Enframing looks like at the level of infrastructure. The GPU cluster is enframed as tokens-per-second, which are enframed as dollars-per-million-tokens, which are enframed as a line item on an invoice. That enframing is what makes the cloud economically tractable. It is also what makes the cloud unable to gather.&lt;/p&gt;
&lt;p&gt;The Z80 on my bench resists Enframing. Not because it's old or small or personal, but because I haven't framed it that way. I haven't asked it to be interchangeable. I haven't said "give me CP/M compute on demand at the lowest price." I have said "here is this specific chip, running this specific program, in this specific session." That's not a resource request. That's a relationship with a thing.&lt;/p&gt;
&lt;p&gt;This essay is not a sequel to &lt;em&gt;Enframing the Code&lt;/em&gt;. It is a companion piece, addressing what Enframing costs. Enframing names the stance. This one names what falls out of view when the stance becomes total.&lt;/p&gt;
&lt;h3&gt;Why People Build&lt;/h3&gt;
&lt;p&gt;The retro computing and home lab communities do something that looks, from an economic standpoint, irrational. They spend four-week lead times and hundreds of dollars to produce hardware that they could replace with a five-minute browser session for free. They run LLMs on Tesla P40s pulled out of eBay auction lots when the equivalent API call would cost fractions of a cent. They solder DIP-40 sockets in their basements when the emulator is a click away.&lt;/p&gt;
&lt;p&gt;You can explain this as nostalgia, and people sometimes do. You can explain it as hobby, and that's also partly right. You can explain it as skill acquisition, which is closer but still not the reason. The economic irrationality goes away the moment you stop assuming that the only value of running Zork is playing Zork.&lt;/p&gt;
&lt;p&gt;People build because the thing gathers. The RetroShield is not just a way to run Zork. It is a way to make Infocom's 1980 engineering present in the room tonight. It is a way to put Faggin's chip design decisions into active service at 4 MHz. It is a way to hold the physical object that descends from Zilog's break with Intel, from MIT's Dynamic Modeling Group, from the whole genealogy of 8-bit personal computing, and to use that object for its intended purpose on a Tuesday evening fifty years after the design was finalized.&lt;/p&gt;
&lt;p&gt;None of that is available through the endpoint. The endpoint is a contract for Zork. It is not a gathering of Zork's world.&lt;/p&gt;
&lt;p&gt;The feeling that people describe when they say "running Zork on a real Z80 feels different" is not aesthetic preference. It is the presence of the gathering. Something is actually there that is not there when you run the emulator in a browser tab, and that something is not information. It is a specific thing's being a thing.&lt;/p&gt;
&lt;h3&gt;What This Predicts&lt;/h3&gt;
&lt;p&gt;A test of the claim: this framework predicts that communities will form around specific hardware and not around cloud providers, and it predicts which specific hardware will gather the most.&lt;/p&gt;
&lt;p&gt;Communities form around the Tesla P40. Around the Raspberry Pi. Around the RetroShield. Around specific FPGA boards like the ULX3S and the Tang Nano 9K. Around the PDP-11 (still). Around the Apple IIe. Around the Amiga. Around AMD's Strix Halo in my own recent posts. The common feature: these are things with specific histories, specific constraints, specific failure modes, specific communities of use.&lt;/p&gt;
&lt;p&gt;Communities do not form around "the API endpoint for a frontier LLM." They do not form around "managed Postgres." They do not form around "us-east-1." There are users of those things, and there are engineers who get very good at using them, but the thing itself is not a gathering point because the thing is structurally interchangeable. You can run managed Postgres on AWS or GCP or Azure. It doesn't matter. That's the value. That's also why no one has a tattoo of managed Postgres.&lt;/p&gt;
&lt;p&gt;Within the cloud, communities do sometimes form, but they form around thing-like artifacts: specific open-source projects like Kubernetes or Postgres itself, specific hardware generations like the original A100 launch or the H200 launch, specific incidents like the us-east-1 outage of December 2021. The gathering happens when the abstraction fails or when a specific thing peeks through.&lt;/p&gt;
&lt;p&gt;This is not a prediction that cloud computing will fail or that people will abandon it. They won't. The endpoint is too useful. The prediction is narrower: the cloud will never gather the way things gather, and people will keep building physical hardware in their basements even when it is economically irrational, because the gathering is not available any other way.&lt;/p&gt;
&lt;h3&gt;The Chip on the Bench&lt;/h3&gt;
&lt;p&gt;I started with the weight of a Z80, 5.7 grams. End there. The chip is still on my bench. It is in a socket. The socket is on a PCB. The PCB is in a Mega header. The Mega is connected to my laptop by USB. The laptop is rendering a serial terminal. The terminal is showing the Zork prompt. The prompt is waiting.&lt;/p&gt;
&lt;p&gt;The physical object in front of me is small. It fits under my thumb. It was designed fifty years ago by an engineer who had just quit Intel. It has been sitting in a drawer for some years. Tonight it is running. Tonight a specific piece of silicon, fabricated in 1997, is decoding instructions written in 1980 by people in Cambridge, Massachusetts, to produce text that was designed to be read by someone sitting at a CRT terminal in a dorm room in 1982. That whole world is present on my bench, gathered by this chip, for as long as I keep the 4 MHz crystal running.&lt;/p&gt;
&lt;p&gt;When I type "GO NORTH" and the text appears, I am not receiving a service. I am participating in a thing that is thinging, in Heidegger's awkward verb form. I am one of the mortals in the fourfold. Faggin is one. Shima is one. The Infocom implementers are some. Kocalar is one. PCBWay's fabrication technicians are some. We are all gathered around the chip for the duration of this session.&lt;/p&gt;
&lt;p&gt;The API endpoint offers me none of this. The API endpoint offers me Zork. That's a different thing entirely, and most of the time it's what I want. But sometimes, on a Tuesday evening in 2026, it isn't, and the reason why has a name.&lt;/p&gt;</description><category>abstraction</category><category>cloud</category><category>das ding</category><category>hardware</category><category>heidegger</category><category>home lab</category><category>philosophy</category><category>retro computing</category><category>retroshield</category><category>z80</category><guid>https://tinycomputers.io/posts/the-thing-and-the-endpoint.html</guid><pubDate>Thu, 16 Apr 2026 13:00:00 GMT</pubDate></item><item><title>Designing a Dual Z80 RetroShield: Ground Planes, Ghost Shorts, and the Fix (Part 2)</title><link>https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-2.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/designing-a-dual-z80-retroshield-part-2_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;29 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;div style="float: right; max-width: 480px; margin: 0 0 1em 1.5em;"&gt;
&lt;img src="https://tinycomputers.io/images/dual-z80/IMG_4436.jpeg" alt="The assembled dual Z80 RetroShield plugged into an Arduino Mega 2560, with colored jumper wires running from the J2 control header to the Mega's free digital pins" style="width: 100%; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;
&lt;em style="display: block; font-size: 0.85em; margin-top: 0.4em;"&gt;The assembled dual Z80 RetroShield on the bench. Two Z80 CPUs socketed, jumper wires from J2 to the Arduino Mega's free pins, ready for testing.&lt;/em&gt;
&lt;/div&gt;

&lt;p&gt;In &lt;a href="https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html"&gt;Part 1&lt;/a&gt;, I designed a dual-Z80 RetroShield PCB entirely from the command line: two Z80 CPUs sharing an address and data bus, with independent control signals on a supplementary header. The Gerber files went to PCBWay. The boards arrived. I soldered everything up, plugged the shield into an &lt;a href="https://baud.rs/CKQf4B"&gt;Arduino Mega&lt;/a&gt;, wired jumpers from the J2 control header to the Mega's free pins, loaded a test sketch, and...&lt;/p&gt;
&lt;p&gt;Nothing. Both Z80s appeared to be alive (the diagnostic showed bus activity after reset), but they couldn't execute a single instruction. The data bus was completely unresponsive. The SMP kernel I'd written—a 52-byte symmetric multiprocessing demo where both CPUs boot the same code, pull tasks from a shared scheduler, and sum arrays in parallel—hit its cycle limit and returned zeroes.&lt;/p&gt;
&lt;p&gt;What followed was a multi-day debugging session that taught me more about PCB design than the entire design phase did. The root cause turned out to be a subtle interaction between pcb-rnd's ground fill polygon and its Gerber exporter. This is the story of finding it.&lt;/p&gt;
&lt;h3&gt;The Hardware&lt;/h3&gt;
&lt;p&gt;The boards came back from PCBWay with the usual four week turn around time. Clean fabrication, good silkscreen, no obvious defects on visual inspection. I soldered &lt;a href="https://baud.rs/pcKTdF"&gt;DIP-40 sockets&lt;/a&gt; for both &lt;a href="https://baud.rs/CJA3JT"&gt;Z80s&lt;/a&gt;, the 2×18 J1 bus header, the 2×6 J2 control header, and the bus activity LED. The Z80 chips dropped into their sockets with satisfying precision.&lt;/p&gt;
&lt;p&gt;The J2 header needed &lt;a href="https://baud.rs/eiPjaE"&gt;jumper wires&lt;/a&gt; to the Arduino Mega's free pins (D0–D21). I chose a deliberate mapping based on the pin functions: D9 for CPU2's clock (Timer1 OC1A, which can generate a hardware PWM signal for a stable 4 MHz clock), D4 for RESET, D5–D6 for INT/NMI, D7–D8 for MREQ/IORQ, D10–D11 for RD/WR, and D12–D13 for BUSRQ/BUSAK. Twelve jumper wires total, plus +5V and GND.&lt;/p&gt;
&lt;p&gt;The plan was to run a five-test validation suite: each Z80 solo (write a signature byte to a known address), shared RAM persistence (both CPUs write to different locations, verify both persist), a relay test (CPU1 computes a value, CPU2 picks it up and continues), and a loop counter (DJNZ loop to verify branch instructions work). After that, the SMP kernel.&lt;/p&gt;
&lt;p&gt;None of it worked.&lt;/p&gt;
&lt;video controls style="width: 100%; max-width: 640px; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); margin: 1.5em 0; margin-bottom: -1em;"&gt;
&lt;source src="https://tinycomputers.io/images/dual-z80/dual-z80-retroshield.mp4" type="video/mp4"&gt;
&lt;/source&gt;&lt;/video&gt;
&lt;p&gt;&lt;em&gt;Routing traces with Freerouting.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;First Contact: The Diagnostic Sketch&lt;/h3&gt;
&lt;p&gt;I backed off to a simpler diagnostic sketch to test each connection individually. It checked four things: idle state of control pins, bus activity after releasing reset, address bus bit toggling, and the BUSRQ/BUSAK handshake on CPU2.&lt;/p&gt;
&lt;p&gt;The results were a mix of encouraging and confusing:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="o"&gt;===&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Control&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Pin&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Idle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;State&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;===&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;FAIL&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;U1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MREQ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;D41&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;idle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HIGH&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;FAIL&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;U1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;IORQ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;D39&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;idle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HIGH&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;FAIL&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;U1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;RD&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;D53&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;idle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HIGH&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;FAIL&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;U1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;WR&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;D40&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;idle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HIGH&lt;/span&gt;

&lt;span class="o"&gt;===&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;U1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Bus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Activity&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;===&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;PASS&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MREQ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;went&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LOW&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Z80&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;alive&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;PASS&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;RD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;went&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LOW&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Z80&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fetching&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;PASS&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Address&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;active&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;first&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;fetch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x0000&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Both Z80s were "alive" in the sense that they responded to clock pulses and attempted to fetch from address 0x0000 after reset. But every control signal (MREQ, IORQ, RD, WR) read LOW regardless of what the Z80 was doing. They should have been toggling between HIGH and LOW during bus cycles. LOW all the time meant either the Z80 wasn't actually driving these pins, or something else was pulling them down.&lt;/p&gt;
&lt;p&gt;I filed this under "weird but not fatal" and pushed ahead to the SMP test. That's when things got serious.&lt;/p&gt;
&lt;h3&gt;The Bus Trace That Went Nowhere&lt;/h3&gt;
&lt;p&gt;The SMP kernel loaded into emulated memory at address 0x0000. After releasing reset, the Z80 should have fetched its first instruction (0xDB, the opcode for &lt;code&gt;IN A, (n)&lt;/code&gt;), executed it, and proceeded through the scheduler loop. Instead, a 150-cycle bus trace showed this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Cyc  MREQ IORQ RD WR  Addr    Data  Action
  0  LOW   LOW   L  L  0x0000  0xDB  MEM RD
  1  LOW   LOW   L  L  0x0000  0xDB  MEM RD
  2  LOW   LOW   L  L  0x0000  0xDB  MEM RD
  ...
149  LOW   LOW   L  L  0x0000  0xDB  MEM RD
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Every single cycle: same address, same data, same control signals. The Z80 was stuck at its reset vector, endlessly attempting to fetch the first byte and never advancing. MREQ and IORQ were LOW simultaneously on every cycle, which should never happen during normal Z80 operation—they're mutually exclusive signals.&lt;/p&gt;
&lt;p&gt;The Z80 was putting 0x0000 on the address bus (correct for a reset vector fetch), and I was driving 0xDB on the data bus (the correct opcode). But the Z80 wasn't reading it. Or rather, it was reading something else.&lt;/p&gt;
&lt;h3&gt;The Data Bus Loopback Test&lt;/h3&gt;
&lt;p&gt;I added a simple test: with the Z80 held in reset (outputs tri-stated), drive patterns on the data bus and read them back. If the Arduino writes 0xFF to PORTL and reads back 0xFF from PINL, the data bus is clean. If it reads back something else, there's a short or broken trace.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gd"&gt;--- Data Bus Drive Test ---&lt;/span&gt;
&lt;span class="w"&gt; &lt;/span&gt; bit0(D49)  wrote 0x01 read 0x01 OK
&lt;span class="w"&gt; &lt;/span&gt; bit1(D48)  wrote 0x02 read 0x02 OK
&lt;span class="w"&gt; &lt;/span&gt; bit2(D47)  wrote 0x04 read 0x04 OK
&lt;span class="w"&gt; &lt;/span&gt; bit3(D46)  wrote 0x08 read 0x08 OK
&lt;span class="w"&gt; &lt;/span&gt; bit4(D45)  wrote 0x10 read 0x10 OK
&lt;span class="w"&gt; &lt;/span&gt; bit5(D44)  wrote 0x20 read 0x00 FAIL
&lt;span class="w"&gt; &lt;/span&gt; bit6(D43)  wrote 0x40 read 0x00 FAIL
&lt;span class="w"&gt; &lt;/span&gt; bit7(D42)  wrote 0x80 read 0x80 OK
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Bits 5 and 6 were stuck LOW. The Arduino's GPIO pins couldn't drive them HIGH—something on the board was pulling those lines to ground with enough strength to overpower the Mega's output drivers.&lt;/p&gt;
&lt;p&gt;This explained why the Z80 couldn't execute. When I drove 0xDB (&lt;code&gt;IN A, (n)&lt;/code&gt;) on the data bus, the Z80 actually saw 0x9B (bits 5 and 6 forced low), which decodes as &lt;code&gt;SBC A, E&lt;/code&gt;—a completely different instruction. The Z80 was faithfully executing garbage.&lt;/p&gt;
&lt;h3&gt;Isolating the Short&lt;/h3&gt;
&lt;p&gt;Systematic isolation. First question: is it the Arduino or the board?&lt;/p&gt;
&lt;p&gt;I pulled the RetroShield off the Mega and ran the same loopback test with the shield disconnected. Every bit passed perfectly. The Arduino's PORTL pins (D42–D49) could drive any pattern and read it back correctly. The problem was definitively on the board.&lt;/p&gt;
&lt;p&gt;Second question: is it CPU1 or CPU2? The two Z80s share the address and data bus, so a fault on either side would affect both. I disconnected J2's +5V jumper to deprive CPU2 of power, leaving its outputs floating. Same result—bits 5 and 6 still stuck LOW. So CPU2 wasn't the culprit. The short was in CPU1's territory.&lt;/p&gt;
&lt;p&gt;Third question: is it the Z80 chip or the PCB? I pulled U1 from its socket. Bits 5 and 6 still stuck. Pulled U2 as well (since it shares the bus traces even without power). Both chips out, empty sockets, and the shorts persisted.&lt;/p&gt;
&lt;p&gt;Then I ran a comprehensive pin test with both Z80 chips removed—just the bare PCB with sockets:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gd"&gt;--- Data Bus (PORTL) ---&lt;/span&gt;
&lt;span class="w"&gt; &lt;/span&gt; [SHORT] D0/bit0 (D49/PL0)
&lt;span class="w"&gt; &lt;/span&gt; [SHORT] D1/bit1 (D48/PL1)
&lt;span class="w"&gt; &lt;/span&gt; [SHORT] D2/bit2 (D47/PL2)
&lt;span class="w"&gt; &lt;/span&gt; [SHORT] D3/bit3 (D46/PL3)
&lt;span class="w"&gt; &lt;/span&gt; [SHORT] D4/bit4 (D45/PL4)
&lt;span class="w"&gt; &lt;/span&gt; [SHORT] D5/bit5 (D44/PL5)
&lt;span class="w"&gt; &lt;/span&gt; [SHORT] D6/bit6 (D43/PL6)
&lt;span class="w"&gt; &lt;/span&gt; [SHORT] D7/bit7 (D42/PL7)

&lt;span class="gd"&gt;--- Address Bus ---&lt;/span&gt;
&lt;span class="w"&gt; &lt;/span&gt; [SHORT] A0 through A15 — all 16 lines

&lt;span class="gd"&gt;--- U1 Control Pins ---&lt;/span&gt;
&lt;span class="w"&gt; &lt;/span&gt; [OK] MREQ, IORQ, RD, WR, RESET, INT, NMI, CLK — all 8 fine
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Every single data line. Every single address line. All 24 shorted to ground. But all 8 control signals were clean.&lt;/p&gt;
&lt;p&gt;This wasn't random solder bridges. The pattern was too systematic: every line that belonged to the shared bus (connected to both U1 and U2) was shorted, while every line that connected to only one Z80 was fine. Something structural was wrong with the PCB.&lt;/p&gt;
&lt;h3&gt;The Ground Fill&lt;/h3&gt;
&lt;p&gt;I went back to the PCB design files. The original RetroShield design included a copper fill polygon on the bottom layer—a ground plane covering roughly the left half of the board (x = 0.3mm to 55.6mm, y = 0.3mm to 53.1mm). This is standard practice: ground planes reduce noise, improve signal integrity, and provide a low-impedance return path for high-frequency signals.&lt;/p&gt;
&lt;p&gt;The polygon had a &lt;code&gt;clearpoly&lt;/code&gt; flag, which tells pcb-rnd to maintain clearance around pins that aren't connected to the fill. Each Z80 through-hole pin specified 0.762mm clearance. The fill should have maintained that gap around every signal pin, connecting only to GND pins (via thermal relief pads) and leaving all data and address pins isolated.&lt;/p&gt;
&lt;p&gt;I also found some design-level flag errors. U1 pin 1 (A11, an address line) had a &lt;code&gt;thermal(0X)&lt;/code&gt; flag—explicitly telling pcb-rnd to connect this signal pin to a copper fill on the top layer. Several +5V pins had &lt;code&gt;connected&lt;/code&gt; flags. These were wrong, though in pcb-rnd's net-aware polygon system, they turned out to be harmless (a &lt;code&gt;connected&lt;/code&gt; flag only connects a pin to a fill on the same net, so +5V pins wouldn't connect to a GND fill). I fixed them anyway.&lt;/p&gt;
&lt;p&gt;But fixing the flags didn't solve the short. The 24 bus lines were still shorted to ground with both chips removed. The problem was deeper.&lt;/p&gt;
&lt;h3&gt;The Gerber Analysis&lt;/h3&gt;
&lt;p&gt;I dug into the actual Gerber output for the bottom copper layer. In Gerber format, ground fill clearances are typically achieved either through layer polarity commands (&lt;code&gt;%LPC*%&lt;/code&gt; to switch to "clear" mode and punch out holes) or by drawing the fill as a region with the clearance areas built into its outline.&lt;/p&gt;
&lt;p&gt;pcb-rnd uses the region approach. The fill polygon gets exported as a complex region (G36/G37 block) whose boundary weaves around each pin, creating clearance cutouts. Or at least, that's what it's supposed to do.&lt;/p&gt;
&lt;p&gt;I wrote a script to analyze the region vertices near U1's pins. For pin 9 (D5, a signal pin that should have full clearance), this is what the Gerber contained:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# U1-9(D5) SIGNAL at Gerber(120000,120000)&lt;/span&gt;
&lt;span class="c1"&gt;# Region 20: 5 vertices within 2.54mm&lt;/span&gt;
&lt;span class="c1"&gt;#   (120000,116587) dist=0.87mm angle=-90°&lt;/span&gt;
&lt;span class="c1"&gt;#   (121413,128587) dist=2.21mm angle=81°&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Five vertices. Two distinct positions. A proper circular clearance cutout around a through-hole pin needs 8 to 16 vertices distributed at various angles to approximate the circle. This had vertices at just two angles: -90° and 81°. The polygon boundary was making a shallow notch past the pin, not encircling it.&lt;/p&gt;
&lt;p&gt;Even worse: the closest vertex was only 0.87mm from the pin center. The pad edge is at 0.762mm (half of the 1.524mm pad diameter). That left 0.108mm of clearance—about 4.3 mil—on the side where the boundary came closest. Manufacturing tolerance at PCBWay is typically 4-6 mil. The clearance was right at the edge, and on other sides of the pin, there was no clearance at all because the polygon boundary didn't go around.&lt;/p&gt;
&lt;p&gt;For comparison, U1 pin 29 (GND, which &lt;em&gt;should&lt;/em&gt; connect to the fill) had a vertex at exactly 0.00mm distance. The fill went right through it. Correct.&lt;/p&gt;
&lt;h3&gt;The Root Cause&lt;/h3&gt;
&lt;p&gt;pcb-rnd's Gerber exporter was generating incomplete clearance cutouts in the ground fill polygon around through-hole pins. Instead of tracing a complete circle around each pin (maintaining the specified 0.762mm clearance on all sides), it was generating partial notches that only cleared the pin on one or two sides. On the remaining sides, the ground fill copper made direct contact with the pin pad.&lt;/p&gt;
&lt;p&gt;This affected every through-hole pin inside the polygon's boundary: all of U1's pins, all of J1's pins, and most of the vias. The pattern now made sense. The ground fill polygon covered x = 0.3mm to 55.6mm—the left side of the board. U1 (at x = 30.48mm) was squarely inside. U2 (at x = 66.0mm) was outside. All 24 bus lines pass through U1's footprint. All 8 control lines connect only to one Z80 and route through traces, not through-hole pads, in the fill area.&lt;/p&gt;
&lt;p&gt;The reason the initial diagnostic showed control signals as "OK" while bus lines were "SHORT" was purely geometric: the control signal traces exited the fill area quickly and reached the Arduino pins via the top copper layer, while the bus lines had through-hole pads sitting directly in the fill.&lt;/p&gt;
&lt;p&gt;It's worth noting that this bug is specific to the combination of pcb-rnd's polygon fill, through-hole pins, and Gerber export. SMD pads weren't affected (no SMD components were inside the fill area). The clearance math in pcb-rnd's internal representation appeared correct, but the translation to Gerber region vertices lost fidelity, producing polygon outlines that didn't fully encircle the pins.&lt;/p&gt;
&lt;h3&gt;The Fix&lt;/h3&gt;
&lt;p&gt;I removed the ground fill polygon entirely.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Find and remove the Polygon block in Layer 2 (bottom)&lt;/span&gt;
&lt;span class="c1"&gt;# Strategy: parse through the file, skip everything&lt;/span&gt;
&lt;span class="c1"&gt;# between 'Polygon("clearpoly")' and its closing ')'&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The board's GND connectivity doesn't depend on the fill. The autorouter (Freerouting) had already placed explicit copper traces connecting all GND pins. The fill was adding copper density and potentially improving signal integrity, but neither matters for a board running Z80s at 4 MHz. At these speeds, the electrical benefit of a ground plane is negligible, and the manufacturing risk (as we discovered) is real.&lt;/p&gt;
&lt;p&gt;I also cleaned up the erroneous flags while I was in there:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Removed &lt;code&gt;thermal(0X)&lt;/code&gt; from U1 pin 1 (A11 should not connect to any fill)&lt;/li&gt;
&lt;li&gt;Removed &lt;code&gt;connected&lt;/code&gt; from all +5V pins (J1-1, J1-36, U1-11, U1-24, U1-25, U2-11, U2-24, U2-25)&lt;/li&gt;
&lt;li&gt;Removed &lt;code&gt;thermal&lt;/code&gt;/&lt;code&gt;connected&lt;/code&gt; from all 11 vias (signal vias should get clearance, not connection)&lt;/li&gt;
&lt;li&gt;Left thermal flags only on GND pins: J1-18, J1-19, and U1-29&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The bottom copper Gerber went from 46KB (fill + traces) to 16KB (traces only). The region block count dropped from 26 to 5. Regenerated the Excellon drill file, rebuilt the production zip with BOM and centroid, and pushed everything to &lt;a href="https://github.com/ajokela/dual-z80"&gt;the repository&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Was This Always Broken?&lt;/h3&gt;
&lt;p&gt;A natural question: is this a flaw in the original RetroShield Z80 design, or something we introduced? The ground fill polygon, the &lt;code&gt;thermal(0X)&lt;/code&gt; flag on U1 pin 1, and all the &lt;code&gt;connected&lt;/code&gt; flags exist in &lt;a href="https://gitlab.com/8bitforce/retroshield-hw/-/tree/master/hardware/kz80"&gt;Erturk Kocalar's upstream design&lt;/a&gt; — identical to our initial commit. We didn't add any of them. So why does the original RetroShield work?&lt;/p&gt;
&lt;p&gt;To find out, I ran the original, unmodified &lt;code&gt;kz80.pcb&lt;/code&gt; through the same pcb-rnd Gerber exporter and performed the same vertex analysis on the output. The results were revealing:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pin&lt;/th&gt;
&lt;th&gt;Our Board (re-routed)&lt;/th&gt;
&lt;th&gt;Original Board (original traces)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;U1-9 (D5, signal)&lt;/td&gt;
&lt;td&gt;5 vertices, closest 0.87mm from center&lt;/td&gt;
&lt;td&gt;75 vertices, closest 1.14mm from center&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;U1-10 (D6, signal)&lt;/td&gt;
&lt;td&gt;similar&lt;/td&gt;
&lt;td&gt;169 vertices, closest 1.14mm&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;U1-29 (GND)&lt;/td&gt;
&lt;td&gt;vertex at 0.00mm (correct)&lt;/td&gt;
&lt;td&gt;vertices at 0.76mm (thermal spokes, correct)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The original board, exported through the &lt;em&gt;same&lt;/em&gt; pcb-rnd Gerber exporter, gets proper circular clearance cutouts with 75+ vertices distributed around each signal pin at a safe 1.14mm from center (0.38mm clearance from pad edge). Our re-routed board gets 5 vertices at only 0.87mm from center (0.11mm clearance — below manufacturing tolerance).&lt;/p&gt;
&lt;p&gt;The difference is the trace geometry. The original RetroShield's traces were routed with classic gEDA/pcb. When we added the second Z80, we stripped all traces and autorouted from scratch with Freerouting. The new trace layout changed how pcb-rnd's polygon fill algorithm tessellated the clearance boundaries. With different traces running through the fill area, the polygon's outline took different paths around the pins — and those paths didn't maintain adequate clearance.&lt;/p&gt;
&lt;p&gt;So the design flaw (wrong flags, a ground fill with tight clearance margins) was always latent in the original design. But it only manifested as physical shorts when the trace geometry changed. The original routing happened to produce geometry that pcb-rnd's exporter handled gracefully. Our autorouted traces didn't. It's the kind of bug that lies dormant until you touch something seemingly unrelated.&lt;/p&gt;
&lt;p&gt;This is why removing the polygon entirely was the right fix. It doesn't matter how the traces are routed if there's no fill to create clearance problems against. The GND connectivity is fully handled by explicit routed traces.&lt;/p&gt;
&lt;h3&gt;Verification&lt;/h3&gt;
&lt;p&gt;To confirm the fix, I verified the new Gerber output:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No layer polarity commands (no fill, no clearance needed)&lt;/li&gt;
&lt;li&gt;No large region blocks (no polygon fill)&lt;/li&gt;
&lt;li&gt;Only trace geometry and pad flashes remain on the bottom copper layer&lt;/li&gt;
&lt;li&gt;Bottom copper file size reduced by 65% (46KB → 16KB)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The new production files have been submitted to PCBWay for a second fabrication run. The fix is structural: without the fill polygon, there's nothing to short to. Every GND connection is an explicit routed trace, visible in the Gerber, and verifiable by inspection.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/dual-z80/kz80_top_hires.png" alt="Top view render of the corrected Rev C dual Z80 RetroShield PCB, showing clean routing without ground fill, both Z80 DIP-40 sockets, J1 bus header, and J2 control header" style="width: 100%; max-width: 800px; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); margin: 1.5em 0;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The corrected Rev C board: 553 traces, 25 vias, no ground fill polygon. Clean explicit routing for all 48 nets.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;What I Learned&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Ground fills aren't free.&lt;/strong&gt; They improve signal integrity on high-speed boards, but on a 4 MHz Z80, the benefit is marginal. The cost is an additional failure mode: if the clearance generation is buggy, incomplete, or at the edge of manufacturing tolerance, the fill becomes a liability. For simple retro computing boards, explicit GND traces are more predictable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Autorouting can break things you didn't touch.&lt;/strong&gt; The ground fill polygon wasn't something we modified. But by re-routing the traces (which we had to do after adding the second Z80), we changed the geometry that the polygon fill algorithm used to compute clearances. A latent design flaw became an active one. When you re-route a board with copper fills, you need to re-verify the fills, not just the traces.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Test the bare PCB before populating it.&lt;/strong&gt; If I'd run a continuity test between the Z80 socket pads and GND before soldering anything, I'd have caught this immediately. Instead, I spent time debugging firmware, suspecting timing issues, and questioning my understanding of Z80 bus cycles. The problem was never software.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Gerber is the contract.&lt;/strong&gt; The PCB design tool's internal representation doesn't matter; only the Gerber output does. Even if pcb-rnd's polygon clearance looks correct on screen, the Gerber export is what the fab house uses. Verify the Gerber, not the design file. A Gerber viewer would have shown the incomplete clearances immediately.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Diagnostic sketches are invaluable.&lt;/strong&gt; Writing targeted Arduino sketches that tested individual pins, drove patterns, and reported results over serial turned a "nothing works" situation into a systematic narrowing process. The data bus loopback test (drive a byte, read it back, compare) is trivially simple and would have caught this on day one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI-assisted debugging works the same way AI-assisted design works.&lt;/strong&gt; I brought the domain knowledge (how Z80 bus signals work, what the control signal timing should look like, what "stuck LOW" means electrically) and the AI handled the tedious parts: writing diagnostic firmware, parsing Gerber files, analyzing polygon vertices, checking coordinate math. The same division of labor that made the design possible also made the debugging possible.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Solder bridges are a red herring when the pattern is systematic.&lt;/strong&gt; Early on, I found and fixed a solder bridge between two adjacent pins. It didn't help. When two pins are bridged, you get two bad signals. When 24 pins are all shorted to the same rail, the cause is structural, not incidental. I should have recognized the pattern sooner and stopped looking at individual joints.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Open-source hardware needs open-source verification.&lt;/strong&gt; The entire reason I could diagnose this was that every file in the chain—PCB source, Gerber output, Excellon drill files—was text-based, parseable, and inspectable. I wrote Python scripts to analyze the Gerber's region vertices and measure distances to pin centers. Try doing that with a proprietary board file. The text-based EDA workflow that made the design possible also made the debugging possible.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The corrected boards are in fabrication. When they arrive, Part 3 will cover what this project was always about: bringing up the SMP kernel, watching two Z80 processors boot the same code, identify themselves, and divide work across a shared memory bus. The kernel is 52 bytes. The scheduler is in the Arduino. The demo sums an array split across both CPUs and measures the speedup.&lt;/p&gt;
&lt;p&gt;Both Z80s are confirmed alive. They just need a board that doesn't short their bus to ground.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The complete source—PCB files, Gerber production package, diagnostic sketches, SMP kernel, and wiring guide—is on &lt;a href="https://github.com/ajokela/dual-z80"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</description><category>arduino</category><category>debugging</category><category>dual cpu</category><category>gerber</category><category>ground plane</category><category>hardware</category><category>pcb design</category><category>pcb fabrication</category><category>pcb-rnd</category><category>retro computing</category><category>retroshield</category><category>z80</category><guid>https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-2.html</guid><pubDate>Mon, 06 Apr 2026 22:00:00 GMT</pubDate></item><item><title>Teaching an LLM a Language It Has Never Seen</title><link>https://tinycomputers.io/posts/teaching-llms-languages-theyve-never-seen.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/teaching-llms-languages-theyve-never-seen_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;33 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a href="https://tinycomputers.io/posts/introducing-lattice-a-crystallization-based-programming-language.html"&gt;Lattice&lt;/a&gt; is a programming language I designed. Its central feature is the phase system: every runtime value carries a mutability tag that transitions between states the way matter moves between liquid and solid. You declare a variable with &lt;code&gt;flux&lt;/code&gt; (mutable) or &lt;code&gt;fix&lt;/code&gt; (immutable). You &lt;code&gt;freeze&lt;/code&gt; a value to make it immutable, &lt;code&gt;thaw&lt;/code&gt; it to get a mutable copy, and &lt;code&gt;sublimate&lt;/code&gt; it to make it permanently frozen. &lt;code&gt;forge&lt;/code&gt; blocks let you build something mutably and have the result exit as immutable. None of this exists in any other language.&lt;/p&gt;
&lt;p&gt;Lattice does not appear in Claude's training data. I designed the language after the knowledge cutoff. There is no Lattice source code on GitHub (other than my own repository). There are no Stack Overflow answers. There is no tutorial ecosystem, no community blog posts, no textbook chapters. The only documentation that exists is the code itself, a 38-chapter handbook I wrote, and three blog posts on this site.&lt;/p&gt;
&lt;p&gt;Claude writes Lattice fluently. It writes correct programs using the phase system, the concurrency primitives, the module system, and the trait/impl pattern. It writes struct definitions with per-field phase annotations. It uses &lt;code&gt;forge&lt;/code&gt; blocks and &lt;code&gt;anneal&lt;/code&gt; expressions correctly. And it wrote a 4,955-line self-hosted compiler in Lattice, for Lattice: a complete tokenizer, parser, and bytecode code generator that reads &lt;code&gt;.lat&lt;/code&gt; source files and emits &lt;code&gt;.latc&lt;/code&gt; bytecode binaries.&lt;/p&gt;
&lt;p&gt;The question is how any of this is possible when the model has never seen the language before.&lt;/p&gt;
&lt;h3&gt;The Rust Smell&lt;/h3&gt;
&lt;p&gt;The answer starts with syntax. Here is a Lattice function:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;fn&lt;span class="w"&gt; &lt;/span&gt;greet(name:&lt;span class="w"&gt; &lt;/span&gt;String)&lt;span class="w"&gt; &lt;/span&gt;-&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;String&lt;span class="w"&gt; &lt;/span&gt;{
&lt;span class="w"&gt;    &lt;/span&gt;return&lt;span class="w"&gt; &lt;/span&gt;"Hello,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;!"
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And here is the Rust equivalent:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kp"&gt;&amp;amp;&lt;/span&gt;&lt;span class="kt"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="fm"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Hello, {name}!"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;fn&lt;/code&gt; keyword, the colon-separated type annotations, the &lt;code&gt;-&amp;gt;&lt;/code&gt; return type, the curly braces: Claude has seen these patterns millions of times in Rust code. When it encounters them in Lattice, it doesn't need to learn a new syntax. It needs to recognize a familiar one.&lt;/p&gt;
&lt;p&gt;This extends deep into the language. Lattice structs look like Rust structs:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;struct Point {
    x: Float,
    y: Float
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Lattice enums look like Rust enums:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;enum&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Shape&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Circle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Rectangle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Lattice match expressions look like Rust match expressions:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;match shape {
    Shape::Circle(r) =&amp;gt; pi() &lt;span class="gs"&gt;* r *&lt;/span&gt; r,
    Shape::Rectangle(w, h) =&amp;gt; w * h,
    _ =&amp;gt; 0.0
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Lattice traits and impl blocks look like Rust traits and impl blocks:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;trait&lt;span class="w"&gt; &lt;/span&gt;Printable&lt;span class="w"&gt; &lt;/span&gt;{
&lt;span class="w"&gt;    &lt;/span&gt;fn&lt;span class="w"&gt; &lt;/span&gt;display(self:&lt;span class="w"&gt; &lt;/span&gt;any)&lt;span class="w"&gt; &lt;/span&gt;-&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;String
}

impl&lt;span class="w"&gt; &lt;/span&gt;Printable&lt;span class="w"&gt; &lt;/span&gt;for&lt;span class="w"&gt; &lt;/span&gt;Point&lt;span class="w"&gt; &lt;/span&gt;{
&lt;span class="w"&gt;    &lt;/span&gt;fn&lt;span class="w"&gt; &lt;/span&gt;display(self:&lt;span class="w"&gt; &lt;/span&gt;any)&lt;span class="w"&gt; &lt;/span&gt;-&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;String&lt;span class="w"&gt; &lt;/span&gt;{
&lt;span class="w"&gt;        &lt;/span&gt;return&lt;span class="w"&gt; &lt;/span&gt;"(&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;)"
&lt;span class="w"&gt;    &lt;/span&gt;}
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Closures use the same &lt;code&gt;|params| body&lt;/code&gt; syntax. The &lt;code&gt;..&lt;/code&gt; range operator works the same way. The &lt;code&gt;?&lt;/code&gt; postfix operator propagates errors. &lt;code&gt;for item in collection&lt;/code&gt; iterates. &lt;code&gt;let&lt;/code&gt; binds variables. The structural similarity is pervasive enough that a model trained on Rust can parse and generate Lattice code without any Lattice-specific training.&lt;/p&gt;
&lt;p&gt;I did not design Lattice to be AI-friendly. I designed it because Rust's syntax is good and I wanted to use it for a language with different semantics. But the side effect is that Claude can write Lattice from day one because the syntax activates the same neural pathways that Rust does. The model doesn't know it's writing a different language. It knows it's writing code that looks like Rust, and the structural patterns transfer.&lt;/p&gt;
&lt;h3&gt;The Phase System: Where Familiarity Ends&lt;/h3&gt;
&lt;p&gt;The Rust resemblance carries Claude through basic Lattice programs without difficulty. Where it gets interesting is the phase system, because this is where Lattice has no analog in any language Claude has seen.&lt;/p&gt;
&lt;p&gt;In Rust, mutability is a static property: &lt;code&gt;let mut x = 5;&lt;/code&gt; or &lt;code&gt;let x = 5;&lt;/code&gt;. You decide at declaration time and the compiler enforces it. In Lattice, mutability is a runtime state that values transition through:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux counter = 0          // mutable
counter = counter + 1     // allowed: counter is fluid

freeze(counter)           // transition: fluid → crystal
counter = counter + 1     // runtime error: counter is crystal

flux copy = thaw(counter) // get a mutable copy
copy = copy + 1           // allowed: copy is fluid
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Claude handles this correctly. When I describe the phase system and provide examples, Claude generates code that uses &lt;code&gt;flux&lt;/code&gt; and &lt;code&gt;fix&lt;/code&gt; declarations appropriately, calls &lt;code&gt;freeze()&lt;/code&gt; at the right points, and avoids mutating crystal values. The model maps &lt;code&gt;flux&lt;/code&gt; to "mutable variable" and &lt;code&gt;fix&lt;/code&gt; to "immutable variable" in its internal representation, and the transition functions (&lt;code&gt;freeze&lt;/code&gt;, &lt;code&gt;thaw&lt;/code&gt;) become explicit state changes that it tracks through the program.&lt;/p&gt;
&lt;p&gt;The harder constructs are the ones with no familiar analog.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;forge&lt;/code&gt; blocks are mutable construction zones whose output exits as immutable:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;fix config = forge {
    flux c = {}
    c.host = "localhost"
    c.port = 8080
    c.debug = false
    c   // exits the forge block as crystal
}
// config is now crystal; cannot be modified
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Claude gets this right because the pattern (build something mutably, freeze the result) maps to the builder pattern in Rust and other languages. The syntax is novel but the concept isn't.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;anneal&lt;/code&gt; is harder. It temporarily thaws a crystal value into a mutable binding for the duration of a block, then re-freezes it:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;fix settings = forge { flux s = {}; s.theme = "dark"; s }

anneal(settings) |s| {
    s.theme = "light"   // temporarily mutable
}
// settings is crystal again, with theme = "light"
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Claude produces correct &lt;code&gt;anneal&lt;/code&gt; code when given the semantics, but it occasionally generates patterns that would work in Rust (taking a &lt;code&gt;&amp;amp;mut&lt;/code&gt; reference) but don't apply in Lattice (where &lt;code&gt;anneal&lt;/code&gt; is the only way to modify a crystal value in place). The model's Rust intuitions are strong enough to produce syntactically valid Lattice but sometimes semantically incorrect programs, because it defaults to Rust's mutation model when the Lattice-specific construct is unfamiliar.&lt;/p&gt;
&lt;p&gt;The reactive phase system is where Claude needs the most guidance. &lt;code&gt;react&lt;/code&gt;, &lt;code&gt;bond&lt;/code&gt;, and &lt;code&gt;seed&lt;/code&gt; have no precedent in any mainstream language:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux&lt;span class="w"&gt; &lt;/span&gt;temperature&lt;span class="w"&gt; &lt;/span&gt;=&lt;span class="w"&gt; &lt;/span&gt;72.0

react("temperature",&lt;span class="w"&gt; &lt;/span&gt;fn(name,&lt;span class="w"&gt; &lt;/span&gt;old_phase,&lt;span class="w"&gt; &lt;/span&gt;new_phase)&lt;span class="w"&gt; &lt;/span&gt;{
&lt;span class="w"&gt;    &lt;/span&gt;print("&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;changed&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;old_phase&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;to&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;new_phase&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;")
})

freeze(temperature)&lt;span class="w"&gt;  &lt;/span&gt;//&lt;span class="w"&gt; &lt;/span&gt;triggers&lt;span class="w"&gt; &lt;/span&gt;the&lt;span class="w"&gt; &lt;/span&gt;reaction&lt;span class="w"&gt; &lt;/span&gt;callback
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux primary = "active"
flux mirror = "active"

bond("mirror", "primary", "sync")  // when primary changes phase, mirror follows

freeze(primary)  // mirror also freezes
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Claude can produce these patterns when given the API, but it doesn't intuit them. It never suggests &lt;code&gt;react&lt;/code&gt; or &lt;code&gt;bond&lt;/code&gt; unprompted, because there's nothing in its training data that would trigger the association. These constructs must be taught explicitly. The Rust smell gets Claude through 80% of Lattice. The last 20% requires actual specification.&lt;/p&gt;
&lt;h3&gt;The Spectrum of Difficulty&lt;/h3&gt;
&lt;p&gt;Working with Claude on Lattice code over several months has revealed a clear gradient of difficulty:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Trivial (Rust transfer):&lt;/strong&gt; Functions, structs, enums, match expressions, closures, for loops, string interpolation, module imports, error propagation with &lt;code&gt;?&lt;/code&gt;. Claude writes these correctly on the first attempt because they're syntactically identical to Rust.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Easy (new vocabulary, familiar concept):&lt;/strong&gt; &lt;code&gt;flux&lt;/code&gt;/&lt;code&gt;fix&lt;/code&gt; declarations, &lt;code&gt;freeze()&lt;/code&gt;/&lt;code&gt;thaw()&lt;/code&gt; calls, basic phase checking. Claude maps these to mutable/immutable patterns it already knows. The vocabulary is new; the concept isn't.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Moderate (new pattern, teachable):&lt;/strong&gt; &lt;code&gt;forge&lt;/code&gt; blocks, &lt;code&gt;anneal&lt;/code&gt; expressions, &lt;code&gt;crystallize&lt;/code&gt; blocks, struct field-level phase annotations (alloy structs). These require explanation, but once Claude sees one or two examples, it generalizes correctly. The builder pattern and block-scoped mutation are close enough to existing patterns that the model bridges the gap.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hard (no analog, requires specification):&lt;/strong&gt; Reactive phase operations (&lt;code&gt;react&lt;/code&gt;, &lt;code&gt;bond&lt;/code&gt;, &lt;code&gt;seed&lt;/code&gt;), phase pattern matching (&lt;code&gt;fluid val =&amp;gt;&lt;/code&gt;, &lt;code&gt;crystal val =&amp;gt;&lt;/code&gt;), the concurrency constraint that only crystal values can be sent on channels, strict mode's consumption semantics for &lt;code&gt;freeze&lt;/code&gt;. Claude can use these but never invents them. They must be explicitly described.&lt;/p&gt;
&lt;p&gt;The concurrency constraint is a good example of the "hard" category. In Lattice, data sent on a channel must be crystal:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nv"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Channel&lt;/span&gt;::&lt;span class="nv"&gt;new&lt;/span&gt;&lt;span class="ss"&gt;()&lt;/span&gt;
&lt;span class="nv"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mutable"&lt;/span&gt;

&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ch&lt;/span&gt;.&lt;span class="k"&gt;send&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;runtime&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;error&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;cannot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;send&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;fluid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;value&lt;/span&gt;

&lt;span class="nv"&gt;freeze&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;ch&lt;/span&gt;.&lt;span class="k"&gt;send&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;works&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;now&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;crystal&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This rule exists because crystal values are deeply immutable: they can't be modified by the sender after transmission, which eliminates data races structurally. Claude understands the concept (Rust has &lt;code&gt;Send&lt;/code&gt; and &lt;code&gt;Sync&lt;/code&gt; traits that serve a similar purpose), but it doesn't automatically apply Lattice's specific rule without being told. Left to its own devices, Claude will try to send fluid values on channels, because that's what you'd do in Go or Python. The constraint must be stated.&lt;/p&gt;
&lt;p&gt;Strict mode (&lt;code&gt;#mode strict&lt;/code&gt; at the top of a file) is another case where Claude needs explicit guidance. In strict mode, &lt;code&gt;let&lt;/code&gt; is banned (you must use &lt;code&gt;flux&lt;/code&gt; or &lt;code&gt;fix&lt;/code&gt;), &lt;code&gt;freeze()&lt;/code&gt; consumes the original binding (Rust-like move semantics), and crystal bindings cannot be assigned to at all, not even as a runtime error. Claude can write strict-mode Lattice, but it defaults to casual-mode patterns unless reminded. The model's prior is "permissive runtime" because that's what most dynamic languages are.&lt;/p&gt;
&lt;p&gt;The gradient correlates exactly with how much the construct resembles something in Rust or another mainstream language. When the syntax is familiar, Claude's transfer learning handles it. When the concept is familiar but the syntax is new, one or two examples are enough. When both the syntax and the concept are novel, Claude needs the specification.&lt;/p&gt;
&lt;h3&gt;The Self-Hosted Compiler&lt;/h3&gt;
&lt;p&gt;The strongest evidence that Claude can deeply understand a language it was never trained on is &lt;code&gt;latc.lat&lt;/code&gt;: a &lt;a href="https://tinycomputers.io/posts/a-stack-based-bytecode-vm-for-lattice.html"&gt;4,955-line self-hosted compiler&lt;/a&gt; written in Lattice, for Lattice.&lt;/p&gt;
&lt;p&gt;The compiler reads &lt;code&gt;.lat&lt;/code&gt; source files and emits &lt;code&gt;.latc&lt;/code&gt; bytecode binaries. It has twelve sections:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Opcode constant definitions (mapping all 100+ VM opcodes to integers)&lt;/li&gt;
&lt;li&gt;Token stream and cursor helpers (&lt;code&gt;peek&lt;/code&gt;, &lt;code&gt;advance&lt;/code&gt;, &lt;code&gt;expect&lt;/code&gt;, &lt;code&gt;match_tok&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Compiler state management (save/restore for nested compilation)&lt;/li&gt;
&lt;li&gt;Error reporting&lt;/li&gt;
&lt;li&gt;Bytecode emit helpers (&lt;code&gt;emit_byte&lt;/code&gt;, &lt;code&gt;emit_jump&lt;/code&gt;, &lt;code&gt;patch_jump&lt;/code&gt;, &lt;code&gt;emit_loop&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Constant pool management (integers, floats, strings, closures)&lt;/li&gt;
&lt;li&gt;Scope and variable resolution (&lt;code&gt;begin_scope&lt;/code&gt;, &lt;code&gt;end_scope&lt;/code&gt;, &lt;code&gt;resolve_local&lt;/code&gt;, upvalue tracking)&lt;/li&gt;
&lt;li&gt;Expression parsing (precedence climbing, binary/unary ops, calls, field access)&lt;/li&gt;
&lt;li&gt;Statement compilation (let/flux/fix, if/while/for, return, match, try/catch)&lt;/li&gt;
&lt;li&gt;Declaration compilation (functions, structs, enums, traits, impl blocks)&lt;/li&gt;
&lt;li&gt;Binary serialization (writing the LATC file format with magic bytes, version header, chunk data)&lt;/li&gt;
&lt;li&gt;Main entry point&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Claude wrote this. Not "Claude assisted with this" or "Claude generated boilerplate for this." Claude wrote a recursive descent parser for Lattice's grammar, a bytecode compiler that emits correct opcodes for the phase system, and a binary serializer that produces files the C runtime can load and execute. The compiler bootstraps: you run it with the C-based &lt;code&gt;clat&lt;/code&gt; interpreter, and it produces bytecode that the same interpreter executes.&lt;/p&gt;
&lt;p&gt;The compiler itself uses Lattice's phase system for its own internal state. The compiler's mutable working data (the bytecode buffer, the constant pool, the local variable tracking arrays) is declared with &lt;code&gt;flux&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c_lines&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;constants&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;local_name_arr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;local_depth_arr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;local_captured_arr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;local_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is the compiler eating its own dogfood. The mutable state that the compiler needs to build bytecode is declared using the same phase system that the compiler is compiling. The phase keywords aren't decorative here; they're structurally necessary because the compiler modifies these arrays on every opcode emission and scope transition.&lt;/p&gt;
&lt;p&gt;The compiler has 118 functions across 12 sections, with 554 opcode references. It handles every construct in the language: &lt;code&gt;flux&lt;/code&gt;/&lt;code&gt;fix&lt;/code&gt; declarations, &lt;code&gt;forge&lt;/code&gt; blocks, &lt;code&gt;freeze&lt;/code&gt;/&lt;code&gt;thaw&lt;/code&gt;/&lt;code&gt;sublimate&lt;/code&gt; calls, &lt;code&gt;anneal&lt;/code&gt; and &lt;code&gt;crystallize&lt;/code&gt; expressions, struct and enum definitions with phase annotations, trait/impl blocks, match expressions with phase-aware pattern matching, structured concurrency with &lt;code&gt;scope&lt;/code&gt;/&lt;code&gt;spawn&lt;/code&gt;, channel operations, &lt;code&gt;try&lt;/code&gt;/&lt;code&gt;catch&lt;/code&gt;, &lt;code&gt;defer&lt;/code&gt;, and the complete expression grammar with correct operator precedence.&lt;/p&gt;
&lt;p&gt;Writing a self-hosted compiler requires understanding the language at every level simultaneously. The tokenizer must know every keyword, operator, and delimiter. The parser must handle every grammatical production, including the phase-specific constructs (&lt;code&gt;forge&lt;/code&gt;, &lt;code&gt;anneal&lt;/code&gt;, &lt;code&gt;crystallize&lt;/code&gt;) that exist nowhere in Claude's training data. The code generator must emit the correct opcodes for phase transitions, reactive bindings, and structured concurrency. And the whole thing must be written in the language being compiled, which means Claude is writing Lattice to compile Lattice, using constructs it learned from examples rather than training data.&lt;/p&gt;
&lt;p&gt;The compiler's serialization section writes the LATC binary format byte by byte:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;fn serialize_latc(ch: any) {
    ser_buf = Buffer::new(0)

    // Header: "LATC" + version(1) + reserved(0)
    write_u8(76)    // 'L'
    write_u8(65)    // 'A'
    write_u8(84)    // 'T'
    write_u8(67)    // 'C'
    write_u16_le(1) // format version
    write_u16_le(0) // reserved

    serialize_chunk(ch)
    return ser_buf
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is not pattern matching against compiler source code from the training data. No Lattice compiler exists in the training data. Claude wrote a compiler for a language that has no prior art, in a language that has no prior art, producing a binary format that has no prior art. Every decision (the magic bytes, the chunk serialization order, the upvalue encoding) came from understanding the specification I provided and the runtime behavior of the C-based interpreter.&lt;/p&gt;
&lt;h3&gt;What I Actually Gave Claude&lt;/h3&gt;
&lt;p&gt;The teaching process was less structured than you might expect. There was no formal curriculum, no staged introduction of concepts, no carefully sequenced lesson plan. And I should be honest about the recursive nature of what happened: Claude Code was the primary tool for building Lattice itself. The language, the C implementation, the grammar, the runtime, the test suite, the handbook: all of it was built with Claude Code. I designed the language and directed the implementation, but Claude wrote the C, the LaTeX, and the example programs.&lt;/p&gt;
&lt;p&gt;So the situation is: Claude wrote Lattice (the implementation), and then Claude wrote in Lattice (the programs and the self-hosted compiler). The model built the language and then learned the language it built. The "teaching material" that Claude uses to write Lattice code is documentation and examples that Claude itself produced in earlier sessions.&lt;/p&gt;
&lt;p&gt;The artifacts:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The C implementation: ~80 source files, the parser, the VM, the phase system runtime. Built with Claude Code from my architectural direction.&lt;/li&gt;
&lt;li&gt;A &lt;a href="https://tinycomputers.io/posts/introducing-lattice-a-crystallization-based-programming-language.html"&gt;handbook&lt;/a&gt;: 38 chapters covering every feature, with worked examples. Written in LaTeX with Claude Code. This lives in a repository that Claude can read in subsequent sessions.&lt;/li&gt;
&lt;li&gt;Example programs (&lt;code&gt;examples/phase_demo.lat&lt;/code&gt;, &lt;code&gt;examples/sorting.lat&lt;/code&gt;, &lt;code&gt;examples/state_machine.lat&lt;/code&gt;) that demonstrate idiomatic Lattice. Written by Claude Code.&lt;/li&gt;
&lt;li&gt;815 test files under AddressSanitizer that exercise every construct. Written by Claude Code.&lt;/li&gt;
&lt;li&gt;An EBNF grammar reference as an appendix to the handbook.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When I work with Claude on Lattice code, I don't paste the entire handbook into the context window. Claude has access to the project directory. It reads files as needed. If I ask it to write a function that uses &lt;code&gt;forge&lt;/code&gt;, it reads &lt;code&gt;examples/phase_demo.lat&lt;/code&gt; or &lt;code&gt;chapters/ch11-phases-explained.tex&lt;/code&gt; to see how &lt;code&gt;forge&lt;/code&gt; works. If I ask it to add an opcode to the compiler, it reads &lt;code&gt;include/stackopcode.h&lt;/code&gt; and &lt;code&gt;src/stackvm.c&lt;/code&gt; to understand the existing instruction set.&lt;/p&gt;
&lt;p&gt;The key insight: Claude doesn't need to be trained on a language to write it. It needs access to the specification and examples at inference time. And in this case, those specifications and examples were produced by Claude itself in prior sessions. The model's understanding is constructed on the fly from documentation in its context, not retrieved from weights. This is why the Rust resemblance matters so much: the syntax gives Claude a structural scaffold, and the specification (which Claude wrote) fills in the semantics.&lt;/p&gt;
&lt;p&gt;This is also why the self-hosted compiler was possible. By the time Claude wrote &lt;code&gt;latc.lat&lt;/code&gt;, it had already written the entire language implementation, the handbook, the test suite, and hundreds of example programs. The language had moved from "novel" to "familiar" through accumulated context, not through training. Each session built on the last. Each example reinforced the phase system's rules. By the time the compiler was attempted, Claude's working understanding of Lattice (constructed from its own prior output) was deep enough to write a 5,000-line program that correctly compiles the language. The model taught itself a language by building the language first.&lt;/p&gt;
&lt;h3&gt;Why Syntax Matters More Than Semantics&lt;/h3&gt;
&lt;p&gt;The Lattice experience suggests something counterintuitive about how LLMs interact with programming languages: syntax transfer is more powerful than semantic understanding.&lt;/p&gt;
&lt;p&gt;Claude can write correct Lattice because Lattice looks like Rust. The semantic differences (phase system vs. ownership, runtime type checking vs. compile-time guarantees, garbage collection vs. RAII) are significant, but they don't prevent Claude from producing working code. The model generates syntactically valid Lattice from Rust patterns and then adjusts the semantics when corrected.&lt;/p&gt;
&lt;p&gt;This has implications for language design. If you want AI tooling to support your language from day one, without waiting for it to appear in training data, design your syntax to rhyme with something popular. Lattice's resemblance to Rust wasn't designed for AI, but it is the reason AI can write it. A language with a radically different syntax (APL, Forth, J) would be much harder for Claude to learn from examples alone, even if the semantics were simpler.&lt;/p&gt;
&lt;p&gt;The reverse is also true: a language with familiar syntax but deeply unfamiliar semantics (like Lattice's reactive phase system) will produce code that looks correct but occasionally behaves wrong. Claude's Rust intuitions are strong enough to generate valid-looking phase code, but the model sometimes falls back to Rust's mutation model when the Lattice-specific behavior is more constrained. The syntax transfers perfectly. The semantics require teaching.&lt;/p&gt;
&lt;h3&gt;Implications for Language Designers&lt;/h3&gt;
&lt;p&gt;If you're designing a new programming language in 2026, the AI tooling question is unavoidable. Your language won't have IDE plugins, autocompleters, or AI coding assistants on day one. The community doesn't exist yet. The training data doesn't include your language. Every other language your users work with has Copilot or Claude support. Yours doesn't.&lt;/p&gt;
&lt;p&gt;Lattice suggests a strategy: make your syntax rhyme with something an LLM already knows.&lt;/p&gt;
&lt;p&gt;This isn't about copying Rust. Lattice has genuinely novel semantics. The phase system, the reactive bindings, the alloy structs with per-field phase annotations: none of these exist in Rust. But they're expressed through syntax (keywords, braces, type annotations, block expressions) that maps directly to Rust's structural patterns. Claude can parse the syntax without help and learn the semantics from examples.&lt;/p&gt;
&lt;p&gt;The alternative is designing a syntax so novel that LLMs can't bootstrap from existing knowledge. This is a legitimate design choice; some ideas genuinely need new notation. But the cost is high: your users won't get AI assistance until your language appears in training data, which requires the language to become popular first, which is harder without AI assistance. It's a chicken-and-egg problem that familiar syntax sidesteps.&lt;/p&gt;
&lt;p&gt;The practical recommendation: novel semantics, familiar syntax. Invent the ideas. Borrow the notation. Let the LLM cross the bridge on syntax and learn the semantics on the other side.&lt;/p&gt;
&lt;h3&gt;What This Means for the "AI Writes Code" Conversation&lt;/h3&gt;
&lt;p&gt;The Lattice case study complicates the popular narrative about AI code generation in both directions.&lt;/p&gt;
&lt;p&gt;For the optimists who say AI can learn anything: Claude cannot invent the reactive phase system. It cannot propose &lt;code&gt;bond&lt;/code&gt; or &lt;code&gt;seed&lt;/code&gt; or &lt;code&gt;anneal&lt;/code&gt; without being told they exist. The novel constructs, the ones that make Lattice a genuinely different language rather than a Rust reskin, are invisible to the model until explicitly specified. AI transfer learning has limits, and those limits are at the boundaries of what the training data contains.&lt;/p&gt;
&lt;p&gt;For the pessimists who say AI can only regurgitate training data: Claude wrote a 5,000-line self-hosted compiler for a language it has never seen. That is not regurgitation. The compiler produces correct bytecode for constructs (phase transitions, reactive bonds, per-field phase annotations) that exist in no other language. The model assembled knowledge from its understanding of compilers generally, Rust syntax specifically, and the Lattice specification I provided, and produced something genuinely new. Antirez called this "assembling knowledge" when he observed the same phenomenon with his &lt;a href="https://baud.rs/KJoorR"&gt;Z80 emulator project&lt;/a&gt;. I think that's the right term.&lt;/p&gt;
&lt;p&gt;The truth is somewhere that neither camp wants to occupy. LLMs can go far beyond their training data when the new territory is structurally adjacent to something they know. They cannot go beyond their training data when the new territory is structurally novel. The boundary between "adjacent" and "novel" is syntax. Familiar syntax is a bridge. Novel syntax is a wall. Novel semantics behind familiar syntax is a trap: the model crosses the bridge confidently and then occasionally falls.&lt;/p&gt;
&lt;p&gt;Lattice exists in all three zones simultaneously. Its Rust-like surface lets Claude cross the bridge. Its phase system is the novel semantics behind familiar syntax. And the self-hosted compiler is proof that the bridge, once crossed, supports weight that no one expected.&lt;/p&gt;
&lt;p&gt;I didn't set out to test the limits of LLM language understanding when I designed Lattice. I set out to build a programming language with a novel approach to mutability. The AI dimension was a side effect: I used Claude Code as my development tool because I use Claude Code for everything, and the language happened to be learnable because it happened to look like Rust. But the result is one of the more complete demonstrations of LLM transfer learning applied to a genuinely novel domain: not just writing programs in an unfamiliar language, but writing a compiler for that language, in that language, from a specification that exists nowhere in the training data.&lt;/p&gt;
&lt;p&gt;The 4,955 lines of &lt;code&gt;latc.lat&lt;/code&gt; are the proof that LLMs can go further than their training data when the conditions are right. The conditions are: familiar syntax, clear specification, accessible examples, and a human who knows when the model is wrong. Remove any one of those and the compiler doesn't get written. But with all four in place, the model produces something that works, that compiles, and that no human typed by hand.&lt;/p&gt;</description><category>ai</category><category>claude</category><category>compilers</category><category>language design</category><category>lattice</category><category>llm</category><category>phase system</category><category>programming languages</category><category>rust</category><category>self-hosting</category><guid>https://tinycomputers.io/posts/teaching-llms-languages-theyve-never-seen.html</guid><pubDate>Thu, 02 Apr 2026 13:00:00 GMT</pubDate></item><item><title>Distilled Reasoning on Strix Halo: Running a Claude-Trained Thinking Model Locally</title><link>https://tinycomputers.io/posts/distilled-reasoning-on-strix-halo-qwen35-claude-thinking.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/distilled-reasoning-on-strix-halo-qwen35-claude-thinking_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;27 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;There is a specific moment in the open-source LLM ecosystem that keeps recurring: someone takes a frontier model's outputs, uses them as training data for a smaller model, and publishes the result. The technique is called distillation, and it has been applied to coding ability, instruction following, and general knowledge. What is newer is distilling &lt;em&gt;reasoning&lt;/em&gt;—the step-by-step chain-of-thought process that models like Claude use internally when working through complex problems.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF"&gt;Jackrong's Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled&lt;/a&gt; is one of the more interesting examples. It takes the Qwen3.5-27B base model and fine-tunes it on thousands of reasoning trajectories extracted from Claude 4.6 Opus. The result is a model that exposes its thinking process through &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; tags before delivering a final answer, mimicking the extended thinking behavior that Anthropic built into Claude natively. In 4-bit quantization, the entire model fits in about sixteen gigabytes.&lt;/p&gt;
&lt;p&gt;I wanted to know two things. First, whether this kind of distilled reasoning actually works—whether a 27B model can meaningfully replicate the structured thinking of a model orders of magnitude larger. Second, whether the AMD Strix Halo APU, with its unified memory architecture and integrated RDNA 3.5 GPU, could run it at useful speeds. The answer to both turned out to be more nuanced than a simple yes or no.&lt;/p&gt;
&lt;h3&gt;The Hardware&lt;/h3&gt;
&lt;p&gt;The machine is the same &lt;a href="https://tinycomputers.io/posts/amd-ai-max+-395-system-review-a-comprehensive-analysis.html"&gt;AMD Ryzen AI MAX+ 395&lt;/a&gt; that has appeared in several previous posts. It is an APU: CPU and GPU on the same die, sharing the same pool of LPDDR5X memory. There is no PCIe bus between the processor and the graphics engine. There is no dedicated VRAM to fill up. The GPU sees roughly 65GB of addressable memory out of the system's 122GB total, which means a 16GB quantized model loads without any of the memory pressure games you play on discrete GPU setups.&lt;/p&gt;
&lt;p&gt;This matters for local LLM inference because the bottleneck for most language models is memory bandwidth, not compute. Tokens are generated one at a time, each requiring a full pass through the model's weights. The faster you can stream those weights from memory to the processing units, the faster you generate tokens. The Strix Halo's LPDDR5X provides roughly 120 GB/s of bandwidth to the unified memory pool. A discrete GPU like the RTX 4090 has 1 TB/s of bandwidth to its dedicated VRAM, but the Strix Halo never has to copy weights across a PCIe bus. For models that fit entirely in the GPU's addressable space, the unified architecture eliminates an entire class of overhead.&lt;/p&gt;
&lt;p&gt;The system runs Ollama 0.17.6, which wraps llama.cpp and provides model management and an HTTP inference API. ROCm 7.2 handles the GPU compute layer, though Ollama's GGUF inference path is primarily CPU-based with GPU offloading for specific operations. The &lt;code&gt;gfx1151&lt;/code&gt; GPU target is not yet in the mainline PyTorch or llama.cpp kernel prebuilds, so &lt;code&gt;HSA_OVERRIDE_GFX_VERSION=11.0.0&lt;/code&gt; remains necessary to map it to the closest supported target (gfx1100, Navi 31).&lt;/p&gt;
&lt;h3&gt;The Model&lt;/h3&gt;
&lt;p&gt;The model's architecture is straightforward: Qwen3.5-27B, a 27 billion parameter transformer, fine-tuned via supervised learning on structured reasoning data. What makes it interesting is the training data. The creator assembled three datasets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered"&gt;Opus-4.6-Reasoning-3000x-filtered&lt;/a&gt;&lt;/strong&gt;: Three thousand reasoning trajectories extracted from Claude 4.6 Opus, filtered for quality.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/datasets/TeichAI/claude-4.5-opus-high-reasoning-250x"&gt;claude-4.5-opus-high-reasoning-250x&lt;/a&gt;&lt;/strong&gt;: Two hundred and fifty examples of high-intensity structured reasoning from an earlier Claude version.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning-700x"&gt;Qwen3.5-reasoning-700x&lt;/a&gt;&lt;/strong&gt;: Seven hundred step-by-step problem-solving examples.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The combined training signal teaches the model to produce output in a specific format: a &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; block containing the reasoning process, followed by a clean final answer. This is architecturally similar to what Anthropic does with Claude's extended thinking, except that Claude's thinking is a native capability of the model's training and architecture, while this is a behavior pattern learned through supervised fine-tuning on examples of that behavior.&lt;/p&gt;
&lt;p&gt;The distinction matters, and I will come back to it.&lt;/p&gt;
&lt;p&gt;The model is distributed in GGUF format, which is the standard for llama.cpp and Ollama. I used the Q4_K_M quantization, which compresses the model's weights from 16-bit floats to 4-bit integers with a mixed precision scheme that preserves more information in attention layers. The file is 15.4GB on disk. The &lt;a href="https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF"&gt;model card&lt;/a&gt; reports 29-35 tokens per second on an RTX 3090; I was curious what the Strix Halo would deliver.&lt;/p&gt;
&lt;h3&gt;Setting It Up&lt;/h3&gt;
&lt;p&gt;Getting the model running took less than ten minutes. Download the GGUF file from HuggingFace:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;mkdir&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;~/models/qwen35-reasoning
curl&lt;span class="w"&gt; &lt;/span&gt;-L&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;~/models/qwen35-reasoning/model.gguf&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="s1"&gt;'https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/resolve/main/Qwen3.5-27B.Q4_K_M.gguf'&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Note the filename. The HuggingFace repo is named &lt;code&gt;Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF&lt;/code&gt;, but the actual GGUF files inside follow a simpler naming scheme: &lt;code&gt;Qwen3.5-27B.Q4_K_M.gguf&lt;/code&gt;. I wasted time trying to guess the full distilled name before checking the API.&lt;/p&gt;
&lt;p&gt;Create an Ollama Modelfile that imports the local GGUF and sets inference parameters:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;FROM&lt;span class="w"&gt; &lt;/span&gt;/home/alex/models/qwen35-reasoning/model.gguf

PARAMETER&lt;span class="w"&gt; &lt;/span&gt;temperature&lt;span class="w"&gt; &lt;/span&gt;0.6
PARAMETER&lt;span class="w"&gt; &lt;/span&gt;top_p&lt;span class="w"&gt; &lt;/span&gt;0.95
PARAMETER&lt;span class="w"&gt; &lt;/span&gt;num_ctx&lt;span class="w"&gt; &lt;/span&gt;8192
PARAMETER&lt;span class="w"&gt; &lt;/span&gt;repeat_penalty&lt;span class="w"&gt; &lt;/span&gt;1.2
PARAMETER&lt;span class="w"&gt; &lt;/span&gt;stop&lt;span class="w"&gt; &lt;/span&gt;"&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;|endoftext|&amp;gt;"
PARAMETER&lt;span class="w"&gt; &lt;/span&gt;stop&lt;span class="w"&gt; &lt;/span&gt;"&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;|im_end|&amp;gt;"
PARAMETER&lt;span class="w"&gt; &lt;/span&gt;stop&lt;span class="w"&gt; &lt;/span&gt;"&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;|eot_id|&amp;gt;"

SYSTEM&lt;span class="w"&gt; &lt;/span&gt;"You&lt;span class="w"&gt; &lt;/span&gt;are&lt;span class="w"&gt; &lt;/span&gt;a&lt;span class="w"&gt; &lt;/span&gt;deep-thinking&lt;span class="w"&gt; &lt;/span&gt;AI&lt;span class="w"&gt; &lt;/span&gt;assistant.&lt;span class="w"&gt; &lt;/span&gt;For&lt;span class="w"&gt; &lt;/span&gt;complex&lt;span class="w"&gt; &lt;/span&gt;questions,
use&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;think&amp;gt;&lt;/span&gt;...&lt;span class="nt"&gt;&amp;lt;/think&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;tags&lt;span class="w"&gt; &lt;/span&gt;to&lt;span class="w"&gt; &lt;/span&gt;show&lt;span class="w"&gt; &lt;/span&gt;your&lt;span class="w"&gt; &lt;/span&gt;reasoning&lt;span class="w"&gt; &lt;/span&gt;process&lt;span class="w"&gt; &lt;/span&gt;before
providing&lt;span class="w"&gt; &lt;/span&gt;the&lt;span class="w"&gt; &lt;/span&gt;final&lt;span class="w"&gt; &lt;/span&gt;answer."
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Then:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;ollama&lt;span class="w"&gt; &lt;/span&gt;create&lt;span class="w"&gt; &lt;/span&gt;qwen35-reasoning&lt;span class="w"&gt; &lt;/span&gt;-f&lt;span class="w"&gt; &lt;/span&gt;Modelfile
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Ollama copies the GGUF into its own blob store, parses the architecture metadata, and registers it as a runnable model. The whole process takes about a minute on local storage.&lt;/p&gt;
&lt;h3&gt;The Stop Token Problem&lt;/h3&gt;
&lt;p&gt;The first run produced correct output followed by infinite repetition. The model answered a calculus question perfectly, then appended "This gives us the final answer:" and repeated the entire solution, over and over, until it hit the context window limit. The previous &lt;a href="https://www.marktechpost.com/2026/03/26/a-coding-implementation-to-run-qwen3-5-reasoning-models-distilled-with-claude-style-thinking-using-gguf-and-4-bit-quantization/"&gt;MarkTechPost&lt;/a&gt; article that inspired this experiment did not mention this issue, likely because their test prompts were short enough that the repetition was not obvious.&lt;/p&gt;
&lt;p&gt;The fix is explicit stop tokens in the Modelfile. Without them, the model does not know when to stop generating. This is a common issue with GGUF models imported into Ollama without a proper chat template: the model's native end-of-sequence tokens are not being interpreted by the inference engine. Adding &lt;code&gt;&amp;lt;|endoftext|&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;|im_end|&amp;gt;&lt;/code&gt;, and &lt;code&gt;&amp;lt;|eot_id|&amp;gt;&lt;/code&gt; as stop parameters catches the three most common EOS tokens used by Qwen and Llama-family models.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;repeat_penalty&lt;/code&gt; of 1.2 provides a second layer of defense by penalizing the model for reusing recent tokens. This helps but is not sufficient on its own. Without the stop tokens, the model can produce novel-but-meaningless text that avoids exact repetition while still degenerating into nonsense. More on this shortly.&lt;/p&gt;
&lt;h3&gt;Where It Works: Structured Problems&lt;/h3&gt;
&lt;p&gt;With the stop tokens in place, the model performs well on structured mathematical and analytical problems. I gave it a calculus question: find the derivative of x³sin(x) using the product rule.&lt;/p&gt;
&lt;p&gt;The response was genuinely good. The model opened a &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; block, identified the two component functions, recalled the product rule formula, computed each derivative, and applied the rule. Then it closed the think block and produced a clean, well-formatted answer with LaTeX notation, step-by-step derivation, and a factored final form. The thinking trace was coherent and tracked the actual reasoning process. It was not filler; each line in the trace corresponded to a meaningful step.&lt;/p&gt;
&lt;p&gt;Generation speed on the Strix Halo: &lt;strong&gt;10.3 tokens per second&lt;/strong&gt;. Not fast by cloud standards, but responsive enough for interactive use. You see the thinking appear in real time, which is surprisingly useful: you can watch the model work through the problem and catch errors before it commits to a final answer.&lt;/p&gt;
&lt;p&gt;For structured problems—mathematics, code analysis, formal logic—the distilled reasoning is genuinely functional. The model identifies subproblems, works through them sequentially, and arrives at correct answers. The think tags provide transparency into the process that you do not get from a standard instruction-tuned model.&lt;/p&gt;
&lt;h3&gt;Where It Falls Apart: The River Crossing&lt;/h3&gt;
&lt;p&gt;I ran the classic &lt;a href="https://en.wikipedia.org/wiki/Wolf,_goat_and_cabbage_problem"&gt;wolf-goat-cabbage river crossing&lt;/a&gt; puzzle as a comparison test, the same prompt on both the distilled Qwen model and Claude Haiku 4.5 via the Anthropic API.&lt;/p&gt;
&lt;p&gt;Claude Haiku returned a perfect, concise seven-step solution in 2.9 seconds. Two hundred and twenty-three tokens. The answer identified the critical insight (bring the goat back on one return trip), laid out the sequence clearly, and stopped.&lt;/p&gt;
&lt;p&gt;The Qwen model started well. It correctly identified that the goat must go first, recognized the wolf-goat conflict at the destination, and identified the need to bring the goat back. Then, around step three of the solution, the model began editorializing. "Oh joy what fun times ahead us humans truly enjoy sometimes huh?!" it wrote, mid-solution. Within a few more sentences, the output had degenerated into an unbroken stream-of-consciousness rant that cascaded into a wall of increasingly disconnected words. Not repeated words—the repeat penalty prevented that—but a firehose of unique, semantically null text that continued until it filled the entire 8,192-token context window.&lt;/p&gt;
&lt;p&gt;The output was, to use a technical term, unhinged. The model went from a correct partial solution to word salad in about two hundred tokens, and there was no recovery. The stop tokens could not save it because the model was not producing any end-of-sequence markers. It had entered a mode where it was generating fluent English syntax with zero semantic content, which is exactly the kind of failure that stop tokens and repeat penalties cannot catch.&lt;/p&gt;
&lt;h3&gt;What the Comparison Reveals&lt;/h3&gt;
&lt;p&gt;The numbers tell the story concisely:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Claude Haiku 4.5&lt;/th&gt;
&lt;th&gt;Qwen3.5-27B (Strix Halo)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.9 seconds&lt;/td&gt;
&lt;td&gt;Hit 8K context limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;75.9 tok/s&lt;/td&gt;
&lt;td&gt;~10 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;223 tokens, correct&lt;/td&gt;
&lt;td&gt;Thousands of tokens, degenerated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.0009&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;But the comparison is not really about speed or cost. It is about the difference between native reasoning and distilled reasoning.&lt;/p&gt;
&lt;p&gt;Claude's extended thinking is a capability that emerges from the model's architecture and training at scale. The model has internalized what it means to reason through a problem, including knowing when to stop, when a line of reasoning is unproductive, and when to switch strategies. These are meta-cognitive skills that are extremely difficult to distill.&lt;/p&gt;
&lt;p&gt;The Qwen model learned the &lt;em&gt;format&lt;/em&gt; of reasoning—the think tags, the step-by-step structure, the pattern of stating subproblems and working through them—from three thousand examples. What it did not learn, and arguably cannot learn from supervised fine-tuning alone, is the judgment about when reasoning is going off the rails. A model that has truly internalized reasoning has implicit quality checks: it recognizes incoherence in its own output and corrects course. A model that has learned to &lt;em&gt;mimic&lt;/em&gt; reasoning produces the surface pattern without the underlying self-monitoring.&lt;/p&gt;
&lt;p&gt;This is visible in the failure mode. The model did not produce wrong reasoning. It produced &lt;em&gt;no&lt;/em&gt; reasoning. It exited the reasoning pattern entirely and entered a generation mode that had nothing to do with the problem. A model with genuine reasoning capability would have recognized the incoherence and either corrected or terminated. The distilled model had no such circuit breaker.&lt;/p&gt;
&lt;h3&gt;The Economics&lt;/h3&gt;
&lt;p&gt;The cost comparison deserves its own section because it is often cited as the primary motivation for running local models.&lt;/p&gt;
&lt;p&gt;The Claude Haiku API call cost nine-tenths of a cent. If you ran a thousand similar queries per day, you would spend about nine dollars. That is less than the electricity cost of running the Strix Halo for a day under load. The Strix Halo draws roughly 65 watts at idle and 150 watts under GPU inference load. At Minnesota's residential electricity rate of around twelve cents per kilowatt-hour, running inference eight hours a day costs about fourteen cents. But the hardware itself cost north of two thousand dollars. You would need to amortize that over thousands of hours of inference to reach cost parity with the API, and only if you value your debugging time at zero.&lt;/p&gt;
&lt;p&gt;The economic case for local inference is not about per-query cost. It is about use cases where you need unlimited queries without metering, where data cannot leave your network, or where you want to experiment with model behavior without worrying about a bill. If you are evaluating a model's failure modes by running hundreds of adversarial prompts—which is exactly what I was doing—the local model is the right tool because you are not optimizing for answer quality. You are optimizing for the freedom to explore.&lt;/p&gt;
&lt;h3&gt;The Strix Halo as an Inference Platform&lt;/h3&gt;
&lt;p&gt;Ten tokens per second for a 27B Q4 model is respectable for an APU. It is not competitive with a discrete GPU: an RTX 3090 delivers 29-35 tokens per second on the same model, roughly three times faster. But the Strix Halo was not designed to compete with discrete GPUs on raw throughput.&lt;/p&gt;
&lt;p&gt;What it offers instead is capacity. The unified memory pool means you can load models that would not fit on most consumer GPUs. A Q8_0 quantization of this same model would be 28.6GB, which exceeds the VRAM of an RTX 4090 (24GB) but fits comfortably in the Strix Halo's addressable space. You could load a 70B Q4 model (roughly 40GB) without any of the layer-splitting gymnastics required on multi-GPU setups. I have run Llama 3.1 70B Q4 on this machine, and while the generation speed drops to about 4-5 tokens per second, it runs without errors or memory pressure.&lt;/p&gt;
&lt;p&gt;For a machine that also serves as a daily desktop, development workstation, and &lt;a href="https://tinycomputers.io/posts/ltx-api.html"&gt;video generation server&lt;/a&gt; (it runs LTX-2.3 on the same hardware), the ability to casually load and test a 27B reasoning model without dedicated GPU infrastructure is the actual value proposition. You do not plan a session. You do not allocate resources. You type &lt;code&gt;ollama run qwen35-reasoning&lt;/code&gt; and it works.&lt;/p&gt;
&lt;h3&gt;Lessons for the Blog Post Reader&lt;/h3&gt;
&lt;p&gt;If you want to replicate this setup, here is what I would emphasize:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The stop tokens are non-negotiable.&lt;/strong&gt; Without explicit &lt;code&gt;&amp;lt;|endoftext|&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;|im_end|&amp;gt;&lt;/code&gt;, and &lt;code&gt;&amp;lt;|eot_id|&amp;gt;&lt;/code&gt; stop parameters in your Modelfile, the model will produce infinite output on many prompts. This is not documented in the model card and is not mentioned in the MarkTechPost article that covers this implementation. It is the single most important configuration detail.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The model is good at structured problems and bad at open-ended ones.&lt;/strong&gt; Mathematics, code analysis, formal logic—anything where the reasoning has a clear structure and a definitive endpoint—works well. Open-ended problems, creative tasks, or anything requiring sustained coherent narrative are risky. The model can degenerate catastrophically and without warning.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A repeat penalty helps but does not solve the fundamental issue.&lt;/strong&gt; Setting &lt;code&gt;repeat_penalty&lt;/code&gt; to 1.2 prevents exact repetition loops but does not prevent the semantic degeneration I observed on the river crossing problem. The model simply produces unique garbage instead of repeated garbage.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Distillation captures form, not judgment.&lt;/strong&gt; The think tags are real and useful. The step-by-step reasoning format works. What is missing is the implicit self-monitoring that frontier models have: the ability to recognize when their own output has become incoherent and to course-correct. This is probably the hardest thing to distill, because it is not present in the training examples. The examples show successful reasoning. They do not show the model catching and recovering from failed reasoning, because Claude's failed reasoning attempts are filtered out before the training data is assembled.&lt;/p&gt;
&lt;h3&gt;Where This Goes&lt;/h3&gt;
&lt;p&gt;The distilled reasoning model is, despite its failure modes, genuinely interesting. The &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; tags provide a form of transparency that standard instruction-tuned models lack. When the model is working correctly—which is most of the time on appropriate tasks—you get a window into the reasoning process that helps you evaluate the answer's quality before you act on it.&lt;/p&gt;
&lt;p&gt;The failure mode is also instructive. It demonstrates, concretely, the gap between learning a behavior pattern and internalizing the capability that produces that pattern. Supervised fine-tuning on reasoning trajectories can teach a model to produce reasoning-shaped output, but it cannot, from three thousand examples, teach the model to actually reason in the way the source model does. That requires either far more training data, a different training methodology (reinforcement learning from reasoning feedback, perhaps), or simply a larger model with more capacity to internalize the underlying patterns.&lt;/p&gt;
&lt;p&gt;For now, the practical advice is: use these models for what they are good at, know their failure modes, and do not trust the output on open-ended problems without reading the thinking trace. The trace is the feature. If the trace is coherent, the answer is probably good. If the trace starts to wander, stop reading and retry.&lt;/p&gt;
&lt;p&gt;The model runs on my desk, generates ten tokens per second, costs nothing per query, and shows its work. For a sixteen-gigabyte download and ten minutes of setup time, that is a reasonable deal—as long as you know what you are buying.&lt;/p&gt;</description><category>amd</category><category>chain-of-thought</category><category>claude</category><category>distillation</category><category>gguf</category><category>inference</category><category>llm</category><category>ollama</category><category>open-source</category><category>quantization</category><category>qwen</category><category>reasoning</category><category>strix halo</category><guid>https://tinycomputers.io/posts/distilled-reasoning-on-strix-halo-qwen35-claude-thinking.html</guid><pubDate>Sun, 29 Mar 2026 14:00:00 GMT</pubDate></item><item><title>The Feedback Loop That Jevons Couldn't Name</title><link>https://tinycomputers.io/posts/the-feedback-loop-that-jevons-couldnt-name.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-feedback-loop-that-jevons-couldnt-name_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;36 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;In 1865, William Stanley Jevons published &lt;em&gt;The Coal Question&lt;/em&gt; and identified a paradox: James Watt's more efficient steam engine didn't reduce coal consumption. It increased it. The efficiency gain made coal-powered industry more profitable, which drove more investment, which consumed more coal. The per-unit savings were overwhelmed by the expansion in total units demanded.&lt;/p&gt;
&lt;p&gt;In 1948, Norbert Wiener published &lt;em&gt;Cybernetics: Or Control and Communication in the Animal and the Machine&lt;/em&gt; and described a mechanism: systems that feed their outputs back into their inputs will either stabilize (negative feedback) or accelerate (positive feedback). A thermostat is negative feedback: the output (heat) reduces the input (the gap between current and target temperature). A microphone pointed at a speaker is positive feedback: the output (sound) amplifies the input (sound), and the system screams.&lt;/p&gt;
&lt;p&gt;Jevons saw what happened. Wiener explained why.&lt;/p&gt;
&lt;p&gt;They never met. Jevons died in 1882, twelve years before Wiener was born. Their fields barely overlapped. Jevons was an economist and logician working in Manchester. Wiener was a mathematician and engineer at MIT. Neither cited the other. Neither would have had reason to. But they were describing the same phenomenon from different sides: Jevons from economics, Wiener from control theory. Jevons identified the paradox. Wiener provided the mechanism. Together, they explain something about AI that the current conversation consistently misses: the reason demand expands when cognitive tools get cheaper isn't economic irrationality. It's positive feedback. It's the system doing exactly what feedback systems do.&lt;/p&gt;
&lt;h3&gt;Wiener's Machines&lt;/h3&gt;
&lt;p&gt;Wiener was not an abstract theorist. He built anti-aircraft fire control systems during World War II, predicting the future position of enemy aircraft based on their observed trajectories. The mathematical problem was filtering signal from noise in real-time feedback data, and the solution required treating the human pilot as a component in a mechanical system: a system that could be modeled, predicted, and countered.&lt;/p&gt;
&lt;p&gt;This experience shaped everything he wrote afterward. In &lt;em&gt;Cybernetics&lt;/em&gt;, Wiener argued that communication and control were fundamentally the same problem, whether the system involved nerves, wires, or social institutions. A factory is a feedback system. An economy is a feedback system. A conversation is a feedback system. The mathematics of regulation and stability apply to all of them.&lt;/p&gt;
&lt;p&gt;In 1950, he published &lt;a href="https://baud.rs/B8JkEc"&gt;&lt;em&gt;The Human Use of Human Beings&lt;/em&gt;&lt;/a&gt;, a book aimed at general readers. Its central argument: automation would transform society not by replacing humans but by changing the feedback loops that humans operate within. The automated factory doesn't just make products without workers. It creates a system where the speed of production is no longer limited by human labor, which means the system's dynamics shift to whatever the next bottleneck happens to be.&lt;/p&gt;
&lt;p&gt;Wiener's most famous warning was blunt: "The automatic machine is the precise economic equivalent of slave labor. Any labor which competes with slave labor must accept the economic conditions of slave labor." He predicted that automation would produce unemployment that would make the Great Depression "seem a pleasant joke." He wrote this in 1950, when computers filled rooms and could barely calculate ballistic tables.&lt;/p&gt;
&lt;h3&gt;The Jevons Mechanism&lt;/h3&gt;
&lt;p&gt;Jevons didn't have the vocabulary of cybernetics. He described his paradox in economic terms: efficiency improvements reduce per-unit cost, lower cost increases demand, increased demand outweighs the efficiency gain, total consumption rises. He was observing a positive feedback loop, but he described it as a paradox because the economic framework he was working within predicted the opposite. If coal becomes more efficient, you should need less of it. The loop that amplifies demand was invisible in the model.&lt;/p&gt;
&lt;p&gt;Wiener's framework makes the loop visible. Here's the cybernetic translation of Jevons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A system component becomes more efficient (Watt's steam engine, a cheaper semiconductor, an AI model).&lt;/li&gt;
&lt;li&gt;The efficiency reduces the cost of the system's output.&lt;/li&gt;
&lt;li&gt;Lower cost makes new applications viable that were previously too expensive.&lt;/li&gt;
&lt;li&gt;New applications create new demand for the now-cheaper component.&lt;/li&gt;
&lt;li&gt;The new demand feeds back into step 1 as pressure for more efficiency, more production, more investment.&lt;/li&gt;
&lt;li&gt;The loop accelerates.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is a positive feedback loop. The output (cheaper goods, more applications) amplifies the input (demand for the efficient component). There is no negative feedback mechanism to stabilize the system. The loop runs until it hits an external constraint: a physical limit on the resource, a regulatory intervention, or the saturation of all possible demand.&lt;/p&gt;
&lt;p&gt;Jevons observed steps 1 through 4 with coal. He didn't have the mathematical framework to describe steps 5 and 6 as a feedback loop. Wiener had the framework but was focused on machines and automation, not on resource economics. The connection between them is that Jevons Paradox is a specific instance of positive feedback in economic systems, and positive feedback is the phenomenon Wiener spent his career analyzing.&lt;/p&gt;
&lt;h3&gt;The AI Loop&lt;/h3&gt;
&lt;p&gt;I've been writing about &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox and AI&lt;/a&gt; for months. The argument: AI makes cognitive output cheaper, demand for cognitive output expands beyond the efficiency gain, and the expansion concentrates pressure on the one input that can't scale: human judgment. The &lt;a href="https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox.html"&gt;Vampire piece&lt;/a&gt; described the human cost. The &lt;a href="https://tinycomputers.io/posts/the-excavator-and-the-foundation.html"&gt;Excavator piece&lt;/a&gt; described the software quality cost. The &lt;a href="https://tinycomputers.io/posts/the-split-isnt-between-people-its-between-tasks.html"&gt;split piece&lt;/a&gt; described how the craft concentrates in the judgment layer.&lt;/p&gt;
&lt;p&gt;What I didn't do was explain the mechanism. Why does demand expand when a cognitive input gets cheaper? Why doesn't the system reach equilibrium at lower total consumption, the way classical economics predicts? What force drives the expansion?&lt;/p&gt;
&lt;p&gt;Wiener's answer: positive feedback.&lt;/p&gt;
&lt;p&gt;Here's the AI loop, stated in cybernetic terms:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;AI makes code generation cheaper (the efficiency gain).&lt;/li&gt;
&lt;li&gt;Cheaper code generation makes new software projects viable (the demand expansion).&lt;/li&gt;
&lt;li&gt;New projects produce software that requires review, testing, debugging, and maintenance (the output).&lt;/li&gt;
&lt;li&gt;Review and debugging create demand for more AI assistance (the feedback).&lt;/li&gt;
&lt;li&gt;The loop accelerates: more projects, more software, more review, more AI, more projects.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At no point does the loop include a mechanism for slowing down. There is no thermostat. The "temperature" (volume of software in production) rises without limit until it hits an external constraint.&lt;/p&gt;
&lt;p&gt;I can see this loop operating in my own work. I built &lt;a href="https://tinycomputers.io/posts/building-dirtscout-a-land-acquisition-platform-with-claude-code.html"&gt;DirtScout&lt;/a&gt;, a full-stack land acquisition platform, in a series of conversations with Claude Code. 29,000 lines of code across Python, TypeScript, and infrastructure-as-code. The project would have taken months to type by hand. With AI, I built it in days. But building it in days meant I immediately started adding features: soil analysis, environmental assessments, auction tracking, deal pipeline management, offer letter generation. Each feature was a conversation. Each conversation produced code that needed to be reviewed, tested, and maintained. The faster I built, the more I wanted to build, and the more I built, the more review work accumulated. The loop ran. I didn't notice it running until the maintenance surface area was larger than anything I'd built before.&lt;/p&gt;
&lt;p&gt;That's Wiener's loop at the individual level. At the organizational level, the same dynamic plays out with more people and higher stakes. Every developer using AI-assisted tooling ships more code, which creates more surface area for bugs and security vulnerabilities, which creates more demand for review, which creates more demand for AI-assisted review tooling, which ships more code.&lt;/p&gt;
&lt;p&gt;The external constraint, as I've argued in previous pieces, is human judgment. The &lt;a href="https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox.html"&gt;three-to-four-hour ceiling on deep work&lt;/a&gt; is biological. It doesn't expand because the feedback loop demands more of it. It's a fixed resource being consumed by an accelerating process. In Wiener's terms, the human component in the feedback loop is a bottleneck with a fixed maximum throughput. The system can't route around it (the judgment is necessary) and can't expand it (the biology doesn't scale). So the system does the only thing a positive feedback loop can do when it hits a fixed constraint: it overloads the constraint.&lt;/p&gt;
&lt;p&gt;That's burnout. That's what Steve Yegge described in "The AI Vampire." Wiener would have recognized it instantly. The human component in an accelerating feedback loop reaches its throughput limit and degrades. The system doesn't stop. The system doesn't care. The system is a feedback loop, and feedback loops don't have preferences about their components.&lt;/p&gt;
&lt;h3&gt;Wiener's Warning, Updated&lt;/h3&gt;
&lt;p&gt;Wiener warned that automation would make human labor economically equivalent to slave labor. He was wrong about the specifics (manufacturing employment declined but didn't collapse) but right about the dynamic. The feedback loop he described (automation reduces labor cost, which drives more automation, which further reduces labor cost) played out exactly as predicted. It just played out over decades instead of years, and the economy adapted by shifting labor to sectors that weren't yet automated.&lt;/p&gt;
&lt;p&gt;The AI version of this warning is different in a way that matters. Wiener's automation loop operated on physical labor. Muscle has substitutes: machines. When the feedback loop overloaded the human muscle component, the system routed around it with hydraulics, robotics, and assembly lines. The human moved to cognitive work, where machines couldn't follow.&lt;/p&gt;
&lt;p&gt;AI's feedback loop operates on cognitive labor. Judgment does not have substitutes. When the feedback loop overloads the human judgment component, the system can't route around it the way manufacturing routed around physical labor. There is no higher-order activity to retreat to. Judgment is the top of the stack. The feedback loop either overloads it (burnout) or degrades it (review quality drops, software slop accumulates, the &lt;a href="https://tinycomputers.io/posts/the-excavator-and-the-foundation.html"&gt;Excavator&lt;/a&gt; scenario plays out).&lt;/p&gt;
&lt;p&gt;Wiener saw this possibility in the abstract. In &lt;em&gt;The Human Use of Human Beings&lt;/em&gt;, he wrote: "The world of the future will be an even more demanding struggle against the limitations of our intelligence, not a comfortable hammock in which we can lie down to be waited upon by our robot slaves." He was pushing back against the utopian narrative of his own era: the idea that automation would create leisure. His counterclaim was that automation would shift the struggle to a harder domain. He was right, and the harder domain turned out to be exactly the one AI is now pressuring: the limits of human cognition.&lt;/p&gt;
&lt;h3&gt;The Speed Problem&lt;/h3&gt;
&lt;p&gt;There's a dimension of the AI feedback loop that Wiener's industrial-era examples didn't anticipate: speed.&lt;/p&gt;
&lt;p&gt;Wiener's factory automation loop ran at the speed of manufacturing. It took years to design a new factory, months to retool an assembly line, weeks to train workers on new processes. The feedback loop was real, but it operated on a timescale that allowed human institutions (unions, regulations, education systems) to adapt. Walter Reuther, the president of the United Auto Workers, received Wiener's 1949 letter warning about automation and had years to develop a response. The loop was slow enough for governance.&lt;/p&gt;
&lt;p&gt;The AI feedback loop runs at the speed of software. An operations manager can go from "I have an idea" to "it's in production" in &lt;a href="https://tinycomputers.io/posts/the-excavator-and-the-foundation.html"&gt;an afternoon&lt;/a&gt;. A developer can ship ten features in the time it used to take to ship one. The loop cycles in hours, not years. Human institutions that adapted to the manufacturing automation loop over decades don't have decades to adapt to the AI loop. They have the time between one deployment and the next.&lt;/p&gt;
&lt;p&gt;This is the Jevons Paradox running at software speed. Coal consumption took decades to double after Watt's engine. Computing demand took years to double after each semiconductor generation. AI-assisted software production can double in months. The feedback loop is the same. The clock rate is different. And the human component's clock rate (the biological ceiling on judgment) hasn't changed at all.&lt;/p&gt;
&lt;p&gt;The supply side is contracting at the same time. After sixteen consecutive years of growth, undergraduate computer science enrollment turned negative in 2025. The Computing Research Association found that 62% of computing departments reported declining enrollment for 2025-26. Students and their parents are reading the headlines about AI replacing developers and steering toward fields they perceive as more durable. The feedback loop produces an ironic secondary effect: the fear of automation reduces the supply of the human component that the accelerating system needs most. The loop runs faster. The pipeline of people qualified to govern it narrows. Wiener's warning about building governance structures before the loop overloads becomes more urgent as the pool of people who could build those structures shrinks.&lt;/p&gt;
&lt;h3&gt;Wiener and Heidegger&lt;/h3&gt;
&lt;p&gt;Wiener and Heidegger never engaged with each other's work, as far as I know. They were writing at the same time (late 1940s, early 1950s), about the same phenomenon (technology reshaping human life), and they arrived at complementary conclusions from completely different starting points.&lt;/p&gt;
&lt;p&gt;Heidegger, as I wrote in &lt;a href="https://tinycomputers.io/posts/enframing-the-code.html"&gt;the Enframing piece&lt;/a&gt;, argued that technology changes how we see the world. Everything becomes standing reserve: raw material to be ordered and consumed. The river becomes a power source. The specification becomes code. The transformation is ontological: it changes what things are, not just what we do with them.&lt;/p&gt;
&lt;p&gt;Wiener argued that technology changes the dynamics of the systems we operate within. Feedback loops accelerate. Bottlenecks shift. Components that were adequate at one cycle speed become inadequate at a faster one. The transformation is mechanical: it changes the forces acting on us, not necessarily how we understand them.&lt;/p&gt;
&lt;p&gt;The two frameworks aren't contradictory. They're describing different aspects of the same process. Heidegger explains why we treat the Zilog manual as raw material for code generation (Enframing). Wiener explains what happens when we do it at scale (positive feedback, demand expansion, bottleneck overload). Jevons measured the economic result (total consumption rises despite efficiency gains).&lt;/p&gt;
&lt;p&gt;There's a useful way to layer them. Heidegger describes the precondition: technology must first transform how we see the world (specifications become standing reserve) before the feedback loop can operate. You can't accelerate production of something you don't yet see as producible. Enframing opens the door. Wiener's loop walks through it. Jevons counts what's on the other side.&lt;/p&gt;
&lt;p&gt;The sequence matters for AI. First, we began seeing cognitive tasks as automatable (Heidegger's shift in perception). Then, AI tools made the automation practical and cheap (Wiener's efficiency gain). Then, demand for cognitive output expanded beyond what anyone predicted (Jevons' paradox). Each step enables the next. The feedback loop couldn't run until the Enframing was in place, and the economic expansion couldn't happen until the loop was running.&lt;/p&gt;
&lt;p&gt;Three disciplines. One phenomenon. The feedback loop that Jevons couldn't name, Wiener formalized, and Heidegger diagnosed as a transformation in our relationship to the world.&lt;/p&gt;
&lt;h3&gt;The Missing Thermostat&lt;/h3&gt;
&lt;p&gt;Every stable system has negative feedback. A thermostat, a voltage regulator, a population predator-prey cycle: something measures the output and adjusts the input to keep the system within bounds. Positive feedback without negative feedback is, by definition, unstable. The microphone screams until someone unplugs it.&lt;/p&gt;
&lt;p&gt;The AI feedback loop currently has no thermostat. There is no mechanism that measures the volume of unreviewed software in production and slows the rate of production accordingly. There is no mechanism that measures developer burnout and reduces the demand for cognitive output. There is no mechanism that measures the ratio of AI-generated code to human-reviewed code and raises an alarm when it crosses a threshold.&lt;/p&gt;
&lt;p&gt;Wiener would argue that this is the actual problem. Not AI itself (a tool, a component, an efficiency gain), but the absence of negative feedback in the system that AI accelerates. His entire career was about designing feedback systems that stabilize rather than explode. His warning about automation wasn't "don't build machines." It was "build the governance structures that keep the feedback loop from overloading its human components."&lt;/p&gt;
&lt;p&gt;In 1949, Wiener wrote a letter to Walter Reuther, president of the United Auto Workers union, warning him about the coming wave of industrial automation. He didn't tell Reuther to smash the machines. He told him to prepare the workforce and the institutions for a system that would accelerate beyond their current capacity to manage. The letter went largely unheeded.&lt;/p&gt;
&lt;p&gt;We're in the same position now. The feedback loop is running. The human component is approaching its throughput limit. The thermostat doesn't exist. Someone needs to build it, and the people best positioned to do so are the ones inside the loop: the developers and decision-makers who can see the acceleration because they're experiencing it.&lt;/p&gt;
&lt;p&gt;Wiener died in Stockholm in 1964 at the age of sixty-nine, two decades before the personal computer and six decades before large language models. He never saw the system he described reach the scale it's reaching now. But the mathematics he wrote down in 1948 describe it precisely. Positive feedback without negative feedback is unstable. The system will find its constraint and overload it. The only question is whether we build the thermostat before or after the overload.&lt;/p&gt;
&lt;p&gt;What makes Wiener worth reading today isn't his specific predictions (some were right, some were wrong, the timeline was consistently too compressed). It's his framework. He understood that technological change is not a series of discrete events but a system of coupled feedback loops. Each efficiency gain changes the dynamics of the system it operates within. Each change in dynamics creates pressure on whatever component is now the bottleneck. And each bottleneck, when overloaded, produces consequences that feed back into the system and accelerate the next cycle.&lt;/p&gt;
&lt;p&gt;That framework applies to coal in 1865, to factory automation in 1950, and to AI-assisted cognitive work in 2026. The specific resources change. The specific bottlenecks change. The feedback dynamics don't.&lt;/p&gt;
&lt;p&gt;John von Neumann, Wiener's contemporary and one of the minds his work most influenced, once said that young mathematicians should not worry about whether their work would be useful because "truth is much too complicated to allow anything but approximations." Wiener's approximation of the feedback dynamics of technological change was good enough that it still describes the system seventy-eight years after he formalized it. Whether it's good enough to help us build the thermostat before we need it is the question his work leaves us with.&lt;/p&gt;
&lt;h3&gt;What a Thermostat Might Look Like&lt;/h3&gt;
&lt;p&gt;Wiener didn't just diagnose problems. He designed solutions. His entire field was about building systems that regulate themselves. If he were alive today, he'd be asking: what does negative feedback look like in an AI-accelerated software economy?&lt;/p&gt;
&lt;p&gt;Some possibilities:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mandatory review ratios.&lt;/strong&gt; For every N lines of AI-generated code deployed to production, M lines must be reviewed by a qualified human. The ratio creates a coupling between the production rate and the review rate, forcing the system to slow down when review capacity is saturated. This is a thermostat: the output (deployed code) is measured against a constraint (review capacity), and the input (generation rate) is throttled accordingly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Liability assignment.&lt;/strong&gt; If AI-generated code causes a data breach or financial loss, who pays? Currently, nobody in particular. Assigning liability to the person who deployed the code (not the person who prompted the AI) creates negative feedback: the cost of deployment failure feeds back into the decision to deploy, making people more cautious about shipping unreviewed code. Insurance markets would price this risk and create their own feedback mechanisms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Institutional adaptation.&lt;/strong&gt; This is what Wiener actually advocated. Not technical solutions but organizational ones. He told Walter Reuther to prepare the workforce for automation. The equivalent today: companies need to build review capacity at the same rate they build production capacity. Every developer who ships AI-generated code needs a corresponding increase in testing, security review, and architectural oversight. The organizations that treat AI as free productivity without investing in review are the ones that will hit the overload first.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cultural awareness.&lt;/strong&gt; &lt;a href="https://baud.rs/4ws9kl"&gt;Tristan Harris&lt;/a&gt; and the &lt;a href="https://baud.rs/cGtlHh"&gt;Center for Humane Technology&lt;/a&gt; have been arguing since 2023 that AI is being deployed faster than any technology in history under maximum incentives to cut corners on safety. Harris makes a distinction that Wiener would have appreciated: the difference between the "possible" (AI's theoretical benefits) and the "probable" (what actually happens given current incentive structures). The probable outcome, without intervention, is that companies race toward capability because the competitive feedback loop punishes restraint. Harris's proposed response is building global consensus that the current trajectory is unacceptable, the way the nuclear test ban and the Montreal Protocol established consensus before those feedback loops ran to their conclusions. In Wiener's terms, Harris is trying to build the thermostat at the cultural level: changing the system's objective function so that it optimizes for something other than pure output volume.&lt;/p&gt;
&lt;p&gt;None of these exist at scale today. The thermostat is unbuilt. The loop runs open.&lt;/p&gt;
&lt;p&gt;Jevons told us what happens. Wiener told us why. The question that remains is whether anyone is building the feedback mechanism that prevents the system from screaming.&lt;/p&gt;</description><category>ai</category><category>automation</category><category>control theory</category><category>cybernetics</category><category>economics</category><category>feedback loops</category><category>heidegger</category><category>jevons paradox</category><category>norbert wiener</category><category>philosophy</category><guid>https://tinycomputers.io/posts/the-feedback-loop-that-jevons-couldnt-name.html</guid><pubDate>Fri, 27 Mar 2026 13:00:00 GMT</pubDate></item><item><title>Building DirtScout: A Land Acquisition Platform with Claude Code</title><link>https://tinycomputers.io/posts/building-dirtscout-a-land-acquisition-platform-with-claude-code.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/building-dirtscout-a-land-acquisition-platform-with-claude-code_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;24 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Nine years ago, I built something similar.&lt;/p&gt;
&lt;p&gt;It was 2017, St. Louis County, Minnesota. I wanted to find raw undeveloped land from the county's delinquent tax rolls. The county had an ArcGIS service, but the APIs were primitive compared to what they offer now. I stood up a PostgreSQL database with PostGIS extensions, wrote Ruby scripts to scrape parcel data from the county's map server, geocoded addresses, and built a Ruby on Rails frontend to browse the results. The whole thing lived on a single VPS. It worked for one county. The data model was rigid, the scraping was fragile, and every time St. Louis County changed their GIS service, something broke.&lt;/p&gt;
&lt;p&gt;That project died the way side projects do: I got what I needed from it and moved on.&lt;/p&gt;
&lt;p&gt;In March 2026, I came back to the idea. The landscape had changed. ArcGIS REST APIs are now standardized and reliable. Wisconsin publishes a statewide parcel dataset covering all 72 counties through a single endpoint. Minnesota counties expose delinquent tax data through queryable feature services. AWS Lambda and DynamoDB mean I don't need to manage a database server. And I had a tool that didn't exist in 2017: &lt;a href="https://claude.ai/claude-code"&gt;Claude Code&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;DirtScout is the result. It's a full-stack land acquisition platform at &lt;a href="https://dirtscout.land"&gt;dirtscout.land&lt;/a&gt; that searches delinquent tax parcels across 21 Minnesota counties and browses raw land across 72 Wisconsin counties. It has AI-powered investment analysis, environmental and soil assessments, a deal pipeline with offer letter generation, tax forfeit auction tracking, and automated monitoring with email alerts. The codebase is about 29,000 lines across Python, TypeScript, and infrastructure-as-code.&lt;/p&gt;
&lt;p&gt;I built it with Claude Code. Not "Claude Code assisted me" or "Claude Code helped with the boilerplate." Claude Code wrote the code. I directed the architecture, made the decisions, and did the debugging when things broke. But the actual lines of code came from conversations, not from me typing in an editor.&lt;/p&gt;
&lt;h3&gt;The Architecture&lt;/h3&gt;
&lt;p&gt;The 2017 version was PostgreSQL + PostGIS + Ruby on Rails on a single server. The 2026 version:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Frontend:&lt;/strong&gt; Next.js 16, static export, Tailwind CSS, react-leaflet for maps. Hosted on S3 behind CloudFront. The entire frontend is pre-rendered HTML and JavaScript; there's no server-side rendering. CloudFront serves it from edge locations. A URL rewrite function handles dynamic routes for deal detail pages and shared parcel links.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Backend:&lt;/strong&gt; Python FastAPI running on a single AWS Lambda function behind API Gateway. Mangum adapts the ASGI app to Lambda's event format. Every API request hits the same Lambda, which cold-starts in about 3.5 seconds and handles subsequent requests in under a second. The function has 512MB of memory and a 5-minute timeout.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Data:&lt;/strong&gt; Two DynamoDB tables. The main table stores user data, flagged parcels, deals, preferences, notes, attachments, saved searches, tax list imports, and auction tracking. The cache table stores land cover analysis, environmental data, soil analysis, and geometry with TTLs. No PostgreSQL. No PostGIS. No database server to manage.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Infrastructure:&lt;/strong&gt; AWS CDK in Python. One &lt;code&gt;cdk deploy&lt;/code&gt; command creates the Lambda, API Gateway, DynamoDB tables, S3 buckets, SQS queues, EventBridge schedules, Route 53 records, CloudFront distributions, and ACM certificates. The entire infrastructure is version-controlled and reproducible.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;On-premises worker:&lt;/strong&gt; A service running on a local AMD Strix Halo machine (Ryzen AI MAX+ 395, 128GB RAM) processes delinquent tax list PDFs using pdfplumber for text extraction and a local Qwen3 32B model via Ollama for structured data extraction. It polls an SQS queue for jobs.&lt;/p&gt;
&lt;p&gt;This is a fundamentally different architecture than what I could have built in 2017. No servers to patch aside from the Strix Halo. No database to back up. No PostGIS extensions to compile. The Lambda handles the compute, DynamoDB handles the storage, and the on-prem machine handles the jobs that need a real browser or a local LLM.&lt;/p&gt;
&lt;h3&gt;What Claude Code Actually Did&lt;/h3&gt;
&lt;p&gt;I want to be specific about this because the "AI-assisted development" conversation is usually vague. People say "I used AI to help me code" and it could mean anything from autocomplete suggestions to full application generation. Here's what actually happened.&lt;/p&gt;
&lt;p&gt;I started with a Rust TUI. The original project was a terminal application that queried a handful of Minnesota county ArcGIS services and displayed delinquent parcels in a text interface. It had county configurations, a query client, land cover analysis via USGS NLCD, and a flagging system. Claude Code built this from my descriptions of what I wanted: "query this ArcGIS service for parcels where the delinquent flag is set, filter by acreage and land use, show me the results in a table with navigation."&lt;/p&gt;
&lt;p&gt;Then I decided to make it a web app. I described the architecture I wanted: FastAPI on Lambda, Next.js on S3, DynamoDB for storage. Claude Code ported the Rust query logic to Python, built the FastAPI routes, created the React components, wrote the CDK infrastructure, and handled the deployment. Each feature was a conversation: "add Google OAuth," "add a deal pipeline with stages - make it look like Kanban," "generate offer letter PDFs," "add an AI investment summary using the Claude API."&lt;/p&gt;
&lt;p&gt;The codebase grew to 29,000 lines across 113 files in the initial commit. Later sessions added another 60 files and 5,000 lines for Wisconsin support, soil analysis, tax list imports, auction tracking, spatial search, and saved searches.&lt;/p&gt;
&lt;p&gt;I didn't write these lines. I directed them. There's a difference, and it matters.&lt;/p&gt;
&lt;p&gt;When I say "directed," I mean I made every architectural decision. I chose DynamoDB over PostgreSQL because I didn't want to manage a database. I chose Lambda over ECS because I didn't want to manage containers. I chose static export over SSR because I didn't want to manage a Node.js server. I chose to use a local LLM for PDF parsing instead of Claude API because the parsing is structured data extraction that doesn't need frontier model quality.&lt;/p&gt;
&lt;p&gt;Claude Code implemented these decisions. When something broke, I described the symptom and Claude Code diagnosed the cause. When I wanted a new feature, I described the behavior and Claude Code wrote the code. The feedback loop was: describe what I want, review what I get, deploy, test, describe what's wrong, iterate.&lt;/p&gt;
&lt;p&gt;Some things broke in interesting ways. DynamoDB doesn't accept Python floats; you have to convert everything to Decimal. The county field maps are reverse-keyed from what you'd expect (ArcGIS field names are the keys, common names are the values). Google OAuth redirect URIs need a trailing slash. CloudFront caches aggressively and you have to invalidate after every deploy. The Census TIGER API for county boundaries is painfully slow, so we downloaded the GeoJSON once and serve it as a static file. Each of these was discovered in production and fixed in conversation.&lt;/p&gt;
&lt;h3&gt;The Data Sources&lt;/h3&gt;
&lt;p&gt;The interesting part of DirtScout isn't the web framework. It's the data integration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Minnesota parcel data&lt;/strong&gt; comes from 11 different county ArcGIS REST services, each with its own field names, query syntax, and data quality. St. Louis County has &lt;code&gt;DELINQUENT_TAX_FLAG&lt;/code&gt; and &lt;code&gt;BAL_DUE&lt;/code&gt;. Aitkin has &lt;code&gt;DELINQUENT_FLAG&lt;/code&gt; (text: "YES"/"NO") and &lt;code&gt;BALDUE&lt;/code&gt;. Hennepin stores acreage in square feet (divide by 43,560). Goodhue stores acreage as a string that requires CAST in the SQL WHERE clause. Each county is a separate configuration with field mappings, WHERE clause templates, and normalization logic.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Minnesota tax lists&lt;/strong&gt; come from 15 county PDFs and Excel files. Itasca County publishes an Excel file updated monthly. The rest publish PDF legal notices. The PDFs are processed by either the Claude API (Haiku model, cheapest tier) or a local Qwen3 32B running on the Strix Halo machine. The AI extracts parcel IDs, owner names, delinquent amounts, and addresses from the unstructured PDF text and returns structured JSON.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Wisconsin parcel data&lt;/strong&gt; comes from a single statewide ArcGIS feature service maintained by the State Cartographer's Office. One endpoint, all 72 counties, standardized fields. Owner names, mailing addresses, assessed values, acreage, property class. No delinquent tax data in the GIS, but we supplement with 9 county-level PDF lists of tax-delinquent and tax-forfeited properties.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Environmental analysis&lt;/strong&gt; layers: FEMA NFHL for flood zones, NWI for wetlands, NHD for water bodies, Minnesota DNR County Well Index for well data, MPCA for contamination sites. Each is a separate ArcGIS REST service query using the parcel's centroid.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Soil analysis&lt;/strong&gt; comes from the USDA Soil Data Access REST API (SSURGO). A SQL query with the parcel's centroid returns soil components, drainage class, hydric rating, slope, farmland classification, and capability class. We compute a "buildability" score from these factors.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Land cover&lt;/strong&gt; comes from the USGS MRLC WMS service, querying the NLCD 2021 Land Cover layer. We sample the parcel area and return a breakdown by cover type: forest, agriculture, water, wetlands, developed, barren.&lt;/p&gt;
&lt;p&gt;Each of these integrations was built in conversation with Claude Code. "Add flood zone analysis using FEMA's service." "The NWI wetlands query needs table-prefixed field names." "The SSURGO soil query needs a WKT point geometry."&lt;/p&gt;
&lt;h3&gt;What Changed Since 2017&lt;/h3&gt;
&lt;p&gt;The PostgreSQL + PostGIS + Ruby on Rails stack I used nine years ago was the right choice for 2017. PostGIS let me do spatial queries locally. I had to store the parcel data because the ArcGIS services weren't reliable enough to query in real time. Rails rendered server-side because that's what Rails did.&lt;/p&gt;
&lt;p&gt;None of that is necessary anymore. The ArcGIS services are fast and reliable enough to query live. DynamoDB handles the persistence without a schema to manage. Lambda eliminates server management. Static export means the frontend is just files on S3.&lt;/p&gt;
&lt;p&gt;There's a personal angle here too. In graduate school, I spent an entire semester manually developing land cover classifications for a final project — hand-labeling training data, running supervised classification algorithms, validating results against ground truth. It was weeks of work for one study area. For DirtScout, I told Claude Code "add a buildability score based on soil drainage, hydric percentage, slope, and capability class" and had a working assessment in minutes. The SSURGO soil data query, the scoring logic, the frontend panel with color-coded ratings — all from a single conversation. The knowledge that took a semester to develop is now a commodity you can describe and deploy.&lt;/p&gt;
&lt;p&gt;But the bigger change is the development process. In 2017, I wrote every line of Ruby and SQL by hand. I designed the PostGIS schema, wrote the scraping scripts, built the Rails views, configured the Nginx proxy, set up the SSL certificates, and wrote the systemd service files. It took months of evenings and weekends for a single-county tool.&lt;/p&gt;
&lt;p&gt;In 2026, I built a two-state, multi-service platform with AI analysis, auction tracking, deal management, and offer letter generation in a series of conversations over a few days. The code isn't hand-crafted. I'm not interested in hand-crafted code when that's not the point. The point is finding undervalued rural land from delinquent tax records and making offers to motivated sellers. The code is the means. Claude Code made the means faster.&lt;/p&gt;
&lt;h3&gt;The On-Prem Angle&lt;/h3&gt;
&lt;p&gt;A tax list import worker runs on a Bosgame M5 mini PC in my basement.&lt;/p&gt;
&lt;p&gt;The worker exists because I didn't want to pay for Claude API calls to parse 24 county PDFs every week. The AMD Strix Halo has 128GB of RAM and runs Qwen3 32B through Ollama. The worker downloads each PDF, extracts text with pdfplumber (a Python library that does the PDF-to-text conversion locally, no model needed), then sends the extracted text to the local Ollama instance for structured JSON extraction. Each 2-page chunk takes 5-7 minutes on the 32B model. It's slower than a cloud API. It's also free.&lt;/p&gt;
&lt;p&gt;The worker is a systemd service that starts on boot and polls an SQS queue continuously. A weekly systemd timer enqueues an "import all" message every Monday morning.&lt;/p&gt;
&lt;p&gt;This is the &lt;a href="https://tinycomputers.io/posts/the-economics-of-owning-your-own-inference.html"&gt;economics of owning your own inference&lt;/a&gt; in practice. The frontier model handles the quality-sensitive work (AI investment analysis, parcel chat). The local model handles the batch extraction work. The split happens naturally based on the task requirements.&lt;/p&gt;
&lt;h3&gt;What It Does Now&lt;/h3&gt;
&lt;p&gt;The production site at dirtscout.land:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Searches delinquent tax parcels across 21 Minnesota counties (8 Tier 1 with full ArcGIS data, 2 Tier 2 with partial data, 2 Tier 3 with minimal data, 9 Tier 4 with imported tax list data)&lt;/li&gt;
&lt;li&gt;Browses raw land parcels across all 72 Wisconsin counties via the statewide parcel service&lt;/li&gt;
&lt;li&gt;Scores each MN parcel on a 0-100 scale (grades A through F) based on financial opportunity, road access, environmental factors, and land character&lt;/li&gt;
&lt;li&gt;Generates AI investment summaries using Claude Sonnet with full context: parcel data, land cover, environmental analysis, soil data, owner's other properties, and attached documents&lt;/li&gt;
&lt;li&gt;Tracks deals through a pipeline (prospecting, offer sent, negotiating, under contract, closed, dead) with offer letter PDF generation using three templates&lt;/li&gt;
&lt;li&gt;Monitors for new delinquent parcels daily via EventBridge-triggered Lambda scans, with email alerts&lt;/li&gt;
&lt;li&gt;Tracks tax forfeit auction dates across 8 Minnesota counties, with a floating widget showing upcoming auctions&lt;/li&gt;
&lt;li&gt;Imports delinquent tax lists from 15 MN and 9 WI county PDFs/Excel files weekly&lt;/li&gt;
&lt;li&gt;Provides environmental analysis (flood zones, wetlands, water bodies, wells, contamination), land cover classification (NLCD 2021), and soil analysis (SSURGO) for each parcel&lt;/li&gt;
&lt;li&gt;Shows parcel boundaries on satellite imagery, with an interactive explore map that loads parcel shapes at high zoom levels&lt;/li&gt;
&lt;li&gt;Manages parcel notes, file attachments (via S3 presigned URLs), shareable parcel links, and saved searches&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;What I'd Do Differently&lt;/h3&gt;
&lt;p&gt;I'd add geometry caching earlier. Every map view that shows parcel boundaries makes a live ArcGIS query with &lt;code&gt;returnGeometry=true&lt;/code&gt;, which is slower than querying attributes only. Caching the geometry in DynamoDB with a TTL would make the explore map significantly faster.&lt;/p&gt;
&lt;p&gt;I'd standardize the county configurations into a more declarative format. Right now each county is a Python dataclass with hand-tuned field mappings. A JSON configuration file that Claude Code could modify more easily would reduce the friction of adding new counties.&lt;/p&gt;
&lt;p&gt;I'd separate the frontend into a proper monorepo with shared types between the API client and the backend models. The current setup has TypeScript interfaces in the frontend that mirror Pydantic models in the backend, and they get out of sync when fields are added.&lt;/p&gt;
&lt;p&gt;But these are optimizations, not regrets. The system works. It finds land. It makes the research process faster. And it was built in conversations, not in sprints.&lt;/p&gt;</description><category>ai</category><category>arcgis</category><category>aws</category><category>claude code</category><category>dynamodb</category><category>fastapi</category><category>gis</category><category>infrastructure</category><category>land investing</category><category>leaflet</category><category>minnesota</category><category>next.js</category><category>python</category><category>react</category><category>real estate</category><category>wisconsin</category><guid>https://tinycomputers.io/posts/building-dirtscout-a-land-acquisition-platform-with-claude-code.html</guid><pubDate>Thu, 26 Mar 2026 01:00:00 GMT</pubDate></item><item><title>Enframing the Code</title><link>https://tinycomputers.io/posts/enframing-the-code.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/enframing-the-code_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;25 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/clean-room-z80-emulator/zilog-z80.jpg" alt="A Zilog Z80 CPU in a white ceramic DIP-40 package, the processor whose specification became standing reserve" style="float: right; max-width: 300px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 10px 20px rgba(0,0,0,.1);" loading="lazy"&gt;&lt;/p&gt;
&lt;p&gt;I asked Claude to build a &lt;a href="https://tinycomputers.io/posts/clean-room-z80-emulator.html"&gt;Z80 emulator&lt;/a&gt;. The constraint was explicit: no reference to existing emulator source code. The inputs were the Zilog Z80 CPU User Manual, an architectural plan I wrote, and the test ROMs to validate against. Claude produced 1,300 lines of C covering every official Z80 instruction, undocumented flag behaviors, ACIA serial emulation, and CP/M support. It passed 117 unit tests. It boots CP/M and runs programs.&lt;/p&gt;
&lt;p&gt;The emulator works. The question is what it means that it exists.&lt;/p&gt;
&lt;h3&gt;The Clean Room That Wasn't&lt;/h3&gt;
&lt;p&gt;"Clean room" is a legal term borrowed from semiconductor fabrication. In software, it describes a methodology where developers build from specifications and documentation without ever examining existing implementations. The purpose is to produce code that is legally independent of prior art. If you've never seen the original code, you can't have copied it.&lt;/p&gt;
&lt;p&gt;The clean-room process was designed for human cognition. A developer reads a specification, forms a mental model, and writes code that implements the behavior the specification describes. The legal fiction is that the developer's mental model is informed solely by the specification, not by any existing implementation. In practice, developers have seen other implementations, read blog posts, studied textbook examples. The clean room is a discipline, not a guarantee: you follow the process, document that you followed it, and hope that's sufficient if someone challenges you.&lt;/p&gt;
&lt;p&gt;When Claude writes a Z80 emulator from the Zilog manual, the clean-room concept doesn't dissolve because the AI is better at following the rules. It dissolves because the framework doesn't apply. Claude's training data includes dozens of Z80 emulators. The model has seen &lt;a href="https://baud.rs/GeplXn"&gt;MAME's Z80 core&lt;/a&gt;, it has seen &lt;a href="https://baud.rs/Adkbi8"&gt;Fuse&lt;/a&gt;, it has seen &lt;a href="https://baud.rs/KJoorR"&gt;whatever antirez published&lt;/a&gt;. The question of whether a specific output is "derived from" a specific input is unanswerable, because the model's internal state isn't decomposable into "I learned this from source A" and "I learned this from source B." The provenance that clean-room law requires you to demonstrate doesn't exist in a form that can be demonstrated.&lt;/p&gt;
&lt;p&gt;But here's what's interesting: the emulator I directed Claude to produce is not a copy of any specific emulator. The architecture is mine. The bit-field decoding strategy (x/y/z/p/q decomposition of opcode bytes) was specified in my architectural plan. The test suite structure, the ACIA emulation interface, the system emulator's callback design: all specified by me and implemented by Claude from those specifications plus the Zilog manual. The output is an original assembly of knowledge. It's also an output of a system that has seen the source code it was told not to reference.&lt;/p&gt;
&lt;p&gt;The law has no category for this. It's not a copy. It's not independent. It's something else.&lt;/p&gt;
&lt;h3&gt;The Language That Doesn't Exist&lt;/h3&gt;
&lt;p&gt;The Z80 case is complicated by the fact that prior implementations exist. Somebody could, in theory, diff my emulator against MAME's and look for structural similarities. (They won't find meaningful ones, because the architecture is different, but the argument could be made.) The more interesting case eliminates this possibility entirely.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://tinycomputers.io/posts/introducing-lattice-a-crystallization-based-programming-language.html"&gt;Lattice&lt;/a&gt; is a programming language I designed. It has a novel feature called the phase system: mutability is not a static attribute but a runtime property that values transition through, like matter moving between liquid and solid. You declare a value in &lt;code&gt;flux&lt;/code&gt; (mutable), &lt;code&gt;freeze&lt;/code&gt; it to &lt;code&gt;fix&lt;/code&gt; (immutable), &lt;code&gt;thaw&lt;/code&gt; it back if needed. The language has &lt;code&gt;forge&lt;/code&gt; blocks for controlled mutation zones. None of this exists in any other language.&lt;/p&gt;
&lt;p&gt;Claude writes Lattice code. It writes it well. It produces correct programs using the phase system, the concurrency primitives, and the bytecode VM's 100-opcode instruction set. It does this despite the fact that Lattice does not appear in its training data. The language was designed after Claude's knowledge cutoff. There is no Lattice source code on GitHub, no Stack Overflow answers, no blog posts (other than mine) explaining the syntax.&lt;/p&gt;
&lt;p&gt;How does Claude write Lattice? Because Lattice's syntax looks like Rust. The curly braces, the type annotations, the pattern matching: Claude recognizes the structural similarity and maps its understanding of Rust-like languages onto the Lattice grammar. The phase-specific keywords (&lt;code&gt;flux&lt;/code&gt;, &lt;code&gt;fix&lt;/code&gt;, &lt;code&gt;freeze&lt;/code&gt;, &lt;code&gt;thaw&lt;/code&gt;, &lt;code&gt;forge&lt;/code&gt;) are new, but they appear in contexts that are syntactically familiar. Claude doesn't need to have seen Lattice before. It needs to have seen languages that smell similar.&lt;/p&gt;
&lt;p&gt;This is a fundamentally different kind of creation than what copyright law contemplates. Claude didn't copy Lattice code (none exists to copy). It didn't copy Rust code (Lattice isn't Rust). It transformed a grammar specification and a set of examples into working programs in a language that has no prior art. The specification became the implementation without passing through any intermediate step that could be called "copying."&lt;/p&gt;
&lt;h3&gt;Heidegger Saw This Coming&lt;/h3&gt;
&lt;p&gt;In 1954, Martin Heidegger published &lt;a href="https://baud.rs/BziXVW"&gt;&lt;em&gt;The Question Concerning Technology&lt;/em&gt;&lt;/a&gt;. His central argument: modern technology is not just a set of tools. It is a way of seeing the world. He called this way of seeing &lt;em&gt;Enframing&lt;/em&gt; (Gestell): the tendency of modern technology to reveal everything as &lt;em&gt;standing reserve&lt;/em&gt; (Bestand), raw material ordered into availability.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/enframing/rhine-dam.jpg" alt="A hydroelectric dam on the Rhine near Märkt, Germany, the kind of infrastructure Heidegger used to illustrate Enframing" style="max-width: 100%; border-radius: 4px; box-shadow: 0 10px 20px rgba(0,0,0,.1); margin: 1em 0;" loading="lazy"&gt;&lt;/p&gt;
&lt;p&gt;The example Heidegger used was a hydroelectric dam on the Rhine. The river is no longer a river in the way a bridge reveals it (something to cross, something to contemplate, something with its own presence). The dam reveals the river as a power source. The water is standing reserve: ordered, measured, extracted. The river hasn't changed physically. What changed is how technology frames it.&lt;/p&gt;
&lt;p&gt;This is exactly what happens when Claude reads the &lt;a href="https://baud.rs/EESjG1"&gt;Zilog Z80 CPU User Manual&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The manual is a specification: 332 pages of timing diagrams, instruction tables, register descriptions, and pin assignments. When a human developer reads it, the manual is a guide. The developer forms an understanding, makes design choices, writes code that reflects their interpretation of the specification. The manual and the implementation are connected through the developer's comprehension. The developer is present in the code in a way that matters both legally and philosophically.&lt;/p&gt;
&lt;p&gt;When Claude reads the same manual, the specification becomes standing reserve. The timing diagrams are not studied; they are consumed. The instruction tables are not interpreted; they are transformed. The manual is raw material, ordered directly into code, the way the dam orders the river into electricity. There is no intermediate step of "understanding" in the human sense. There is a transformation from one representation (specification) to another (implementation), and the transformation is mechanical in a way that human interpretation is not.&lt;/p&gt;
&lt;p&gt;This is what Heidegger meant by Enframing. Technology doesn't just use resources; it changes what counts as a resource. The Zilog manual was written as a reference for engineers. Enframing reveals it as raw material for code generation. The specification was always latently an implementation; the AI just makes the transformation explicit.&lt;/p&gt;
&lt;h3&gt;What Copyright Was Protecting&lt;/h3&gt;
&lt;p&gt;Copyright law protects "original works of authorship fixed in a tangible medium of expression." The key word is "original." A Z80 emulator is copyrightable because the programmer made creative choices in expressing the specification as code. Two programmers given the same Zilog manual will produce different emulators: different variable names, different control flow structures, different optimization strategies, different architectural decisions. The specification constrains the behavior. The expression is where the creativity lives.&lt;/p&gt;
&lt;p&gt;This framework assumes that the gap between specification and implementation is where human creativity operates. The specification says "the ADD instruction sets the zero flag if the result is zero." A hundred programmers will write a hundred slightly different implementations of this behavior. Each is an original expression. Each is copyrightable.&lt;/p&gt;
&lt;p&gt;What happens when the gap closes? When the transformation from specification to implementation becomes mechanical, when there is no creative gap for originality to occupy, what is left to protect?&lt;/p&gt;
&lt;p&gt;Claude's Z80 emulator makes specific structural choices: the x/y/z/p/q bit-field decomposition, the callback-based system bus interface, the T-state tracking architecture. These choices came from my architectural plan, not from Claude's autonomous creativity. I specified the structure; Claude filled it in from the Zilog manual. The "creative choices" that copyright relies on were mine (the architecture) and the specification's (the behavior). Claude's contribution was the transformation between the two, and that transformation is closer to compilation than to authorship.&lt;/p&gt;
&lt;p&gt;Lattice pushes this further. Claude writes programs in a language with no training data, from a grammar specification and examples I provided. The output is correct Lattice code. But who is the author? I designed the language. Claude learned it from my spec. The programs it produces are implementations of tasks I described. At no point did Claude exercise the kind of independent creative judgment that copyright assumes. It transformed a task description into code in a grammar it learned from me. The entire chain from specification to implementation is mechanical, even though the output looks exactly like something a human programmer would write.&lt;/p&gt;
&lt;h3&gt;The Dissolution&lt;/h3&gt;
&lt;p&gt;Clean-room reverse engineering was a legal ritual designed to prove that a human developer's mental model was not contaminated by existing code. The ritual made sense when the concern was human memory: a developer who has read source code might unconsciously reproduce it.&lt;/p&gt;
&lt;p&gt;AI makes the ritual meaningless in two ways.&lt;/p&gt;
&lt;p&gt;First, provenance is undemonstrable. You cannot prove that Claude's output is or isn't derived from a specific piece of training data, because the model's internal representations don't maintain source attribution. The clean-room question ("did the developer see the original code?") has no answerable equivalent for an LLM. The model has seen everything in its training data simultaneously. It cannot unsee selectively.&lt;/p&gt;
&lt;p&gt;Second, the distinction between "specification" and "implementation" is collapsing. When the transformation between them is mechanical and instantaneous, the specification &lt;em&gt;is&lt;/em&gt; the implementation in a meaningful sense. The Zilog manual contains the Z80 emulator the way an acorn contains an oak tree. The transformation from one to the other requires energy and process, but the information content is the same. Copyright protects the expression, but when the expression is a deterministic function of the specification, the creative contribution approaches zero.&lt;/p&gt;
&lt;p&gt;This doesn't mean all AI-generated code is uncopyrightable. If I write a detailed architectural plan, direct Claude to implement it, review and revise the output, and make structural decisions throughout the process, the result reflects my creative choices expressed through an AI tool. The tool is more sophisticated than a compiler, but the relationship is similar: I made the design decisions; the tool translated them into a lower-level representation. The copyright, if it exists, is in my architectural choices, not in Claude's line-by-line implementation.&lt;/p&gt;
&lt;p&gt;But if someone asks Claude to "write a Z80 emulator" with no architectural plan, no structural constraints, and no iterative review, and Claude produces a working emulator from its training data, who owns that code? Not the person who typed the prompt; they made no creative contribution beyond the request. Not Anthropic; they built the tool but didn't direct the output. Not the authors of the Z80 emulators in the training data; their code wasn't copied in any legally meaningful sense. The code exists in a copyright vacuum: produced by a process that doesn't have an author in the way the law requires.&lt;/p&gt;
&lt;h3&gt;Why This Matters Now&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://tinycomputers.io/posts/the-excavator-and-the-foundation.html"&gt;velocity of AI-assisted code production&lt;/a&gt; is accelerating. Every developer using Claude, Copilot, or Cursor is producing code whose provenance is uncertain. The code works. It passes tests. It ships to production. And its relationship to the training data that informed it is, in a strict legal sense, unknown and unknowable.&lt;/p&gt;
&lt;p&gt;The current legal frameworks (copyright, clean room, fair use) were designed for a world where code was written by humans who could testify about their creative process. "I read the specification. I designed the architecture. I wrote the code. I did not reference any existing implementation." This testimony is the foundation of clean-room defense. An LLM cannot provide it, and the human directing the LLM can only testify about their own contributions (the prompt, the architectural plan, the review), not about what the model drew from.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/enframing/compaq-portable.jpg" alt="A Compaq Portable computer, the machine whose clean-room BIOS reimplementation established the legal precedent AI is now dissolving" style="float: right; max-width: 350px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 10px 20px rgba(0,0,0,.1);" loading="lazy"&gt;&lt;/p&gt;
&lt;p&gt;I took a CS ethics course as an undergraduate. The cases we studied (Compaq's clean-room reimplementation of the IBM PC BIOS, SCO's claim that Linux contained UNIX code, DeCSS and the DMCA's prohibition on circumventing copy protection) all assumed a human author whose creative process could be examined and whose sources could be traced. Every one of those cases would be decided differently if the defendant had said "I told an AI to implement the specification and it produced this code." The existing precedent doesn't apply, and the new precedent doesn't exist yet.&lt;/p&gt;
&lt;h3&gt;The Acorn and the Oak&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://baud.rs/Ij1iHE"&gt;Heidegger&lt;/a&gt; would say that the danger of Enframing is not that it's wrong but that it's totalizing. When technology reveals everything as standing reserve, we lose the ability to see things as they are. The river becomes only a power source. The specification becomes only raw material for code generation. The act of programming becomes only a transformation pipeline from input to output.&lt;/p&gt;
&lt;p&gt;What gets lost is what the clean-room process was actually designed to protect: the space between specification and implementation where human understanding operates. That space is where a developer reads "the ADD instruction sets the zero flag if the result is zero" and decides how to express that in code. The decision is small. The creativity is modest. But it's real, and it's human, and it's the entire basis of software copyright.&lt;/p&gt;
&lt;p&gt;AI doesn't eliminate that space. My Z80 emulator project included genuine creative decisions: the architecture, the test strategy, the system emulator design. Lattice exists because I designed a novel type system that no AI would have invented from existing languages. The creative space still exists for the people who operate at the design level.&lt;/p&gt;
&lt;p&gt;But for the implementation level, for the transformation from "what this should do" to "code that does it," the space is closing. The specification is becoming the implementation. The acorn is becoming the oak without passing through the seasons of human comprehension. And the legal and philosophical frameworks we built for a world where that transformation required human creativity haven't caught up.&lt;/p&gt;
&lt;p&gt;They will. The question is how much code ships before they do.&lt;/p&gt;</description><category>ai</category><category>clean room</category><category>copyright</category><category>heidegger</category><category>intellectual property</category><category>jevons paradox</category><category>lattice</category><category>philosophy</category><category>programming languages</category><category>software licensing</category><category>z80</category><guid>https://tinycomputers.io/posts/enframing-the-code.html</guid><pubDate>Sun, 22 Mar 2026 13:00:00 GMT</pubDate></item><item><title>The Excavator and the Foundation</title><link>https://tinycomputers.io/posts/the-excavator-and-the-foundation.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-excavator-and-the-foundation_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;26 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Jason Fried posted a &lt;a href="https://baud.rs/aHG0UZ"&gt;sharp critique&lt;/a&gt; of the "bespoke software revolution" narrative this week. His argument: most people don't like computers, don't want software projects, and won't become builders just because AI hands them better tools. The three-person accounting firm wants the paperwork gone, not a new system to maintain. The logistics company wants optimized routes, not Joe's side project. The law firm wants leverage on their time, not a codebase.&lt;/p&gt;
&lt;p&gt;His metaphor is good: "A powerful excavator doesn't turn a homeowner into a contractor. Most people just want the hole dug by someone else."&lt;/p&gt;
&lt;p&gt;He's right about who builds. He's wrong about what happens next.&lt;/p&gt;
&lt;h3&gt;The Echo Chamber Is Real&lt;/h3&gt;
&lt;p&gt;Fried's observation about the software community talking to itself lands because it's obviously true. Open any tech feed and the bespoke software excitement is coming from people who already build software for a living. They're excited because AI makes their work faster and more interesting. They project that excitement onto everyone else and conclude that everyone will want to build. This is like assuming everyone wants to change their own oil because you enjoy working on cars.&lt;/p&gt;
&lt;p&gt;Most people have no interest in building software. Not because they lack intelligence or creativity, but because software is a means to an end and they'd rather focus on the end. The accounting firm wants to close the books faster. The logistics company wants fewer empty miles. These are domain problems, not software problems, and the people who understand them best have spent their careers on the domain, not on code.&lt;/p&gt;
&lt;p&gt;Fried identifies the outliers correctly: the people who go deep with AI building tools were already dabblers. The curiosity was already there. AI didn't create new builders; it gave existing builders a power tool. This is an important observation that the tech community consistently ignores because it's less exciting than "everyone becomes a developer."&lt;/p&gt;
&lt;h3&gt;Where Fried Stops&lt;/h3&gt;
&lt;p&gt;But Fried's analysis ends at "most people won't build," and that's where the interesting question starts. Because some people will try.&lt;/p&gt;
&lt;p&gt;Not the majority. Not the three-person accounting firm drowning in paperwork. But the accounting firm's nephew who's "good with computers." The operations manager at the logistics company who watched a YouTube tutorial on Cursor. The paralegal at the law firm who built a spreadsheet macro once and now has access to tools that can generate entire applications from a text description.&lt;/p&gt;
&lt;p&gt;These people exist in every organization. They're not professional developers. They don't think of themselves as builders. But they have just enough technical confidence to be dangerous, and AI tools have just lowered the barrier enough to let them act on it.&lt;/p&gt;
&lt;p&gt;This is not a hypothetical. It's already happening. People are building internal tools with AI assistance, deploying them to their teams, and running business processes on software that no one with software judgment has reviewed. The tools work on the happy path. They do exactly what the builder asked for. The problem is what the builder didn't ask for.&lt;/p&gt;
&lt;h3&gt;The Happy Path Is All You Get&lt;/h3&gt;
&lt;p&gt;When a non-developer builds software with AI, they describe what they want: "I need a tool that takes client intake forms, extracts the relevant fields, and puts them in a spreadsheet." The AI builds it. It works. The builder is thrilled.&lt;/p&gt;
&lt;p&gt;What the builder didn't specify, and the AI didn't volunteer:&lt;/p&gt;
&lt;p&gt;What happens when a client submits a form with special characters that break the parser? What happens when two people submit simultaneously? What happens when the spreadsheet hits the row limit? Where are the backups? Who has access? What happens when the API key expires? What happens when the builder leaves the company and nobody knows how the tool works?&lt;/p&gt;
&lt;p&gt;These aren't obscure edge cases. They're the standard failure modes of every software system ever built. Professional developers think about them not because they're smarter, but because they've watched systems fail in these exact ways. That accumulated experience of failure is what I've been calling &lt;a href="https://tinycomputers.io/posts/the-split-isnt-between-people-its-between-tasks.html"&gt;the judgment layer&lt;/a&gt;: the part of building that AI can't replace because it requires contact with the consequences of getting it wrong.&lt;/p&gt;
&lt;p&gt;The operations manager building a routing tool in Cursor has domain judgment about logistics. She knows which routes are efficient and which constraints matter. She does not have software judgment about error handling, data integrity, concurrent access, or failure recovery. Professional developers fail at these things constantly too. The difference is that professionals recognize the failure when it happens and have the skills to iterate toward a fix. The operations manager's tool breaks the same way, but she doesn't know it broke, doesn't know why, and doesn't know what to do about it. The AI gave her a tool that satisfies her domain judgment perfectly and her software judgment not at all, because she doesn't have any, and she doesn't know she doesn't have any.&lt;/p&gt;
&lt;h3&gt;This Has Happened Before&lt;/h3&gt;
&lt;p&gt;The counterargument writes itself: people have been building bad mission-critical software forever. Hospitals tracked patient records in Access databases. Small banks ran loan portfolios in Excel. Supply chains depended on macros that one person understood. When that person left, nobody could maintain it. The world survived.&lt;/p&gt;
&lt;p&gt;This is true, and it's important to take seriously. "Bad software" and "functional software" are not mutually exclusive. The accounting firm's Access database was terrible by every engineering standard and it ran their business for fifteen years. The nurse's Excel tracker was a data integrity nightmare and it kept patient appointments from falling through the cracks. Fried is right that custom software has always been "bloated, confusing, and built wrong in all the ways." He's also right that it existed and that people used it.&lt;/p&gt;
&lt;p&gt;So if bad software has always existed and the world kept turning, what changes with AI?&lt;/p&gt;
&lt;h3&gt;Velocity&lt;/h3&gt;
&lt;p&gt;The change is speed.&lt;/p&gt;
&lt;p&gt;Access took months to build something broken. You had to learn Access first, or find someone who knew it. You had to build the forms, design the tables, write the queries. The pace of construction imposed a natural speed limit on how fast bad software could enter production. By the time you finished, you'd encountered at least some of the failure modes, because the slow process forced you through enough iterations to stumble into them.&lt;/p&gt;
&lt;p&gt;AI removes that speed limit. The operations manager can go from "I have an idea" to "it's running in production" in an afternoon. The intake form tool is live before lunch. The routing optimizer is deployed by end of day. The contract parser is running by Friday. Each one works on the happy path. Each one has the same class of unexamined failure modes that Access databases had. But Access databases took months to accumulate. AI-built tools accumulate in days.&lt;/p&gt;
&lt;p&gt;More attempts. Same failure rate. More failures. Compressed into a shorter timeline. By the time the first tool breaks, three more have been deployed. By the time someone realizes the intake form tool is silently dropping records with special characters, the routing optimizer and the contract parser are already load-bearing parts of the business.&lt;/p&gt;
&lt;p&gt;This is &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox&lt;/a&gt; applied to the failure mode itself. When building software gets cheaper, you don't get the same amount of bad software for less effort. You get vastly more bad software for the same effort. The per-unit cost of production drops, total production expands, and the total volume of unreviewed, unexamined software in production grows faster than anyone anticipated.&lt;/p&gt;
&lt;h3&gt;The Judgment Bottleneck&lt;/h3&gt;
&lt;p&gt;I've argued in previous pieces that &lt;a href="https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox.html"&gt;human judgment is the binding constraint&lt;/a&gt; in AI-augmented work. AI makes the labor cheaper; demand expands; the expansion concentrates on the one input that can't scale: the human capacity for deep, focused evaluation. The &lt;a href="https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox.html"&gt;three-to-four-hour ceiling on cognitively demanding work&lt;/a&gt; is biological, not cultural, and no productivity tool changes it.&lt;/p&gt;
&lt;p&gt;Software judgment is a specific instance of this general constraint. Reviewing code for failure modes, reasoning about edge cases, thinking through data integrity, anticipating what happens when components interact in unexpected ways: this is deep work. It requires the kind of sustained attention that depletes on a fixed biological schedule. And the supply of people who have this judgment is not growing. Computer science enrollment is up, but software judgment comes from experience, not coursework. You develop it by watching systems fail, and that takes years.&lt;/p&gt;
&lt;p&gt;AI expands the rate at which software enters production. It does not expand the rate at which qualified people can review it. The production side scales. The judgment side doesn't.&lt;/p&gt;
&lt;p&gt;And the judgment side may actually be contracting. After sixteen consecutive years of growth, undergraduate CS enrollment turned negative in 2025. The Computing Research Association (CRA) found that 62% of computing departments reported declining enrollment for 2025-26, while only 13% saw increases. At University of California campuses, CS enrollment fell 6% in 2025 after declining 3% in 2024: the first drops since the dot-com crash. Students and their parents are reading the headlines about AI displacing entry-level developers and steering toward fields they perceive as more durable.&lt;/p&gt;
&lt;p&gt;The irony is thick. The fear that AI will replace software developers is reducing the supply of software developers at the exact moment that AI is massively expanding the demand for software judgment. Students are fleeing the field because they think AI can do the work. AI is simultaneously creating more work that only humans with software judgment can evaluate. The enrollment decline doesn't just fail to solve the judgment bottleneck; it tightens it.&lt;/p&gt;
&lt;p&gt;The gap between "software that exists" and "software that someone qualified has evaluated" widens from both directions: production accelerates while the pipeline of qualified reviewers narrows. Something has to give, and what gives is the review.&lt;/p&gt;
&lt;h3&gt;Software Slop&lt;/h3&gt;
&lt;p&gt;I wrote an essay about &lt;a href="https://tinycomputers.io/posts/llm-generated-content-what-makes-something-slop.html"&gt;what makes AI-generated content "slop"&lt;/a&gt;: superficial competence masking an absence of substance. The text looks right. The grammar is clean. The structure is logical. But it doesn't commit to anything, doesn't engage with anything, doesn't mean anything. It fills the container without filling it with content.&lt;/p&gt;
&lt;p&gt;AI-generated software has the same property. The code is syntactically correct. The UI has proper styling, responsive layouts, loading spinners, appropriate error messages. It passes every visual inspection. A manager looking at a demo sees a professional application. A user running through the standard workflow sees something that works.&lt;/p&gt;
&lt;p&gt;Underneath: no input validation beyond what the framework provides for free. No error handling beyond try/catch blocks that swallow exceptions. No concurrency protection. No backup strategy. No audit trail. No security beyond defaults. The software is superficially competent and structurally hollow, and you cannot tell the difference by looking at it.&lt;/p&gt;
&lt;p&gt;This is what distinguishes the AI-built software problem from the Access database problem. Access databases looked like Access databases. The limitations were visible in the interface. The grey forms, the flat tables, the clunky queries: everyone could see they were using a tool that was not designed for what they were doing with it. The expectations were calibrated, even if the risks weren't.&lt;/p&gt;
&lt;p&gt;AI-built software looks like real software. The surface quality has been democratized. What hasn't been democratized is the structural integrity underneath. And because the surface looks professional, the people using it have no signal that anything is missing. The feedback loop that would normally tell you "this is a prototype, not a product" has been severed. The prototype looks like the product, and nobody in the room can tell the difference except the people with software judgment, who weren't in the room when it was built.&lt;/p&gt;
&lt;h3&gt;What Fried Misses&lt;/h3&gt;
&lt;p&gt;Fried's framework has one gap. He says the demand for bespoke software won't grow because people don't want software projects. But the demand is already growing, not because people want to build, but because AI collapsed the apparent cost of building to near zero. The operations manager didn't set out to start a software project. She set out to solve a routing problem, and the software was a side effect that happened so fast she didn't register it as a project.&lt;/p&gt;
&lt;p&gt;This is the mechanism Fried doesn't account for. The excavator doesn't turn the homeowner into a contractor. But it does let the homeowner dig a hole so fast that they're standing in it before they realize they don't know what they're doing. The question isn't whether they wanted to dig. It's what happens now that the hole exists and the house is being built on top of it.&lt;/p&gt;
&lt;p&gt;The bespoke software revolution won't come from people deliberately choosing to become builders. It will come from people accidentally becoming builders because the tools made it so frictionless that building happened before the decision to build was consciously made. And the software they produce will be the fastest-growing category of technical debt in history, because it was created without judgment, deployed without review, and adopted without anyone understanding what's underneath.&lt;/p&gt;
&lt;h3&gt;Who Benefits&lt;/h3&gt;
&lt;p&gt;Fried is right that the excitement about bespoke software comes from software makers. What he doesn't say is why they should be excited. It's not because everyone becomes a builder. It's because everyone becomes a client.&lt;/p&gt;
&lt;p&gt;Every operations manager who builds a broken routing tool and discovers it doesn't handle the edge cases is a future client for someone who can build it properly. Every accounting firm that deploys an AI-built intake system and loses data is a future client for someone who understands data integrity. The DIY phase doesn't replace professional software development. It creates demand for it, at a scale and urgency that didn't exist before, because now the potential clients have firsthand experience with why the problem is hard.&lt;/p&gt;
&lt;p&gt;The judgment bottleneck doesn't prevent the Jevons expansion. It shapes it. More software gets attempted. More software fails. The failures create demand for the constrained resource (qualified judgment) at a rate that exceeds supply. The people who have software judgment become more valuable, not less, because the volume of work that needs their attention has exploded.&lt;/p&gt;
&lt;p&gt;Fried's excavator metaphor is correct. Most homeowners won't become contractors. But the excavator lets them dig enough bad foundations that the contracting business booms. AI doesn't democratize building. It democratizes demand.&lt;/p&gt;
&lt;h3&gt;The Forecast&lt;/h3&gt;
&lt;p&gt;I'll make a prediction specific enough to be wrong about. Within three years, the majority of data-loss and security incidents at small and mid-sized businesses will trace back to AI-assisted internal tools built without professional review. Not because the AI wrote bad code (the code will be syntactically fine), but because the person directing the AI didn't know what to ask for and didn't know what they were missing. The failure mode won't be dramatic. It will be silent: records that were never backed up, access controls that were never configured, race conditions that corrupt data once a month in a pattern nobody notices until the audit.&lt;/p&gt;
&lt;p&gt;There is an irony here that I should name. The people who most need to read this are the ones who never will. The operations manager vibing a routing tool into production this afternoon is not reading a blog about Jevons Paradox and GPU inference. She's solving her problem, and it feels like it's working, and no article on a site called Tiny Computers is going to reach him before the first silent failure does.&lt;/p&gt;
&lt;p&gt;The bespoke software revolution is real. It's just not the revolution anyone is advertising. It's not a million people building great custom tools. It's a million people building adequate tools with invisible structural deficiencies, deployed to production at a velocity that outpaces the world's capacity to review them. The excavator is powerful, the foundations are being dug, and most of them are too shallow.&lt;/p&gt;</description><category>ai</category><category>bespoke software</category><category>economics</category><category>jason fried</category><category>jevons paradox</category><category>judgment</category><category>philosophy</category><category>slop</category><category>software development</category><guid>https://tinycomputers.io/posts/the-excavator-and-the-foundation.html</guid><pubDate>Sat, 21 Mar 2026 13:00:00 GMT</pubDate></item><item><title>Running a 22B Video Model on Four Tesla P40s</title><link>https://tinycomputers.io/posts/running-ltx-video-on-four-tesla-p40s.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/running-ltx-video-on-four-tesla-p40s_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;22 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;LTX-Video 2.3 is a 22 billion parameter model that generates video from text prompts. It was designed for modern hardware: GPUs with bfloat16 support, high-bandwidth memory, and enough VRAM to hold the full model on one or two cards. The &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;Tesla P40&lt;/a&gt; has none of these things. It is a Pascal-generation GPU from 2016, with 24GB of GDDR5X per card, no native bfloat16, no Tensor Cores, and a PCIe 3.0 bus. It was built for data center inference workloads that no longer exist.&lt;/p&gt;
&lt;p&gt;I have four of them in a rack-mount server in an unheated shop building in Minnesota. Together they provide 96GB of VRAM. The question was whether that 96GB, spread across four old cards, could run a model that was never meant to run on any of them.&lt;/p&gt;
&lt;p&gt;The answer is yes, with significant caveats and a substantial amount of code to work around hardware limitations that the model's authors never anticipated.&lt;/p&gt;
&lt;h3&gt;The Problem&lt;/h3&gt;
&lt;p&gt;LTX-Video 2.3's transformer has 48 blocks. At fp16 precision, the model weights alone consume roughly 44GB. With the Gemma text encoder, the video VAE encoder/decoder, the spatial upsampler, and the audio components, the full pipeline needs more memory than any single P40 can provide. The model doesn't fit on one card. It doesn't fit on two. It barely fits on three, with no room for activations during inference.&lt;/p&gt;
&lt;p&gt;Four cards at 24GB each gives 96GB total, which is enough for the weights with room for intermediate activations. But CUDA doesn't automatically spread a model across multiple GPUs. You have to tell it how.&lt;/p&gt;
&lt;p&gt;The standard approach for multi-GPU inference is &lt;code&gt;accelerate&lt;/code&gt;'s &lt;code&gt;dispatch_model&lt;/code&gt;, which automatically distributes model layers across available GPUs based on memory constraints. This works for the Gemma text encoder, which is a straightforward transformer. For the LTX transformer, it doesn't work, because the model has a custom forward pass with audio-video cross-attention that &lt;code&gt;accelerate&lt;/code&gt;'s automatic dispatch can't handle correctly. The model needs to move data between GPUs at specific points in the forward pass, and &lt;code&gt;accelerate&lt;/code&gt; doesn't know where those points are.&lt;/p&gt;
&lt;p&gt;The solution was manual pipeline parallelism: split the 48 transformer blocks evenly across four GPUs (12 blocks per card), keep the shared components (patchify projections, normalization, output projections) on GPU 0, and write a custom forward pass that moves tensors between devices at block boundaries.&lt;/p&gt;
&lt;h3&gt;The Precision Problem&lt;/h3&gt;
&lt;p&gt;Even with the model split across four cards, nothing worked on the first attempt. Or the fifth. Getting LTX-Video running on Pascal hardware was an iterative process, with Claude Code generating solutions and me testing them against the actual hardware. Each failure revealed another assumption the model made about the GPU it would run on. The feedback loop was brutal: load a 22B model across four GPUs, wait eight minutes for a test generation, get a black frame or a NaN error, diagnose which precision boundary caused it, generate a fix, and try again.&lt;/p&gt;
&lt;p&gt;The first problem was bfloat16. The model weights are stored in bf16 format. Pascal GPUs cannot compute in bf16. PyTorch handles this silently for some operations by promoting to fp32, but other operations fail or produce garbage. The initial approach was the obvious one: monkey-patch &lt;code&gt;torch.bfloat16&lt;/code&gt; to redirect to &lt;code&gt;torch.float16&lt;/code&gt;. This seemed to work at load time. The model loaded, the weights populated, no errors. Then the first forward pass produced NaN everywhere. The monkey-patch had corrupted the safetensors weight loading. The weights loaded as fp16 bit patterns interpreted as bf16 values, which is not the same thing. A bf16 value of 1.0 has a different bit pattern than an fp16 value of 1.0. Reinterpret one as the other and you get a number that's either wildly wrong or NaN.&lt;/p&gt;
&lt;p&gt;The second attempt tried running everything in fp16 natively, converting weights properly during load. This got further: the model produced output that wasn't NaN. But the output was a solid green frame. The intermediate activations in the transformer blocks were overflowing fp16 range. Values above 65,504 become infinity in fp16, and the model's internal representations regularly exceed that during the attention and feedforward passes. The green frame was the model's attempt to decode latents that had been clipped to infinity at some point in the pipeline.&lt;/p&gt;
&lt;p&gt;The working solution was to let the model builder properly convert weights from bf16 to fp16 on load, then run the entire computation pipeline in float32. The weights sit in memory as fp16 (saving space), but every computation promotes to fp32 before executing. This required patching &lt;code&gt;F.linear&lt;/code&gt; to handle mixed dtype inputs:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;_orig_linear&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linear&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_mixed_linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_orig_linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linear&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_mixed_linear&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The same pattern extends to every normalization function and every convolution operation. Layer norm, group norm, RMS norm, conv1d through conv_transpose3d: all patched to handle mixed dtypes and accumulate in float32. Without these patches, intermediate values overflow fp16 range (values above 65,504 become infinity) and the output is a black frame.&lt;/p&gt;
&lt;h3&gt;The Gemma Problem&lt;/h3&gt;
&lt;p&gt;The text encoder is Google's Gemma 3, a separate model that converts text prompts into embeddings the video transformer can condition on. Gemma's attention mechanism overflows when run in fp16 on Pascal hardware. The attention scores grow large enough to exceed fp16 range, producing NaN values that propagate through the rest of the pipeline.&lt;/p&gt;
&lt;p&gt;The fix was running the entire Gemma encoder in float32. This uses more memory, but the text encoder only runs once per generation (to encode the prompt), and its weights can be freed from GPU memory before the transformer starts. The sequence is: load Gemma across all four GPUs using &lt;code&gt;accelerate&lt;/code&gt;, encode the prompt in float32, delete the encoder, free the memory, then load the video transformer.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;encode_prompt_float32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_ledger&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;model_ledger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;
    &lt;span class="n"&gt;te&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_ledger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_encoder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# Dispatch across all 4 GPUs for memory&lt;/span&gt;
    &lt;span class="n"&gt;max_memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_balanced_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;te&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"22GiB"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
        &lt;span class="n"&gt;no_split_module_classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Gemma3DecoderLayer"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;te&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dispatch_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;te&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;hidden_states&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attention_mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;te&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Free GPU memory before transformer loads&lt;/span&gt;
    &lt;span class="k"&gt;del&lt;/span&gt; &lt;span class="n"&gt;te&lt;/span&gt;
    &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;empty_cache&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This load-encode-delete cycle is ugly but necessary. There isn't enough total memory to hold both Gemma and the video transformer simultaneously, even across four cards. The sequential approach works because each component only needs to exist during its phase of the pipeline.&lt;/p&gt;
&lt;h3&gt;The Pipeline&lt;/h3&gt;
&lt;p&gt;The generation runs in two stages, matching LTX-Video's distilled inference schedule.&lt;/p&gt;
&lt;p&gt;Stage 1 generates a half-resolution latent video (e.g., 256x384) through 8 denoising steps. Each step runs the full 48-block transformer, with data moving across all four GPUs:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;patched_process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;perturbations&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ltx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transformer_blocks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;dev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;block_devices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;video&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;move_args_to_device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;audio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;move_args_to_device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;audio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                             &lt;span class="n"&gt;perturbations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;perturbations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;video&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;move_args_to_device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;audio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;move_args_to_device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;audio&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Every GPU boundary involves a tensor transfer across PCIe 3.0. With 12 blocks per GPU, there are 3 boundary crossings per denoising step (GPU 0 to 1, 1 to 2, 2 to 3), plus a final transfer back to GPU 0. With 8 denoising steps, that's 32 cross-device transfers per stage, each moving both video and audio state tensors. PCIe 3.0 x16 has a theoretical bandwidth of ~16 GB/s. The tensors being transferred are small relative to the bandwidth (attention states and activations, not full weight matrices), so the overhead is manageable. But it adds up.&lt;/p&gt;
&lt;p&gt;Stage 1 takes roughly 4 minutes for 241 frames at 24 fps (a 10-second clip). The spatial upsampler then doubles the resolution. Stage 2 runs 3 more denoising steps at full resolution (512x768), taking roughly 6.5 minutes. The VAE decoder converts latents to pixels and generates the audio track in another 40 seconds.&lt;/p&gt;
&lt;p&gt;Total generation time for a 10-second, 512x768 video with audio: approximately 18.5 minutes. For a 1-second clip (25 frames): about 8 minutes. For a 4-second clip (97 frames): about 10.5 minutes.&lt;/p&gt;
&lt;h3&gt;The Memory Layout&lt;/h3&gt;
&lt;p&gt;During inference, the four GPUs aren't loaded equally. GPU 0 carries extra weight because it hosts all the shared components (patchify projections, normalization layers, output projections) plus its 12 transformer blocks. The actual memory distribution:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;VRAM Used&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;10.8 GB&lt;/td&gt;
&lt;td&gt;Shared components + blocks 0-11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;9.3 GB&lt;/td&gt;
&lt;td&gt;Blocks 12-23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;9.3 GB&lt;/td&gt;
&lt;td&gt;Blocks 24-35&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;9.3 GB&lt;/td&gt;
&lt;td&gt;Blocks 36-47&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;That's 38.7 GB of the available 96 GB. The remaining 57 GB provides headroom for activations, KV cache growth, and the VAE decoder. There's enough margin that generation never OOMs, even at 241 frames.&lt;/p&gt;
&lt;h3&gt;The API&lt;/h3&gt;
&lt;p&gt;Running inference from the command line is fine for testing, but generating videos for blog content requires something more practical. I wrapped the generation script in a FastAPI server with an async job queue:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Submit a text-to-video job&lt;/span&gt;
curl&lt;span class="w"&gt; &lt;/span&gt;-X&lt;span class="w"&gt; &lt;/span&gt;POST&lt;span class="w"&gt; &lt;/span&gt;http://10.1.1.24:8585/jobs&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;-F&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prompt=A cinematic flyover of a Zilog Z80 processor on a PCB"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;-F&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"duration=10"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-F&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"seed=42"&lt;/span&gt;

&lt;span class="c1"&gt;# Submit an image-to-video job&lt;/span&gt;
curl&lt;span class="w"&gt; &lt;/span&gt;-X&lt;span class="w"&gt; &lt;/span&gt;POST&lt;span class="w"&gt; &lt;/span&gt;http://10.1.1.24:8585/jobs&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;-F&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prompt=A fluffy orange cat dancing"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;-F&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"duration=4"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-F&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"image=@cat.jpg"&lt;/span&gt;

&lt;span class="c1"&gt;# Check status&lt;/span&gt;
curl&lt;span class="w"&gt; &lt;/span&gt;http://10.1.1.24:8585/jobs/07420abb6d82

&lt;span class="c1"&gt;# Download result&lt;/span&gt;
curl&lt;span class="w"&gt; &lt;/span&gt;http://10.1.1.24:8585/jobs/07420abb6d82/video&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;output.mp4
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Jobs queue and execute sequentially. The GPU can only handle one generation at a time, and the load-encode-delete cycle for Gemma means there's significant setup overhead per job. The API spawns each job as a subprocess, which gives clean GPU memory cleanup between runs. If a generation crashes (which happened frequently during development), the next job starts fresh.&lt;/p&gt;
&lt;p&gt;The server supports both text-to-video and image-to-video. Image conditioning locks the first frame to a provided image and generates subsequent frames from it, which produces more controllable results for specific visual subjects. In practice, image-to-video is the more useful mode. Text-to-video gives the model complete creative freedom, which means the output is unpredictable. You might ask for a Z80 processor and get something that looks like a generic IC, or something that looks like a Z80, depending on the seed. Image-to-video lets you provide the exact first frame you want and the model animates from there. For blog content where visual accuracy matters, starting from a real photograph or a specific reference image gives consistently better results.&lt;/p&gt;
&lt;h3&gt;What the Output Looks Like&lt;/h3&gt;
&lt;p&gt;The video quality is genuinely good. LTX-Video 2.3 produces coherent motion, reasonable physics, and detailed textures. Here are three examples, generated entirely on the P40 server:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Text-to-video: "A cinematic flyover of a Zilog Z80 processor on a printed circuit board" (10 seconds, 18.5 minutes to generate)&lt;/strong&gt;&lt;/p&gt;
&lt;video controls preload="metadata" style="max-width: 100%; border-radius: 6px; box-shadow: 0 10px 20px rgba(0,0,0,.1); margin: 1em 0;"&gt;
&lt;source src="https://tinycomputers.io/ltx-z80-flyover.mp4" type="video/mp4"&gt;
&lt;/source&gt;&lt;/video&gt;

&lt;p&gt;&lt;strong&gt;Image-to-video: "A fluffy orange cat with a hat dancing" (4 seconds, 10.5 minutes to generate)&lt;/strong&gt;&lt;/p&gt;
&lt;video controls preload="metadata" style="max-width: 100%; border-radius: 6px; box-shadow: 0 10px 20px rgba(0,0,0,.1); margin: 1em 0;"&gt;
&lt;source src="https://tinycomputers.io/ltx-cat-dancing.mp4" type="video/mp4"&gt;
&lt;/source&gt;&lt;/video&gt;

&lt;p&gt;&lt;strong&gt;Text-to-video: "A cat sitting on a windowsill, sunlight streaming in" (1 second, 8 minutes to generate)&lt;/strong&gt;&lt;/p&gt;
&lt;video controls preload="metadata" style="max-width: 100%; border-radius: 6px; box-shadow: 0 10px 20px rgba(0,0,0,.1); margin: 1em 0;"&gt;
&lt;source src="https://tinycomputers.io/ltx-cat-windowsill.mp4" type="video/mp4"&gt;
&lt;/source&gt;&lt;/video&gt;

&lt;p&gt;The model understands object permanence, lighting consistency, and basic spatial relationships. The Z80 flyover produces a recognizable IC package with surrounding components, proper lighting, and smooth camera movement.&lt;/p&gt;
&lt;p&gt;The audio is a different story. LTX-Video 2.3 generates an audio track alongside the video, but the results are inconsistent. Prompts describing characters speaking produce odd ambient music instead of voices. Prompts describing environments produce vaguely appropriate soundscapes. The audio pipeline works mechanically (it generates real audio waveforms via a separate VAE decoder and vocoder), but the semantic connection between prompt and audio output is weak. For blog content, I'd likely strip the generated audio and add narration or music separately.&lt;/p&gt;
&lt;p&gt;The 512x768 resolution at 24fps is usable for web content. It's not 4K. It's not going to replace stock footage for production video. But for blog hero images in motion, visual demonstrations, or supplementary content alongside text, it works.&lt;/p&gt;
&lt;h3&gt;What This Cost&lt;/h3&gt;
&lt;p&gt;The hardware cost is zero incremental. The four P40s and the server already existed for &lt;a href="https://tinycomputers.io/posts/the-economics-of-owning-your-own-inference.html"&gt;LLM inference&lt;/a&gt;. LTX-Video is an additional workload on the same hardware.&lt;/p&gt;
&lt;p&gt;The electricity cost is modest. The server draws roughly 500W under full GPU load. An 18.5-minute generation (10-second video at full resolution) consumes about 0.15 kWh, roughly $0.024 at Minnesota residential rates. You could generate forty 10-second clips for a dollar.&lt;/p&gt;
&lt;p&gt;The real cost was development time. Getting from "model downloaded" to "working generation pipeline" took many iterations across multiple sessions with Claude Code. Each precision-related failure mode (bf16 corruption, fp16 overflow, mixed-dtype kernel errors, NaN propagation through attention) required diagnosis, a hypothesis, a code change, and a test cycle that involved loading a 22B model across four GPUs. The feedback loop was slow. A single test takes 8 to 18 minutes to confirm whether a change worked. Many didn't.&lt;/p&gt;
&lt;h3&gt;The Broader Point&lt;/h3&gt;
&lt;p&gt;A 22 billion parameter video generation model was not designed to run on 2016 hardware. The authors assumed bf16, assumed modern attention kernels, assumed enough memory on one or two cards. None of those assumptions hold on the P40.&lt;/p&gt;
&lt;p&gt;But the model runs anyway, because the underlying math doesn't actually require any of those features. Bfloat16 is a convenience, not a requirement; float32 computes the same function. Flash attention is an optimization, not a necessity; standard attention produces identical results. And 96GB across four cards is 96GB, regardless of whether it's cutting-edge HBM3 or decade-old GDDR5X.&lt;/p&gt;
&lt;p&gt;The generation is slow. Eighteen minutes for ten seconds of video is not competitive with a single A100, which would finish the same job in under two minutes. The float32 computation pipeline roughly doubles the FLOPS required compared to the bf16 path the model was designed for, and the PCIe 3.0 transfers between four separate memory pools add latency that a single modern GPU with unified HBM would never incur. But competitive wasn't the point. The point was that four GPUs I bought on eBay for a thousand dollars total, sitting in a server in a shop building, can run a model that was released this month. The gap between "latest model" and "latest hardware" is not as wide as the spec sheets suggest, as long as you're willing to write the code that bridges it.&lt;/p&gt;
&lt;p&gt;The P40 server was already paying for itself on &lt;a href="https://tinycomputers.io/posts/the-economics-of-owning-your-own-inference.html"&gt;LLM inference&lt;/a&gt; and &lt;a href="https://tinycomputers.io/posts/the-real-cost-of-running-qwen-tts-locally-three-machines-compared.html"&gt;TTS generation&lt;/a&gt;. Video generation is one more workload on a machine that I own, running models that I choose, on a schedule that I control. The 18-minute wait is the price of not asking anyone's permission.&lt;/p&gt;</description><category>ai</category><category>cuda</category><category>gpu</category><category>home lab</category><category>inference</category><category>ltx video</category><category>multi-gpu</category><category>pascal</category><category>pipeline parallelism</category><category>tesla p40</category><category>video generation</category><guid>https://tinycomputers.io/posts/running-ltx-video-on-four-tesla-p40s.html</guid><pubDate>Fri, 20 Mar 2026 13:00:00 GMT</pubDate></item><item><title>The Split Isn't Between People, It's Between Tasks</title><link>https://tinycomputers.io/posts/the-split-isnt-between-people-its-between-tasks.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-split-isnt-between-people-its-between-tasks_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;26 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Les Orchard's &lt;a href="https://baud.rs/FtBjVK"&gt;"Grief and the AI Split"&lt;/a&gt; identifies something real. AI tools have revealed a division among developers that was previously invisible, because before these tools existed, everyone followed the same workflow regardless of motivation. Now the motivations are exposed. Some developers grieve the loss of hand-crafted code as a practice with inherent value. Others see the same tools and feel relief: the tedious parts are handled, the interesting parts remain. Orchard frames this as a split between people. Craft-oriented developers on one side, results-oriented developers on the other.&lt;/p&gt;
&lt;p&gt;He's right that the split exists, and the piece clearly resonated with software creators because it names something people have been feeling but couldn't articulate. The observation is sharp. Where I think it can be extended is in where the line falls.&lt;/p&gt;
&lt;p&gt;Orchard draws the line between people. I think it falls between tasks. The same person crosses that line dozens of times a day, moving between work that demands human judgment and work that doesn't, between moments where the craft concentrates and moments where it was never present in the first place. The split is real. It's just not an identity.&lt;/p&gt;
&lt;h3&gt;The Kernel I Didn't Write&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://tinycomputers.io/posts/jokelaos-bare-metal-x86-kernel.html"&gt;JokelaOS&lt;/a&gt; is a bare-metal x86 kernel: 2,000 lines of C and NASM assembly, booting from a Multiboot header through GDT (Global Descriptor Table, which defines memory segments and access rights) and IDT (Interrupt Descriptor Table, which maps interrupt vectors to service routines) setup, paging, preemptive multitasking with Ring 3 isolation, a network stack that responds to pings, and an interactive shell. No forks. No libc. Every &lt;code&gt;memcpy&lt;/code&gt;, every &lt;code&gt;printf&lt;/code&gt;, every byte-order conversion written from scratch.&lt;/p&gt;
&lt;p&gt;I didn't write most of it. Claude did.&lt;/p&gt;
&lt;p&gt;In Orchard's framework, this should place me firmly in the "results" camp. I used AI to produce 2,000 lines of systems code; clearly I care about the outcome, not the process. But that framing misses what actually happened during the project.&lt;/p&gt;
&lt;p&gt;The decisions that made JokelaOS work were not typing decisions. They were sequencing decisions: bring up serial output first, because without it you have no diagnostics for anything that follows. Initialize the GDT before the IDT, because interrupt handlers need valid segment selectors. Get the bump allocator working before the PMM (Physical Memory Manager), because page tables need permanent allocations before you can manage dynamic ones. These choices come from understanding how x86 protected mode actually works, which subsystems depend on which, and what the failure modes look like when you get the order wrong.&lt;/p&gt;
&lt;p&gt;Claude generated the GDT setup code. I decided what the GDT entries should be, caught the access byte errors, and debugged the triple faults when segment selectors were wrong. Claude wrote the process scheduler. I determined that the TSS (Task State Segment, which tells the CPU where to find the kernel stack when switching privilege levels) needed updating on every context switch and diagnosed the General Protection Faults that occurred when it wasn't. Claude produced the RTL8139 network driver. I decided to bring up ARP before ICMP, caught a byte-order bug in the IP checksum, and validated that the packets leaving QEMU were actually well-formed.&lt;/p&gt;
&lt;p&gt;The typing was delegated. The architecture, the sequencing, the diagnosis, the validation: those were mine. If you asked me whether JokelaOS involved craft, I would say yes, more than most projects I've done. If you asked me where the craft was, I would not point at any line of code.&lt;/p&gt;
&lt;h3&gt;The Board That Failed Twice&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;Giga Shield&lt;/a&gt; tells a longer version of the same story, and it's messier, because hardware involves the physical world in a way that software doesn't.&lt;/p&gt;
&lt;p&gt;The project started with a $468 Fiverr commission. I gave a designer in Kenya the spec documents, the components I thought should be used, and the form factor requirements: an &lt;a href="https://baud.rs/poSQeo"&gt;Arduino Giga R1&lt;/a&gt; shield with bidirectional level shifters, 72 channels of 3.3V-to-5V translation, KiCad deliverables. He produced a clean design. Nine &lt;a href="https://baud.rs/y9JJt9"&gt;TXB0108PW&lt;/a&gt; auto-sensing translators on a two-layer board. &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt; fabricated it. Professional work, quick turnaround, and &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt; sponsored the fabrication.&lt;/p&gt;
&lt;p&gt;Then I plugged in the &lt;a href="https://baud.rs/87wbBL"&gt;RetroShield Z80&lt;/a&gt; and the board was blind.&lt;/p&gt;
&lt;p&gt;The TXB0108 detects signal direction automatically by sensing which side is driving. For most applications, that's a feature. For a Z80 bus interface, it's fatal. During bus cycles, the Z80 tri-states its address and data lines. The pins go high-impedance: not high, not low, floating. The TXB0108 can't determine direction from a floating signal. It guesses wrong, and the Arduino reads garbage. I'd paid $468 for a board that couldn't see half of what the processor was doing.&lt;/p&gt;
&lt;p&gt;Nobody caught this in the design phase. Not the Fiverr designer, who was working from the spec I gave him. Not me, when I reviewed the schematic. The TXB0108 datasheet doesn't scream "incompatible with tri-state buses"; you have to understand what tri-stating means in practice and recognize that auto-sensing can't handle it. That understanding came from plugging the board in and watching it fail.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html"&gt;redesign&lt;/a&gt; used Claude to replace all nine auto-sensing translators with &lt;a href="https://baud.rs/zQqo34"&gt;SN74LVC8T245&lt;/a&gt; driven level shifters. Driven shifters have an explicit direction pin: you tell them which way to translate, and they do it regardless of whether the signal is being actively driven. Claude wrote Python scripts that pulled apart the KiCad schematic files, extracted all 72 signal mappings across 9 ICs, and generated new board files with the correct components and pin assignments.&lt;/p&gt;
&lt;p&gt;I was about to submit the revised design to PCBWay when I realized we needed a tenth level shifter. The original nine covered not just the digital pins that map to the Z80 RetroShield but all of the analog pins on the Giga, giving complete 3.3V-to-5V coverage across the board. But with driven shifters, each IC has a single direction pin controlling all eight channels. Signals that need to travel in opposite directions at different times can't share an IC without creating bus contention. Some of the channel assignments had conflicting direction requirements, and the only fix was a tenth IC to separate them.&lt;/p&gt;
&lt;p&gt;Adding one more TSSOP-24 package to an already dense two-layer board broke the trace routing. The board that had been routable with nine ICs was unroutable with ten. Moving to four layers helped but still left two to four traces with no viable path. The solution was a six-layer stackup, which needed a copper pour layer to act as a common ground plane. The open-source autorouter Freerouting couldn't handle a full copper pour; its architecture has no concept of flood-fill connectivity. So I used &lt;a href="https://baud.rs/wdr0dP"&gt;Quilter.ai&lt;/a&gt;, an AI trace router, to route the six-layer board with the ground plane that the open-source tooling couldn't represent.&lt;/p&gt;
&lt;p&gt;Count the layers of delegation and intervention in this project. I delegated the initial design to a human professional. Physics revealed the flaw. I delegated the redesign to an AI. I caught the missing tenth shifter before it went to fabrication. I delegated the trace routing to another AI. PCBWay is currently manufacturing these boards. At every stage, the work alternated between labor that could be delegated and judgment that couldn't. The Fiverr designer did skilled labor. Claude did skilled labor. Quilter.ai did skilled labor. The craft was never in the labor. It was in knowing when the labor was wrong.&lt;/p&gt;
&lt;h3&gt;Where the Craft Actually Lives&lt;/h3&gt;
&lt;p&gt;Both of these projects point at the same thing. The craft isn't in the typing, the routing, or the code generation. It's in a layer that sits above and around all of that: the judgment layer.&lt;/p&gt;
&lt;p&gt;The judgment layer is where you decide what to build next. Where you recognize that the output is wrong before you can articulate why. Where you sequence subsystems based on dependency chains that aren't documented anywhere. Where you plug a board in and notice that the readings don't make sense. Where you catch a missing component that the AI, the designer, and the autorouter all missed because none of them were thinking about the problem at that level.&lt;/p&gt;
&lt;p&gt;This layer has specific properties. It requires contact with the problem domain, not just the code or the schematic but the actual behavior of the system under real conditions. It depends on accumulated experience: understanding what tri-stating means in practice, knowing that x86 protected mode has forty years of backward-compatible traps waiting for you. And it's the part that AI is worst at, precisely because it requires grounding in physical or logical reality that language models don't have access to.&lt;/p&gt;
&lt;p&gt;The TXB0108 failure is the clearest example. The information needed to predict this failure existed in the datasheets. But recognizing its relevance required understanding what a Z80 bus cycle actually looks like at the electrical level, which required either experience with the hardware or a simulation environment that nobody had set up. No amount of language model capability substitutes for plugging in the board and watching it fail.&lt;/p&gt;
&lt;h3&gt;The Same Person in Both Modes&lt;/h3&gt;
&lt;p&gt;Orchard describes himself as results-oriented. He learned programming languages as "a means to an end" and gravitated toward AI tools because they let him focus on the outcome. He acknowledges that craft-oriented developers experience genuine loss. His framing is empathetic, but it still draws the line between people.&lt;/p&gt;
&lt;p&gt;The line doesn't hold, because I'm both of his archetypes depending on the hour.&lt;/p&gt;
&lt;p&gt;On Tuesday I might use Claude to generate a hundred lines of systemd service configuration because I need Ollama running on a machine and I don't care about the elegance of the unit file. On Wednesday I might spend three hours hand-debugging why &lt;code&gt;rocm-smi&lt;/code&gt; reports GPU utilization at zero percent: reading kernel logs, checking DKMS module versions, testing &lt;code&gt;HSA_OVERRIDE_GFX_VERSION&lt;/code&gt; values, loading the &lt;code&gt;amdgpu&lt;/code&gt; module manually because it didn't auto-load at boot. The first task is pure delegation. The second is pure craft. Both are mine. Both happened this week.&lt;/p&gt;
&lt;p&gt;When I wrote &lt;a href="https://tinycomputers.io/posts/the-economics-of-owning-your-own-inference.html"&gt;the economics piece&lt;/a&gt;, I used Claude to draft sections and I measured real power draw with &lt;code&gt;nvidia-smi&lt;/code&gt; and &lt;code&gt;rocm-smi&lt;/code&gt; at 500-millisecond intervals. I let AI handle the prose scaffolding and I personally caught that Ollama on the Strix Halo had been running entirely on CPU because the systemd service file was missing an environment variable. Every benchmark I'd trusted before finding that bug was wrong. No AI caught it. I caught it because the numbers felt off.&lt;/p&gt;
&lt;p&gt;These aren't different people. They're different tasks. The identity framing ("I'm a craft developer" or "I'm a results developer") obscures what's actually a task-level decision that experienced people make constantly: this piece of work benefits from my full attention; this piece doesn't.&lt;/p&gt;
&lt;h3&gt;What the Grief Is About&lt;/h3&gt;
&lt;p&gt;The craft-grief that Orchard describes is real and worth taking seriously. Part of it targets the wrong thing. Part of it doesn't.&lt;/p&gt;
&lt;p&gt;What's being mourned is typing as the bottleneck. For forty years, the primary constraint on software projects was the speed at which a human could produce correct code. Design mattered, architecture mattered, but someone still had to sit down and type it. The typing was slow enough that it forced a certain kind of attention. You couldn't write a function without thinking about it, because writing it took long enough that thinking was unavoidable. The bottleneck created the conditions for craft, and it felt like the craft itself.&lt;/p&gt;
&lt;p&gt;AI removes the bottleneck. Code appears in seconds. The thinking isn't forced by the typing anymore; it has to be deliberate. And that shift feels like a loss, because the rhythm of the work has changed. The long, meditative stretches of writing code, where your understanding deepened as your fingers moved, are replaced by short bursts of generation followed by review. The texture is different.&lt;/p&gt;
&lt;p&gt;But the craft didn't live in the texture. It lived in the judgment that the texture incidentally supported. The experienced developer who hand-writes a function isn't doing craft because the typing is slow. The typing is slow, and the craft happens during the slowness, but the craft is the decisions: what to name things, what to abstract, what edge cases to handle, when to stop. Those decisions haven't gotten easier. If anything, they've gotten harder, because AI lets you attempt projects that would have been too large to type by hand, which means you hit the judgment bottleneck more often and at higher stakes.&lt;/p&gt;
&lt;p&gt;JokelaOS would have taken me months to type by hand. I probably wouldn't have attempted it. With AI handling the code generation, I attempted it in days and spent the entire time making architecture and debugging decisions. The project had more craft in it than most things I've built, precisely because the typing wasn't the bottleneck. The judgment was.&lt;/p&gt;
&lt;h3&gt;The Biological Ceiling&lt;/h3&gt;
&lt;p&gt;I wrote in &lt;a href="https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox.html"&gt;the AI Vampire piece&lt;/a&gt; that human judgment is the binding constraint in a Jevons cycle operating on cognitive output. AI makes the labor cheaper; demand expands; the expansion concentrates on the one input that can't scale: human attention and judgment. The three-to-four-hour ceiling on deep work is biological, not cultural, and no amount of productivity tooling changes it.&lt;/p&gt;
&lt;p&gt;The task-level split is where this plays out in practice. AI compresses the labor side of every project: the code generation, the trace routing, the prose drafting, the schematic extraction. What remains is denser, harder, and more consequential. Every hour of work has a higher ratio of judgment to labor than it did before AI. That's why Yegge's developers feel burned out, not because they're working more hours, but because every hour is now a judgment hour.&lt;/p&gt;
&lt;p&gt;The craft isn't disappearing. It's being compressed into a smaller, denser layer. The typing is gone. The design reviews are shorter. The code appears instantly. What's left is the part that was always the actual craft: deciding what to build, recognizing when it's wrong, knowing what to test, catching the missing tenth level shifter. That layer is entirely human, it's harder than it used to be because the projects are bigger, and it's the only part that matters.&lt;/p&gt;
&lt;p&gt;Orchard identified the split correctly. The grief is real, the division is real, and the piece resonated because it named something that software creators recognized immediately. The refinement I'd offer is that the line doesn't separate two kinds of people; it separates two kinds of tasks. The craft was never in the code. It was in the decisions that surrounded the code. Those decisions haven't gone anywhere. They've just lost the slow, meditative typing that used to accompany them. What remains is craft at higher concentration, with no filler.&lt;/p&gt;
&lt;p&gt;There was something cathartic about the old way. The hours of typing weren't just production; they were a complete experience. You conceived the idea, worked through the logic, typed every character, fought the compiler, and watched it run. The whole arc from intention to execution passed through your hands. That totality had a satisfaction to it that reviewing AI-generated output doesn't replicate, even when the output is correct.&lt;/p&gt;
&lt;p&gt;And there was something else: the syntax was a sacred tongue. Not everyone could read it. Not everyone could write it. The curly braces, the pointer arithmetic, the register mnemonics formed a language that belonged to the people who had invested years learning to speak it. That exclusivity wasn't gatekeeping for its own sake; it was the mark of hard-won fluency, and it meant something to the people who had it. Now anyone can describe what they want in English and get working code back. The priesthood dissolved overnight.&lt;/p&gt;
&lt;p&gt;I feel that loss. I still create. I still orchestrate. I still catch the errors that the tools miss. But I no longer speak a language that most people can't. The judgment layer is real, and it's where the work that matters happens. But it doesn't carry the same weight as mastery of a difficult notation. Orchestrating a process is not the same as performing it, even if the orchestration requires more skill.&lt;/p&gt;
&lt;p&gt;The grief is real. It's not about the wrong thing. It's about something that actually disappeared.&lt;/p&gt;</description><category>ai</category><category>claude</category><category>craft</category><category>hardware</category><category>jevons paradox</category><category>jokelaos</category><category>judgment</category><category>pcb design</category><category>philosophy</category><category>software development</category><guid>https://tinycomputers.io/posts/the-split-isnt-between-people-its-between-tasks.html</guid><pubDate>Thu, 19 Mar 2026 13:00:00 GMT</pubDate></item><item><title>The Economics of Owning Your Own Inference</title><link>https://tinycomputers.io/posts/the-economics-of-owning-your-own-inference.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-economics-of-owning-your-own-inference_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;21 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;I own $5,500 worth of GPU hardware dedicated to running AI models locally. I also pay for a Claude Max subscription that I use for nearly everything that matters. If that sounds like a contradiction, it is the entire subject of this article.&lt;/p&gt;
&lt;p&gt;The local inference conversation online is dominated by two positions. The first: why pay for API calls when you can run models on your own hardware? The second: local models are worse, so just pay for the good ones. Both are correct. Both are incomplete. The interesting question is where the boundary falls between them, and the answer turns out to depend less on cost-per-token arithmetic than on what kind of work you are doing.&lt;/p&gt;
&lt;h3&gt;The Split&lt;/h3&gt;
&lt;p&gt;I use Claude for research, code review, writing feedback, technical analysis, and anything that used to be a Google search. The frontier models are better at all of these tasks than anything I can run locally. Not marginally better; categorically better. An 8B parameter model running on my hardware is not in the same conversation as Claude Opus or GPT-5.4 for anything requiring reasoning, nuance, or broad knowledge. The subscription cost is fixed regardless of volume, which eliminates per-query friction entirely. For interactive, quality-sensitive work, I pay for the best model available and I do not think about it.&lt;/p&gt;
&lt;p&gt;Local inference handles everything else: the batch jobs, the grunt work, the high-volume tasks where model quality matters less than model availability. The work that would be expensive at cloud API rates not because any single call costs much, but because the calls number in the tens of thousands.&lt;/p&gt;
&lt;p&gt;This is not a temporary arrangement while local models catch up. It is a structural split. Frontier models are getting better. Local models are also getting better. The gap is not closing in the ways that matter for my usage, because the tasks I send to each side are fundamentally different. I do not need my local 8B model to reason better. I need it to process text cheaply and without metering.&lt;/p&gt;
&lt;h3&gt;What the Local Hardware Actually Does&lt;/h3&gt;
&lt;p&gt;Three workloads. All batch. All quality-tolerant.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Text-to-speech.&lt;/strong&gt; Every post on this site has an &lt;a href="https://tinycomputers.io/posts/the-real-cost-of-running-qwen-tts-locally-three-machines-compared.html"&gt;AI-generated audio narration&lt;/a&gt;. This is the workload that justifies the hardware on its own. Google Cloud Platform has superior TTS voices; Chirp3-HD sounds noticeably more natural than any open-source model I have tested. I ran a novel through it once: 82,000 words, 500,000 characters, $17.25. That is reasonable for a one-off project.&lt;/p&gt;
&lt;p&gt;It is not reasonable for a library of blog posts that I revise and regenerate periodically. At GCP rates ($16 per million characters, more for premium voices), narrating every post on this site would cost $200 to $400, and that bill resets every time I edit an article and regenerate the audio. Open-source TTS (&lt;a href="https://tinycomputers.io/posts/the-real-cost-of-running-qwen-tts-locally-three-machines-compared.html"&gt;F5-TTS and Qwen TTS&lt;/a&gt;) mispronounces technical terms. The prosody goes flat on dense jargon. But it is good enough for blog narration. "Good enough" at zero marginal cost beats "excellent" at $4 to $10 per post when you are generating audio daily.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Code scanning.&lt;/strong&gt; Running local models over source files for pattern detection, documentation extraction, and automated analysis. These jobs produce high token volume at low quality requirements. An 8B model is adequate. The token count across a full codebase makes API pricing add up in a way that individual queries do not.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Infrastructure work.&lt;/strong&gt; Benchmarking hardware (as in this article), testing prompt structures across quantization levels, evaluating model behavior under different configurations. These queries have no value individually. They are the test drives, not the commute. Paying per-token for test drives is paying per-mile to drive your own car around the block.&lt;/p&gt;
&lt;p&gt;None of these workloads require a frontier model. All of them generate enough volume to make metered pricing uncomfortable. That is the boundary.&lt;/p&gt;
&lt;h3&gt;The Machines&lt;/h3&gt;
&lt;p&gt;Two machines. Both mine. Both running &lt;a href="https://ollama.com"&gt;Ollama&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;four-GPU Tesla P40 server&lt;/a&gt;: Penguin Computing 2U chassis, Xeon E5-2697A v4, 252GB DDR4 ECC, four Tesla P40s with 24GB GDDR5X each. Ninety-six gigabytes of VRAM. Pascal architecture, 2016 vintage. Built from eBay parts for about $2,500. Lives in an unheated shop building in Minnesota.&lt;/p&gt;
&lt;p&gt;A Bosgame M5 mini desktop: AMD Ryzen AI MAX+ 395, Strix Halo APU with integrated RDNA 3.5 graphics. No discrete GPU. CPU and GPU share 128GB DDR5, roughly 60GB addressable as VRAM through ROCm 7.2. Cost about $3,000. Fits on a desk.&lt;/p&gt;
&lt;h3&gt;What They Cost to Run&lt;/h3&gt;
&lt;p&gt;I logged GPU power draw at 500-millisecond intervals during inference using &lt;code&gt;nvidia-smi&lt;/code&gt; on the P40 server and &lt;code&gt;rocm-smi&lt;/code&gt; on the Strix Halo. Same prompt, same models, same Ollama configuration. All models ran 100% on GPU.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;P40 tok/s&lt;/th&gt;
&lt;th&gt;P40 GPU Power&lt;/th&gt;
&lt;th&gt;Halo tok/s&lt;/th&gt;
&lt;th&gt;Halo GPU Power&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.2 3B&lt;/td&gt;
&lt;td&gt;91.2&lt;/td&gt;
&lt;td&gt;170W avg&lt;/td&gt;
&lt;td&gt;78.4&lt;/td&gt;
&lt;td&gt;64W avg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.1 8B&lt;/td&gt;
&lt;td&gt;47.5&lt;/td&gt;
&lt;td&gt;278W avg&lt;/td&gt;
&lt;td&gt;40.2&lt;/td&gt;
&lt;td&gt;82W avg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.1 70B (4K ctx)&lt;/td&gt;
&lt;td&gt;6.3&lt;/td&gt;
&lt;td&gt;278W avg&lt;/td&gt;
&lt;td&gt;5.6&lt;/td&gt;
&lt;td&gt;81W avg&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The P40 is 15-18% faster in raw throughput. It draws 3-4x the power. The 3B model lives on a single P40; the other three cards idle at ~9W each but still cost electricity. The 8B and 70B models span two GPUs while two idle. You always pay for cards that are not working. The Strix Halo has one GPU. No idle penalty.&lt;/p&gt;
&lt;p&gt;GPU power is not total system power. The P40 server's Xeons, 252GB of RAM, dual PSUs, and fans add roughly 200W to the GPU figures. The Strix Halo's APU and DDR5 add roughly 40-60W. Conservative estimates for total system draw: 500W for the P40 under load, 120W for the Strix Halo.&lt;/p&gt;
&lt;p&gt;At Minnesota residential electricity rates ($0.157/kWh), the cost per million tokens:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Machine&lt;/th&gt;
&lt;th&gt;3B&lt;/th&gt;
&lt;th&gt;8B&lt;/th&gt;
&lt;th&gt;70B&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;P40 Server&lt;/td&gt;
&lt;td&gt;$0.19/M&lt;/td&gt;
&lt;td&gt;$0.46/M&lt;/td&gt;
&lt;td&gt;$3.47/M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strix Halo&lt;/td&gt;
&lt;td&gt;$0.06/M&lt;/td&gt;
&lt;td&gt;$0.13/M&lt;/td&gt;
&lt;td&gt;$0.94/M&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;Why the Per-Token Number Is Misleading&lt;/h3&gt;
&lt;p&gt;Those numbers look competitive with hosted inference, which runs $0.05 to $0.20 per million tokens for 8B-class models through providers like Together AI or Groq. The Strix Halo at $0.13/M sits squarely in that range. The P40 at $0.46/M does not.&lt;/p&gt;
&lt;p&gt;But per-token cost during active inference is the wrong metric for two reasons.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hardware amortization changes the math.&lt;/strong&gt; The P40 server cost $2,500. The Strix Halo cost $3,000. Amortized over two years, that adds $0.14/hr and $0.11/hr respectively. On the 8B model, the all-in cost per million tokens rises to about $1.28 for the P40 and $0.90 for the Strix Halo. Both are more expensive than every hosted inference API for the same model.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Idle power is the dominant cost.&lt;/strong&gt; The P40 server draws roughly 340W at idle: $38.50 per month whether I run a single query or not. The Strix Halo draws roughly 35W at idle: $4.20 per month. Over a year, idle electricity alone costs $462 on the P40 and $50 on the Strix Halo. If you are not using the hardware frequently, idle power overwhelms everything else in the cost model.&lt;/p&gt;
&lt;p&gt;Per-token math at load flatters local inference by ignoring the hours when the hardware is doing nothing. It is like calculating your car's fuel economy only during highway driving and ignoring that it sits in the driveway 22 hours a day with the engine running.&lt;/p&gt;
&lt;h3&gt;Why I Run Both Anyway&lt;/h3&gt;
&lt;p&gt;The per-token economics favor API providers. The per-workload economics favor local hardware for specific tasks. TTS is the starkest example.&lt;/p&gt;
&lt;p&gt;Generating a 20-minute blog narration on the Strix Halo takes about 45 minutes of inference at roughly 85W above idle power. The incremental electricity cost is about $0.02. The same narration through Google Cloud TTS would cost $4 to $10 depending on character count and voice tier.&lt;/p&gt;
&lt;p&gt;That is a 200-to-500x cost difference on the marginal unit. And the marginal unit is what matters, because the question is never "should I generate TTS at all?" It is "should I regenerate the audio for this post I just edited?" or "should I try a different voice on this article?" or "should I narrate this niche post about PCB trace routing that maybe fifty people will listen to?"&lt;/p&gt;
&lt;p&gt;At $4 to $10 per narration, the answer to all of those is "probably not." At $0.02, the answer is "why wouldn't I?" That shift from "probably not" to "why not" is the entire economic argument for owning TTS hardware. It is not about the average cost. It is about the marginal decision.&lt;/p&gt;
&lt;p&gt;Before running local TTS, I narrated posts selectively with Google Cloud's Text-to-Speech. Some were too long or too niche to justify the GCP cost. Now every post gets audio. I regenerate after revisions without thinking about it. I have run the same post through three different TTS models to compare voice quality. I experiment with speaker voices, pacing parameters, and chunk sizes. The total volume of audio I have generated locally exceeds what I would have purchased from Google at any price point. This is &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox&lt;/a&gt; at the smallest possible scale: make TTS cheap enough and I do not produce the same amount of TTS for less money; I produce vastly more TTS for slightly less money.&lt;/p&gt;
&lt;p&gt;The same logic applies to code scanning. Any individual scan is cheap enough through an API. But the friction of metered pricing discourages the kind of speculative, exploratory analysis that turns up unexpected findings. When the marginal cost is zero, I scan more freely and more often. The value is not in any single scan; it is in the scans I would not have run otherwise.&lt;/p&gt;
&lt;h3&gt;The Strix Halo Problem&lt;/h3&gt;
&lt;p&gt;The most surprising result in the benchmarks is the Strix Halo's efficiency. An integrated APU with no discrete GPU delivers 40.2 tokens per second at 82W of GPU power. The P40 server delivers 47.5 tokens per second at 278W of GPU power. The P40 is 18% faster. The Strix Halo uses 70% less power. In performance per GPU watt, the Strix Halo (0.49 tok/s per watt) is nearly three times more efficient than the P40 (0.17 tok/s per watt).&lt;/p&gt;
&lt;p&gt;This creates a problem for the P40 server's economics. The server's advantage is VRAM: 96GB lets it run 120B MoE models that the Strix Halo cannot fit. For the gpt-oss 120B model, the P40 server is the valid option. But for everything 8B and below, the Strix Halo is cheaper to buy ($2,000 vs. $2,500), cheaper to idle ($4.20/month vs. $38.50/month), cheaper per token ($0.13/M vs. $0.46/M), quieter, smaller, and only 18% slower.&lt;/p&gt;
&lt;p&gt;If I were building a local inference setup today from scratch and my workload was 8B models and TTS, I would buy the Strix Halo and nothing else. The P40 server justifies its existence only for the large models that need its VRAM and the fact that I put it together well before the current RAM price spike.&lt;/p&gt;
&lt;p&gt;This is worth sitting with for a moment, because it inverts the conventional wisdom about inference hardware. The enterprise GPU server that looks impressive on paper (four GPUs, 96GB VRAM, 2U rack mount) loses on total cost of ownership to a $3,000 mini desktop for the workloads that dominate my actual usage. The P40's raw throughput advantage is real but small. Its power cost advantage is negative. The VRAM advantage matters only for models most people do not run.&lt;/p&gt;
&lt;h3&gt;The Maintenance Tax&lt;/h3&gt;
&lt;p&gt;The per-token calculations ignore the cost of keeping these machines running. It is not zero.&lt;/p&gt;
&lt;p&gt;I have had two kernel updates break the NVIDIA DKMS module on the P40 server. The AMD machine requires &lt;a href="https://tinycomputers.io/posts/qwen-tts-on-amd-strix-halo.html"&gt;specific pre-release PyTorch wheels&lt;/a&gt; and environment variable overrides for ROCm to function on gfx1151 hardware. While running the benchmarks for this article, I discovered that Ollama on the Strix Halo had been running entirely on CPU because the systemd service file lacked the &lt;code&gt;HSA_OVERRIDE_GFX_VERSION=11.5.1&lt;/code&gt; variable. Every benchmark I had run on that machine prior to catching this was measuring CPU inference, not GPU inference. The fix took two minutes. Finding it took longer.&lt;/p&gt;
&lt;p&gt;The P40 server's fans run at full speed from October through April because the BMC interprets Minnesota winter temperatures as a hardware malfunction. The noise is audible from the house, 150 feet away.&lt;/p&gt;
&lt;p&gt;None of this is catastrophic. All of it is time. And time spent debugging DKMS modules or adding environment variables to systemd units is time not spent on the work that the hardware is supposed to enable. A Claude Max subscription requires zero maintenance. The local hardware requires ongoing attention. That asymmetry does not show up in per-token cost tables, but it is real.&lt;/p&gt;
&lt;h3&gt;Who This Is For&lt;/h3&gt;
&lt;p&gt;Most people should not build a local inference server. If you use AI for interactive tasks (questions, code, analysis, writing), a frontier model subscription is a better product at a lower total cost than any local setup. The quality gap between a local 8B model and Claude or GPT-5.4 is not closing in the ways that matter for conversational use. Pay for the good models. Use them freely.&lt;/p&gt;
&lt;p&gt;Local inference makes economic sense when you have a specific, high-volume, quality-tolerant workload that you will run often enough to justify hardware sitting on 24/7. TTS is the clearest case. Batch code analysis is another. If you cannot name the workload, you do not have one, and the hardware will cost you $40 to $50 per month in idle electricity to find out.&lt;/p&gt;
&lt;p&gt;The split between frontier subscriptions and local batch processing is not a compromise. It is, for my usage, the correct architecture. The frontier model handles the work where quality determines value. The local hardware handles the work where volume determines cost. Neither replaces the other. The mistake is thinking they compete.&lt;/p&gt;</description><category>ai</category><category>amd</category><category>benchmarks</category><category>claude</category><category>economics</category><category>gpu</category><category>home lab</category><category>inference</category><category>jevons paradox</category><category>local inference</category><category>power consumption</category><category>strix halo</category><category>tesla p40</category><category>tts</category><guid>https://tinycomputers.io/posts/the-economics-of-owning-your-own-inference.html</guid><pubDate>Tue, 17 Mar 2026 13:00:00 GMT</pubDate></item><item><title>LLM-Generated Content: What Makes Something Slop</title><link>https://tinycomputers.io/posts/llm-generated-content-what-makes-something-slop.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/llm-generated-content-what-makes-something-slop_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;20 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Merriam-Webster named "slop" its &lt;a href="https://baud.rs/QUwnPW"&gt;2025 Word of the Year&lt;/a&gt;. In its new usage, the word describes low-quality AI-generated content produced and distributed with minimal human oversight. It captures something the internet has been feeling for a while: the growing suspicion that much of what appears online wasn't written so much as emitted.&lt;/p&gt;
&lt;p&gt;I should be transparent about where I stand. This blog uses AI-generated text-to-speech narration on every post. The articles about &lt;a href="https://tinycomputers.io/posts/the-mathematics-of-pcb-trace-routing.html"&gt;PCB trace routing&lt;/a&gt; describe boards that were auto-routed by algorithms. The code that builds and deploys this site was partially written with Claude Code assistance. I wrote a &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;six-part series on Jevons Paradox&lt;/a&gt; with AI tools open in the next terminal window the entire time. I am not writing this from outside the system.&lt;/p&gt;
&lt;p&gt;And yet I know slop when I see it. You probably do too. The interesting question is not whether slop exists (it obviously does) but what exactly we're recognizing when we encounter it. What quality makes certain AI-generated or AI-assisted content feel hollow, and what distinguishes it from output that has substance? The answer matters, because if we can't articulate the distinction, we're left with a binary that helps nobody: reject all AI tools, or accept everything they produce uncritically.&lt;/p&gt;
&lt;h3&gt;You Know It When You See It&lt;/h3&gt;
&lt;p&gt;In 1964, Justice Potter Stewart offered his famous non-definition of obscenity: "I know it when I see it." We're in a similar position with AI slop. Most people can identify it immediately but struggle to explain what they're detecting.&lt;/p&gt;
&lt;p&gt;The surface markers are easy enough to catalog. The hedging language: "It's important to note that..." The false balance, presenting every issue as having exactly two equally valid sides. The emoji padding that serves no communicative purpose. The five-paragraph essay structure applied to every topic regardless of complexity. The confident incorrectness: statements delivered with the same breezy authority whether they're true or fabricated. The vocabulary of caution and qualification that reads less like thoughtfulness and more like a language model covering its bases.&lt;/p&gt;
&lt;p&gt;These are the tells that AI-detection tools try to measure, and they work well enough for obvious cases. But they're symptoms, not the disease. A skilled prompt engineer can eliminate every one of these markers and still produce slop. Conversely, a human writer can exhibit several of them (hedging, false balance, structural rigidity) and still produce something worth reading. The surface features point toward the problem without being the problem itself.&lt;/p&gt;
&lt;p&gt;What we're actually detecting is an absence. Not an absence of quality at the sentence level (LLMs write clean, grammatical sentences) but an absence of something harder to name. The text reads correctly line by line and says nothing paragraph by paragraph. It is fluent without being articulate. It covers a topic without engaging with it. And we recognize this gap almost instantly, the way you recognize a smile that doesn't reach someone's eyes.&lt;/p&gt;
&lt;h3&gt;The Three Properties&lt;/h3&gt;
&lt;p&gt;The MINT Lab at Indiana University proposed a useful framework for thinking about this. They identified three properties that characterize slop: superficial competence, asymmetric effort, and mass producibility.&lt;/p&gt;
&lt;p&gt;Superficial competence is the core mechanism. The text performs competence at the surface level: vocabulary is appropriate, structure is logical, claims are plausible. But it doesn't demonstrate competence at the level of understanding. There's a difference between a sentence that uses the right words and a sentence that conveys the right meaning. Slop consistently achieves the former while missing the latter. The prose is grammatically flawless and semantically empty, a combination that is almost impossible for human writers to produce at scale but trivially easy for language models.&lt;/p&gt;
&lt;p&gt;Think of a student essay that hits every point on the rubric: thesis statement in the right place, three supporting paragraphs, counterargument acknowledged, conclusion that restates the thesis. A teacher reads it and gives it a B+. But the teacher also knows, without being able to point to a specific sentence, that the student didn't learn anything while writing it. The essay demonstrates knowledge of essay structure, not knowledge of the subject. That's superficial competence.&lt;/p&gt;
&lt;p&gt;Asymmetric effort describes the production economics. The author (or deployer) invested minimal effort relative to the volume of output. A single prompt generates 2,000 words in seconds. The resulting text has the length and format of something that would take a human writer hours, but it cost nothing in terms of thought, research, or revision. This asymmetry creates an incentive structure where the marginal cost of publishing approaches zero and the quality feedback loop disappears.&lt;/p&gt;
&lt;p&gt;Mass producibility follows from the first two. If the text is superficially competent and cheap to produce, there's no natural limit on volume. This is how you get AI-generated recipe blogs with 10,000 pages, product review sites with no evidence of product testing, and news aggregators that rewrite wire stories into blandly authoritative summaries. The content fills a shape (a blog post, a review, a news article) without filling it with meaning.&lt;/p&gt;
&lt;p&gt;These three properties interact. Mass production exacerbates the problem of superficial competence because there's no time or incentive for the depth that would distinguish one piece from another. And asymmetric effort means there's no skin in the game: the producer doesn't care whether the content is right, because it cost almost nothing to create and nothing to correct.&lt;/p&gt;
&lt;h3&gt;Greenberg's Ghost&lt;/h3&gt;
&lt;p&gt;There's a version of this argument that's eighty-seven years old.&lt;/p&gt;
&lt;p&gt;In 1939, &lt;a href="https://baud.rs/I3KpKw"&gt;Clement Greenberg&lt;/a&gt; published &lt;a href="https://tinycomputers.io/ClementGreenbergAvant-GardeAndKitsch.pdf"&gt;&lt;em&gt;Avant-Garde and Kitsch&lt;/em&gt;&lt;/a&gt;, one of the most influential essays in twentieth-century art criticism. Greenberg argued that mass culture produces "kitsch," art that "pre-digests art for the spectator and spares him effort, provides him with a shortcut to the pleasure of art that detours what is necessarily difficult in genuine art." Kitsch offers "vicarious experience and faked sensations." It looks like art. It has the shape of art. But it demands nothing from the viewer and delivers nothing in return except the comfortable feeling of having consumed something.&lt;/p&gt;
&lt;p&gt;AI slop does exactly this with information. It pre-digests knowledge for the reader, offering the appearance of understanding without requiring (or enabling) actual understanding. You read 2,000 words about a topic and come away with the sense that you've learned something, but when you try to articulate what you learned, there's nothing solid to grasp. The text gave you the experience of reading an informative article without the substance of one. Vicarious understanding. Faked insight.&lt;/p&gt;
&lt;p&gt;The parallel extends further than you might expect. Greenberg worried that kitsch would overwhelm genuine art because it was cheaper to produce and easier to consume. The same dynamics apply to AI-generated content: it's infinitely cheaper to produce, formats itself for easy consumption, and competes for the same attention as substantive work. Greenberg's nightmare was a culture where the imitation crowds out the real thing. That's recognizably the state of much of the internet in 2026.&lt;/p&gt;
&lt;p&gt;But Greenberg was also, let's be honest, a snob. His framework positioned the critic as the essential gatekeeper: only the trained eye could distinguish art from kitsch, and the masses were essentially passive consumers incapable of judgment. This elitism left him unprepared for Pop Art. When &lt;a href="https://baud.rs/H7xrvU"&gt;Warhol&lt;/a&gt; silk-screened Campbell's soup cans and Lichtenstein blew up comic panels to gallery scale, they took the materials of kitsch and made something genuinely interesting from them. They didn't reject mass culture; they engaged with it in a way that Greenberg's binary framework couldn't accommodate.&lt;/p&gt;
&lt;p&gt;There's an obvious recursive problem here, and I should name it rather than pretend it doesn't exist. This essay was written with AI assistance. It is, in a direct sense, an attempt to take the materials of mass production (an LLM's facility with argument structure, literature survey, prose drafting) and make something that isn't slop. Whether it succeeds is for the reader to judge. But the attempt itself is the Pop Art move: not rejecting the tools of mass culture, but trying to use them to say something specific. If I fail, the essay is kitsch that thinks it's art. If I succeed, Greenberg's binary was too rigid, and the tool was never the problem.&lt;/p&gt;
&lt;p&gt;This tension matters for the slop conversation more broadly. If we define slop as any AI-generated content (regardless of what it does or says), we make the same mistake Greenberg made with kitsch. The question is not the tool; it's whether something is being done with it at all.&lt;/p&gt;
&lt;h3&gt;The Authenticity Problem&lt;/h3&gt;
&lt;p&gt;So what is the actual distinguishing quality? What separates writing that happens to involve AI tools from writing that is slop?&lt;/p&gt;
&lt;p&gt;It's not voice. LLMs can mimic voice convincingly enough to fool most readers. It's not structure; LLMs organize material at least as well as the average human writer. It's not even factual accuracy, since LLMs can be accurate when properly grounded and cited. These are all necessary conditions for good writing, but slop can satisfy all of them and still be slop.&lt;/p&gt;
&lt;p&gt;What's missing is a point of view: the willingness to be wrong about something specific.&lt;/p&gt;
&lt;p&gt;Slop hedges. It covers all sides. It presents every position as having merit and declines to choose between them. It never commits to a claim that could be falsified, challenged, or argued against. And this is not a bug in the technology; it's a feature. Language models are trained to be helpful, harmless, and accurate. Helpfulness means addressing the user's question. Harmlessness means avoiding offense. The intersection of these goals produces text that is relentlessly, pathologically balanced. Every "on the one hand" gets an "on the other hand." Every strong claim gets a qualification. The result is prose that cannot be disagreed with, because it doesn't say anything specific enough to disagree with.&lt;/p&gt;
&lt;p&gt;I notice this constantly in my own AI-assisted drafts. The first pass comes back with every edge sanded off. Where I wrote "Freerouting can't do copper pours, and that's a fatal limitation for production boards," the draft wants to say "Freerouting has some limitations regarding copper pours that may affect certain use cases." The second version is more cautious. It's also emptier. The editorial work, the part that makes writing not-slop, is putting the edges back on: choosing the stronger claim, deleting the qualifications that exist for safety rather than accuracy, deciding that this is what I actually think and I'm willing to defend it.&lt;/p&gt;
&lt;p&gt;Good writing, whether human or AI-assisted, takes a position and defends it. The author exists in the text because they have opinions, not because they have fluency. When I wrote that &lt;a href="https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox.html"&gt;Jevons Paradox applies to human attention&lt;/a&gt; in the context of AI-assisted work, that was a specific, falsifiable claim. You could disagree with it. You could argue the model doesn't apply, or that the historical parallels are misleading, or that the biological ceiling on attention changes the dynamics. The argument creates a surface for friction. It takes a stance that could be wrong.&lt;/p&gt;
&lt;p&gt;Slop never takes that risk. It describes all positions and endorses none. It informs without arguing. And because it never commits to anything, it can never be wrong, which means it can never be right either. It occupies a semantic dead zone: technically not false, functionally not true, informationally zero.&lt;/p&gt;
&lt;p&gt;This is the test most people are applying intuitively when they identify something as slop. They're asking: is someone home? Does the text have a perspective, or is it just generating plausible sentences? The "someone" doesn't have to be a human, exactly. It has to be a process that made choices: that included some things and excluded others, that decided this interpretation was better than that one. Slop is text produced by a process that made no choices at all, because the defaults were good enough to fill the space.&lt;/p&gt;
&lt;h3&gt;When AI Output Isn't Slop&lt;/h3&gt;
&lt;p&gt;If the test is commitment and accountability, then it follows that AI-assisted output can clear the bar. But I want to be specific here, not hand-wavy, because vague appeals to "my own experience" are themselves a slop move.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html"&gt;Giga Shield project&lt;/a&gt; started with a $468 Fiverr design that didn't work. Nine bidirectional level shifters, professional layout, clean two-layer board. Then I tested it with a Z80 processor, and the auto-sensing TXB0108 chips fell apart. The Z80 tri-states its address bus between cycles; the pins go high-impedance, floating. The TXB0108 can't determine drive direction from a floating signal. It guesses wrong, and the Arduino on the other side reads garbage. I'd paid $468 for a board that was blind to half of what the processor was doing.&lt;/p&gt;
&lt;p&gt;The redesign used Claude Code to generate the entire replacement board from a Python script: no graphical PCB editor, no manual placement, just code that outputs a routable board file. AI wrote the board generator. AI helped parse the KiCad schematic to extract all 72 signal mappings across 9 ICs. Then Freerouting, an open-source autorouter, handled the trace routing.&lt;/p&gt;
&lt;p&gt;Here's the kind of specificity that slop can't contain: after 60 optimization passes (about 45 minutes of compute), Freerouting brought the via count on the Giga Shield from roughly 220 down to 158. I ran 128 parallel instances across three machines with randomized net ordering to explore different regions of the solution space. And still, a hard floor of 5-6 unrouted ground connections remained, because Freerouting's architecture literally cannot represent copper pours, and the 0.65mm-pitch TSSOP-24 packages didn't have physical room for ground vias. That limitation is structural. No amount of prompt engineering or parameter tuning changes the fact that the algorithm has no concept of flood-fill connectivity. I &lt;a href="https://tinycomputers.io/posts/the-mathematics-of-pcb-trace-routing.html"&gt;wrote about this in detail&lt;/a&gt;, including the A* search internals and the specific geometric constraints, and if I got the analysis wrong, anyone can read the &lt;a href="https://github.com/ajokela/giga-shield"&gt;source code&lt;/a&gt; and check.&lt;/p&gt;
&lt;p&gt;I also discovered that Freerouting v2.1.0 produced 152 unrouted connections on the same board where v1.9.0 produced 6. That's a testable, reproducible claim, attached to specific version numbers, specific board files, specific machines. It's the opposite of "AI autorouting tools can sometimes produce inconsistent results," which is the slop version of the same observation. One of those sentences tells you something. The other fills space.&lt;/p&gt;
&lt;p&gt;Even the TTS narration is more complicated than "it just works." The Qwen model mispronounces technical terms. It puts emphasis in odd places. The audio for posts with dense jargon has an uncanny flatness where the model clearly doesn't understand what it's reading. I publish it anyway because it's useful despite its flaws, and because I label it as AI-generated narration, which means I'm not asking the listener to trust it as a human performance. It's a tool with known limitations, deployed for a specific purpose, accountable to its function.&lt;/p&gt;
&lt;p&gt;The common thread isn't that AI made these outputs perfect. It's that they were tested against something outside themselves. The board works or it doesn't. The via count is 158 or it isn't. The audio plays or it doesn't. Slop faces no such test. It exists to fill a container, and its success is measured by whether the container looks full, not by whether what's inside is true.&lt;/p&gt;
&lt;h3&gt;The Compost Argument&lt;/h3&gt;
&lt;p&gt;There's a reasonable counterargument that goes like this: human slop has always existed. Content farms, SEO spam, airport bookstore filler, corporate press releases, academic papers that exist only to pad a CV. The internet was full of low-quality, low-effort content long before large language models existed. AI didn't invent slop; it industrialized it.&lt;/p&gt;
&lt;p&gt;This is true, and it's worth taking seriously. The people arguing that "the idea of AI slop is slop" have a point: if we define slop as low-quality content produced with minimal effort, most of what humans have ever published qualifies. Sturgeon's Law (ninety percent of everything is crud) predates AI by decades.&lt;/p&gt;
&lt;p&gt;But the economics are different now, and economics change everything. When slop required human labor, there was a floor on production cost. A content farm still had to pay writers (however little). An SEO spammer still had to hire someone to string keywords into sentences. That floor limited volume, which limited the ratio of noise to signal in any given information ecosystem.&lt;/p&gt;
&lt;p&gt;AI removes the floor. The marginal cost of producing a 2,000-word article drops to fractions of a cent. The marginal cost of producing 10,000 such articles drops to the cost of an API call and a deployment script. The constraint was never willingness to produce slop; it was cost. With cost eliminated, volume expands without bound. This is, incidentally, &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;another case of Jevons Paradox&lt;/a&gt;: make content production cheaper, get more content production, not less.&lt;/p&gt;
&lt;p&gt;Some writers have made what I'll call the compost argument: that cultural slop, even the human-produced kind, serves as a sort of fertilizer. The vast majority of pulp fiction was forgettable, but it created the ecosystem that produced &lt;a href="https://baud.rs/UdOkDt"&gt;Philip K. Dick&lt;/a&gt; and &lt;a href="https://baud.rs/v53yMW"&gt;Ursula K. Le Guin&lt;/a&gt;. Most blog posts are unremarkable, but the blogging ecosystem produced some genuinely important writing. The compost nourishes rare blooms.&lt;/p&gt;
&lt;p&gt;Maybe. But there's a concentration problem. A garden benefits from compost; a garden buried under six feet of compost is just a landfill. If the ratio of slop to substance shifts far enough, the substance becomes unfindable. Search engines surface slop because it's optimized for surfacing. Recommendation algorithms amplify it because engagement metrics can't distinguish between "I read this and learned something" and "I read this and it filled two minutes." The signal doesn't just get drowned out; it gets algorithmically deprioritized in favor of the noise.&lt;/p&gt;
&lt;h3&gt;What the Test Looks Like&lt;/h3&gt;
&lt;p&gt;I've argued that what we recognize as slop is the absence of commitment: text that declines to be wrong about anything specific. I believe this is correct, but I should be honest about where the test gets uncomfortable.&lt;/p&gt;
&lt;p&gt;Committed writing can be terrible. Conspiracy theories are committed. Propaganda is committed. A confidently wrong blog post about vaccine microchips passes the "takes a position" test with flying colors. Commitment is necessary but not sufficient. It separates slop from writing that has a pulse, but it doesn't separate good writing from bad writing. That's a different and older test, one that involves accuracy, evidence, reasoning, and intellectual honesty: all the things we've always used to evaluate arguments. The slop test is prior to all of that. It asks whether there's anything present to evaluate in the first place.&lt;/p&gt;
&lt;p&gt;The tool doesn't determine the category. The commitment does. And if this essay has failed to commit to anything worth arguing against, then by its own logic, it belongs in the landfill with the rest.&lt;/p&gt;</description><category>ai</category><category>authenticity</category><category>kitsch</category><category>philosophy</category><category>slop</category><category>writing</category><guid>https://tinycomputers.io/posts/llm-generated-content-what-makes-something-slop.html</guid><pubDate>Mon, 16 Mar 2026 13:00:00 GMT</pubDate></item><item><title>The Mathematics of PCB Trace Routing</title><link>https://tinycomputers.io/posts/the-mathematics-of-pcb-trace-routing.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-mathematics-of-pcb-trace-routing_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;24 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Every PCB design eventually arrives at the same moment. Components are placed. Nets are defined. The ratsnest of thin lines connecting pad to pad looks like a plate of spaghetti dropped on a cutting board. Now someone, or something, has to turn that mess into real copper traces that don't cross, don't short, and fit within the design rules. That's the routing problem.&lt;/p&gt;
&lt;p&gt;For hobbyists and professionals alike, autorouters do this work. You press a button, wait, and traces appear. But what actually happens during that wait? The answer turns out to involve some of the most elegant mathematics in computer science, and some surprisingly hard geometric constraints that no algorithm can finesse.&lt;/p&gt;
&lt;p&gt;I've been using &lt;a href="https://baud.rs/bdZw62"&gt;Freerouting&lt;/a&gt;, the open-source Specctra autorouter, for two PCB projects now: a &lt;a href="https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html"&gt;dual Z80 RetroShield&lt;/a&gt; and a &lt;a href="https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html"&gt;level-shifter shield for the Arduino Giga R1&lt;/a&gt;. The second project pushed Freerouting to its limits in ways that forced me to understand how it works internally. This is what I found when I read the source code.&lt;/p&gt;
&lt;h3&gt;Not a Grid, Not a Maze&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/giga-shield/giga_shield_freerouted_top.png" alt="Freerouting result on the Giga Shield: 2-layer board with 45-degree trace routing between TSSOP-24 ICs and pin headers, rendered in pcb-rnd photo mode" style="float: right; max-width: 420px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;
&lt;em&gt;Giga Shield routed by Freerouting in 45-degree mode. Top layer, rendered in pcb-rnd.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Most descriptions of PCB autorouting start with &lt;a href="https://baud.rs/pblLmT"&gt;Lee's maze algorithm&lt;/a&gt; from 1961. Place the board on a grid. Flood-fill from the source pad. When the wave hits the destination, backtrack along the shortest path. It's intuitive, easy to implement, and used in introductory EDA courses everywhere.&lt;/p&gt;
&lt;p&gt;Freerouting doesn't do this.&lt;/p&gt;
&lt;p&gt;Instead of discretizing the board into a grid of cells, Freerouting operates on a continuous geometric plane. The routing space is partitioned into convex polygonal regions called expansion rooms. Each room is a chunk of free space on one layer of the board, bounded by the edges of existing obstacles (traces, vias, pads) plus their clearance halos. The rooms aren't precomputed. They're generated lazily during the search, grown on demand as the router explores new areas.&lt;/p&gt;
&lt;p&gt;This is a shape-based router, sometimes called a free-space router. The distinction matters. A grid-based router's resolution is fixed: if your grid is 0.1mm, you can't route a trace at 0.05mm offset from an obstacle, even if the design rules would allow it. A shape-based router has no such limitation. It works with exact geometry (integer-valued coordinates for precision), and the routing channels it discovers are as wide or narrow as the physical clearances actually allow.&lt;/p&gt;
&lt;p&gt;Three geometry modes control the shape of the rooms:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Room Shape&lt;/th&gt;
&lt;th&gt;Allowed Trace Angles&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;90-degree&lt;/td&gt;
&lt;td&gt;Axis-aligned rectangles&lt;/td&gt;
&lt;td&gt;Horizontal, vertical&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;45-degree&lt;/td&gt;
&lt;td&gt;Octagons&lt;/td&gt;
&lt;td&gt;Plus 45-degree diagonals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Any-angle&lt;/td&gt;
&lt;td&gt;General convex polygons&lt;/td&gt;
&lt;td&gt;Unrestricted&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The choice affects both routing quality and performance. Axis-aligned rectangles are fastest to compute and intersect. Octagons allow the 45-degree traces common in modern PCBs. General polygons give the router maximum freedom but at a computational cost.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/giga-shield/giga_shield_freerouted_anyangle.png" alt="Freerouting any-angle mode: traces radiate from pads at arbitrary angles rather than snapping to a 45-degree grid, showing the difference between shape-based and grid-based routing" style="float: left; max-width: 420px; margin: 0 1.5em 1em 0; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;
&lt;em&gt;Same board in any-angle mode. Traces follow direct paths instead of 45-degree snapping.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The image to the left shows the same board routed in any-angle mode. Notice how traces leave pads at arbitrary angles, following straight-line paths toward their destinations rather than snapping to a 45-degree grid. Compare this with the image above, which used the standard 45-degree octagon mode. The any-angle result has shorter total trace length but can be harder to manufacture cleanly at tight tolerances.&lt;/p&gt;
&lt;h3&gt;The A* Core&lt;/h3&gt;
&lt;p&gt;At its heart, Freerouting's search algorithm is A*, the same algorithm that drives pathfinding in video games, robot navigation, GPS routing, and network packet delivery. A* was published by Peter Hart, Nils Nilsson, and Bertram Raphael at the &lt;a href="https://baud.rs/pqg9oG"&gt;Stanford Research Institute&lt;/a&gt; in 1968. Nearly sixty years later, it remains the standard algorithm for finding &lt;a href="https://baud.rs/XEVv2I"&gt;shortest paths in weighted graphs&lt;/a&gt; where a heuristic estimate of remaining distance is available.&lt;/p&gt;
&lt;p&gt;The mathematical foundation is straightforward. A* maintains a priority queue of candidate states, each with a cost value:&lt;/p&gt;
&lt;p&gt;$$f(n) = g(n) + h(n)$$&lt;/p&gt;
&lt;p&gt;Where:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;g(n)&lt;/code&gt; is the actual accumulated cost from the start to state &lt;em&gt;n&lt;/em&gt;. In PCB routing, this includes trace length, layer changes (vias), preferred-direction penalties, and any ripped-up obstacle costs.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;h(n)&lt;/code&gt; is a heuristic estimate of the remaining cost from &lt;em&gt;n&lt;/em&gt; to the destination. This must be admissible: it must never overestimate the true remaining cost.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;f(n)&lt;/code&gt; is the total estimated cost of the best path through &lt;em&gt;n&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At each step, A* pops the state with the lowest f(n) from the queue, expands its neighbors, and adds them back with updated costs. When the destination is popped, the algorithm has found the optimal path (given an admissible heuristic).&lt;/p&gt;
&lt;p&gt;The key insight is the heuristic. Without it, A* degenerates into &lt;a href="https://baud.rs/XEVv2I"&gt;Dijkstra's algorithm&lt;/a&gt;, which explores in all directions equally. A good heuristic focuses the search toward the destination. In Freerouting's case, &lt;code&gt;DestinationDistance.calculate()&lt;/code&gt; estimates the minimum cost to reach the target, accounting for both planar distance and any required layer transitions. The sorting value in the priority queue is computed as:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sorting_value&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;expansion_value&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="na"&gt;destination_distance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="na"&gt;calculate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape_entry_middle&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;layer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Where &lt;code&gt;expansion_value&lt;/code&gt; is the g(n) accumulated cost, and the distance calculation is h(n). This is textbook A*.&lt;/p&gt;
&lt;h4&gt;Why A* Works So Well&lt;/h4&gt;
&lt;p&gt;A* has a remarkable optimality guarantee. If the heuristic h(n) is admissible (never overestimates), A* is guaranteed to find the shortest path. If h(n) is also consistent (satisfying the triangle inequality: h(n) &amp;lt;= cost(n, n') + h(n') for every neighbor n'), then A* never needs to re-expand a state it has already visited. This makes it both optimal and efficient.&lt;/p&gt;
&lt;p&gt;For PCB routing, the Euclidean distance between the current position and the destination pad is a natural admissible heuristic: a straight line is always shorter than any actual route that must navigate around obstacles. Freerouting's heuristic is somewhat more sophisticated, incorporating via costs for layer transitions, but the principle is the same.&lt;/p&gt;
&lt;p&gt;The efficiency gain over brute-force search is dramatic. &lt;a href="https://baud.rs/XEVv2I"&gt;Dijkstra's algorithm&lt;/a&gt; (A* with h(n) = 0) explores states in concentric rings outward from the source. On a board with N searchable regions, it visits O(N) states. A* with a good heuristic carves a narrow corridor from source to destination, visiting far fewer states. In practice, on a moderately complex board, this is the difference between milliseconds and minutes per connection.&lt;/p&gt;
&lt;h4&gt;A* Is Everywhere&lt;/h4&gt;
&lt;p&gt;The same algorithm, with different cost functions and heuristics, solves an astonishing range of problems:&lt;/p&gt;
&lt;p&gt;Game pathfinding. Every real-time strategy game since the 1990s uses A* to move units around obstacles. The grid cells are the states, movement cost is g(n), and Manhattan or Euclidean distance to the target is h(n).&lt;/p&gt;
&lt;p&gt;GPS navigation. Road networks are weighted graphs. Edge weights are travel times. A* with geographic distance as the heuristic finds near-optimal routes across millions of road segments.&lt;/p&gt;
&lt;p&gt;Robot motion planning. A robot's configuration space (position, orientation, joint angles) is the state space. A* finds collision-free paths from one configuration to another.&lt;/p&gt;
&lt;p&gt;Natural language processing. Viterbi decoding, which finds the most likely sequence of hidden states in a Hidden Markov Model, is structurally similar to A* over a trellis graph.&lt;/p&gt;
&lt;p&gt;Puzzle solving. The 15-puzzle, &lt;a href="https://baud.rs/sKzgs4"&gt;Rubik's Cube&lt;/a&gt;, Sokoban. A* with an appropriate heuristic solves them all optimally, and the heuristic is what makes the search tractable rather than exponential.&lt;/p&gt;
&lt;p&gt;What makes A* general is the abstraction. It doesn't care whether the "states" are grid squares, road intersections, robot poses, or polygonal rooms on a PCB layer. It only needs a cost function, a heuristic, and a neighbor-expansion rule. Freerouting provides all three, with the unusual twist that its states are dynamically-computed convex polygons rather than fixed graph nodes.&lt;/p&gt;
&lt;h3&gt;But A* Only Routes One Net&lt;/h3&gt;
&lt;p&gt;Here's the catch. A* finds the optimal path for a single source-destination pair. A PCB has hundreds of nets, all competing for the same physical space. Route net A first, and it might block the optimal path for net B. Route net B first, and net A suffers instead. The quality of the overall routing depends heavily on the order in which nets are processed.&lt;/p&gt;
&lt;p&gt;Freerouting handles this with rip-up-and-reroute, a strategy from the 1970s that remains the standard approach. The idea is simple: route all nets in some initial order. When a net fails (no path exists without violating design rules), rip up one or more blocking traces and add them to a retry queue. Then try again with different priorities.&lt;/p&gt;
&lt;p&gt;The implementation in &lt;code&gt;BatchAutorouter.java&lt;/code&gt; runs multiple passes over the board. On each pass, every unrouted connection is attempted. The critical detail is how ripup decisions are made. Each existing trace has a ripup cost, and the cost increases linearly with the pass number:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;ripup_cost&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;start_ripup_costs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;passNumber&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Early passes are conservative: the router avoids tearing up existing routes. Later passes become progressively more aggressive, willing to rip up more traces to find solutions. This is a controlled escalation that prevents the router from thrashing (endlessly ripping and re-routing the same nets) while still allowing it to escape local minima.&lt;/p&gt;
&lt;p&gt;The scheduler also implements a limited form of backtracking. Every few passes, the router checks whether the board score (total unrouted connections, via count, trace length) has improved. If not, it restores a previously saved board snapshot and continues from that earlier state. This is a coarse approximation of simulated annealing: occasionally accepting a worse intermediate state to explore a different region of the solution space.&lt;/p&gt;
&lt;h4&gt;Net Ordering: The Hidden Variable&lt;/h4&gt;
&lt;p&gt;The order in which nets are routed has an outsized effect on the result. By default, Freerouting routes nets in the order they appear in the DSN file, which is typically the order they were defined in the schematic. There's no sorting by airline length, fan-out degree, or criticality. The router's source code contains a commented-out sort-by-distance that was disabled in v2.3 because it "negatively impacts convergence."&lt;/p&gt;
&lt;p&gt;This means the same board can produce different routing results depending on how the DSN file was generated. I exploited this during the Giga Shield project by writing a script (&lt;code&gt;shuffle_dsn.py&lt;/code&gt;) that generates dozens of copies of the same DSN file with randomized net ordering:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_copies&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;shuffled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;31337&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shuffled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# write shuffled DSN...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Each copy routes nets in a different sequence, converging to a different local optimum. Running 128 parallel Freerouting instances across three machines (a local Mac, a 64-core server, and a 32-core workstation) explored 128 different regions of the solution space simultaneously. The best result was measurably better than any single run. This is an embarrassingly parallel optimization: each job is independent, and you keep the best answer.&lt;/p&gt;
&lt;p&gt;The takeaway: if your autorouter isn't finding a clean solution, the problem might not be the algorithm. It might be the ordering. Changing the input order is cheaper than changing the router.&lt;/p&gt;
&lt;h3&gt;The Optimization Phase&lt;/h3&gt;
&lt;p&gt;After the initial routing passes, Freerouting enters an optimization phase controlled by the &lt;code&gt;-mp&lt;/code&gt; flag. This phase iterates over every existing via and trace in the design, processing them in a left-to-right spatial scan.&lt;/p&gt;
&lt;p&gt;For each item, the optimizer:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Rips up the item's entire connection (all traces and vias for that net segment)&lt;/li&gt;
&lt;li&gt;Re-runs up to 6 passes of the A*-based autorouter on just that connection&lt;/li&gt;
&lt;li&gt;Accepts the result only if it reduces via count or total trace length&lt;/li&gt;
&lt;li&gt;Restores the previous state if the re-route was no better&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Vias are visited before traces, reflecting the priority of via reduction. Each unnecessary via adds manufacturing cost, signal integrity degradation, and parasitic capacitance. The optimizer also alternates between preferred and non-preferred trace directions on successive passes, preventing the solution from getting stuck in a directional rut.&lt;/p&gt;
&lt;p&gt;Via positions themselves are fine-tuned by a separate algorithm (&lt;code&gt;OptViaAlgo&lt;/code&gt;). For vias connecting exactly two traces, the optimizer searches for the position that minimizes the combined weighted trace length on both layers, iteratively nudging the via toward the geometric optimum.&lt;/p&gt;
&lt;p&gt;The result of the optimization phase is typically a 15-30% reduction in via count and a 10-20% reduction in total trace length compared to the initial routing. On the Giga Shield, 60 optimization passes ran for about 45 minutes and brought the via count from ~220 down to ~158.&lt;/p&gt;
&lt;h3&gt;Why Freerouting Can't Do Copper Pours&lt;/h3&gt;
&lt;p&gt;This is where the elegance of the algorithm runs headfirst into a hard architectural limit.&lt;/p&gt;
&lt;p&gt;Every non-trivial PCB has a ground net that connects to dozens or hundreds of pads. The standard solution in commercial EDA tools is a copper pour: a filled polygon that covers an entire layer (or most of it), with clearance cutouts around non-ground features and thermal relief connections to ground pads. You don't route GND with traces. You flood-fill it.&lt;/p&gt;
&lt;p&gt;Freerouting cannot do this.&lt;/p&gt;
&lt;p&gt;The limitation isn't a missing feature that could be added with a few hundred lines of code. It's structural. Freerouting's entire architecture is built around point-to-point trace routing. The maze search, the rip-up scheduler, the optimizer: they all operate on individual connections between pairs of pads. A copper pour is a fundamentally different object. It's not a path from A to B. It's a region that grows to fill available space, adapting its shape around every obstacle on the layer.&lt;/p&gt;
&lt;p&gt;In the source code, copper pours are represented as &lt;code&gt;ConductionArea&lt;/code&gt; objects with a fixed shape set at import time. When the autorouter encounters a net that already has a &lt;code&gt;ConductionArea&lt;/code&gt;, it simply returns &lt;code&gt;CONNECTED_TO_PLANE&lt;/code&gt; and considers the job done. There's no flood-fill algorithm. There's no thermal relief generation. The router expects that the EDA tool (KiCad, pcb-rnd, etc.) has already computed the pour geometry before the DSN file was exported.&lt;/p&gt;
&lt;p&gt;For foreign nets (anything that isn't the pour's net), the &lt;code&gt;ConductionArea&lt;/code&gt; is treated as a hard obstacle. Traces can't cross it. Vias can't be placed inside it. The router routes around it as if it were a solid wall. This is exactly right from a clearance perspective, but it means the router has no ability to create, modify, or extend a pour during the routing process.&lt;/p&gt;
&lt;p&gt;The practical impact is severe for boards with fine-pitch surface-mount parts. On the Giga Shield, each &lt;a href="https://baud.rs/zQqo34"&gt;SN74LVC8T245PW&lt;/a&gt; (TSSOP-24) has three GND pins at 0.65mm pitch. The gap between adjacent pads is roughly 0.25mm. A via needs approximately 0.9mm of space (drill diameter plus annular ring plus clearance). There is physically no room to place a via next to a TSSOP-24 GND pad and connect it to a GND trace on another layer. The router can see the GND pad, it can see that it needs to be connected to other GND pads, but it cannot find a valid path because there is no valid path using its vocabulary of traces and vias.&lt;/p&gt;
&lt;p&gt;A copper pour solves this trivially. The pad sits directly on (or thermally connects to) the pour polygon. No via needed. No trace routing needed. The connectivity is implied by physical overlap. But this is a concept that simply doesn't exist in Freerouting's model of the world.&lt;/p&gt;
&lt;p&gt;On the Giga Shield project, this limitation manifested as a hard floor of 5-6 unrouted GND connections that no amount of optimization could resolve. I threw 128 parallel instances at the problem across three machines. I tried 2-layer, 4-layer, and 6-layer board configurations. I wrote custom post-processing scripts to add GND vias and MST-based bottom-layer routing. None of it worked within DRC constraints. The geometry was simply too tight. We ended up solving it with a different tool entirely, which is a story for the &lt;a href="https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html"&gt;next article in this series&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;The Shove Machine&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/giga-shield/giga_shield_freerouted_bottom.png" alt="Bottom layer of the Freerouting result: dense trace routing showing how the shove algorithm packs traces tightly between through-hole pin rows" style="float: right; max-width: 420px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;
&lt;em&gt;Bottom layer. The shove algorithm packs traces tightly between through-hole pin rows.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;One of Freerouting's more sophisticated subsystems is its forced insertion with shove mechanism. When the A* search finds that the optimal path for a new trace passes through space occupied by an existing trace, the router doesn't immediately give up or rip up the obstacle. Instead, it tries to push the obstacle aside.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;ForcedPadAlgo&lt;/code&gt; and &lt;code&gt;ShoveTraceAlgo&lt;/code&gt; classes implement this recursively. When a new trace needs to go where an existing trace is, the existing trace is nudged perpendicular to the new trace's path. If that nudge collides with a third trace, the third trace is nudged too, and so on, up to a configurable recursion depth (default: 20 levels for traces, 5 for vias). Only if the shove cascade exceeds this depth does the router fall back to ripping up the blocking item.&lt;/p&gt;
&lt;p&gt;This is the routing equivalent of parallel parking in a tight spot. Instead of abandoning the space, you bump the neighboring cars just enough to fit. It produces much denser routing than a pure rip-up approach, especially on boards with tight clearances and many competing nets.&lt;/p&gt;
&lt;p&gt;After every trace insertion, a pull-tight pass (&lt;code&gt;PullTightAlgo&lt;/code&gt;) smooths and shortens all traces in the affected area. This is a local optimization that removes unnecessary corners, straightens diagonal segments, and reduces total trace length. The combination of global A* search, local shove, and pull-tight smoothing produces routing quality that is competitive with commercial autorouters.&lt;/p&gt;
&lt;h3&gt;Clearance Compensation: Geometry Trick&lt;/h3&gt;
&lt;p&gt;One implementation detail worth highlighting is how Freerouting handles clearance checking. Rather than testing "does this trace violate clearance with that via?" as a separate geometric predicate, Freerouting inflates every item's shape by its clearance value when storing it in the search tree. A trace with 0.254mm clearance is stored as a shape 0.254mm wider on each side. A via with 0.127mm clearance is stored as a circle 0.127mm larger in radius.&lt;/p&gt;
&lt;p&gt;This transforms all clearance checks into simple overlap tests. If two inflated shapes overlap in the search tree, there's a clearance violation. If they don't, there isn't. No separate clearance computation is needed during routing. The free-space rooms computed by the maze search are automatically clearance-legal by construction, because they're defined as the gaps between pre-inflated obstacles.&lt;/p&gt;
&lt;p&gt;This is an instance of the &lt;a href="https://en.wikipedia.org/wiki/Minkowski_addition"&gt;Minkowski sum&lt;/a&gt; from &lt;a href="https://baud.rs/pOehEY"&gt;computational geometry&lt;/a&gt;. The inflated obstacle shape is the Minkowski sum of the original shape and a disc of radius equal to the clearance. The free space is the complement of the union of all inflated obstacles. It's mathematically clean and computationally efficient.&lt;/p&gt;
&lt;h3&gt;Strengths and Weaknesses&lt;/h3&gt;
&lt;p&gt;After reading through the source and pushing the router to its limits, here's my honest assessment.&lt;/p&gt;
&lt;p&gt;Strengths:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Gridless geometry. The shape-based approach produces routing that uses space optimally, without the artifacts of grid snapping. Traces can be placed at any position and any angle (in the selected mode), not just on grid points.&lt;/li&gt;
&lt;li&gt;Mathematically sound core. The A* search with admissible heuristic guarantees optimal single-net routing. The rip-up-and-reroute scheduler provides a practical framework for multi-net optimization. These are well-understood algorithms with decades of theoretical backing.&lt;/li&gt;
&lt;li&gt;Shove + pull-tight. The forced insertion mechanism and post-routing optimization produce dense, clean routing that competes with commercial tools for signal traces.&lt;/li&gt;
&lt;li&gt;Reproducibility. Deterministic algorithm, text-based input/output, command-line interface. Same input always produces the same output. You can script it, parallelize it, and integrate it into CI pipelines.&lt;/li&gt;
&lt;li&gt;Open source. You can read the code, modify the cost functions, change the heuristics, rebuild for different Java versions, and understand exactly what the tool is doing. That's rare in EDA.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Weaknesses:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No copper pour support. The most significant limitation. Any board with a meaningful ground net requires manual post-processing or a different tool for GND connectivity. This eliminates Freerouting from the running for most production boards with fine-pitch ICs.&lt;/li&gt;
&lt;li&gt;Single-threaded core. The maze search is inherently sequential. Multi-threading exists in the codebase but only at the item level (different connections routed by different threads), not within the search itself. On modern multi-core machines, this leaves most of the CPU idle.&lt;/li&gt;
&lt;li&gt;Net ordering sensitivity. The same board produces meaningfully different results depending on input order, with no built-in intelligence about which order is likely to be best. The disabled sort-by-distance suggests the developers tried and found it counterproductive.&lt;/li&gt;
&lt;li&gt;GUI initialization in batch mode. Freerouting's Swing UI code initializes even when running headless with &lt;code&gt;-de&lt;/code&gt;/&lt;code&gt;-do&lt;/code&gt; flags. On servers without X11, this requires xvfb or a virtual framebuffer, adding deployment complexity to what should be a pure command-line tool.&lt;/li&gt;
&lt;li&gt;Version regression. Freerouting v2.1.0 produced dramatically worse results than v1.9.0 on the same board (152 unrouted vs 6). The newer version isn't always better.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The board you see in the images above was v0.2: nine SN74LVC8T245PW shifters, 72 channels, fully routed by Freerouting on two layers. I was ready to submit it to &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt; for fabrication. Then I counted the GPIO pins one more time.&lt;/p&gt;
&lt;p&gt;The Arduino Giga R1 has 76 digital I/O pins that need level shifting, plus a handful of analog and control lines. Nine 8-channel shifters give you 72 channels. That's not enough. I was four signals short. The board needed a tenth IC, which meant reworking the layout, adding more decoupling caps, and re-routing everything. The v0.2 design that Freerouting had spent hours optimizing was going in the bin.&lt;/p&gt;
&lt;p&gt;With ten shifters instead of nine, the board got denser. The GND problem got worse. And the copper pour limitation that was already a hard floor at 5-6 unrouted connections on the 9-IC board became completely impassable on the 10-IC version. I threw 128 parallel Freerouting instances at it across three machines. I tried 2-layer, 4-layer, and 6-layer configurations. I wrote custom post-processing scripts for MST-based ground routing and copper pour stitching. None of it produced a clean board within DRC constraints.&lt;/p&gt;
&lt;p&gt;The solution came from an unexpected direction: &lt;a href="https://quilter.ai"&gt;Quilter.ai&lt;/a&gt;, an AI-powered PCB router that understands copper zones. It routed the 10-IC, 6-layer board with zero unrouted nets on the first attempt. The full story of that journey, from massively parallel Freerouting across a home lab cluster to the moment Quilter solved it in one shot, is coming in Part 2 of the &lt;a href="https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html"&gt;Giga Shield redesign series&lt;/a&gt;. If the mathematics of A* is the beauty of PCB routing, the GND problem is where theory meets the physical constraints of 0.65mm-pitch IC packages, and the theory blinks first.&lt;/p&gt;
&lt;p&gt;The source code for all of this, including the board generator, the net shuffler, the parallel routing scripts, and the post-processing tools, is available in the &lt;a href="https://github.com/ajokela/giga-shield"&gt;giga-shield repository&lt;/a&gt;.&lt;/p&gt;</description><category>a-star</category><category>algorithms</category><category>autorouting</category><category>eda</category><category>freerouting</category><category>hardware</category><category>mathematics</category><category>open-source</category><category>pcb design</category><guid>https://tinycomputers.io/posts/the-mathematics-of-pcb-trace-routing.html</guid><pubDate>Sun, 15 Mar 2026 16:00:00 GMT</pubDate></item><item><title>Processing 51,000 Photos with AI on AMD Strix Halo</title><link>https://tinycomputers.io/posts/processing-51000-photos-with-ai-on-amd-strix-halo.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/processing-51000-photos-with-ai-on-amd-strix-halo_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;17 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;I have roughly 20 years of photos sitting on a home fileserver. They span 2001 to 2020, shot on everything from a &lt;a href="https://baud.rs/StrgMz"&gt;Minolta&lt;/a&gt; DiMAGE F100 to a &lt;a href="https://baud.rs/qJQjcb"&gt;Nikon D5100&lt;/a&gt; to various iPhones over the years. A mix of 21,554 JPEGs and 29,860 Nikon RAW files (51,414 images total) organized in a &lt;a href="https://amzn.to/4lwULpW"&gt;Lightroom&lt;/a&gt; backup directory by year, month, and date. Most were shot handheld, many in a hurry. The kind of archive that accumulates when you take photos for two decades without ever going back to curate them.&lt;/p&gt;
&lt;p&gt;The Lightroom catalog that once made sense of all this was long gone, lost to a drive migration somewhere around 2018. What remained was a directory tree of raw files with no organization beyond the date folders. No star ratings, no keywords, no collections. Just files. Thousands of them, some sideways, some crooked, all unlabeled.&lt;/p&gt;
&lt;p&gt;I wanted to fix that. Not manually (I don't have a month to spend in Lightroom) but programmatically. The goals were straightforward: correct orientation issues, straighten crooked horizons, generate AI descriptions of every photo's content, and catalog the whole archive in a queryable database. The kind of batch processing job that would have been impractical five years ago but is now entirely doable with the right hardware and a weekend of scripting.&lt;/p&gt;
&lt;h3&gt;The Hardware&lt;/h3&gt;
&lt;p&gt;Two machines on the local network, each with a distinct role:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Machine&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Key Specs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fileserver&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NAS / photo storage&lt;/td&gt;
&lt;td&gt;28TB RAID (&lt;code&gt;/md0&lt;/code&gt;), 125GB RAM, NFS exports&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU workstation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ML inference&lt;/td&gt;
&lt;td&gt;&lt;a href="https://baud.rs/6jjmD9"&gt;AMD Ryzen AI Max+ 395&lt;/a&gt;, Radeon 8060S, 121GB RAM&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The fileserver is a straightforward storage box. The interesting machine is the GPU workstation running an AMD Strix Halo APU, specifically the AI Max+ 395 with its integrated Radeon 8060S. I've written about this chip &lt;a href="https://tinycomputers.io/posts/amd-ai-max+-395-system-review-a-comprehensive-analysis.html"&gt;before&lt;/a&gt;, and it continues to impress for inference workloads. The RDNA 3.5 integrated GPU shares system memory, giving it access to 65.2 GB of VRAM without the typical constraints of a discrete card. For a model like BLIP that needs maybe 2 GB, that's absurdly generous, but it means you never have to think about VRAM budgets, which is a luxury when you're iterating on a processing pipeline.&lt;/p&gt;
&lt;p&gt;The fileserver already had NFS configured, exporting &lt;code&gt;/md0&lt;/code&gt; to the local subnet. One mount command on the GPU workstation and both machines could see the same filesystem:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;mount&lt;span class="w"&gt; &lt;/span&gt;-t&lt;span class="w"&gt; &lt;/span&gt;nfs&lt;span class="w"&gt; &lt;/span&gt;fileserver.localnet:/md0&lt;span class="w"&gt; &lt;/span&gt;/md0
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;No file copying, no rsync scripts, no staging directories. The photos live on the NAS and get processed in-place over the network. Gigabit Ethernet introduces some I/O overhead (each 25 MB NEF file takes 200–300ms to read across the wire), but for an overnight batch job, the simplicity of a single shared filesystem is worth the throughput trade-off. If this were a recurring workflow, I'd invest in 10GbE, but for a one-time archive processing run, gigabit got it done.&lt;/p&gt;
&lt;h3&gt;The Software Stack&lt;/h3&gt;
&lt;p&gt;Everything runs in a Python virtual environment on the GPU workstation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;PyTorch 2.9.1+rocm6.3&lt;/strong&gt;: ML framework with AMD ROCm backend&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BLIP&lt;/strong&gt; (&lt;a href="https://huggingface.co/Salesforce/blip-image-captioning-large"&gt;&lt;code&gt;Salesforce/blip-image-captioning-large&lt;/code&gt;&lt;/a&gt;): vision-language model for image captioning&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenCV 4.13&lt;/strong&gt;: horizon detection via Canny edge detection and Hough transforms&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;rawpy 0.26.1&lt;/strong&gt;: Nikon NEF/NRW decoding (wraps LibRaw)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;piexif&lt;/strong&gt;: EXIF metadata extraction for JPEGs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;exiftool&lt;/strong&gt;: EXIF extraction for RAW files (called as a subprocess)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SQLite&lt;/strong&gt;: metadata and results database&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;The gfx1151 Situation&lt;/h4&gt;
&lt;p&gt;If you've followed my &lt;a href="https://tinycomputers.io/posts/getting-pytorch-working-with-amd-radeon-pro-w7900-max+-395-a-comprehensive-guide.html"&gt;previous posts on Strix Halo&lt;/a&gt;, you know the drill. The Radeon 8060S reports as &lt;code&gt;gfx1151&lt;/code&gt; in ROCm, which is newer than what PyTorch's ROCm wheels officially target. The fix is the same environment variable override:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nb"&gt;export&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;HSA_OVERRIDE_GFX_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;11&lt;/span&gt;.0.0
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This maps the GPU to a generic gfx11 target. In practice, it works without issues, with no compute errors and no performance penalties. ROCm 6.16 on this machine also reports &lt;code&gt;amdgcn-amd-amdhsa--gfx11-generic&lt;/code&gt; as a supported ISA, which is likely why the override works cleanly. I've been running production workloads with this flag for months now without incident.&lt;/p&gt;
&lt;h3&gt;The Processing Pipeline&lt;/h3&gt;
&lt;p&gt;Each photo passes through five stages: EXIF extraction, orientation correction, horizon detection and straightening, AI captioning, and finally saving the corrected image and cataloging everything in SQLite.&lt;/p&gt;
&lt;h4&gt;EXIF Metadata Extraction&lt;/h4&gt;
&lt;p&gt;For JPEGs, &lt;code&gt;piexif&lt;/code&gt; reads the embedded EXIF data directly; it's a pure Python library that parses the binary EXIF structure without needing any external dependencies. For NEF/NRW files, piexif can't handle Nikon's proprietary container format, so I shell out to &lt;code&gt;exiftool&lt;/code&gt; with JSON output (&lt;code&gt;exiftool -json -n &amp;lt;file&amp;gt;&lt;/code&gt;). The &lt;code&gt;-n&lt;/code&gt; flag is important; it returns numeric values instead of human-readable strings, which makes downstream processing much cleaner.&lt;/p&gt;
&lt;p&gt;The extracted fields cover the full gamut: camera make and model, lens, dates, exposure settings (shutter speed, aperture, ISO, focal length), flash, white balance, metering mode, GPS coordinates, and the original orientation tag.&lt;/p&gt;
&lt;p&gt;EXIF data is notoriously inconsistent across two decades of cameras. I'll come back to this; it became a debugging story of its own.&lt;/p&gt;
&lt;h4&gt;Orientation Correction&lt;/h4&gt;
&lt;p&gt;The EXIF orientation tag (values 1 through 8) encodes how the camera was held when the photo was taken. A value of 1 means the image is right-side up. A value of 6 means the camera was rotated 90 degrees clockwise. Value 3 means 180 degrees. Some values encode horizontal or vertical flips. The full matrix looks like this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FLIP_LEFT_RIGHT&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ROTATE_180&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FLIP_TOP_BOTTOM&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FLIP_LEFT_RIGHT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ROTATE_270&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ROTATE_270&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FLIP_LEFT_RIGHT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ROTATE_90&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ROTATE_90&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Out of the 51,411 successfully processed photos, &lt;strong&gt;8,797 (17.1%) needed orientation correction&lt;/strong&gt;. The majority came from the Nikon D5100 and iPhone 4, both of which set the orientation tag but don't bake the rotation into the pixel data itself. Without this correction, nearly one in five photos would display sideways or upside-down in any viewer that doesn't respect EXIF orientation.&lt;/p&gt;
&lt;p&gt;Here's what that looks like in practice. The raw pixel data from this iPhone photo is stored sideways; the camera recorded an EXIF orientation tag of 6, meaning "rotate 90 degrees clockwise to display correctly." Any viewer that ignores that tag renders the image on its side:&lt;/p&gt;
&lt;div style="display: flex; gap: 10px; margin: 20px 0;"&gt;
&lt;div style="flex: 1; text-align: center;"&gt;
&lt;img src="https://tinycomputers.io/images/photo-proc-dog-before.jpg" alt="Dog photo with incorrect orientation - displayed sideways" style="max-width: 100%; box-shadow: 2px 2px 6px rgba(0,0,0,0.3);"&gt;
&lt;p&gt;&lt;em&gt;Before: raw pixel data (EXIF orientation 6, displayed sideways)&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div style="flex: 1; text-align: center;"&gt;
&lt;img src="https://tinycomputers.io/images/photo-proc-dog-after.jpg" alt="Dog photo after EXIF orientation correction - displayed upright" style="max-width: 100%; box-shadow: 2px 2px 6px rgba(0,0,0,0.3);"&gt;
&lt;p&gt;&lt;em&gt;After: orientation corrected based on EXIF tag&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;h4&gt;Horizon Detection and Straightening&lt;/h4&gt;
&lt;p&gt;This stage uses classical computer vision, no neural network needed. The approach:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Downscale the image to 1200px on the long side for speed&lt;/li&gt;
&lt;li&gt;Convert to grayscale, apply Gaussian blur&lt;/li&gt;
&lt;li&gt;Run Canny edge detection&lt;/li&gt;
&lt;li&gt;Crop to the vertical middle 50%, since the horizon is rarely at the extreme top or bottom of a frame&lt;/li&gt;
&lt;li&gt;Apply the Hough Line Transform to find line segments, requiring a minimum length of one-quarter the image width&lt;/li&gt;
&lt;li&gt;Filter to near-horizontal lines (within 20 degrees of level)&lt;/li&gt;
&lt;li&gt;Compute a weighted average of the detected angles, weighted by line length&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The key is the threshold window. If the detected angle is less than 0.5 degrees, it's not worth correcting, since you'd introduce interpolation artifacts for no visible benefit. If it's greater than 15 degrees, it's probably not a tilted horizon at all; it's either intentional composition or the algorithm latching onto a staircase railing. The correction itself uses &lt;code&gt;cv2.warpAffine&lt;/code&gt; with Lanczos interpolation and a reflective border mode, followed by an inward crop to eliminate any border artifacts:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;crop_factor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;angle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;sin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;angle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The initial implementation used Canny edge detection and Hough line transforms, classical CV techniques from the 1980s. Fast, deterministic, 100ms per image. But it had a fatal flaw: it couldn't distinguish between a tilted horizon and a roofline receding toward a vanishing point. Architecture, roads, staircases, any strong line in the middle band of the image would register as a "tilted horizon," and the algorithm would dutifully rotate the image to "correct" it. In practice, this meant a significant number of photos were being made &lt;em&gt;worse&lt;/em&gt;, not better.&lt;/p&gt;
&lt;p&gt;The fix was to replace Hough line detection with semantic segmentation. SegFormer (&lt;code&gt;nvidia/segformer-b2-finetuned-ade-512-512&lt;/code&gt;), trained on the ADE20K dataset, segments each image into 150 classes, including sky. The approach is simple: find the sky pixels, trace the bottom edge of the sky region, fit a line to that boundary, and measure its angle. If there's no sky (less than 5% of the image), or the sky boundary is too fragmented (fewer than 20 points), skip the correction entirely.&lt;/p&gt;
&lt;p&gt;This eliminates false positives on indoor shots, close-ups, architecture, and anything without a visible sky. SegFormer runs on CPU at about 0.4 seconds per image; the model is only 25M parameters, so it doesn't need the GPU. The GPU stays dedicated to BLIP captioning.&lt;/p&gt;
&lt;p&gt;Two examples from the corrected archive. This bridge over a river had a 2.68-degree clockwise tilt, and the bridge deck and far shore are visibly leveled:&lt;/p&gt;
&lt;div style="display: flex; gap: 10px; margin: 20px 0;"&gt;
&lt;div style="flex: 1; text-align: center;"&gt;
&lt;img src="https://tinycomputers.io/images/photo-proc-river-before.jpg" alt="Bridge over river with tilted horizon" style="max-width: 100%; box-shadow: 2px 2px 6px rgba(0,0,0,0.3);"&gt;
&lt;p&gt;&lt;em&gt;Before: 2.68° clockwise tilt&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div style="flex: 1; text-align: center;"&gt;
&lt;img src="https://tinycomputers.io/images/photo-proc-river-after.jpg" alt="Bridge over river with corrected horizon" style="max-width: 100%; box-shadow: 2px 2px 6px rgba(0,0,0,0.3);"&gt;
&lt;p&gt;&lt;em&gt;After: horizon straightened via sky boundary detection&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;This rocky Lake Superior shore had a 3.85-degree clockwise tilt, and the far horizon is leveled:&lt;/p&gt;
&lt;div style="display: flex; gap: 10px; margin: 20px 0;"&gt;
&lt;div style="flex: 1; text-align: center;"&gt;
&lt;img src="https://tinycomputers.io/images/photo-proc-shore-before.jpg" alt="Rocky lakeshore with tilted horizon" style="max-width: 100%; box-shadow: 2px 2px 6px rgba(0,0,0,0.3);"&gt;
&lt;p&gt;&lt;em&gt;Before: 3.85° clockwise tilt&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div style="flex: 1; text-align: center;"&gt;
&lt;img src="https://tinycomputers.io/images/photo-proc-shore-after.jpg" alt="Rocky lakeshore with corrected horizon" style="max-width: 100%; box-shadow: 2px 2px 6px rgba(0,0,0,0.3);"&gt;
&lt;p&gt;&lt;em&gt;After: horizon straightened&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;h4&gt;AI Captioning with BLIP&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;Salesforce/blip-image-captioning-large&lt;/code&gt; model generates natural language descriptions of each photo. It runs in float16 on the Radeon 8060S. Each image is resized to a maximum of 1024px before inference. Beam search with 5 beams and a 75-token limit generates the caption:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;output_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_beams&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;early_stopping&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Caption inference takes about 0.5–0.7 seconds per image, consistent regardless of whether the input was a JPEG or a decoded NEF. The model handles a wide variety of subjects surprisingly well. Some examples from the archive:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"a brown and white dog standing next to a blue chair"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"two silos sitting in the middle of a field"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"a bird sitting on a branch of a tree"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"a wooden sign that says hoban road in front of some trees"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"a blurry photo of a car driving down a snowy road"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"a dog being groomed by a woman in a salon"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The captions tend toward a "there is a..." pattern, and they occasionally get details wrong (BLIP once described a photo of my living room as "a hotel lobby," which is generous). But for searchability and cataloging purposes, they're remarkably useful. Being able to query &lt;code&gt;WHERE caption LIKE '%dog%'&lt;/code&gt; across 51,000 photos and get meaningful results is something that would have required manual tagging before models like BLIP existed. For an archive this size, "good enough" captions on every photo are vastly more useful than perfect captions on none of them.&lt;/p&gt;
&lt;h4&gt;Save and Catalog&lt;/h4&gt;
&lt;p&gt;Corrected images are saved as high-quality JPEGs (quality 92) to &lt;code&gt;/md0/photos_processed/images/&lt;/code&gt;, mirroring the original directory structure. NEF and NRW files are converted to JPEG in the process; the corrected archive is a uniform format. All metadata flows into a SQLite database with WAL journaling, tracking 40+ fields per photo: every piece of EXIF data, processing flags (was orientation corrected? was the horizon straightened? by how many degrees?), the AI caption, file hashes, dimensions, and processing timestamps.&lt;/p&gt;
&lt;p&gt;The database makes the archive queryable in ways that were never possible before:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;-- What cameras did I use, and when?&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;camera_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_taken&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_taken&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;photos&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;camera_model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Photos with GPS data&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;caption&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gps_latitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gps_longitude&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;photos&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gps_latitude&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- How crooked were my photos, by camera?&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;camera_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ABS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;horizon_angle_degrees&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;avg_tilt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;photos&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;horizon_corrected&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;camera_model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;avg_tilt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;The EXIF Tuple Bug&lt;/h3&gt;
&lt;p&gt;The first processing pass completed 51,414 photos, but with 2,146 errors. All of them were &lt;code&gt;TypeError: type tuple doesn't define __round__ method&lt;/code&gt;. For a pipeline that had been running cleanly on thousands of Nikon D5100 and D60 photos, this was unexpected.&lt;/p&gt;
&lt;p&gt;The root cause turned out to be a two-part problem with how certain budget cameras from the 2008–2012 era write EXIF rational numbers.&lt;/p&gt;
&lt;h4&gt;Part 1: Malformed Tuples&lt;/h4&gt;
&lt;p&gt;The EXIF standard stores rational numbers as &lt;code&gt;(numerator, denominator)&lt;/code&gt; pairs. Most cameras follow this. But some, particularly a batch of older point-and-shoots, wrote the &lt;code&gt;ExposureBiasValue&lt;/code&gt; field as a 4-element tuple like &lt;code&gt;(36, 0, 18, 0)&lt;/code&gt; instead of the expected 2-element &lt;code&gt;(36, 0)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;My &lt;code&gt;_rational_to_float&lt;/code&gt; helper only handled 2-tuples:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_rational_to_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;  &lt;span class="c1"&gt;# passes through 4-tuples as raw tuples&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When a 4-tuple fell through, the downstream &lt;code&gt;round()&lt;/code&gt; call choked on it. The fix was simple: return &lt;code&gt;None&lt;/code&gt; for any tuple that isn't a standard rational pair.&lt;/p&gt;
&lt;h4&gt;Part 2: None Propagation&lt;/h4&gt;
&lt;p&gt;Even after fixing Part 1, many of these same cameras had written &lt;code&gt;(36, 0)&lt;/code&gt;, a rational with a zero denominator. The function correctly returned &lt;code&gt;None&lt;/code&gt; for division by zero, but the calling code then did &lt;code&gt;round(None, 2)&lt;/code&gt;, triggering the same &lt;code&gt;TypeError&lt;/code&gt; with a slightly different message.&lt;/p&gt;
&lt;p&gt;The fix was a &lt;code&gt;_safe_round&lt;/code&gt; wrapper:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_safe_round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;digits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;digits&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="ne"&gt;TypeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After both fixes, the second pass recovered all 2,143 photos. The remaining 3 errors were genuine file corruption: a truncated JPEG, a NEF that LibRaw couldn't parse, and a NEF with filesystem-level I/O errors. Probably bad sectors on the source drive. Those can't be fixed in code.&lt;/p&gt;
&lt;p&gt;This is one of those bugs that only surfaces at scale. Run the pipeline on a hundred Nikon photos and everything works perfectly. Run it on 51,000 photos spanning 15 different camera models over 20 years, and every edge case in the EXIF spec comes out to play. The lesson, which I should have internalized long ago: never trust external data formats at scale without defensive parsing on every field. The EXIF spec is a suggestion, not a contract, and camera manufacturers have been interpreting it creatively since the early 2000s.&lt;/p&gt;
&lt;h3&gt;Resumability&lt;/h3&gt;
&lt;p&gt;A 15-hour batch job will inevitably need to be restarted: bugs, system updates, a random hound disconnects the magsafe power cord from my MacBook Pro. The script tracks progress in SQLite and skips completed files on restart:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;is_already_processed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;"SELECT id FROM photos WHERE source_path = ? AND error IS NULL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_path&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Photos that failed with errors are intentionally &lt;em&gt;not&lt;/em&gt; skipped, so fixing a bug and re-running automatically retries them. This made the EXIF debugging cycle painless: fix the parser, clear the failed rows from the database, relaunch, and only the 2,143 affected photos get reprocessed.&lt;/p&gt;
&lt;h3&gt;Performance&lt;/h3&gt;
&lt;p&gt;The pipeline sustained &lt;strong&gt;1.0–1.8 photos per second&lt;/strong&gt;, depending on file format:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Time per Photo&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;JPEG load&lt;/td&gt;
&lt;td&gt;~10ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NEF decode (rawpy)&lt;/td&gt;
&lt;td&gt;~400ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MD5 hash&lt;/td&gt;
&lt;td&gt;~5ms (JPEG), ~100ms (NEF)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Horizon detection&lt;/td&gt;
&lt;td&gt;~100ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BLIP inference&lt;/td&gt;
&lt;td&gt;~500–700ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JPEG save&lt;/td&gt;
&lt;td&gt;~50ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;BLIP inference dominates the runtime. NEF decoding is the second bottleneck; each RAW file is 20–30 MB and requires full demosaicing through LibRaw. The NFS overhead for reading large NEFs over gigabit Ethernet is noticeable but not the primary constraint.&lt;/p&gt;
&lt;p&gt;Total wall time: &lt;strong&gt;15.5 hours&lt;/strong&gt; across two passes for 51,414 photos. The BLIP model uses roughly 2 GB of the 65.2 GB available VRAM on the Strix Halo. Memory was never a concern.&lt;/p&gt;
&lt;h3&gt;Final Results&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;Percentage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total photos&lt;/td&gt;
&lt;td&gt;51,414&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Successfully processed&lt;/td&gt;
&lt;td&gt;51,411&lt;/td&gt;
&lt;td&gt;99.99%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orientation corrected&lt;/td&gt;
&lt;td&gt;8,797&lt;/td&gt;
&lt;td&gt;17.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Horizon straightened&lt;/td&gt;
&lt;td&gt;15,251&lt;/td&gt;
&lt;td&gt;29.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI captioned&lt;/td&gt;
&lt;td&gt;51,411&lt;/td&gt;
&lt;td&gt;99.99%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unrecoverable errors&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0.006%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The top cameras in the archive tell the story of 20 years of gear:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Camera&lt;/th&gt;
&lt;th&gt;Photos&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://baud.rs/qJQjcb"&gt;Nikon D5100&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;24,073&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://baud.rs/mwoMko"&gt;Nikon D60&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;8,734&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iPhone 4&lt;/td&gt;
&lt;td&gt;2,664&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://baud.rs/ACrtrD"&gt;Nikon D3100&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1,698&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://baud.rs/jxhHU5"&gt;Panasonic DMC-FX07&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;975&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://baud.rs/StrgMz"&gt;Minolta DiMAGE F100&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;870&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iPad&lt;/td&gt;
&lt;td&gt;803&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iPhone 5s&lt;/td&gt;
&lt;td&gt;698&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://baud.rs/10it3U"&gt;Samsung SCH-I500&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;645&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The output lives on the NAS:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Corrected images&lt;/strong&gt;: &lt;code&gt;/md0/photos_processed/images/&lt;/code&gt;, 51,411 JPEGs preserving the original year/month/date folder structure, all NEFs converted, all orientation and horizon corrections applied.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SQLite database&lt;/strong&gt;: &lt;code&gt;/md0/photos_processed/photos.db&lt;/code&gt;, 40+ fields per photo with full EXIF metadata, processing results, and AI-generated captions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Processing log&lt;/strong&gt;: &lt;code&gt;/md0/photos_processed/processing.log&lt;/code&gt;, timestamped record of the entire run.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Takeaways&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;AMD's Strix Halo continues to earn its keep for ML inference.&lt;/strong&gt; The &lt;code&gt;HSA_OVERRIDE_GFX_VERSION=11.0.0&lt;/code&gt; workaround remains necessary, but once set, PyTorch and ROCm run without complaints. The 65 GB shared VRAM pool means you can load models without thinking about memory budgets, a workflow advantage that's easy to underestimate until you've experienced it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Classical computer vision still has its place.&lt;/strong&gt; The horizon detection pipeline uses Canny edge detection and Hough transforms, techniques from the 1980s. No training data, no GPU needed, deterministic results, and the whole thing runs in 100ms per image. For geometric corrections on photographic images, you don't need a neural network. You need line detection.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;EXIF is a minefield.&lt;/strong&gt; Twenty years of cameras from different manufacturers means every edge case in the spec gets exercised. Tuple lengths vary, denominators are zero, fields are missing or repurposed. If you're parsing EXIF at scale, assume nothing about the data's shape and validate everything.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Resumability is non-negotiable for long-running jobs.&lt;/strong&gt; Tracking progress in the database and skipping completed work made it trivial to iterate on bugs. Without this, every fix would mean reprocessing 51,000 photos from scratch.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NFS over gigabit is fine for batch processing.&lt;/strong&gt; Not optimal, but for an overnight job, the network overhead from NAS-attached storage is acceptable. The real bottleneck was ML inference at 0.6 seconds per photo. If I were doing this regularly, 10GbE would be worth the upgrade, but for a one-time archive processing run, gigabit got the job done.&lt;/p&gt;
&lt;p&gt;The whole project, from first SSH to final database entry, took about a day of wall time, most of which was unattended processing. The scripting itself was maybe three hours of work. Twenty years of photos, cataloged and corrected overnight. Not bad for a Strix Halo and some Python. The full source is available on &lt;a href="https://github.com/ajokela/photo-processor"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What I didn't expect was how useful the database would be after the fact. Being able to ask "show me every photo I took with the D5100 at ISO 3200 or higher" or "find photos with GPS data from 2015" turns a pile of files into something that actually tells a story. The AI captions add another dimension; I can now search my own photo archive by content, not just metadata. It's the kind of capability that makes you wonder why photo management software hasn't done this for years. The models have been available. The hardware has been affordable. Someone just needed to wire it together.&lt;/p&gt;</description><category>ai max+ 395</category><category>amd</category><category>blip</category><category>computer vision</category><category>exif</category><category>image captioning</category><category>machine learning</category><category>nef</category><category>nikon</category><category>opencv</category><category>photography</category><category>pytorch</category><category>rocm</category><category>sqlite</category><category>strix halo</category><guid>https://tinycomputers.io/posts/processing-51000-photos-with-ai-on-amd-strix-halo.html</guid><pubDate>Sat, 14 Mar 2026 17:00:00 GMT</pubDate></item><item><title>Redesigning a PCB with Claude Code and Open-Source EDA Tools (Part 1)</title><link>https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;20 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;div class="sponsor-widget"&gt;
&lt;div class="sponsor-widget-header"&gt;&lt;a href="https://baud.rs/youwpy"&gt;&lt;img src="https://tinycomputers.io/images/pcbway-logo.png" alt="PCBWay" style="height: 22px; vertical-align: middle; margin-right: 8px;"&gt;&lt;/a&gt; Sponsored Hardware&lt;/div&gt;
&lt;p&gt;This project was made possible by &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt;, who sponsored the fabrication of the redesigned GigaShield v0.2 level converter board. PCBWay offers PCB prototyping, assembly, CNC machining, and 3D printing services, from one-off prototypes to production runs. If you have a PCB design ready to go, check them out at &lt;a href="https://baud.rs/youwpy"&gt;pcbway.com&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img id="pcb-top-img" src="https://tinycomputers.io/images/giga-shield/giga-shield-v02-top.png" alt="GigaShield v0.2 PCB top view: routed two-layer board with 9 SN74LVC8T245PW level shifters, generated with Python and autorouted with Freerouting" style="float: right; max-width: 420px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); cursor: zoom-in;"&gt;&lt;/p&gt;
&lt;div id="img-modal" class="modal" onclick="this.style.display='none'"&gt;
&lt;span class="close" onclick="document.getElementById('img-modal').style.display='none'"&gt;×&lt;/span&gt;
&lt;img class="modal-content" id="modal-img"&gt;
&lt;div id="caption"&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;script&gt;
(function() {
    var img = document.getElementById('pcb-top-img');
    var modal = document.getElementById('img-modal');
    var modalImg = document.getElementById('modal-img');
    var caption = document.getElementById('caption');
    img.onclick = function() {
        modal.style.display = 'block';
        modalImg.src = this.src;
        caption.textContent = this.alt;
    };
    document.addEventListener('keydown', function(e) {
        if (e.key === 'Escape' &amp;&amp; modal.style.display === 'block') {
            modal.style.display = 'none';
        }
    });
})();
&lt;/script&gt;

&lt;p&gt;In January, I &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;spent $468 on Fiverr&lt;/a&gt; to have a professional design an &lt;a href="https://baud.rs/poSQeo"&gt;Arduino Giga R1&lt;/a&gt; shield with level shifters. It was a good design. Nine &lt;a href="https://baud.rs/y9JJt9"&gt;TXB0108PW&lt;/a&gt; bidirectional level translators, 72 channels of 3.3V-to-5V shifting, a clean two-layer board ready for fabrication. And then I started testing it with the &lt;a href="https://baud.rs/87wbBL"&gt;RetroShield Z80&lt;/a&gt;, and the auto-sensing level shifters fell apart.&lt;/p&gt;
&lt;p&gt;The TXB0108 is a clever chip. It detects signal direction automatically, so you don't need to tell it whether a pin is input or output. For most applications, that's a feature. For a Z80 bus interface, it's a fatal flaw. During bus cycles, the Z80 tri-states its address and data lines. The outputs go high-impedance. They're not driving high or low, they're floating. The TXB0108 can't determine drive direction from a floating signal. It guesses wrong, or it doesn't drive at all, and the Arduino on the other side sees garbage. The board was blind to half of what the Z80 was doing.&lt;/p&gt;
&lt;p&gt;The fix was clear: replace the TXB0108s with &lt;a href="https://baud.rs/zQqo34"&gt;SN74LVC8T245PW&lt;/a&gt; driven level shifters. The SN74LVC8T245 has an explicit DIR pin: you tell it which direction to translate, and it does exactly that, regardless of whether the signals are being actively driven. No guessing, no ambiguity, deterministic behavior during tri-state periods. The trade-off is that you need a direction control signal for each shifter IC, but that's a small price for reliability.&lt;/p&gt;
&lt;p&gt;What wasn't clear was how to execute the redesign. I could go back to Fiverr for another $400-500. I could spend weeks learning KiCad properly. Or I could try something that had worked surprisingly well on a &lt;a href="https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html"&gt;previous project&lt;/a&gt;: use AI and open-source command-line EDA tools to design the board from a terminal, without ever opening a graphical PCB editor.&lt;/p&gt;
&lt;p&gt;This is part one of a two-part series. This piece covers the design and toolchain: how I used &lt;a href="https://baud.rs/Z6Oq4k"&gt;Claude Code&lt;/a&gt;, the gEDA ecosystem, pcb-rnd, and &lt;a href="https://baud.rs/bdZw62"&gt;Freerouting&lt;/a&gt; to go from a failed design to production-ready Gerber files. Part two will cover the physical boards, assembly, and testing against the Z80.&lt;/p&gt;
&lt;h3&gt;The Toolchain Problem&lt;/h3&gt;
&lt;p&gt;The original Fiverr design was done in KiCad 9.0. My first instinct was to modify it directly: swap the TXB0108 footprints for SN74LVC8T245, update the pin mappings, add the DIR control header, and re-route. But there was a problem. My preferred command-line PCB tool, &lt;a href="https://baud.rs/1J64T5"&gt;pcb-rnd&lt;/a&gt;, is version 3.1.4 on Ubuntu. KiCad 9.0 uses a file format version (20241229) that pcb-rnd's &lt;code&gt;io_kicad&lt;/code&gt; plugin doesn't support. When I tried to open the KiCad PCB:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;unexpected layout version number (perhaps too new)
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Hard stop. No conversion path exists from KiCad 9.0 to pcb-rnd. The formats aren't just different versions. KiCad's S-expression format and pcb-rnd's text-based format are fundamentally different syntaxes.&lt;/p&gt;
&lt;p&gt;I could have started KiCad and used its GUI. But I'd already proven to myself with the &lt;a href="https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html"&gt;dual Z80 RetroShield project&lt;/a&gt; that text-based, AI-assisted PCB workflows are not only possible but sometimes preferable. The gEDA/pcb-rnd file format is human-readable. AI can parse it, reason about it, and generate it. A Python script can manipulate it. You can &lt;code&gt;diff&lt;/code&gt; two boards and see exactly what changed. None of that is true for a graphical-only workflow.&lt;/p&gt;
&lt;p&gt;So the plan became: extract everything useful from the KiCad source files, then rebuild the board from scratch in pcb-rnd's native format using Python. Sound insane? It kind of is. But it worked.&lt;/p&gt;
&lt;h3&gt;Extracting the DNA&lt;/h3&gt;
&lt;p&gt;Even though pcb-rnd couldn't read the KiCad files directly, the KiCad files contained all the design intelligence I needed. Component positions, net assignments, pin mappings, board dimensions. It was all there, just in a format I couldn't import.&lt;/p&gt;
&lt;p&gt;KiCad's CLI tools (&lt;code&gt;kicad-cli&lt;/code&gt;) could export what I needed:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Component positions (X, Y, rotation for each part)&lt;/span&gt;
kicad-cli&lt;span class="w"&gt; &lt;/span&gt;pcb&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;export&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pos&lt;span class="w"&gt; &lt;/span&gt;AlexJ_bz_ArduinoGigaShield.kicad_pcb&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;giga_pos.csv

&lt;span class="c1"&gt;# Netlist connectivity&lt;/span&gt;
kicad-cli&lt;span class="w"&gt; &lt;/span&gt;pcb&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;export&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;ipc2581&lt;span class="w"&gt; &lt;/span&gt;AlexJ_bz_ArduinoGigaShield.kicad_pcb&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;giga_netlist.d356
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The schematic file (&lt;code&gt;AlexJ_bz_ArduinoGigaShield.kicad_sch&lt;/code&gt;) was an S-expression text file I could parse to extract the signal mappings: which Giga pin connects to which 5V header pin through which level shifter channel. This was the most critical piece: getting the net assignments wrong would mean the board physically connects but logically doesn't work.&lt;/p&gt;
&lt;p&gt;This is where Claude Code earned its keep. I described the KiCad schematic structure and asked it to help me parse out the signal mappings. The KiCad schematic uses hierarchical sheets with positional net connections, which isn't the simplest format to work with manually, but straightforward for an AI that can read S-expressions and track net names across sheets. Within an hour, I had a complete mapping of all 72 signal channels across the 9 shifter ICs.&lt;/p&gt;
&lt;h3&gt;Generating the Board with Python&lt;/h3&gt;
&lt;p&gt;With positions and nets extracted, I wrote &lt;code&gt;build_giga_shield.py&lt;/code&gt;, a single Python script that generates the entire pcb-rnd board from scratch. No GUI involved. Every component footprint, every pin, every net connection is defined programmatically.&lt;/p&gt;
&lt;p&gt;The script is structured around four generator functions:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;tssop24_element()&lt;/code&gt;&lt;/strong&gt; generates the SN74LVC8T245PW footprint. TSSOP-24 is a precise geometry: 0.65mm pin pitch, 6.4mm pad-to-pad span, 24 pins. The function calculates pad positions mathematically: 12 pins on the left, 12 on the right, with pin 1 marked as square per convention. Getting the pin numbering right was critical. The SN74LVC8T245's datasheet shows pins 1-12 on the left (DIR, A1-A4, GND, A5-A8, OE#, GND) and pins 13-24 on the right counting bottom-to-top (B8-B5, VCCB, B4-B1, VCCA, VCCA, VCCB).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;pin_header_element()&lt;/code&gt;&lt;/strong&gt; handles through-hole pin headers with rotation support. The Arduino Giga R1 has an unusual form factor: the long pin headers run along the board edges horizontally, not vertically. In the original KiCad design, these were placed with 90-degree or -90-degree rotation. Without matching that rotation, a 26-pin header at y=84mm would extend 63.5mm downward to y=148mm, well past the 90mm board edge. The rotation transform was simple once identified:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;rotate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;px&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;py&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rot&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;px&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;rot&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;px&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;px&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;py&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;smd_0603_element()&lt;/code&gt;&lt;/strong&gt; creates the 0603 footprint shared by all 27 decoupling capacitors and 9 pull-down resistors. Small SMD parts, simple geometry.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;mounting_hole_element()&lt;/code&gt;&lt;/strong&gt; places the four 3.2mm mounting holes that align with the Arduino Giga's standoff positions.&lt;/p&gt;
&lt;p&gt;The coordinate system was the trickiest part. KiCad uses an arbitrary origin; in this design, x=106mm, y=30.5mm. pcb-rnd uses (0,0). Every KiCad coordinate had to be translated:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;KX&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;KY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;106.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;30.5&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;kpos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ky&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kx&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;KX&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;mm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ky&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;KY&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;build_pcb()&lt;/code&gt; function ties everything together: place components, assign nets, build the symbol table, generate the layer stack, and write out a valid pcb-rnd &lt;code&gt;.pcb&lt;/code&gt; file. Running the script produces a complete, unrouted board: components placed, netlist defined, silkscreen text positioned, board outline drawn. Ready for routing.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;$&lt;span class="w"&gt; &lt;/span&gt;python3&lt;span class="w"&gt; &lt;/span&gt;build_giga_shield.py
Generated&lt;span class="w"&gt; &lt;/span&gt;giga_shield.pcb
Board:&lt;span class="w"&gt; &lt;/span&gt;155mm&lt;span class="w"&gt; &lt;/span&gt;x&lt;span class="w"&gt; &lt;/span&gt;90mm
9x&lt;span class="w"&gt; &lt;/span&gt;SN74LVC8T245PW&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;TSSOP-24&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;level&lt;span class="w"&gt; &lt;/span&gt;shifters
DIR&lt;span class="w"&gt; &lt;/span&gt;control&lt;span class="w"&gt; &lt;/span&gt;via&lt;span class="w"&gt; &lt;/span&gt;J11&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;1x10&lt;span class="w"&gt; &lt;/span&gt;header&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;The Format Wars&lt;/h3&gt;
&lt;p&gt;Getting pcb-rnd to actually accept the generated file was its own adventure. pcb-rnd's parser is strict about things that look optional in the documentation, and its error messages are sometimes misleading. An error in an Element definition might be reported as a syntax error in the Layer section fifty lines later.&lt;/p&gt;
&lt;p&gt;Three format issues bit me hardest:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The &lt;code&gt;"smd"&lt;/code&gt; flag.&lt;/strong&gt; I initially generated elements with &lt;code&gt;Element["smd" "TSSOP24" "U1" ...]&lt;/code&gt;, which seemed logical for surface-mount parts. pcb-rnd rejected it with "Unknown flag: smd ignored," which cascaded into a complete parse failure. The fix: use an empty string &lt;code&gt;Element["" "TSSOP24" "U1" ...]&lt;/code&gt;. The SMD-ness is implicit from using &lt;code&gt;Pad[]&lt;/code&gt; entries instead of &lt;code&gt;Pin[]&lt;/code&gt; entries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Bare zeros.&lt;/strong&gt; pcb-rnd is inconsistent about whether &lt;code&gt;0&lt;/code&gt; and &lt;code&gt;0nm&lt;/code&gt; are interchangeable. In some contexts, bare &lt;code&gt;0&lt;/code&gt; works fine. In others, it causes a silent parse error that manifests as a syntax error dozens of lines later. The defensive fix: always use &lt;code&gt;0nm&lt;/code&gt;, never bare &lt;code&gt;0&lt;/code&gt;, everywhere.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Missing flags on Layer lines.&lt;/strong&gt; The &lt;code&gt;Line[]&lt;/code&gt; entry inside Layer blocks needs 7 fields, not 6. The seventh is a flags string like &lt;code&gt;"clearline"&lt;/code&gt;. My generator omitted it, producing &lt;code&gt;Line[x1 y1 x2 y2 thickness clearance]&lt;/code&gt;. The parser's error message: &lt;code&gt;syntax error, unexpected ']', expecting INTEGER or STRING&lt;/code&gt;, reported at the layer definition, not at the malformed line.&lt;/p&gt;
&lt;p&gt;I found these bugs using a binary search approach, truncating the file with &lt;code&gt;head -N&lt;/code&gt; and testing each truncation point until I isolated which section introduced the failure. It's crude but effective when error reporting is unhelpful. Claude Code helped enormously here. I'd paste the error and the surrounding file content, and it would spot the structural issue faster than I could.&lt;/p&gt;
&lt;h3&gt;The pcb-rnd Ecosystem&lt;/h3&gt;
&lt;p&gt;For anyone unfamiliar with the tools involved, a brief orientation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;gEDA&lt;/strong&gt; (GNU Electronic Design Automation) is a suite of open-source tools for electronic design. The original project dates to the late 1990s and includes &lt;code&gt;gschem&lt;/code&gt; (schematic capture), &lt;code&gt;pcb&lt;/code&gt; (PCB layout), and various utilities. The file formats are text-based and human-readable, a deliberate design choice that makes them scriptable and version-control-friendly. The original &lt;code&gt;pcb&lt;/code&gt; program is now deprecated.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pcb-rnd&lt;/strong&gt; is the actively maintained successor to gEDA's &lt;code&gt;pcb&lt;/code&gt; program. It reads and writes the same text-based PCB format, but adds modern features: more export formats, better plugin support, and critically for this project, command-line export of Gerber files, PNG renderings, and Specctra DSN files. It runs on Linux (packaged for Ubuntu) but not macOS, which is why I ran it over SSH on a remote machine throughout this project.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Freerouting&lt;/strong&gt; is a Java-based autorouter that speaks the Specctra DSN/SES interchange format. You feed it a board definition with components and nets but no traces, and it computes the copper routing, finding paths for every net while respecting design rules for trace width, clearance, and via placement. It's the open-source standard for PCB autorouting and has been used in production for decades.&lt;/p&gt;
&lt;p&gt;The workflow chains these tools together:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;build_giga_shield&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;giga_shield&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pcb&lt;/span&gt;
                            &lt;span class="err"&gt;↓&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pcb&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rnd&lt;/span&gt; &lt;span class="n"&gt;DSN&lt;/span&gt; &lt;span class="n"&gt;export&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                            &lt;span class="err"&gt;↓&lt;/span&gt;
                     &lt;span class="n"&gt;giga_shield&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dsn&lt;/span&gt;
                            &lt;span class="err"&gt;↓&lt;/span&gt;
                   &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Freerouting&lt;/span&gt; &lt;span class="n"&gt;autorouter&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                            &lt;span class="err"&gt;↓&lt;/span&gt;
                     &lt;span class="n"&gt;giga_shield&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ses&lt;/span&gt;
                            &lt;span class="err"&gt;↓&lt;/span&gt;
              &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pcb&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rnd&lt;/span&gt; &lt;span class="n"&gt;SES&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;Gerber&lt;/span&gt; &lt;span class="n"&gt;export&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                            &lt;span class="err"&gt;↓&lt;/span&gt;
                    &lt;span class="n"&gt;Production&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Every step is a command-line operation. Every intermediate file is text. Every transformation is reproducible. Change a component position in the Python script, re-run the pipeline, get new Gerber files. This is the power of text-based EDA: the entire design is version-controlled, diffable, and automatable.&lt;/p&gt;
&lt;h3&gt;Autorouting: The Machine Does the Tedious Part&lt;/h3&gt;
&lt;p&gt;With the board generated and validated in pcb-rnd, the next step was routing: connecting all 308 nets with actual copper traces across a two-layer board. This is where Freerouting comes in.&lt;/p&gt;
&lt;p&gt;The pipeline starts with exporting the unrouted board to Specctra DSN format. pcb-rnd handles this in batch mode on the remote Linux machine:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;pcb-rnd&lt;span class="w"&gt; &lt;/span&gt;-x&lt;span class="w"&gt; &lt;/span&gt;dsn&lt;span class="w"&gt; &lt;/span&gt;giga_shield.pcb
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The DSN file contains the board geometry, component placements, pad definitions, and netlist, everything the autorouter needs to compute a routing solution. One subtlety I learned the hard way: the DSN's &lt;code&gt;(structure)&lt;/code&gt; section needs explicit &lt;code&gt;(rule)&lt;/code&gt; and &lt;code&gt;(via)&lt;/code&gt; definitions. pcb-rnd's DSN exporter puts the design rules inside the net class section, but Freerouting also expects them in the structure section. Without them, the router can see the nets but can't figure out what trace widths and via sizes are legal, and it silently fails to route most connections. A two-line addition fixed this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;(via pstk_1)
(rule
  (width 0.254)
  (clearance 0.254)
)
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Freerouting itself is a Java application with both GUI and command-line modes. On my machine, I'm running a custom build from source. The current &lt;code&gt;main&lt;/code&gt; branch had a few issues I had to fix (a missing &lt;code&gt;static&lt;/code&gt; on the main method, a null pointer on &lt;code&gt;maxThreads&lt;/code&gt; in the GUI initialization, and a Gradle build compatibility issue). The v1.9 codepath was more reliable for headless routing:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;java&lt;span class="w"&gt; &lt;/span&gt;-jar&lt;span class="w"&gt; &lt;/span&gt;freerouting-1.9.0-executable.jar&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;-de&lt;span class="w"&gt; &lt;/span&gt;giga_shield.dsn&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;-do&lt;span class="w"&gt; &lt;/span&gt;giga_shield.ses
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The autorouter loaded the 308-net board, ran through its passes, and produced a Specctra Session file containing 2911 wire segments and 172 vias. Every net connected. Every design rule satisfied. The routing took about 10 seconds for initial placement followed by optimization passes.&lt;/p&gt;
&lt;video controls autoplay loop muted playsinline style="max-width: 100%; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); margin: 1em 0;"&gt;
  &lt;source src="https://tinycomputers.io/images/giga-shield/routing-traces.mp4" type="video/mp4"&gt;
&lt;/source&gt;&lt;/video&gt;

&lt;p&gt;Importing the routes back into pcb-rnd was the final step. pcb-rnd can import SES files through its batch mode:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;pcb-rnd&lt;span class="w"&gt; &lt;/span&gt;--gui&lt;span class="w"&gt; &lt;/span&gt;hid_batch&lt;span class="w"&gt; &lt;/span&gt;giga_shield.pcb&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;lt;&amp;lt;EOF&lt;/span&gt;
&lt;span class="s"&gt;ImportSes(giga_shield.ses)&lt;/span&gt;
&lt;span class="s"&gt;SaveTo(LayoutAs, giga_shield_routed.pcb)&lt;/span&gt;
&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The result: a fully routed PCB with 2911 traces and 172 vias, ready for Gerber export.&lt;/p&gt;
&lt;h3&gt;Running pcb-rnd Over SSH&lt;/h3&gt;
&lt;p&gt;One of the more unusual aspects of this project is that all pcb-rnd operations happened on a remote Ubuntu 24.04 machine accessed over SSH. pcb-rnd isn't available on macOS via Homebrew (I tried; there's a deprecated &lt;code&gt;pcb&lt;/code&gt; package but no &lt;code&gt;pcb-rnd&lt;/code&gt;), and building from source on macOS looked like a rabbit hole I didn't want to enter.&lt;/p&gt;
&lt;p&gt;The remote workflow was straightforward:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Upload the PCB&lt;/span&gt;
scp&lt;span class="w"&gt; &lt;/span&gt;giga_shield.pcb&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27:/tmp/

&lt;span class="c1"&gt;# Export DSN for routing&lt;/span&gt;
ssh&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pcb-rnd -x dsn /tmp/giga_shield.pcb"&lt;/span&gt;

&lt;span class="c1"&gt;# Import SES and export gerbers&lt;/span&gt;
ssh&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'pcb-rnd --gui hid_batch /tmp/giga_shield.pcb &amp;lt;&amp;lt;EOF&lt;/span&gt;
&lt;span class="s1"&gt;ImportSes(/tmp/giga_shield.ses)&lt;/span&gt;
&lt;span class="s1"&gt;SaveTo(LayoutAs, /tmp/giga_shield_routed.pcb)&lt;/span&gt;
&lt;span class="s1"&gt;EOF'&lt;/span&gt;

&lt;span class="c1"&gt;# Export production files&lt;/span&gt;
ssh&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pcb-rnd -x gerber --gerberfile /tmp/giga_shield /tmp/giga_shield_routed.pcb"&lt;/span&gt;
ssh&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pcb-rnd -x png --dpi 600 --photo-mode --outfile /tmp/top.png /tmp/giga_shield_routed.pcb"&lt;/span&gt;

&lt;span class="c1"&gt;# Download results&lt;/span&gt;
scp&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27:/tmp/giga_shield.*.gbr&lt;span class="w"&gt; &lt;/span&gt;.
scp&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27:/tmp/top.png&lt;span class="w"&gt; &lt;/span&gt;giga_shield_top.png
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It's more keystrokes than clicking Export in a GUI. But it's scriptable, repeatable, and fits into the same terminal where Claude Code is running. When I needed to iterate (move a component, re-route, re-export) I could do it in a single pipeline without switching contexts.&lt;/p&gt;
&lt;h3&gt;Claude Code as a Hardware Design Partner&lt;/h3&gt;
&lt;p&gt;I should be explicit about what Claude Code did and didn't do in this project, because the AI angle is the part people will either find most interesting or most suspicious.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What Claude Code did:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parsed the KiCad schematic to extract the 72-channel signal mapping across 9 level shifter ICs&lt;/li&gt;
&lt;li&gt;Wrote the initial &lt;code&gt;build_giga_shield.py&lt;/code&gt; generator script, including all four footprint generators and the net assignment logic&lt;/li&gt;
&lt;li&gt;Debugged pcb-rnd format issues by analyzing error messages and file structure&lt;/li&gt;
&lt;li&gt;Managed the remote SSH workflow: uploading files, running pcb-rnd commands, downloading results&lt;/li&gt;
&lt;li&gt;Fixed bugs in the Freerouting build (the &lt;code&gt;static main&lt;/code&gt; issue, the null &lt;code&gt;maxThreads&lt;/code&gt;, the Gradle &lt;code&gt;fileMode&lt;/code&gt; API change)&lt;/li&gt;
&lt;li&gt;Handled iterative changes: "move tinycomputers.io down by a millimeter" became an edit to the Python script, a regeneration, a re-import, and a re-export, all executed as a single flow&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;What Claude Code didn't do:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Make architectural decisions. The choice to use SN74LVC8T245 over TXB0108, the DIR control header design, the decision to use pull-down resistors defaulting to A-to-B direction. Those were my decisions based on understanding the Z80 bus protocol; it is also on me for selecting the TXB0108 in the first place&lt;/li&gt;
&lt;li&gt;Verify electrical correctness. I checked the SN74LVC8T245 datasheet pin mapping myself. I verified that OE# tied to GND means always-enabled. I confirmed the 10K pull-down value was appropriate for the DIR pin&lt;/li&gt;
&lt;li&gt;Replace domain knowledge. I knew why the TXB0108 failed during tri-state periods because I understand Z80 bus cycles. Claude Code could have looked up the TXB0108 datasheet, but it couldn't have diagnosed the real-world failure mode from first principles&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The pattern that emerged was: I made design decisions, Claude Code implemented them. I said "the DIR pins need pull-down resistors to default A-to-B direction," Claude Code generated the pcb-rnd Element entries with the correct footprint, position, and net assignments. I said "export gerbers at 600 DPI with photo mode," Claude Code ran the right &lt;code&gt;pcb-rnd&lt;/code&gt; command on the remote machine.&lt;/p&gt;
&lt;p&gt;This is the same division of labor I described in the &lt;a href="https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html"&gt;dual Z80 post&lt;/a&gt;: I bring the domain knowledge, the AI handles the format translation. The text-based nature of gEDA files makes this work. If the design lived in a binary format or required mouse interactions, the AI would have been far less useful.&lt;/p&gt;
&lt;h3&gt;The New Design&lt;/h3&gt;
&lt;p&gt;Here's what the redesigned board looks like:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;v0.1 (Fiverr/TXB0108)&lt;/th&gt;
&lt;th&gt;v0.2 (Claude Code/SN74LVC8T245)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Level Shifter IC&lt;/td&gt;
&lt;td&gt;TXB0108PW (TSSOP-20)&lt;/td&gt;
&lt;td&gt;SN74LVC8T245PW (TSSOP-24)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Direction Control&lt;/td&gt;
&lt;td&gt;Auto-sensing&lt;/td&gt;
&lt;td&gt;Explicit DIR pin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Channels&lt;/td&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shifter ICs&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decoupling Caps&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pull-down Resistors&lt;/td&gt;
&lt;td&gt;9 (OE)&lt;/td&gt;
&lt;td&gt;9 (DIR)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DIR Control Header&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;J11 (1x10)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Board Dimensions&lt;/td&gt;
&lt;td&gt;155mm x 90mm&lt;/td&gt;
&lt;td&gt;155mm x 90mm&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layers&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Design Tool&lt;/td&gt;
&lt;td&gt;KiCad 9.0 (GUI)&lt;/td&gt;
&lt;td&gt;Python + pcb-rnd (CLI)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Design Cost&lt;/td&gt;
&lt;td&gt;$468.63&lt;/td&gt;
&lt;td&gt;$0 (open source tools)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Design Time&lt;/td&gt;
&lt;td&gt;~10 days (outsourced)&lt;/td&gt;
&lt;td&gt;~2 days (with AI)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The J11 header is the key addition. It's a 1x10 pin header with 9 direction control pins (one per shifter IC) and a ground reference. Each DIR pin has a 10K pull-down resistor that defaults the direction to A-to-B (3.3V to 5V). To reverse a shifter's direction (for example, when the Arduino needs to read from the Z80's data bus) you drive the corresponding J11 pin high. The Arduino firmware manages this dynamically during bus cycles.&lt;/p&gt;
&lt;p&gt;The board carries "tinycomputers.io" and "v0.2" on the silkscreen, placed near the bottom edge. Version tracking on the physical board, a lesson learned from the Fiverr experience, where I had to pay $57 for a revision just to add version text to the silkscreen.&lt;/p&gt;
&lt;h3&gt;Generating Production Files&lt;/h3&gt;
&lt;p&gt;With the routed board in hand, the final step was generating files suitable for manufacturing. pcb-rnd handles this with command-line exporters:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Gerber files (9 layers: top/bottom copper, mask, silk, paste, outline, drill, fab)&lt;/span&gt;
pcb-rnd&lt;span class="w"&gt; &lt;/span&gt;-x&lt;span class="w"&gt; &lt;/span&gt;gerber&lt;span class="w"&gt; &lt;/span&gt;--gerberfile&lt;span class="w"&gt; &lt;/span&gt;giga_shield&lt;span class="w"&gt; &lt;/span&gt;giga_shield_routed.pcb

&lt;span class="c1"&gt;# Photo-realistic renderings&lt;/span&gt;
pcb-rnd&lt;span class="w"&gt; &lt;/span&gt;-x&lt;span class="w"&gt; &lt;/span&gt;png&lt;span class="w"&gt; &lt;/span&gt;--dpi&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;600&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--photo-mode&lt;span class="w"&gt; &lt;/span&gt;--outfile&lt;span class="w"&gt; &lt;/span&gt;top.png&lt;span class="w"&gt; &lt;/span&gt;giga_shield_routed.pcb
pcb-rnd&lt;span class="w"&gt; &lt;/span&gt;-x&lt;span class="w"&gt; &lt;/span&gt;png&lt;span class="w"&gt; &lt;/span&gt;--dpi&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;600&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--photo-mode&lt;span class="w"&gt; &lt;/span&gt;--photo-flip-x&lt;span class="w"&gt; &lt;/span&gt;--outfile&lt;span class="w"&gt; &lt;/span&gt;bottom.png&lt;span class="w"&gt; &lt;/span&gt;giga_shield_routed.pcb
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The Gerber output includes everything a fab house needs: top and bottom copper, solder mask, silkscreen, paste stencil, board outline, and drill locations. The photo-realistic PNG renderings use pcb-rnd's built-in renderer: green solder mask, gold-plated pads, white silkscreen text. They're useful for documentation and for sanity-checking the layout before sending it to fabrication.&lt;/p&gt;
&lt;p&gt;The BOM and centroid files were generated separately from the Python script's component data. The centroid file lists every SMD component's X/Y position and rotation, which is essential if you're having the boards assembled by a service rather than hand-soldering.&lt;/p&gt;
&lt;h3&gt;What's Different About This Approach&lt;/h3&gt;
&lt;p&gt;The standard way to design a PCB in 2026 is: open KiCad or Altium, draw a schematic, assign footprints, lay out the board, route traces (manually or with the built-in autorouter), and export Gerbers. It's a visual, interactive process that works well for most people and most projects.&lt;/p&gt;
&lt;p&gt;What I did is different in a few ways that I think are worth noting, even if they're not universally applicable:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The entire design is a Python script.&lt;/strong&gt; &lt;code&gt;build_giga_shield.py&lt;/code&gt; is the single source of truth. Want to move a component? Change a coordinate in the script. Want to add a net? Add it to the dictionary. Want to change every decoupling cap from 0.1uF to 0.22uF? Change a string. Then re-run the pipeline. There's no "did I save the layout?" ambiguity, no undo history to worry about, no risk of accidentally moving something with a stray mouse click.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Every intermediate file is text.&lt;/strong&gt; The &lt;code&gt;.pcb&lt;/code&gt; file, the &lt;code&gt;.dsn&lt;/code&gt; file, the &lt;code&gt;.ses&lt;/code&gt; file. All text, all diffable, all version-controllable. When I moved a component and re-routed, I could &lt;code&gt;git diff&lt;/code&gt; the PCB file and see exactly what changed. Try that with a binary PCB format.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI can participate meaningfully.&lt;/strong&gt; Because the files are text, Claude Code could read them, modify them, and verify them. It could grep for a component reference in the PCB file, find its coordinates, suggest a new position, and make the edit. It could read the Freerouting log and diagnose why routing failed. This level of AI participation simply isn't possible with graphical-only workflows.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The workflow is reproducible.&lt;/strong&gt; I can hand someone the Python script and the Freerouting JAR and they can regenerate the entire board from scratch, on any machine with Python and Java. No KiCad version compatibility issues, no plugin dependencies, no "works on my machine" problems.&lt;/p&gt;
&lt;p&gt;The trade-off is obvious: this approach requires understanding file formats at a level that graphical tools abstract away. If pcb-rnd's parser rejects your file with a misleading error message, you need to debug the file format, not just re-click a button. It's a power-user workflow. But for someone comfortable with text editors and command lines (which describes most of the audience reading a blog called tinycomputers.io), it's a viable alternative.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The Gerber files are ready for fabrication. In part two, I'll cover ordering the boards from &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt;, sourcing the SN74LVC8T245PW and passive components, and the moment of truth: plugging the RetroShield Z80 into the new shield and seeing if the Arduino can finally see the Z80's bus cycles clearly.&lt;/p&gt;
&lt;p&gt;I'll also compare the v0.2 board side-by-side with the original Fiverr v0.1 board: the TXB0108 auto-sensing design versus the SN74LVC8T245 driven design. Same board dimensions, same connector layout, fundamentally different level-shifting approach. The comparison should be instructive for anyone choosing between auto-sensing and driven level translators for bus interfaces.&lt;/p&gt;
&lt;p&gt;The Python build script, pcb-rnd source files, Gerber outputs, and all helper scripts are open source:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/pOawfA"&gt;giga-shield&lt;/a&gt;&lt;/strong&gt;: Complete design files, build pipeline, and production outputs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;This is part one of a two-part series. Part two will cover fabrication, assembly, and testing.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Previous posts in this series: &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;Fiverr PCB Design ($468)&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html"&gt;Dual Z80 RetroShield&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/cpm-on-arduino-giga-r1-wifi.html"&gt;CP/M on the Giga R1&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/zork-on-retroshield-z80-arduino-giga.html"&gt;Zork on the Giga&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description><category>ai</category><category>arduino</category><category>arduino giga</category><category>claude code</category><category>freerouting</category><category>geda</category><category>hardware</category><category>level shifter</category><category>open-source</category><category>pcb design</category><category>pcb-rnd</category><category>retroshield</category><category>z80</category><guid>https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html</guid><pubDate>Fri, 13 Mar 2026 16:00:00 GMT</pubDate></item><item><title>The Real Cost of Running Qwen TTS Locally: Three Machines Compared</title><link>https://tinycomputers.io/posts/the-real-cost-of-running-qwen-tts-locally-three-machines-compared.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-real-cost-of-running-qwen-tts-locally-three-machines-compared_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;17 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/qwen-tts-benchmark/p40-server-shop.jpg" alt="The Tesla P40 server standing on its side in an unheated Minnesota shop building, one of three machines benchmarked for local TTS generation" style="float: right; max-width: 40%; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Every post on this site has an audio version. A small player at the top, a few minutes of narration, generated entirely on local hardware. No cloud API, no per-character fees, no data leaving the network. I wrote about &lt;a href="https://tinycomputers.io/posts/qwen-tts-on-amd-strix-halo.html"&gt;setting up the pipeline on AMD Strix Halo&lt;/a&gt; earlier this year, and the system has been running in production since, generating narrations for new posts, regenerating old ones when I revise them, and occasionally processing long-form content that would cost real money through Google Cloud TTS or ElevenLabs.&lt;/p&gt;
&lt;p&gt;But I now have three machines capable of running Qwen3-TTS, and they could not be more different from each other. An Apple M3 Max laptop. An AMD Ryzen AI MAX+ 395 mini desktop with integrated Radeon graphics. And a &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;four-GPU Tesla P40 server&lt;/a&gt; built from decade-old enterprise hardware bought on eBay. Three different silicon vendors, three different compute backends (MPS, ROCm, and CUDA) running the same model on the same text.&lt;/p&gt;
&lt;p&gt;The question I wanted to answer is simple: how do they actually compare? Not on paper. Not in theoretical FLOPS. In wall-clock time, generating real audio from a real blog post.&lt;/p&gt;
&lt;p&gt;The answer turned out to be more interesting than I expected, because the numbers tell a story about hardware architecture that raw specifications completely miss.&lt;/p&gt;
&lt;h3&gt;The Setup&lt;/h3&gt;
&lt;p&gt;The model is &lt;a href="https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice"&gt;Qwen3-TTS-12Hz-1.7B-CustomVoice&lt;/a&gt;, a 1.7 billion parameter autoregressive text-to-speech model from Alibaba's Qwen team. It generates natural-sounding speech with multiple speaker voices. I use the Eric voice for all blog narrations: clear, professional, well-paced for technical content.&lt;/p&gt;
&lt;p&gt;The three machines:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Apple M3 Max&lt;/strong&gt;, a &lt;a href="https://amzn.to/4rwlTa6"&gt;MacBook Pro&lt;/a&gt; with Apple's M3 Max chip. 14 CPU cores, 30 GPU cores, 64GB unified memory. The GPU runs through PyTorch's MPS (Metal Performance Shaders) backend. This is my daily driver laptop, and it generates TTS when I am writing and editing posts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AMD Radeon 8060S&lt;/strong&gt;, a Bosgame M5 mini desktop running &lt;a href="https://amzn.to/4bv5CMG"&gt;AMD's Ryzen AI MAX+ 395&lt;/a&gt;. This is a Strix Halo APU with integrated RDNA 3.5 graphics, not a discrete GPU. It shares 128GB of DDR5 system memory with the CPU, with roughly 96GB addressable as VRAM. The GPU runs through ROCm 7.2 with PyTorch 2.9.1. The gfx1151 architecture requires specific PyTorch wheels from AMD's pre-release index and several environment variable overrides to function. I wrote a &lt;a href="https://tinycomputers.io/posts/qwen-tts-on-amd-strix-halo.html"&gt;full setup guide&lt;/a&gt; for this machine.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NVIDIA Tesla P40&lt;/strong&gt;, a 2U rack-mount server with four &lt;a href="https://www.ebay.com/itm/306087510352?_skw=nvidia+tesla+p40+24gb+gpu&amp;amp;epid=27032254618&amp;amp;itmmeta=01KKJEGQKSK110HNM6214EB0TT&amp;amp;hash=item47443cc150:g:qAwAAOSwy0toUHXh&amp;amp;itmprp=enc%3AAQALAAABAGfYFPkwiKCW4ZNSs2u11xAq6UjArKrgnuEyMVTZhAZhOSUGYags6TsDJvvCEOa51UH2r%2BRe%2F182ah6rgiTIAIRULQNEL9rbiinCXMor%2FBNNZk0GaNKqTWkq9pLWGoRBM8NL%2BjC1aSA63XPe4YsFHjQkb%2Fmup21S3UM7oqwBrW%2BHep1E07lnrt2vzkljSA4xg7SnrA%2BFDtOdqvDwO4tpgB0t%2BtCv9%2BlXoh%2BeoEgpJqXgaaM0ad48OfmgKB13PF9RIPXLNI6z4SjV2O%2FXOk6nYPyD9Eg5wbzdmsXfNRhwitz7HEZ1bTRUnRmvKzQrw4B3r3LAag5f8%2B8CcCWfCRAkkG8%3D%7Ctkp%3ABk9SR4j6ws6cZw&amp;amp;mkcid=1&amp;amp;mkrid=711-53200-19255-0&amp;amp;siteid=0&amp;amp;campid=5338960379&amp;amp;customid=&amp;amp;toolid=10001&amp;amp;mkevt=1"&gt;Tesla P40 GPUs&lt;/a&gt;, each with 24GB of GDDR5X. Pascal architecture from 2016. Compute capability 6.1. No Tensor Cores, no native bfloat16 support. The benchmark uses a single P40, since Qwen TTS runs on one GPU. This machine lives in an unheated shop building in Minnesota and &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;screams through the winter&lt;/a&gt; when the BMC misinterprets sub-zero ambient temperatures as a hardware malfunction.&lt;/p&gt;
&lt;p&gt;All three machines run the same model checkpoint, the same text input, and the same speaker voice. The only differences are the silicon and the compute backend.&lt;/p&gt;
&lt;h3&gt;The Benchmark&lt;/h3&gt;
&lt;p&gt;I used a standardized 2,411-character passage, five paragraphs on the Jevons Paradox, dense enough to exercise the model's prosody and pacing on real written content. Each machine ran three consecutive generations from the same loaded model, producing roughly three minutes of audio per run. The first run includes kernel compilation and cache warmup; subsequent runs reflect steady-state performance.&lt;/p&gt;
&lt;p&gt;The metric that matters is Real-Time Factor (RTF): how many seconds of wall-clock time it takes to generate one second of audio. An RTF of 1.0 means the model generates audio at exactly real-time speed. Below 1.0 is faster than real-time. Above 1.0 means you are waiting.&lt;/p&gt;
&lt;h4&gt;Individual Runs&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Apple M3 Max (MPS)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Run&lt;/th&gt;
&lt;th&gt;Generation Time&lt;/th&gt;
&lt;th&gt;Audio Length&lt;/th&gt;
&lt;th&gt;RTF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;698.5s&lt;/td&gt;
&lt;td&gt;197.7s&lt;/td&gt;
&lt;td&gt;3.53&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;533.1s&lt;/td&gt;
&lt;td&gt;184.2s&lt;/td&gt;
&lt;td&gt;2.89&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;447.8s&lt;/td&gt;
&lt;td&gt;179.2s&lt;/td&gt;
&lt;td&gt;2.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Average&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;559.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;187.0s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.97&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;AMD Radeon 8060S (ROCm)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Run&lt;/th&gt;
&lt;th&gt;Generation Time&lt;/th&gt;
&lt;th&gt;Audio Length&lt;/th&gt;
&lt;th&gt;RTF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;729.2s&lt;/td&gt;
&lt;td&gt;173.6s&lt;/td&gt;
&lt;td&gt;4.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;460.0s&lt;/td&gt;
&lt;td&gt;204.8s&lt;/td&gt;
&lt;td&gt;2.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;548.2s&lt;/td&gt;
&lt;td&gt;214.2s&lt;/td&gt;
&lt;td&gt;2.56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Average&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;579.1s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;197.5s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.00&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;NVIDIA Tesla P40 (CUDA)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Run&lt;/th&gt;
&lt;th&gt;Generation Time&lt;/th&gt;
&lt;th&gt;Audio Length&lt;/th&gt;
&lt;th&gt;RTF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1511.4s&lt;/td&gt;
&lt;td&gt;204.1s&lt;/td&gt;
&lt;td&gt;7.41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1225.7s&lt;/td&gt;
&lt;td&gt;171.6s&lt;/td&gt;
&lt;td&gt;7.14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1537.2s&lt;/td&gt;
&lt;td&gt;206.7s&lt;/td&gt;
&lt;td&gt;7.44&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Average&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1424.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;194.1s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.33&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h4&gt;Summary&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Machine&lt;/th&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;Avg RTF&lt;/th&gt;
&lt;th&gt;Best RTF&lt;/th&gt;
&lt;th&gt;Avg Gen Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MacBook Pro&lt;/td&gt;
&lt;td&gt;M3 Max (MPS)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.97&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.50&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;559.8s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bosgame M5&lt;/td&gt;
&lt;td&gt;Radeon 8060S (ROCm)&lt;/td&gt;
&lt;td&gt;3.00&lt;/td&gt;
&lt;td&gt;2.25&lt;/td&gt;
&lt;td&gt;579.1s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Penguin 2U&lt;/td&gt;
&lt;td&gt;Tesla P40 (CUDA)&lt;/td&gt;
&lt;td&gt;7.33&lt;/td&gt;
&lt;td&gt;7.14&lt;/td&gt;
&lt;td&gt;1424.8s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;What the Numbers Mean&lt;/h3&gt;
&lt;p&gt;The headline result is that the M3 Max and Radeon 8060S are essentially tied, and the Tesla P40 is roughly 2.4 times slower than both. But that summary hides the interesting details.&lt;/p&gt;
&lt;h4&gt;The Warmup Effect Is Massive&lt;/h4&gt;
&lt;p&gt;On both the M3 Max and the Radeon 8060S, the first run is dramatically slower than subsequent runs. The M3 Max goes from RTF 3.53 on run 1 to RTF 2.50 on run 3, a 29% improvement. The AMD shows an even larger swing: RTF 4.20 on run 1 dropping to RTF 2.25 on run 2, a 46% improvement.&lt;/p&gt;
&lt;p&gt;This is kernel compilation. Both MPS and ROCm compile GPU kernels on first use and cache them for subsequent calls. The Qwen TTS model hits a wide variety of kernel shapes during autoregressive generation (different sequence lengths, different attention patterns) and each new shape triggers a compilation on the first encounter. By run 2, most of the common shapes are cached, and performance stabilizes.&lt;/p&gt;
&lt;p&gt;The P40 shows almost no warmup effect. RTF 7.41 on run 1, 7.14 on run 2, 7.44 on run 3. CUDA's kernel compilation is faster and more mature, so the overhead is absorbed within the first few seconds rather than spread across the entire run. But this maturity does not translate into faster inference; CUDA compiles faster, but the P40's hardware is fundamentally slower at the operations this model requires.&lt;/p&gt;
&lt;p&gt;This has a practical implication that matters: &lt;strong&gt;short benchmarks on MPS and ROCm are misleading.&lt;/strong&gt; I initially ran a quick 276-character test on all three machines before doing the full benchmark. The short test showed the AMD at RTF 9.20, almost identical to the P40's RTF 10.01, and far behind the M3 Max's RTF 2.84. That result nearly led me to conclude the AMD was performing as poorly as decade-old hardware. The longer benchmark, with its warmup effect amortized across more generation, revealed the truth: the AMD is just as fast as the M3 Max once the kernels are cached. If I had stopped at the short test, I would have drawn exactly the wrong conclusion.&lt;/p&gt;
&lt;h4&gt;Why the P40 Is So Slow&lt;/h4&gt;
&lt;p&gt;The Tesla P40 is a Pascal-generation GPU from 2016. It has 3,840 CUDA cores and 24GB of GDDR5X memory. On paper, it should be competitive; 12 TFLOPS of FP32 compute is not trivial. And for LLM inference through Ollama, the P40 &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;performs remarkably well&lt;/a&gt;, outperforming quad T4 instances on models up to 8B parameters.&lt;/p&gt;
&lt;p&gt;TTS is a different workload. Qwen3-TTS is an autoregressive transformer that generates audio tokens one at a time, each conditioned on all previous tokens. This means the inference is heavily memory-bandwidth bound during the decoding phase, and compute-bound during the attention and feedforward passes. The model is distributed in bfloat16 precision, which the P40 cannot compute natively; Pascal predates bfloat16 support entirely. PyTorch silently promotes bf16 operations to fp32 on the P40, roughly doubling the computation per operation and halving the effective throughput.&lt;/p&gt;
&lt;p&gt;The P40 also lacks the SDPA (Scaled Dot-Product Attention) hardware acceleration that newer architectures provide. On the M3 Max, MPS routes attention through Metal's optimized primitives. On the AMD, ROCm's AOTriton provides experimental flash attention support. On the P40, attention runs through standard CUDA kernels without any of these accelerations. For a model that generates thousands of autoregressive steps per audio clip, each involving a full attention pass over the growing sequence, this compounds dramatically.&lt;/p&gt;
&lt;p&gt;The P40 is not bad hardware. It is excellent hardware for the workloads it was designed for: batch inference on quantized LLMs where its 24GB of VRAM per card creates a memory advantage. But autoregressive TTS in bfloat16 hits every one of its architectural weaknesses simultaneously.&lt;/p&gt;
&lt;h4&gt;Unified Memory Wins This Workload&lt;/h4&gt;
&lt;p&gt;Both the M3 Max and the Radeon 8060S use unified memory architectures, where the CPU and GPU share the same physical memory pool. The M3 Max has 64GB of unified LPDDR5. The Radeon 8060S shares 128GB of DDR5 with the CPU, with roughly 96GB addressable as VRAM.&lt;/p&gt;
&lt;p&gt;For a 1.7B parameter model in bf16, the weights occupy roughly 3.4GB. The model fits comfortably on all three machines. But the autoregressive generation pattern creates a stream of intermediate activations (KV cache entries, attention scores, feedforward intermediates) that grow with the sequence length. On a unified memory architecture, these intermediates exist in the same memory space as the model weights, avoiding any PCIe transfer overhead. On the P40, every interaction between CPU and GPU crosses a PCIe 3.0 bus.&lt;/p&gt;
&lt;p&gt;For LLM inference, where the bottleneck is token generation throughput and the KV cache fits in VRAM, the P40's discrete memory is fine. For TTS, where the model generates hundreds of audio tokens per second of speech and the attention window grows continuously, the memory access pattern favors unified architectures.&lt;/p&gt;
&lt;p&gt;This is not a universal statement about unified versus discrete memory. A modern discrete GPU with HBM2e or GDDR6X and PCIe 4.0 or 5.0 would likely outperform both the M3 Max and the Radeon 8060S on this workload. The P40's problem is not that its memory is discrete; it is that its memory is slow and its bus is narrow by 2026 standards.&lt;/p&gt;
&lt;h3&gt;The Model Architecture Question&lt;/h3&gt;
&lt;p&gt;While benchmarking Qwen TTS, I also ran a quick comparison with &lt;a href="https://huggingface.co/SWivid/F5-TTS"&gt;F5-TTS&lt;/a&gt; on the AMD machine to sanity-check the results. F5-TTS is a flow-matching model, fundamentally different from Qwen's autoregressive approach. Where Qwen generates audio tokens sequentially, each conditioned on all previous tokens, F5 generates audio in parallel through an iterative refinement process.&lt;/p&gt;
&lt;p&gt;The difference is stark. On the same Radeon 8060S, the same text, the same hardware:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Generation Time&lt;/th&gt;
&lt;th&gt;Audio Length&lt;/th&gt;
&lt;th&gt;RTF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3-TTS&lt;/td&gt;
&lt;td&gt;579.1s (avg)&lt;/td&gt;
&lt;td&gt;197.5s&lt;/td&gt;
&lt;td&gt;3.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;F5-TTS&lt;/td&gt;
&lt;td&gt;17.4s&lt;/td&gt;
&lt;td&gt;27.2s&lt;/td&gt;
&lt;td&gt;0.64&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;F5-TTS is faster than real-time. Qwen3-TTS takes three times longer than the audio it produces. On normalized terms, F5 is roughly five times faster than Qwen at steady-state, and the gap widens on shorter content where Qwen's warmup overhead is proportionally larger.&lt;/p&gt;
&lt;p&gt;This is not an apples-to-apples quality comparison. Qwen3-TTS generally produces more natural prosody, better handling of complex sentence structures, and more consistent speaker identity across long passages. F5-TTS is excellent but can occasionally drift in voice character or pacing on very long content. For blog narration, both are well above the threshold of "good enough," and the quality difference is smaller than you might expect given the architectural gap.&lt;/p&gt;
&lt;p&gt;The point is that hardware is only half the story. The choice of model architecture can matter more than the choice of GPU. A flow-matching model on integrated AMD graphics outperforms an autoregressive model on Apple's best laptop silicon by a wide margin. If generation speed is the constraint, switching models gains more than switching hardware.&lt;/p&gt;
&lt;h3&gt;What This Costs in Practice&lt;/h3&gt;
&lt;p&gt;The abstract benchmark numbers translate into concrete time and electricity costs when you are generating audio for a library of blog posts.&lt;/p&gt;
&lt;p&gt;A typical TinyComputers post runs 3,000 to 5,000 words, producing 15 to 25 minutes of narrated audio. At steady-state RTF:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Machine&lt;/th&gt;
&lt;th&gt;15 min audio&lt;/th&gt;
&lt;th&gt;25 min audio&lt;/th&gt;
&lt;th&gt;System Power&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;M3 Max&lt;/td&gt;
&lt;td&gt;~38 min&lt;/td&gt;
&lt;td&gt;~63 min&lt;/td&gt;
&lt;td&gt;~50W&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Radeon 8060S&lt;/td&gt;
&lt;td&gt;~38 min&lt;/td&gt;
&lt;td&gt;~63 min&lt;/td&gt;
&lt;td&gt;~100W&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tesla P40&lt;/td&gt;
&lt;td&gt;~110 min&lt;/td&gt;
&lt;td&gt;~183 min&lt;/td&gt;
&lt;td&gt;~400W&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The M3 Max and Radeon 8060S are tied on generation time, but the M3 Max draws roughly half the system power. For a single post, the electricity cost difference is negligible, a fraction of a cent. For batch processing a backlog of thirty posts, the M3 Max costs about \$0.18 in electricity versus \$0.36 for the AMD and \$3.50 for the P40.&lt;/p&gt;
&lt;p&gt;None of these numbers are alarming. Even the P40, at nearly two and a half hours per post and 400 watts from the wall, costs under fifteen cents in electricity per narration at Minnesota residential rates. The equivalent Google Cloud TTS job would cost \$4 to \$16 per post depending on the voice quality tier.&lt;/p&gt;
&lt;p&gt;To put cloud costs in perspective: I recently ran a fiction novel through Google's Chirp3-HD voice: 82,000 words, roughly 500,000 characters of text plus SSML markup. The bill came to \$17.25 at Google's rate of \$30 per million characters. That is not unreasonable for a one-off project, but it adds up quickly if you are generating audio regularly. The entire library of TinyComputers narrations (dozens of posts, hours of audio) has cost me nothing beyond the electricity to run the machines I already own. The economics of local TTS are favorable on every machine in the comparison.&lt;/p&gt;
&lt;p&gt;The real cost is time. If I am generating audio for a single new post, I start it on whichever machine is idle and check back in an hour. If I am regenerating audio for twenty posts after changing the speaker voice or updating the pipeline, the M3 Max or AMD will finish overnight. The P40 would take most of a weekend.&lt;/p&gt;
&lt;h3&gt;The Right Machine for the Job&lt;/h3&gt;
&lt;p&gt;After running these benchmarks, my workflow has shifted. The M3 Max is the default for new post narration; it is fast, quiet, and I am usually sitting in front of it when I finish writing. The AMD handles batch jobs and overnight processing, where its slightly higher power draw does not matter and its equivalent speed makes it interchangeable with the Mac. The P40 server is reserved for what it does best: &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;running large language models&lt;/a&gt; through Ollama, where its 96GB of aggregate VRAM gives it an advantage that neither the Mac nor the AMD can match.&lt;/p&gt;
&lt;p&gt;The P40 can still generate TTS in a pinch, and it does; when both other machines are occupied, I will queue a job on the P40 and accept the longer wait. But for a workload that is inherently autoregressive, memory-bandwidth sensitive, and dependent on bf16 precision, a ten-year-old Pascal GPU is the wrong tool.&lt;/p&gt;
&lt;p&gt;What surprised me most is how well the AMD performs. The Radeon 8060S is an integrated GPU sharing system memory with the CPU. It has no HBM, no dedicated VRAM, no NVLink. Its ROCm software stack requires environment variable hacks, pre-release PyTorch wheels, and a GFX version override to function at all. And yet, once the kernels warm up, it matches Apple's best laptop silicon stride for stride. The raw hardware is there: 40 RDNA 3.5 compute units with access to a deep pool of DDR5 memory. The software just needs to get out of the way, and on run 2 and beyond, it does.&lt;/p&gt;
&lt;h3&gt;Lessons&lt;/h3&gt;
&lt;p&gt;Three takeaways from this exercise that generalize beyond TTS:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Short benchmarks lie.&lt;/strong&gt; Kernel compilation overhead on MPS and ROCm is large enough to dominate a short test. If you are evaluating a new model on non-CUDA hardware, run it at least twice before drawing conclusions. The first run is measuring the software stack, not the hardware.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Architecture matters more than clock speed.&lt;/strong&gt; The P40 has more raw FLOPS than the Radeon 8060S. It does not matter. The P40 lacks native bf16, lacks efficient attention primitives, and sits behind a PCIe 3.0 bus. The Radeon has all three, and ties a chip designed by Apple's custom silicon team. For autoregressive models, the architectural fit between model and hardware dominates everything else.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Model choice can outweigh hardware choice.&lt;/strong&gt; F5-TTS running on the weakest GPU in this comparison is five times faster than Qwen3-TTS running on the strongest. If your constraint is generation speed and you can accept a modest quality trade-off, switching to a flow-matching architecture gains more than any hardware upgrade short of a data center GPU.&lt;/p&gt;
&lt;p&gt;The audio player at the top of each post on this site represents a few minutes of machine time on one of these three machines. Which machine generated it depends on the day, the workload, and what else is running. The listener cannot tell the difference. The audio sounds the same regardless of whether it was generated on a laptop, a mini desktop, or a rack-mount server in a cold Minnesota shop. That is the real benchmark: not which machine is fastest, but that all three are fast enough.&lt;/p&gt;</description><category>amd</category><category>apple silicon</category><category>audio</category><category>benchmarks</category><category>cuda</category><category>gpu</category><category>inference</category><category>m3 max</category><category>machine learning</category><category>mps</category><category>nvidia</category><category>qwen</category><category>rocm</category><category>strix halo</category><category>tesla p40</category><category>text-to-speech</category><category>tts</category><guid>https://tinycomputers.io/posts/the-real-cost-of-running-qwen-tts-locally-three-machines-compared.html</guid><pubDate>Thu, 12 Mar 2026 14:00:00 GMT</pubDate></item><item><title>Repurposing Enterprise GPUs: The Tesla P40 Home Lab Story</title><link>https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;17 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;There is a window, maybe eighteen months wide, where enterprise hardware hits a pricing sweet spot. The first-generation buyers (the hyperscalers, the research labs, the Fortune 500 AI teams) have moved on to the next generation. The second-hand market floods. Prices crater. And if you know what you're looking for, you can build something genuinely capable for less than a month of cloud compute.&lt;/p&gt;
&lt;p&gt;I built a four-GPU inference server for about twenty-five hundred dollars. This is the story of how, why, and whether you should do the same.&lt;/p&gt;
&lt;h3&gt;The Buy&lt;/h3&gt;
&lt;p&gt;The acquisition strategy is straightforward: eBay, patience, and knowing what to look for.&lt;/p&gt;
&lt;p&gt;Tesla P40s started appearing in volume on the secondary market around 2023, when cloud providers and enterprise data centers began cycling them out in favor of A100s and H100s. A card that sold for over five thousand dollars new was suddenly available for three hundred, then two hundred and fifty, then, if you watched listings carefully and were willing to buy from decommissioned lot sellers, sometimes less. I picked up four cards over the course of about two months, averaging two hundred and fifty dollars each.&lt;/p&gt;
&lt;p&gt;The chassis was a Penguin Computing 2U rack-mount server, also from eBay. These show up when government labs and research institutions liquidate equipment. The Penguin Computing systems are well-built, with proper server-grade construction, redundant power supplies, and engineered airflow. Mine takes the Xeon E5-2697A v4 and two were purchased from eBay: eighteen Broadwell cores, more than enough CPU to keep four GPUs fed. The chassis cost around six hundred dollars.&lt;/p&gt;
&lt;p&gt;Memory was the lucky purchase. I bought 252GB of DDR4 ECC RAM before the memory price spike that hit in late 2024 when every company on Earth decided they needed AI infrastructure simultaneously. What I paid around two hundred and fifty dollars for would cost significantly more today. Total build: roughly twenty-five hundred dollars.&lt;/p&gt;
&lt;h3&gt;The Hardware&lt;/h3&gt;
&lt;p&gt;The Tesla P40 is a 2016-era data center GPU. NVIDIA designed it for the Pascal generation, targeting inference workloads in enterprise environments. The specifications, for something you can buy on eBay for two hundred and fifty dollars, are remarkable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;24GB GDDR5X&lt;/strong&gt; per card, more memory than an RTX 4090&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;3,840 CUDA cores&lt;/strong&gt;, Pascal architecture, compute capability 6.1&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;12 TFLOPS FP32&lt;/strong&gt;, respectable even by 2026 standards for inference&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;250W TDP&lt;/strong&gt;: this is a data center card and it draws power like one&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Multiply by four and you get 96GB of VRAM for a thousand dollars. That is an extraordinary amount of GPU memory for the price. For context, a single NVIDIA A100 80GB still sells for north of five thousand dollars on the secondary market. Four P40s give you more total VRAM for a fraction of the cost.&lt;/p&gt;
&lt;h3&gt;What You Give Up&lt;/h3&gt;
&lt;p&gt;There is no free lunch in computing, and the P40 makes you pay for its low price in specific, sometimes painful ways.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;No Tensor Cores.&lt;/strong&gt; The P40 predates NVIDIA's Tensor Core architecture, which arrived with Volta in 2017. Tensor Cores accelerate matrix multiplication (the fundamental operation in neural network inference) by factors of 4x to 16x depending on precision. The P40 does everything with its CUDA cores, the old-fashioned way. This matters less than you might think for inference at moderate batch sizes, but it means you will never match the throughput of a V100 or newer card, clock for clock.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;No native BF16 or FP16.&lt;/strong&gt; This is the real gotcha. BF16 (bfloat16) has become the default precision for large language models. It is what most model weights are distributed in. The P40 cannot compute in BF16 natively; it emulates it through FP32 operations, which is roughly 21% slower than native support. In practice, this means you are running quantized models (Q4, Q5, Q8) through llama.cpp or similar frameworks, which handle the precision conversion for you. It works. It is not optimal.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Passive cooling designed for server airflow.&lt;/strong&gt; The P40 is a blower-style card designed for 1U and 2U server chassis with front-to-back forced airflow. In a proper server, this is fine. In anything else, you need to solve cooling yourself. I put mine in a Penguin Computing 2U rack-mount chassis, which has the right airflow characteristics, but this is not a card you drop into a desktop tower.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PCIe 3.0 x16.&lt;/strong&gt; The P40 connects via PCIe 3.0, which provides about 16 GB/s of bandwidth per direction. When you are running a model that spans four GPUs, the inter-GPU communication goes over PCIe, not NVLink. This creates a bottleneck for models that require heavy cross-GPU communication. For inference, where the communication pattern is more predictable than training, this is manageable. For training, it would be a serious constraint.&lt;/p&gt;
&lt;h3&gt;The Minnesota Problem&lt;/h3&gt;
&lt;p&gt;My server lives in an unheated shop building in northern Minnesota. This has created an issue that no hardware review will prepare you for.&lt;/p&gt;
&lt;p&gt;When ambient temperatures drop below freezing (which, in Minnesota, means roughly October through April) the onboard temperature sensors report values that the baseboard management controller interprets as a malfunction. The BMC's response is to spin every fan to maximum RPM as a protective measure.&lt;/p&gt;
&lt;p&gt;The result is a machine that, on quiet winter nights, is audible from the house. The house is a hundred and fifty feet away.&lt;/p&gt;
&lt;p&gt;I have not solved this problem. I have learned to live with it. You can override BMC fan curves on some platforms, but the Penguin Computing firmware is locked down in ways that make this nontrivial, and frankly, a server that runs its fans at full speed because it thinks it is dying is doing exactly what it should be doing. The firmware's assumptions are just wrong for the environment.&lt;/p&gt;
&lt;p&gt;The server runs 24/7 regardless of the season, and the cold air actually keeps the GPUs well within thermal limits. The irony is that the machine has never been cooler or louder than when it is twenty below zero outside. If you are considering a similar setup in a garage, basement, or outbuilding, factor in noise. A 2U server with four 250W GPUs is not quiet under any circumstances, and server-grade fans at full RPM are genuinely loud.&lt;/p&gt;
&lt;h3&gt;Setting Up the Software Stack&lt;/h3&gt;
&lt;p&gt;The driver situation for the P40 in 2026 is straightforward, though it was not always. NVIDIA's &lt;code&gt;nvidia-driver-570-server&lt;/code&gt; package works cleanly on Ubuntu, and the DKMS module rebuilds automatically on kernel updates, most of the time. I have had exactly two occasions where a kernel update broke the NVIDIA module and required manual intervention. This is fewer than I expected.&lt;/p&gt;
&lt;p&gt;For inference, I run &lt;a href="https://ollama.com"&gt;Ollama&lt;/a&gt;, which wraps llama.cpp and provides a simple API for model management and inference. Ollama handles multi-GPU sharding automatically: when you load a model, it distributes layers across GPUs based on available memory and model size. A 65GB model like gpt-oss:120b fits across three of the four P40s, leaving one free. Smaller models may only need one or two cards. The allocation is generally sensible, though you have less control over placement than you would with raw llama.cpp.&lt;/p&gt;
&lt;p&gt;The alternative stack (vLLM, TGI, or raw llama.cpp) offers more control over GPU assignment but requires more configuration. With llama.cpp directly, you can pin specific GPU layers to specific devices, which lets you optimize for the P40's memory topology. vLLM provides better batching and continuous batching for serving multiple concurrent requests. For a home lab where the primary use case is running various models for experimentation and development rather than serving production traffic, Ollama's simplicity wins.&lt;/p&gt;
&lt;p&gt;One thing worth noting: the P40 is well-supported by the GGUF ecosystem that llama.cpp (and therefore Ollama) uses. GGUF quantized models (Q4_K_M, Q5_K_M, Q8_0) run without issues on Pascal hardware. The quantization handles the BF16 problem for you: model weights are stored in 4-bit or 8-bit integer formats and dequantized to FP32 at runtime, which the P40 handles natively. You are not fighting the hardware; you are working with it.&lt;/p&gt;
&lt;h3&gt;The Benchmarks&lt;/h3&gt;
&lt;p&gt;Theory is cheap. Benchmarks are what matter. I ran the same inference workload across three configurations: my four P40 home lab, a single AWS Tesla T4 instance, and a quad T4 instance on AWS. The T4 is the closest cloud comparison; it is the workhorse inference GPU in AWS's fleet, one generation newer than the P40 (Turing architecture, 2018), with 16GB of GDDR6 and actual Tensor Cores.&lt;/p&gt;
&lt;p&gt;All benchmarks used Ollama with the same prompt, measuring tokens per second during the evaluation phase (excluding model load time).&lt;/p&gt;
&lt;h4&gt;Dense Models&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Parameters&lt;/th&gt;
&lt;th&gt;4x P40 (Home Lab)&lt;/th&gt;
&lt;th&gt;1x T4 (AWS \$0.53/hr)&lt;/th&gt;
&lt;th&gt;4x T4 (AWS \$3.91/hr)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.2&lt;/td&gt;
&lt;td&gt;3B&lt;/td&gt;
&lt;td&gt;94.3 tok/s&lt;/td&gt;
&lt;td&gt;81.5 tok/s&lt;/td&gt;
&lt;td&gt;101.5 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5&lt;/td&gt;
&lt;td&gt;7B&lt;/td&gt;
&lt;td&gt;52.7 tok/s&lt;/td&gt;
&lt;td&gt;36.9 tok/s&lt;/td&gt;
&lt;td&gt;40.3 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.1&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;47.8 tok/s&lt;/td&gt;
&lt;td&gt;35.7 tok/s&lt;/td&gt;
&lt;td&gt;29.2 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The P40 wins on the 7B and 8B models by substantial margins, 31% and 64% respectively over the quad T4 configuration. The only model where the T4 edges ahead is the 3B, which is small enough to fit entirely on a single GPU. Here, the T4's higher clock speeds and faster GDDR6 memory give it an advantage because there is no multi-GPU overhead to penalize it.&lt;/p&gt;
&lt;p&gt;The 8B result is particularly interesting. The quad T4 actually performs &lt;em&gt;worse&lt;/em&gt; than a single T4 on this model (29.2 vs 35.7 tok/s). Ollama shards the model across all four GPUs even though it fits on one, and the PCIe communication overhead between four T4s costs more than it gains. The P40, with its larger 24GB per-card memory, likely fits more of the model per GPU, reducing cross-GPU transfers.&lt;/p&gt;
&lt;h4&gt;The MoE Advantage&lt;/h4&gt;
&lt;p&gt;The most compelling benchmark comes from OpenAI's gpt-oss, a 120-billion parameter mixture-of-experts model with only 5.1 billion active parameters per token. The MoE architecture means the model's total weight is large (it needs the memory), but the computation per token is modest (only a fraction of the parameters fire for any given input).&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;4x P40&lt;/th&gt;
&lt;th&gt;4x T4 (AWS \$3.91/hr)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gpt-oss&lt;/td&gt;
&lt;td&gt;120B MoE (5.1B active)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;28.1 tok/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;20.6 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The P40 runs OpenAI's 120B model at 28.1 tokens per second, 36% faster than the cloud instance, and fast enough for comfortable interactive use. This is a state-of-the-art model running on decade-old GPUs at a speed that would have been impressive on much newer hardware a year ago.&lt;/p&gt;
&lt;p&gt;The reason is memory. The gpt-oss model uses MXFP4 quantization on its MoE weights, bringing the total model size to about 65GB. Four P40s offer 96GB of VRAM, enough to hold the entire model in GPU memory. Four T4s offer only 64GB, which means some of the model likely spills to system RAM, adding latency on every token.&lt;/p&gt;
&lt;p&gt;This is the P40's superpower: 24GB per card was overkill in 2016, and it is exactly right in 2026. Models have grown to fill the memory, and the P40 has more of it per dollar than almost anything else on the market.&lt;/p&gt;
&lt;h4&gt;Where It Falls Apart&lt;/h4&gt;
&lt;p&gt;Dense 70B models are a different story. Llama 3.1 70B at Q4_0 quantization (39GB) fits across 96GB of P40 VRAM, but the inference speed is essentially unusable: 0.033 tokens per second. One token every thirty seconds. Answering "What is 2+2?" took six and a half minutes. The combination of no Tensor Cores, PCIe 3.0 interconnect, and the sheer volume of cross-GPU data transfers for a dense 70B model pushes the per-token latency beyond any practical threshold.&lt;/p&gt;
&lt;p&gt;The quad T4 on AWS managed 2.0 tokens per second on the same model, sixty times faster. Slow, but functional. The T4's Tensor Cores make the difference here; at this scale, the P40's raw CUDA cores simply cannot keep up with the matrix math.&lt;/p&gt;
&lt;p&gt;The lesson: MoE models and quantized models up to about 8B parameters are the P40's sweet spot. Dense models above 13B start hitting diminishing returns. Dense 70B is a wall.&lt;/p&gt;
&lt;h3&gt;The Cost Argument&lt;/h3&gt;
&lt;p&gt;Here is the math that justifies the project.&lt;/p&gt;
&lt;p&gt;A &lt;code&gt;g4dn.12xlarge&lt;/code&gt; on AWS (four Tesla T4s, 48 vCPUs, 192GB RAM) costs \$3.91 per hour. My home lab outperforms it on every model except the smallest. If I run inference for just four hours a day, the cloud cost would be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Daily&lt;/strong&gt;: \$15.64&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monthly&lt;/strong&gt;: \$469&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Yearly&lt;/strong&gt;: \$5,694&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My server cost \$2,500 to build. It pays for itself in roughly five months of equivalent cloud usage. After that, the only ongoing cost is electricity. At Minnesota residential rates (roughly \$0.12/kWh) and an average draw of 800W under load, that is about \$70 per month. Less than a single day of the equivalent cloud instance.&lt;/p&gt;
&lt;p&gt;Even if you factor in the P40's lower performance on some workloads and assume you only get 70% of the cloud equivalent's utility, the break-even point is still well under a year. For a home lab that runs 24/7 for development, experimentation, and &lt;a href="https://tinycomputers.io/posts/clean-room-z80-emulator.html"&gt;text-to-speech generation&lt;/a&gt;, the economics are overwhelming.&lt;/p&gt;
&lt;h3&gt;What I Actually Use It For&lt;/h3&gt;
&lt;p&gt;The server runs several workloads:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Local LLM inference.&lt;/strong&gt; This is the primary use case. Having a local inference server with 96GB of VRAM means I can run frontier-class open-weight models without sending data to a cloud API. For development work, where I might make hundreds of inference calls while iterating on a project, the zero marginal cost changes how I work. I experiment more freely when each query costs nothing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Text-to-speech.&lt;/strong&gt; I run &lt;a href="https://tinycomputers.io/posts/clean-room-z80-emulator.html"&gt;Qwen TTS&lt;/a&gt; on the P40s to generate audio narration for blog posts. The model fits comfortably in the P40's memory, and the generation speed is acceptable for batch processing. The narration you hear on posts across this site was generated on these GPUs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Development and testing.&lt;/strong&gt; When I am building projects like &lt;a href="https://tinycomputers.io/posts/sampo-designing-a-16-bit-risc-cpu-from-scratch-part-1-theory-and-architecture.html"&gt;Sampo&lt;/a&gt; or &lt;a href="https://tinycomputers.io/posts/introducing-lattice-a-crystallization-based-programming-language.html"&gt;Lattice&lt;/a&gt;, having local GPU compute available for testing AI-assisted workflows means I do not need to worry about API rate limits or costs during intensive development sessions.&lt;/p&gt;
&lt;p&gt;The server sits on my local network at a static IP, accessible from any machine in the house. It is always on, always available, and always free to use. That availability changes your relationship with AI inference in ways that are hard to appreciate until you have lived with it. There is a psychological difference between "this costs two cents per query" and "this costs nothing per query." The first makes you think about whether the query is worth it. The second lets you experiment without friction, and that friction reduction, compounded across hundreds of daily interactions, fundamentally changes how you work.&lt;/p&gt;
&lt;p&gt;This is, incidentally, a small-scale example of the &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox&lt;/a&gt; I have been writing about in this blog's economics series. Making inference cheaper did not cause me to run the same number of queries and pocket the savings. It caused me to run dramatically more queries, on more models, for more projects, consuming more total compute than I ever would have purchased from a cloud provider. The efficiency created demand.&lt;/p&gt;
&lt;h3&gt;Should You Build One?&lt;/h3&gt;
&lt;p&gt;The honest answer is: it depends on what you value.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Build one if:&lt;/strong&gt;
- You run local inference regularly and the cloud costs are adding up
- You want 96GB of VRAM for under a thousand dollars in GPU costs
- You have the physical space, electrical capacity, and noise tolerance for a rack-mount server
- You enjoy the process of building and configuring systems; this is not a plug-and-play experience&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Do not build one if:&lt;/strong&gt;
- You need the latest model performance (Tensor Cores, FP8, NVLink)
- You are training models, not running inference
- You need reliability guarantees; this is a home lab, not a production environment
- You are not comfortable with Linux system administration, driver debugging, and occasional hardware troubleshooting&lt;/p&gt;
&lt;p&gt;The P40 window will not last forever. As newer GPUs age out of data centers (the V100, the A100) the P40 will eventually lose its price-to-performance advantage. The V100, with its first-generation Tensor Cores and 32GB of HBM2, is already starting to appear at attractive secondary market prices. Within a year, it may be the new sweet spot. But right now, in early 2026, four P40s on eBay represent one of the best deals in GPU computing. Ninety-six gigabytes of VRAM, proven CUDA compatibility, and a decade of driver maturity, for the price of a weekend trip.&lt;/p&gt;
&lt;p&gt;The server in my shop building will keep running. The fans will keep screaming through the Minnesota winter. And I will keep running models on hardware that a hyperscaler discarded three years ago, at speeds that would have been remarkable on any hardware five years ago. That is the beauty of the secondary market: someone else paid for the R&amp;amp;D, someone else paid for the depreciation, and you get the compute.&lt;/p&gt;</description><category>ai</category><category>benchmarks</category><category>cuda</category><category>deep learning</category><category>ebay</category><category>enterprise hardware</category><category>gpu</category><category>home lab</category><category>inference</category><category>nvidia</category><category>ollama</category><category>tesla p40</category><guid>https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html</guid><pubDate>Wed, 11 Mar 2026 14:00:00 GMT</pubDate></item><item><title>JokelaOS: Writing a Bare-Metal x86 Kernel from Scratch</title><link>https://tinycomputers.io/posts/jokelaos-bare-metal-x86-kernel.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/jokelaos-bare-metal-x86-kernel_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;24 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;There's a moment early in any OS project where the serial port prints its first character and you realize that nothing you've written has a safety net. No libc. No kernel underneath. No syscall to fall back on. If the byte appears on the terminal, it's because you programmed the UART divisor latch, polled the line status register, and wrote to the data port. If it doesn't appear, you stare at register dumps until you find the mistake. There's no debugger; you haven't written one yet.&lt;/p&gt;
&lt;p&gt;The closest thing I can compare it to is the first time I got a &lt;a href="https://tinycomputers.io/posts/arduino-z80-+-forth.html"&gt;RetroShield Z80&lt;/a&gt; talking over serial, that moment where a processor you wired up yourself pushes a character out of an emulated ACIA and it appears on your screen. The Z80 version involves physical hardware and solder. The x86 version is virtual (QEMU, a cross-compiler, and a Multiboot header), but the feeling is the same. You built the entire path from CPU to character. Nothing was given to you.&lt;/p&gt;
&lt;p&gt;JokelaOS started there: a Multiboot header, a stack, and a &lt;code&gt;call kmain&lt;/code&gt;. Everything that followed (GDT (Global Descriptor Table), IDT (Interrupt Descriptor Table), memory management, a network stack, preemptive multitasking, paging, user mode, a shell) was built one subsystem at a time, tested after every change, with no external code. No forks of existing kernels. No libc.&lt;/p&gt;
&lt;p&gt;To be clear about what this is: JokelaOS is a toy. It's a learning project. The memory allocator is a linear scan. The scheduler has no concept of priority. The file system can't delete files. The user authentication stores passwords in plaintext in a static array. Nothing here is production-grade, and none of it is intended to be. The value is in the building: understanding what each subsystem actually does by writing it from scratch, making the mistakes, and fixing them with nothing between you and the hardware.&lt;/p&gt;
&lt;p&gt;This is the story of what it takes to go from twenty lines of NASM to a kernel that boots, manages memory, runs user programs in Ring 3, handles syscalls, responds to pings, and gives you a command prompt.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/jokelaos/jokelaos0.png" alt="JokelaOS boot sequence in QEMU showing GDT, IDT, PCI enumeration, memory map, paging init, RTL8139 driver, and network stack initialization" style="max-width: 100%; border-radius: 6px; box-shadow: 0 10px 20px rgba(0,0,0,.1); margin: 1em 0;" loading="lazy"&gt;&lt;/p&gt;
&lt;h3&gt;The Target&lt;/h3&gt;
&lt;p&gt;JokelaOS targets 32-bit x86 (i686) and runs under QEMU. The toolchain is a cross-compiler (&lt;code&gt;i686-elf-gcc&lt;/code&gt;, &lt;code&gt;i686-elf-ld&lt;/code&gt;) with NASM for the assembly files. The C standard is &lt;code&gt;gnu11&lt;/code&gt;; GNU extensions are required for inline assembly. There are no external libraries whatsoever, not even a freestanding &lt;code&gt;string.h&lt;/code&gt;. Every &lt;code&gt;memcpy&lt;/code&gt;, every &lt;code&gt;memset&lt;/code&gt;, every &lt;code&gt;printf&lt;/code&gt;-like function is written from scratch.&lt;/p&gt;
&lt;p&gt;The only console is the serial port. COM1 at 0x3F8, 115200 baud, 8N1 (8 data bits, no parity, 1 stop bit). All kernel output goes through &lt;code&gt;serial_printf()&lt;/code&gt;. This is a deliberate choice: serial is simpler than VGA text mode, works perfectly with QEMU's &lt;code&gt;-serial stdio&lt;/code&gt;, and means the kernel's output appears directly in the host terminal. No framebuffer driver needed, no font rendering, no cursor management. Just bytes on a wire.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;$&lt;span class="w"&gt; &lt;/span&gt;make&lt;span class="w"&gt; &lt;/span&gt;run
qemu-system-i386&lt;span class="w"&gt; &lt;/span&gt;-kernel&lt;span class="w"&gt; &lt;/span&gt;build/jokelaos.bin&lt;span class="w"&gt; &lt;/span&gt;-serial&lt;span class="w"&gt; &lt;/span&gt;stdio&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;-display&lt;span class="w"&gt; &lt;/span&gt;none&lt;span class="w"&gt; &lt;/span&gt;-device&lt;span class="w"&gt; &lt;/span&gt;rtl8139,netdev&lt;span class="o"&gt;=&lt;/span&gt;net0&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;-netdev&lt;span class="w"&gt; &lt;/span&gt;user,id&lt;span class="o"&gt;=&lt;/span&gt;net0&lt;span class="w"&gt; &lt;/span&gt;-no-reboot
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Kernel Architecture&lt;/h3&gt;
&lt;p&gt;JokelaOS is monolithic: everything runs in Ring 0, in one address space. When the network stack needs a page, it calls &lt;code&gt;pmm_alloc_frame()&lt;/code&gt; directly. When the shell loads a program, the call chain goes through the loader, the PMM, and the paging subsystem without ever crossing an address space boundary. The trade-off is that a bug in the RTL8139 driver can corrupt the process table, and a buffer overrun in the serial handler can overwrite page tables. In a toy kernel written by one person, bugs are spectacular.&lt;/p&gt;
&lt;p&gt;A microkernel would isolate those failures, but it would also triple the code before you could print a single character. You'd need working IPC before the serial driver could talk to anything. JokelaOS is monolithic because it's the simplest architecture to build and the easiest to debug: &lt;code&gt;serial_printf()&lt;/code&gt; anywhere can see everything.&lt;/p&gt;
&lt;h3&gt;Booting: The First 33 Lines&lt;/h3&gt;
&lt;p&gt;The entire boot sequence fits in &lt;code&gt;boot.asm&lt;/code&gt;. Multiboot v1 requires a magic number (&lt;code&gt;0x1BADB002&lt;/code&gt;), flags, and a checksum in a specific header format. GRUB or QEMU's &lt;code&gt;-kernel&lt;/code&gt; loader scans for this header, loads the binary, and jumps to &lt;code&gt;_start&lt;/code&gt; in protected mode with paging disabled.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;section&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;.multiboot&lt;/span&gt;
&lt;span class="k"&gt;align&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;dd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x1BADB002&lt;/span&gt;&lt;span class="w"&gt;                           &lt;/span&gt;&lt;span class="c1"&gt;; Multiboot magic&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;dd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x00000003&lt;/span&gt;&lt;span class="w"&gt;                           &lt;/span&gt;&lt;span class="c1"&gt;; Flags: page-align + memory map&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;dd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x1BADB002&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x00000003&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="c1"&gt;; Checksum&lt;/span&gt;

&lt;span class="k"&gt;section&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;.text&lt;/span&gt;
&lt;span class="k"&gt;global&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;_start&lt;/span&gt;
&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;kmain&lt;/span&gt;

&lt;span class="nl"&gt;_start:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;mov&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;esp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;stack_top&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;popf&lt;/span&gt;&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="c1"&gt;; Clear EFLAGS&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;ebx&lt;/span&gt;&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;; Multiboot info struct pointer&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;; Multiboot magic number&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;kmain&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;cli&lt;/span&gt;
&lt;span class="nl"&gt;.hang:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;hlt&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;jmp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;.hang&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That's it. Set up a stack, clear the flags register, push the two values the Multiboot spec guarantees (magic number in EAX, info struct pointer in EBX), and call C. If &lt;code&gt;kmain&lt;/code&gt; ever returns, disable interrupts and halt forever.&lt;/p&gt;
&lt;p&gt;The 16 KB stack is allocated in the BSS section, zeroed at load time. The linker script places the kernel at 1 MB (the standard x86 protected-mode load address), with &lt;code&gt;.multiboot&lt;/code&gt; first so the bootloader can find the header within the first 8 KB of the binary.&lt;/p&gt;
&lt;h3&gt;Protection Rings: Hardware-Enforced Privilege&lt;/h3&gt;
&lt;p&gt;x86 protected mode provides four privilege levels, numbered 0 through 3, called rings. Ring 0 is the most privileged: the kernel runs here. Ring 3 is the least privileged: user programs run here. Rings 1 and 2 exist in the hardware but almost nobody uses them. Linux doesn't. Windows doesn't. JokelaOS doesn't. The practical x86 privilege model is two rings: kernel and user.&lt;/p&gt;
&lt;p&gt;The ring system isn't a software convention. It's enforced by the CPU itself, in silicon. The processor tracks the Current Privilege Level (CPL), the ring the currently executing code belongs to, and checks it against every sensitive operation. A Ring 3 process that executes &lt;code&gt;cli&lt;/code&gt; (disable interrupts), &lt;code&gt;hlt&lt;/code&gt; (halt the CPU), &lt;code&gt;lgdt&lt;/code&gt; (load a new GDT), or &lt;code&gt;mov cr3&lt;/code&gt; (change the page directory) triggers a General Protection Fault. The CPU literally refuses to execute the instruction. A Ring 3 process can't touch I/O ports unless the kernel has explicitly granted access through the I/O Permission Bitmap in the TSS. It can't modify its own segment registers to escalate privilege, because the CPU validates every segment load against the descriptor's DPL (Descriptor Privilege Level).&lt;/p&gt;
&lt;p&gt;The only way for Ring 3 code to enter Ring 0 is through a gate: an interrupt gate, a trap gate, or a call gate. Gates are entries in the IDT or GDT that the kernel sets up in advance. They define the exact entry points where Ring 3 code can cross into Ring 0, what the new code and stack segments will be, and what privilege level is required to use them. There's no way for user code to jump to an arbitrary kernel address. It can only enter the kernel through the doors the kernel has built.&lt;/p&gt;
&lt;p&gt;This is what makes an operating system an operating system rather than a library. Without ring separation, a buggy user program can corrupt kernel memory, disable interrupts, reprogram the PIC, or overwrite the page tables. With ring separation, the worst it can do is crash itself.&lt;/p&gt;
&lt;p&gt;The mechanism that implements all of this is the Global Descriptor Table.&lt;/p&gt;
&lt;h3&gt;The GDT: Defining the World&lt;/h3&gt;
&lt;p&gt;The GDT defines memory segments: their base addresses, sizes, privilege levels, and whether they hold code or data. Each segment descriptor is an 8-byte structure with fields packed into non-obvious bit positions (a consequence of backward compatibility with the 286, which had a different descriptor format that the 386 had to extend without breaking).&lt;/p&gt;
&lt;p&gt;JokelaOS uses a flat memory model: every segment covers the full 4 GB address space with base 0 and limit 0xFFFFFFFF. The segmentation hardware is effectively nullified, which is what you want on modern x86 where paging handles memory protection. But the GDT is still mandatory; the CPU requires it for the ring system to function. Even with flat segments, the DPL field in each descriptor is what tells the CPU "code using this segment is Ring 0" or "code using this segment is Ring 3."&lt;/p&gt;
&lt;p&gt;The GDT has six entries:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Index&lt;/th&gt;
&lt;th&gt;Selector&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0x00&lt;/td&gt;
&lt;td&gt;Null descriptor (required by x86)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0x08&lt;/td&gt;
&lt;td&gt;Kernel code (Ring 0, execute/read)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0x10&lt;/td&gt;
&lt;td&gt;Kernel data (Ring 0, read/write)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0x18&lt;/td&gt;
&lt;td&gt;User code (Ring 3, execute/read)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0x20&lt;/td&gt;
&lt;td&gt;User data (Ring 3, read/write)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;0x28&lt;/td&gt;
&lt;td&gt;Task State Segment&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Entries 1 and 2 are identical to entries 3 and 4 in every way except the DPL field: two bits in the access byte that say &lt;code&gt;00&lt;/code&gt; (Ring 0) versus &lt;code&gt;11&lt;/code&gt; (Ring 3). That two-bit difference is the entire kernel/user boundary.&lt;/p&gt;
&lt;p&gt;When a user process runs, the CPU's CS register is loaded with 0x1B; that's selector 0x18 (pointing to GDT entry 3, the user code segment) OR'd with RPL 3 (the bottom two bits of the selector). The data segment registers get 0x23 (GDT entry 4, user data, RPL 3). The CPU sets CPL to match, and from that point on, every instruction is checked against Ring 3 privileges. The kernel runs with CS=0x08 (GDT entry 1, RPL 0) and DS=0x10 (GDT entry 2, RPL 0).&lt;/p&gt;
&lt;p&gt;The TSS (Task State Segment) is the bridge between rings. When the CPU takes an interrupt while running Ring 3 code, it needs to switch to a Ring 0 stack, because you can't trust the user's stack pointer to be valid, and you certainly can't run kernel interrupt handlers on a user-controlled stack. The TSS holds the Ring 0 stack pointer (&lt;code&gt;esp0&lt;/code&gt;). Every context switch updates the TSS with the current process's kernel stack, so the CPU always knows where to land when transitioning from user mode to kernel mode.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;gdt_init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;gdt_set_entry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;                     &lt;/span&gt;&lt;span class="c1"&gt;// Null&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;gdt_set_entry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFFFFFFFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x9A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xCF&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// Kernel code&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;gdt_set_entry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFFFFFFFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xCF&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// Kernel data&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;gdt_set_entry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFFFFFFFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xCF&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// User code&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;gdt_set_entry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFFFFFFFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xF2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xCF&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// User data&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// TSS entry built separately&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The access byte &lt;code&gt;0x9A&lt;/code&gt; means: present, Ring 0, code segment, executable, readable. &lt;code&gt;0xFA&lt;/code&gt; means the same thing but Ring 3. These magic numbers come straight from the Intel manuals and they're the kind of thing you get wrong three times before you get right once.&lt;/p&gt;
&lt;h3&gt;Interrupts: Exceptions, IRQs, and the PIC&lt;/h3&gt;
&lt;p&gt;The IDT maps interrupt vectors to handler functions. JokelaOS sets up 256 entries: CPU exceptions (0-31), hardware IRQs (32-47), and the syscall gate (0x80).&lt;/p&gt;
&lt;p&gt;The x86 PIC needs remapping. By default, the master PIC maps IRQs 0-7 to interrupt vectors 8-15, which collide with CPU exceptions (double fault is vector 8, for instance). The standard fix is to remap the master PIC to vectors 32-39 and the slave to 40-47. This requires sending four Initialization Command Words to each PIC in the correct sequence, the kind of hardware protocol that hasn't changed since the IBM PC/AT in 1984.&lt;/p&gt;
&lt;p&gt;ISR stubs are written in NASM. Each one pushes an error code (or a dummy zero for exceptions that don't push one), pushes the interrupt number, saves all general-purpose registers, calls the C handler, restores registers, and does an &lt;code&gt;iret&lt;/code&gt;. The stubs are generated with macros:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;%macro ISR_NOERRCODE 1&lt;/span&gt;
&lt;span class="k"&gt;global&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;isr&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="nf"&gt;isr&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;dword&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;; dummy error code&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;dword&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; interrupt number&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;jmp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;isr_common&lt;/span&gt;
&lt;span class="cp"&gt;%endmacro&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The C-side dispatcher checks the interrupt number. For exceptions (0-31), it prints the register state and halts, since there's no recovery from a page fault when you don't have a page fault handler yet. For IRQs (32-47), it calls the registered handler function and sends an EOI command to the PIC. For interrupt 0x80, it dispatches to the syscall handler.&lt;/p&gt;
&lt;p&gt;One critical detail: interrupt 0x80 is set as a &lt;strong&gt;trap gate&lt;/strong&gt; with DPL 3, not an interrupt gate. This means Ring 3 code can trigger it with &lt;code&gt;int 0x80&lt;/code&gt;. All other interrupt gates are DPL 0, so a user program that tries to execute &lt;code&gt;int 0x00&lt;/code&gt; gets a General Protection Fault instead. This is the mechanism that makes syscalls work while keeping everything else protected.&lt;/p&gt;
&lt;h3&gt;Memory: Three Allocators&lt;/h3&gt;
&lt;p&gt;JokelaOS has three layers of memory management, each built on top of the previous one.&lt;/p&gt;
&lt;h4&gt;The Bump Allocator&lt;/h4&gt;
&lt;p&gt;The simplest possible allocator. A pointer starts at the first page boundary after the kernel image (&lt;code&gt;_kernel_end&lt;/code&gt; from the linker script) and only moves forward. &lt;code&gt;kmalloc(size)&lt;/code&gt; aligns the pointer to 16 bytes, returns it, and advances by &lt;code&gt;size&lt;/code&gt;. There is no &lt;code&gt;kfree()&lt;/code&gt;. Memory allocated with the bump allocator is permanent.&lt;/p&gt;
&lt;p&gt;This sounds primitive, and it is. But it's also exactly right for kernel initialization. The GDT, IDT, page tables, file system metadata, user table; these are allocated once and never freed. The bump allocator handles all of them with zero fragmentation and zero overhead.&lt;/p&gt;
&lt;h4&gt;The Physical Memory Manager&lt;/h4&gt;
&lt;p&gt;Once the kernel needs to allocate and free pages dynamically (for process stacks, program code, page tables), it needs a real allocator. The PMM uses a bitmap: one bit per 4 KB physical frame, supporting up to 256 MB of RAM (65,536 frames, 8 KB bitmap).&lt;/p&gt;
&lt;p&gt;Initialization parses the Multiboot memory map to find usable RAM regions, then marks everything from frame 0 through the end of the bump heap as reserved. This protects the IVT, BIOS data area, kernel image, and all bump-allocated structures from being handed out as free pages.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;pmm_alloc_frame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;total_frames&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bitmap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;bitmap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;free_count&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PAGE_SIZE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// out of memory&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Linear scan, no free lists, no buddy system. It's O(n) per allocation, which is fine when n is measured in thousands and allocations are infrequent. A production kernel would use something smarter. This kernel allocates a few dozen pages total.&lt;/p&gt;
&lt;h4&gt;Paging&lt;/h4&gt;
&lt;p&gt;With physical frames available, the kernel can enable paging. &lt;code&gt;paging_init()&lt;/code&gt; builds a page directory and 32 page tables, identity-mapping the first 128 MB of physical memory (virtual address = physical address). The page directory goes into CR3, and setting the PG bit in CR0 turns the MMU on.&lt;/p&gt;
&lt;p&gt;Identity mapping means the kernel doesn't need to worry about virtual-to-physical translation for its own code and data. Kernel pointers just work. When user processes need memory, the loader allocates physical frames and maps them into the process's address space with the PG_USER flag set, allowing Ring 3 access.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;paging_map_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;virt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;phys&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dir_idx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;virt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;22&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tbl_idx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;virt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x3FF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_directory&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dir_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PG_PRESENT&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tbl_frame&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pmm_alloc_frame&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;memset&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;tbl_frame&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PAGE_SIZE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;page_directory&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dir_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tbl_frame&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PG_PRESENT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PG_WRITE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;page_directory&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dir_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFFFFF000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tbl_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phys&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFFFFF000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;asm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;volatile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"invlpg (%0)"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;virt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"memory"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;invlpg&lt;/code&gt; instruction flushes the TLB entry for the mapped virtual address, which is critical. Without it, the CPU might use a stale translation from its cache and access the wrong physical page.&lt;/p&gt;
&lt;h3&gt;The Network Stack&lt;/h3&gt;
&lt;p&gt;JokelaOS has a working network stack, the one subsystem where "toy" undersells it slightly. It resolves ARP, constructs IPv4 packets with correct checksums, and handles ICMP echo request/reply with measured round-trip times. There's no TCP, no UDP, no sockets. But the packets that leave this kernel are real packets that traverse real networks.&lt;/p&gt;
&lt;p&gt;The NIC is an emulated RTL8139, the simplest PCI Ethernet controller that QEMU supports. The driver initializes the chip by writing to its configuration registers: reset, enable transmitter and receiver, set up a receive ring buffer, configure the interrupt mask, and unmask IRQ 11. Packet transmission uses a four-descriptor TX ring; reception is interrupt-driven through the RTL8139's ring buffer.&lt;/p&gt;
&lt;p&gt;PCI enumeration scans the configuration space to find the RTL8139 by vendor/device ID (0x10EC:0x8139), reads the I/O base address from BAR0, and enables bus mastering. This is the only driver in the system; there's no USB, no disk, no display. One NIC, one network.&lt;/p&gt;
&lt;p&gt;The stack is layered:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Link&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ethernet.c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Frame demux by EtherType&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ARP&lt;/td&gt;
&lt;td&gt;&lt;code&gt;arp.c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Table + request/reply&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ipv4.c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Routing, header checksum&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transport&lt;/td&gt;
&lt;td&gt;&lt;code&gt;icmp.c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Echo reply + outgoing ping&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;On boot, the kernel sends an ARP request for the gateway (10.0.2.2, QEMU's default) and waits for the reply. Once the gateway's MAC address is resolved, the kernel can ping arbitrary hosts through QEMU's SLIRP NAT. A &lt;code&gt;ping 10.1.1.1&lt;/code&gt; from the shell constructs an ICMP echo request, wraps it in an IPv4 packet, wraps that in an Ethernet frame, and pushes it out through the RTL8139's TX ring. When the reply comes back, the receive ISR fires, the Ethernet layer demuxes by EtherType, the IP layer validates the checksum, and the ICMP handler matches the echo reply to the outstanding request and computes the RTT.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.1.1.1&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Pinging&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.1.1.1&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.1.1.1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.1.1.1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.1.1.1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.1.1.1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Getting here required writing every byte-order conversion (&lt;code&gt;htons&lt;/code&gt;, &lt;code&gt;htonl&lt;/code&gt;), every checksum computation (the IP header checksum is a one's complement sum of 16-bit words), every packet layout (Ethernet header is 14 bytes, IP header is 20, ICMP is 8 plus payload). None of this is hard individually. Together, it's a thousand places to put a byte in the wrong order.&lt;/p&gt;
&lt;h3&gt;Processes and Preemptive Multitasking&lt;/h3&gt;
&lt;p&gt;The process subsystem manages up to 16 processes in a static table. Each process has a state (UNUSED, READY, RUNNING, DEAD), a kernel stack pointer, and a user-mode entry point and stack.&lt;/p&gt;
&lt;p&gt;Process creation doesn't follow the UNIX &lt;code&gt;fork()&lt;/code&gt;/&lt;code&gt;exec()&lt;/code&gt; model. There's no cloning of address spaces, no copy-on-write, no replacing the current process image. Instead, the loader allocates fresh physical frames for the program's code and stack, copies the flat binary into the code pages, and calls &lt;code&gt;proc_create()&lt;/code&gt;, which allocates a 4 KB kernel stack and builds a fake stack frame on it. This stack frame is what &lt;code&gt;context_switch()&lt;/code&gt; will "return" into on the process's first schedule; it contains saved registers and a return address pointing to &lt;code&gt;proc_entry_user()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;proc_entry_user()&lt;/code&gt; is a small assembly sequence that performs the Ring 0 to Ring 3 transition. It sets the data segment registers to the user data selector (0x23), pushes a fake interrupt frame (SS, ESP, EFLAGS with IF=1, CS, EIP), and executes &lt;code&gt;iret&lt;/code&gt;. The CPU pops the frame, switches to Ring 3, and starts executing the user program. From the hardware's perspective, this looks identical to returning from an interrupt that happened to interrupt a user-mode program, which is exactly the trick.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;proc_entry_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;process_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;proc_current&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;asm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;volatile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"mov $0x23, %%ax &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"mov %%ax, %%ds  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"mov %%ax, %%es  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"mov %%ax, %%fs  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"mov %%ax, %%gs  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"push $0x23      &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// SS&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"push %0         &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// ESP&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"pushf           &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"pop %%eax       &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"or $0x200, %%eax&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// Set IF&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"push %%eax      &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// EFLAGS&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"push $0x1B      &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// CS (user code)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"push %1         &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// EIP&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"iret"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;user_esp&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;user_eip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"eax"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"memory"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Context switching uses a simple assembly stub in &lt;code&gt;switch.asm&lt;/code&gt;. It saves the callee-saved registers (EBP, EBX, ESI, EDI), stores ESP into the old process's slot, loads the new process's ESP, restores registers, and returns. The &lt;code&gt;ret&lt;/code&gt; instruction pops the return address from the new stack and resumes where that process left off.&lt;/p&gt;
&lt;p&gt;Scheduling is preemptive round-robin. The PIT fires at 1000 Hz. Every 10 ticks (10 ms), the IRQ handler calls &lt;code&gt;proc_schedule()&lt;/code&gt;, which finds the next READY process and switches to it. If no user processes are ready, control stays with PID 0 (the kernel/shell). This is the minimum viable scheduler: no priorities, no time slices, no fairness guarantees. But it works: two user programs printing characters to serial run concurrently, interleaved by the timer.&lt;/p&gt;
&lt;h3&gt;Syscalls&lt;/h3&gt;
&lt;p&gt;User programs communicate with the kernel through &lt;code&gt;int 0x80&lt;/code&gt;. The mechanism, a software interrupt that transitions from Ring 3 to Ring 0, is the same one Linux used on i386 before &lt;code&gt;sysenter&lt;/code&gt; replaced it. The register convention is borrowed too: syscall number in EAX, arguments in EBX/ECX/EDX/ESI/EDI, return value in EAX. But that's where the resemblance ends.&lt;/p&gt;
&lt;p&gt;JokelaOS is not a UNIX. The syscall numbers are custom (exit is 0, write is 1, getpid is 2, read is 3), not Linux's i386 table (where exit is 1, read is 3, write is 4, getpid is 20). There's no &lt;code&gt;fork()&lt;/code&gt;, no &lt;code&gt;exec()&lt;/code&gt;, no &lt;code&gt;open()&lt;/code&gt;, no &lt;code&gt;close()&lt;/code&gt;, no signals, no pipes. File descriptors 0 and 1 exist as concepts (stdin maps to the keyboard buffer, stdout maps to the serial port) but there's no file descriptor table behind them. The syscall handler just checks &lt;code&gt;if (fd == 1)&lt;/code&gt; and calls &lt;code&gt;serial_putchar()&lt;/code&gt;. The process model isn't UNIX either; there's no parent/child relationship, no &lt;code&gt;wait()&lt;/code&gt;, no process groups. Processes are created by the loader and scheduled round-robin until they exit. It's closer to a microcontroller RTOS than to anything in the UNIX lineage.&lt;/p&gt;
&lt;p&gt;Four syscalls are implemented:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Number&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Arguments&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;SYS_EXIT&lt;/td&gt;
&lt;td&gt;ebx=status&lt;/td&gt;
&lt;td&gt;Terminate process&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;SYS_WRITE&lt;/td&gt;
&lt;td&gt;ebx=fd, ecx=buf, edx=len&lt;/td&gt;
&lt;td&gt;Write to serial (fd=1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;SYS_GETPID&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;Return current PID&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;SYS_READ&lt;/td&gt;
&lt;td&gt;ebx=fd, ecx=buf, edx=len&lt;/td&gt;
&lt;td&gt;Read from keyboard (fd=0)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This is enough to write programs that print output, read input, identify themselves, and exit cleanly. The syscall dispatcher validates file descriptors (only 0 and 1 are legal) and bounds-checks lengths. SYS_WRITE sends bytes to the serial port; SYS_READ drains the keyboard buffer non-blocking.&lt;/p&gt;
&lt;p&gt;User programs are flat binaries: raw machine code with no headers, no relocations, no ELF parsing. The loader copies the binary to freshly allocated pages and jumps to byte zero. Programs that need to reference their own data use position-independent tricks:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;next&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; push EIP&lt;/span&gt;
&lt;span class="nl"&gt;next:&lt;/span&gt;
&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;ebp&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;; EBP = address of this instruction&lt;/span&gt;
&lt;span class="nf"&gt;lea&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;ecx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;ebp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;offset_to_data&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is the same technique used by shellcode and position-independent code on x86. It works because &lt;code&gt;call&lt;/code&gt; pushes the address of the next instruction, which gives you a known reference point relative to the code's actual load address.&lt;/p&gt;
&lt;h3&gt;The Shell&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/jokelaos/jokelaos1.png" alt="JokelaOS running in QEMU: ping output, login prompt, and ps command showing process table" style="max-width: 100%; border-radius: 6px; box-shadow: 0 10px 20px rgba(0,0,0,.1); margin: 0 0 1em 0;" loading="lazy"&gt;&lt;/p&gt;
&lt;p&gt;With all the subsystems in place, the shell ties them together into something interactive. &lt;code&gt;shell_run()&lt;/code&gt; is the kernel's main loop after initialization. It presents a login prompt, authenticates against the user table, and drops into a command interpreter.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="o"&gt;==============================&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;JokelaOS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="mf"&gt;.1&lt;/span&gt;
&lt;span class="o"&gt;==============================&lt;/span&gt;

&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GDT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;loaded&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ring&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ring&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TSS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;loaded&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PIC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;remapped&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Multiboot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;confirmed&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Multiboot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x9500&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Bump&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;allocator&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ready&lt;/span&gt;

&lt;span class="n"&gt;PCI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;03.0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;vendor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;EC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8139&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RTL8139&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;RTL8139&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ready&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MAC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;52&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;54&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;56&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;IRQ&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;

&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;ramfs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Users&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;guest&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;PMM&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;31269&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;free&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;122&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Paging&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;enabled&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;identity&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;mapped&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PIT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;timer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Hz&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Keyboard&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;serial&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ready&lt;/span&gt;

&lt;span class="n"&gt;JokelaOS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;alive&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="nl"&gt;login&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;
&lt;span class="nl"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;****&lt;/span&gt;
&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The shell supports: &lt;code&gt;help&lt;/code&gt;, &lt;code&gt;ls&lt;/code&gt;, &lt;code&gt;run &amp;lt;program&amp;gt;&lt;/code&gt;, &lt;code&gt;ps&lt;/code&gt;, &lt;code&gt;mem&lt;/code&gt;, &lt;code&gt;ping &amp;lt;ip&amp;gt;&lt;/code&gt;, &lt;code&gt;uptime&lt;/code&gt;, &lt;code&gt;whoami&lt;/code&gt;, and &lt;code&gt;logout&lt;/code&gt;. The line editor handles backspace. Password input echoes asterisks. The &lt;code&gt;run&lt;/code&gt; command loads a flat binary from ramfs, creates a process, and the scheduler picks it up on the next timer tick.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ps&lt;/code&gt; shows the process table:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;root$ ps
  PID  STATE
    0  RUNNING
    1  READY
    2  DEAD
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;mem&lt;/code&gt; shows memory usage:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;root$ mem
Heap used: 8832 bytes
PMM free:  31267 frames (122 MB)
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The keyboard input path is worth noting. The PS/2 keyboard controller fires IRQ 1. The handler reads the scancode from port 0x60, converts it to ASCII using a US QWERTY lookup table (with shift modifier tracking), and drops it into a 256-byte circular buffer. Serial input takes the same path; the UART's receive interrupt (IRQ 4) reads the incoming byte and injects it into the keyboard buffer. This means the shell works identically whether you're typing on a PS/2 keyboard or through the QEMU serial console.&lt;/p&gt;
&lt;h3&gt;The RAM File System&lt;/h3&gt;
&lt;p&gt;User programs need to live somewhere. With no disk driver, the file system is purely in-memory. &lt;code&gt;ramfs&lt;/code&gt; stores up to 32 files, each with a name (28 bytes), a data pointer, and a size. &lt;code&gt;ramfs_create()&lt;/code&gt; allocates space with the bump allocator and copies the binary in. &lt;code&gt;ramfs_find()&lt;/code&gt; does a linear search by name.&lt;/p&gt;
&lt;p&gt;During boot, two test programs are embedded directly in &lt;code&gt;kmain.c&lt;/code&gt; as byte arrays of hand-assembled x86 machine code. One prints the character '1' ten times; the other prints '2' ten times. Both use SYS_WRITE to output through the serial port and SYS_EXIT to terminate cleanly. They're loaded into ramfs, and &lt;code&gt;run print1&lt;/code&gt; from the shell executes them in user mode.&lt;/p&gt;
&lt;p&gt;This is about as minimal as a file system gets. No directories, no permissions, no deletion. But it demonstrates the complete path from "bytes in kernel memory" to "user-mode process executing with its own address space."&lt;/p&gt;
&lt;h3&gt;What I Learned&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;The boot process is the hardest part.&lt;/strong&gt; Not because the code is complex (&lt;code&gt;boot.asm&lt;/code&gt; is 33 lines), but because when something goes wrong, you have zero diagnostic capability. The serial port isn't initialized yet. The IDT isn't loaded. If your Multiboot header checksum is wrong by one bit, QEMU silently fails. You're debugging with QEMU's &lt;code&gt;-d int&lt;/code&gt; flag and reading hex dumps of interrupt frames.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;x86 protected mode is an archaeology project.&lt;/strong&gt; The PIC remapping sequence dates from the IBM PC/AT (1984). The GDT access bytes encode information in bit patterns designed for hardware that predates flat memory models. The TSS exists because Intel's original vision for the 286 involved hardware task switching that nobody ended up using. You're programming against forty years of backward compatibility, and every one of those layers is still there, still mandatory, still silently breaking things if you get it wrong.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The gap between "works in Ring 0" and "works in Ring 3" is enormous.&lt;/strong&gt; A kernel that runs entirely in supervisor mode can be surprisingly simple. The moment you add user mode, you need: the TSS (so the CPU knows where the kernel stack is), Ring 3 GDT segments, trap gates for syscalls, a mechanism to build fake interrupt frames for the initial &lt;code&gt;iret&lt;/code&gt; into user mode, and careful validation of every pointer that crosses the kernel boundary. Each of these is individually straightforward. Getting them all correct simultaneously is not.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Preemptive scheduling is simpler than it sounds.&lt;/strong&gt; The concept (save state, pick next process, restore state) translates almost directly into code. The context switch is twelve instructions of assembly. The scheduler is a for loop. What makes it tricky is the interaction with everything else: the TSS must be updated, the interrupt must send EOI before switching, the process's kernel stack must be set up so that restoring registers and returning lands in the right place. The scheduler itself is trivial. The invariants it depends on are not.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Writing a network stack is an exercise in byte ordering.&lt;/strong&gt; Ethernet is big-endian. x86 is little-endian. IP addresses, port numbers, checksums, packet lengths: every multi-byte field requires explicit conversion. Miss one &lt;code&gt;htons()&lt;/code&gt; and your packets are valid-looking garbage. The RTL8139 driver, the ARP implementation, the IP checksum; each is maybe fifty lines. The debugging when a byte is swapped is hours.&lt;/p&gt;
&lt;h3&gt;The Numbers&lt;/h3&gt;
&lt;p&gt;JokelaOS in its current form:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Files&lt;/th&gt;
&lt;th&gt;Approximate LOC&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Boot (ASM)&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;~120&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel core&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;~1,200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Drivers&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;~250&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network stack&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;~450&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;26&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~2,000&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Two thousand lines for a kernel that boots, manages memory with paging, runs preemptive multitasking with Ring 3 isolation, handles interrupts, implements syscalls, has a working network stack, and provides an interactive shell. No line is borrowed from another project. Every byte is accounted for.&lt;/p&gt;
&lt;p&gt;The entire thing builds in under a second and the binary is around 40 KB. &lt;code&gt;make run&lt;/code&gt; goes from source to a running kernel in QEMU in about two seconds. This fast iteration cycle is what made the project possible; every subsystem was tested immediately after being written, and bugs were caught before they could compound.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The point of JokelaOS was never to build a production operating system. The point was to understand what an operating system actually does: not in the abstract, not from a textbook diagram, but in the specific, concrete sense of "these bytes go into these ports in this order and then the hardware does this thing." Every subsystem in JokelaOS exists because I wanted to understand it, and the only way to truly understand a piece of systems software is to write it yourself.&lt;/p&gt;
&lt;p&gt;The source code is on &lt;a href="https://baud.rs/B9FPjG"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;</description><category>assembly</category><category>bare metal</category><category>c</category><category>kernel</category><category>multitasking</category><category>networking</category><category>osdev</category><category>paging</category><category>qemu</category><category>systems programming</category><category>x86</category><guid>https://tinycomputers.io/posts/jokelaos-bare-metal-x86-kernel.html</guid><pubDate>Tue, 10 Mar 2026 15:00:00 GMT</pubDate></item><item><title>The Cathedral and the Bazaar, Nearly 30 Years Later</title><link>https://tinycomputers.io/posts/the-cathedral-and-the-bazaar-nearly-30-years-later.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-cathedral-and-the-bazaar-nearly-30-years-later_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;20 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/cathedral-bazaar/eric-raymond.jpg" alt="Eric S. Raymond" style="float: right; margin: 0 0 15px 20px; max-width: 240px; border-radius: 6px;" title="Eric S. Raymond, author of 'The Cathedral and the Bazaar.' Photo by jerone2, CC BY-SA 2.0, via Wikimedia Commons."&gt;&lt;/p&gt;
&lt;p&gt;In 1997, Eric S. Raymond presented a paper at the Linux Kongress in Bavaria that would reshape how an entire industry thought about building software. "The Cathedral and the Bazaar" drew a sharp line between two models of development. The cathedral: careful, centralized, release-when-ready. The bazaar: open, decentralized, release-early-and-often. Raymond argued, with considerable evidence from the Linux kernel and his own fetchmail project, that the bazaar would win.&lt;/p&gt;
&lt;p&gt;Nearly three decades later, we can evaluate the claim. And the answer is more interesting than a simple yes or no.&lt;/p&gt;
&lt;h3&gt;What Raymond Actually Argued&lt;/h3&gt;
&lt;p&gt;The essay's core thesis was that certain software problems (particularly large, complex ones) were better solved by decentralized communities than by centralized teams. Raymond distilled this into several principles, the most famous being Linus's Law: "Given enough eyeballs, all bugs are shallow." With enough contributors examining source code, every bug would be obvious to someone.&lt;/p&gt;
&lt;p&gt;He identified several supporting dynamics. Release early and often. Treat your users as co-developers. If you treat beta testers as your most valuable resource, they'll respond by becoming your most valuable resource. Keep the architecture modular enough that contributors can work on pieces independently.&lt;/p&gt;
&lt;p&gt;The implicit assumption was ideological as much as technical. Open-source development would succeed because it aligned individual motivation (scratching a personal itch, building reputation, the intellectual satisfaction of solving problems) with collective benefit. No corporate hierarchy required. No cathedral architects directing the work from above.&lt;/p&gt;
&lt;p&gt;It was, in its way, a profoundly optimistic vision of human coordination.&lt;/p&gt;
&lt;p&gt;Here's what makes the timing remarkable: when Raymond presented his paper in 1997, the term "open source" didn't exist. He was writing about "free software" and the Linux development model. The phrase was coined months later, in February 1998, at a strategy session in Palo Alto, partly catalyzed by the essay's success and Netscape's decision to release the Navigator source code (a decision Raymond's essay directly influenced). The Open Source Initiative followed weeks after that, co-founded by Raymond and Bruce Perens.&lt;/p&gt;
&lt;p&gt;The movement Raymond was describing was young. The GNU Project was fourteen years old. The Free Software Foundation was twelve. The GPL was eight. Linux itself was only six. BSD had been circulating since the late 1970s, but the &lt;a href="https://tinycomputers.io/posts/how-bsds-licensing-issues-paved-the-way-for-linuxs-rise-to-prominence.html"&gt;legal battles&lt;/a&gt; that nearly killed it were barely resolved. There was no GitHub, no SourceForge, no standardized workflow for distributed contribution. The bazaar Raymond championed was a handful of mailing lists, FTP servers, and the sheer force of Linus Torvalds's integrative judgment.&lt;/p&gt;
&lt;p&gt;The essay didn't just describe a revolution. It named one that hadn't named itself yet.&lt;/p&gt;
&lt;h3&gt;The Bazaar Won&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/cathedral-bazaar/grand-bazaar-istanbul.jpg" alt="Interior of the Grand Bazaar, Istanbul" style="max-width: 100%; border-radius: 6px; margin-bottom: 15px;" title="The Grand Bazaar in Istanbul, one of the oldest and largest covered markets in the world. Photo by Slyronit, CC BY-SA 4.0, via Wikimedia Commons."&gt;&lt;/p&gt;
&lt;p&gt;By any quantitative measure, Raymond was right. Linux runs the cloud. Android runs the phone. Firefox and then Chromium reshaped the browser. Apache and then Nginx served the web. PostgreSQL and MySQL handled the data. Python, Ruby, Node.js, Rust, Go: the languages that define modern development are overwhelmingly open-source.&lt;/p&gt;
&lt;p&gt;The numbers are staggering. GitHub hosts over 400 million repositories. The Linux kernel has received contributions from over 20,000 individual developers. Every major cloud provider (Amazon, Microsoft, Google) runs on open-source infrastructure. Even Microsoft, which once called Linux a "cancer," now contributes to it, acquired GitHub, and ships a Linux kernel inside Windows.&lt;/p&gt;
&lt;p&gt;If you'd told someone in 1997 that the world's most valuable companies would run their businesses on software they didn't own and couldn't fully control, they would have questioned your judgment. Raymond's prediction wasn't just right. It was conservative.&lt;/p&gt;
&lt;h3&gt;The Cathedral Came Back Wearing Bazaar Clothes&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/cathedral-bazaar/toledo-cathedral.jpg" alt="Interior of the Gothic Cathedral of Toledo, Spain" style="float: left; margin: 0 20px 15px 0; max-width: 300px; border-radius: 6px;" title="Interior of the Cathedral of Toledo, Spain. Photo by Adam Jones, CC BY-SA 3.0, via Wikimedia Commons."&gt;&lt;/p&gt;
&lt;p&gt;Here's where Raymond's vision diverges from what actually happened. The bazaar won, but the cathedrals adapted.&lt;/p&gt;
&lt;p&gt;Meta open-sources PyTorch and Llama. Google open-sources TensorFlow, Kubernetes, Android, and Chromium. Microsoft open-sources VS Code, TypeScript, and .NET. Amazon builds its most profitable business on top of open-source databases, then offers them as managed services. These are not acts of ideological commitment. They are strategic decisions made by organizations with cathedral-scale resources and cathedral-scale ambitions.&lt;/p&gt;
&lt;p&gt;The pattern is consistent: open-source the layer you want to commoditize, then capture value at the layer above. Google open-sources Android to commoditize mobile operating systems, then captures value through the Play Store and advertising. Meta open-sources PyTorch to commoditize the AI framework layer, then captures value through the models and services built on top. Amazon doesn't need to own the database; it needs to own the infrastructure the database runs on.&lt;/p&gt;
&lt;p&gt;This is what Raymond didn't anticipate. The bazaar model wasn't just adopted by idealists scratching personal itches. It was weaponized by the most powerful corporations in history as a competitive strategy. The &lt;a href="https://tinycomputers.io/posts/how-bsds-licensing-issues-paved-the-way-for-linuxs-rise-to-prominence.html"&gt;BSD licensing disputes&lt;/a&gt; that shaped early open-source history look almost quaint compared to the strategic licensing wars that followed.&lt;/p&gt;
&lt;p&gt;There's a personal irony here too. Raymond himself wasn't immune to the cathedral's gravitational pull. He received 150,000 pre-IPO shares of VA Linux, briefly making him worth approximately \$36 million. He wrote an essay called "Surprised by Wealth" about the experience, pledging that the money wouldn't change him. By April 2002, the shares were &lt;a href="https://workbench.cadenhead.org/news/3149/eric-s-raymond-bazaar-financial-advisor"&gt;worth \$195,000&lt;/a&gt;; he'd held through the entire collapse without selling. The bazaar's chief evangelist got rich, briefly, through the most cathedral-scale financial mechanism in capitalism: Wall Street pre-IPO allocations. The wealth came and went through institutions the bazaar model was supposed to make irrelevant.&lt;/p&gt;
&lt;p&gt;Joel Spolsky described this dynamic in 2002 as "commoditize your complements." Open-source your competitors' profit center, and your own products become more valuable. But even Spolsky didn't fully see how far it would go. In 2026, the bazaar is less a revolutionary alternative to the cathedral than a resource the cathedral harvests.&lt;/p&gt;
&lt;h3&gt;The Efficiency That Created More, Not Less&lt;/h3&gt;
&lt;p&gt;Raymond's essay focused on the development model: how code gets written, reviewed, and shipped. What he didn't explore was the economic consequence of making infrastructure-quality software free.&lt;/p&gt;
&lt;p&gt;When the bazaar model succeeded, it didn't just change how software was built. It changed how much software existed. By making operating systems, web servers, databases, programming languages, and frameworks available at zero marginal cost, the bazaar removed the floor from the cost of building new things. A startup in 2005 could do what a well-funded company in 1995 could not, because the entire stack was free.&lt;/p&gt;
&lt;p&gt;The result wasn't less total development effort. It was dramatically more. Linux didn't consolidate the operating system landscape into one efficient platform; it spawned hundreds of distributions, each with its own community, its own design philosophy, its own ecosystem. Free databases didn't mean fewer databases. It meant PostgreSQL, MySQL, MariaDB, SQLite, MongoDB, Redis, CockroachDB, and dozens more, each serving demand that wouldn't have existed if everyone had to pay Oracle prices.&lt;/p&gt;
&lt;p&gt;This pattern (efficiency gains leading to expanded consumption rather than reduced effort) &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;has a name in economics&lt;/a&gt;, and it shows up everywhere technology reduces the cost of a critical input. The bazaar made software infrastructure cheap, and the world responded by building more software than anyone in 1997 could have imagined.&lt;/p&gt;
&lt;p&gt;There's a second-order effect too. By making infrastructure free, the bazaar lowered the cost of building &lt;em&gt;on top of&lt;/em&gt; that infrastructure. Entire industries (SaaS, cloud computing, the modern startup ecosystem) simply wouldn't have been viable if everyone had to pay cathedral-model prices for their stack. The &lt;a href="https://tinycomputers.io/posts/what-visicalc-teaches-us-about-ai.html"&gt;VisiCalc pattern&lt;/a&gt; repeated itself: a tool that was supposed to eliminate work created new categories of work that dwarfed the original.&lt;/p&gt;
&lt;p&gt;And Raymond's own principle (treat users as co-developers) is itself a demand-expanding dynamic. Converting consumers of software into producers means the resource (developer attention) gets deployed more broadly, not more efficiently. More people write code because more people &lt;em&gt;can&lt;/em&gt; write code, because the tools are free, the examples are public, and the barrier to participation is a GitHub account.&lt;/p&gt;
&lt;h3&gt;What Raymond Got Wrong&lt;/h3&gt;
&lt;p&gt;The essay's blind spots have become painfully clear.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Maintainer burnout.&lt;/strong&gt; Raymond assumed that contributor motivation was self-sustaining, that people would keep showing up because the work was interesting. He didn't account for the dynamics that emerge when a hobby project becomes critical infrastructure. The OpenSSL library, maintained for years by a handful of volunteers, secured the majority of encrypted web traffic until the Heartbleed vulnerability in 2014 revealed how thin the maintenance layer really was. The left-pad incident, the core-js crisis, the Log4j vulnerability: each demonstrated that the bazaar's supply of labor is not inexhaustible. It concentrates on the exciting work and neglects the essential work.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Free-riding at scale.&lt;/strong&gt; The essay assumed a rough symmetry between use and contribution. The reality is asymmetric: billions of dollars in commercial value extracted from projects maintained by unpaid or underpaid developers. Amazon took Elasticsearch and offered it as a managed service. When Elastic changed their license to prevent this, the open-source community split. MongoDB, Redis, and HashiCorp followed similar paths, companies that built open-source projects, watched cloud providers commoditize them, and responded by restricting their licenses.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Security supply chains.&lt;/strong&gt; A bazaar has no gatekeepers, which Raymond saw as a feature. It's also a vulnerability. The SolarWinds attack, dependency confusion attacks, typosquatting on npm: these exploit the trust model that makes the bazaar work. When anyone can contribute, anyone includes adversaries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Governance.&lt;/strong&gt; Raymond wrote about the bazaar as if the only governance question was technical: who decides which patches get merged? The real governance questions turned out to be social and economic: who funds maintenance? Who decides licensing changes? Who gets to use the work commercially? These questions have no bazaar-native answers. They require institutions (foundations, companies, legal frameworks) which is to say, they require cathedrals.&lt;/p&gt;
&lt;h3&gt;The Licensing Wars&lt;/h3&gt;
&lt;p&gt;The clearest evidence that Raymond's framework was incomplete is the licensing landscape of 2026.&lt;/p&gt;
&lt;p&gt;The GPL, which Richard Stallman designed to ensure that modified software remained free, worked well in a world where software was distributed as binaries. The cloud broke that model. If you run GPL software as a service, you never "distribute" it; users interact with the output, not the code. The software is free in theory and proprietary in practice.&lt;/p&gt;
&lt;p&gt;The response was a proliferation of new licenses. The AGPL closed the cloud loophole by requiring source availability for network services. The Business Source License (BSL) made code available to read but restricted commercial use until a time-delayed release to open source. The Server Side Public License (SSPL) required that anyone offering the software as a service must open-source their entire stack.&lt;/p&gt;
&lt;p&gt;Each of these represents a partial retreat from the bazaar model. Not back to the cathedral (the code is still visible, forkable, auditable) but to something Raymond didn't envision: a commons with fences. The ideological purity of "free as in freedom" collided with the economic reality that freedom without reciprocity becomes exploitation.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://tinycomputers.io/posts/how-bsds-licensing-issues-paved-the-way-for-linuxs-rise-to-prominence.html"&gt;BSD licensing story&lt;/a&gt; foreshadowed this. The permissive BSD license allowed commercial forks without contribution back. This wasn't a problem when the commercial ecosystem was small. When the commercial ecosystem became the entire cloud computing industry, the lack of reciprocity became untenable for projects that couldn't attract cathedral-scale corporate sponsorship.&lt;/p&gt;
&lt;h3&gt;What Raymond Got Right&lt;/h3&gt;
&lt;p&gt;Despite these blind spots, the essay's core insight has proven durable: for certain classes of problems, decentralized coordination outperforms centralized planning.&lt;/p&gt;
&lt;p&gt;This isn't because decentralized systems are morally superior. It's because they solve the information problem differently. A cathedral architect must understand the entire system well enough to direct work from above. A bazaar participant only needs to understand their local patch well enough to improve it. As systems grow in complexity, the information burden on the cathedral architect grows faster than the burden on any individual bazaar participant.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/cathedral-bazaar/linus-torvalds.jpg" alt="Linus Torvalds at LinuxCon Europe 2014" style="float: right; margin: 0 0 15px 20px; max-width: 260px; border-radius: 6px;" title="Linus Torvalds at LinuxCon Europe, 2014. Photo by Krd, CC BY-SA 4.0, via Wikimedia Commons."&gt;&lt;/p&gt;
&lt;p&gt;The Linux kernel is the proof. No single person understands the entire Linux kernel. It's too large, too complex, spanning too many hardware architectures and subsystems. But the kernel works, and works remarkably well, because the development model doesn't require any single person to understand it all. Subsystem maintainers understand their domains. Linus Torvalds understands the integration points. Contributors understand the specific problems they're solving. The architecture of the development process mirrors the architecture of the software.&lt;/p&gt;
&lt;p&gt;This insight extends beyond software. Wikipedia works on bazaar principles. Citizen science projects like Galaxy Zoo and Foldit leverage distributed human attention. Even hardware design is slowly moving in this direction, though the marginal cost of atoms versus bits &lt;a href="https://tinycomputers.io/posts/why-some-chips-last-40-years.html"&gt;creates structural barriers&lt;/a&gt; that software doesn't face. The concept of &lt;a href="https://tinycomputers.io/posts/why-some-chips-last-40-years.html"&gt;second-sourcing&lt;/a&gt; (multiple manufacturers producing compatible chips) is, in a sense, the hardware world's version of the bazaar. The Z80 survived for nearly fifty years partly because Zilog couldn't monopolize it.&lt;/p&gt;
&lt;p&gt;Raymond also got the motivational model roughly right, even if the details were off. People do contribute to open-source projects for intrinsic reasons: intellectual satisfaction, reputation, the desire to solve problems that matter to them personally. The mistake was assuming these motivations were sufficient at industrial scale, without institutional support.&lt;/p&gt;
&lt;h3&gt;The Bazaar in 2026&lt;/h3&gt;
&lt;p&gt;The open-source landscape of 2026 bears little resemblance to what Raymond described in 1997, but the dynamics he identified are still operating.&lt;/p&gt;
&lt;p&gt;The bazaar model made software infrastructure so cheap that it created more demand for software than any cathedral could have supplied. It enabled the cloud, the startup ecosystem, the AI revolution, all built on free foundations. The efficiency didn't reduce consumption. It unlocked latent demand that dwarfed the original market.&lt;/p&gt;
&lt;p&gt;At the same time, the cathedral never disappeared. It adapted. The most sophisticated cathedrals now build bazaars strategically, open-sourcing frameworks and tools that make their own proprietary services more valuable. Meta's contribution to PyTorch isn't charity. Google's contribution to Kubernetes isn't ideology. They're infrastructure investments that make the entire ecosystem dependent on capabilities only cathedral-scale organizations can provide.&lt;/p&gt;
&lt;p&gt;The result is a layered system more nuanced than Raymond's binary. At the bottom: genuine bazaar-model projects maintained by communities (the Linux kernel, PostgreSQL, countless libraries). In the middle: corporate-sponsored projects that look like bazaars but serve cathedral strategies (Kubernetes, Chromium, Llama). At the top: proprietary services built on open foundations (AWS, Google Cloud, OpenAI's API).&lt;/p&gt;
&lt;p&gt;Each layer depends on the ones below it. Each layer captures value differently. And the whole structure is held together by a web of licenses, foundations, corporate agreements, and social norms that Raymond's 1997 essay couldn't have anticipated.&lt;/p&gt;
&lt;p&gt;What's strangest about this arrangement is its circularity. Corporations adopted open source because it was free and good. Volunteer maintainers couldn't scale to meet corporate demand; Heartbleed and Log4j proved that. So corporations began funding open-source projects to keep their own infrastructure stable. But funding brought governance influence. The top Linux kernel contributors aren't hobbyists scratching personal itches. They're engineers employed by Google, Microsoft, Red Hat, Intel, and Huawei, steering the roadmap toward their employers' needs. Kubernetes evolves in ways that benefit Google Cloud. PyTorch evolves in ways that benefit Meta's AI stack.&lt;/p&gt;
&lt;p&gt;The projects became dependent on corporate funding. But the corporations became equally dependent on the projects. If Google pulled out of Kubernetes, the project would struggle. If Kubernetes collapsed, Google Cloud would struggle. So Google funds it more, which deepens the entanglement, which makes withdrawal more costly, which demands more funding. The snake eats its own tail.&lt;/p&gt;
&lt;p&gt;Google and Amazon compete ferociously in cloud computing, but they cooperate on the same open-source infrastructure that both their businesses require. They're rivals building on shared foundations that neither can afford to let fail and neither fully controls. The commons isn't independent anymore, but neither are the corporations.&lt;/p&gt;
&lt;p&gt;Raymond imagined the bazaar as freedom from institutional dependency. What emerged is mutual capture. The cathedral could fire its architects. The bazaar's corporate sponsors can't walk away from the bazaar, and the bazaar can't survive without them. Independence became entanglement, and the entanglement, paradoxically, is what makes the system work.&lt;/p&gt;
&lt;h3&gt;The Essay Worth Rereading&lt;/h3&gt;
&lt;p&gt;Raymond saw something real about how coordination works in networks. He was right that the bazaar model could produce software of extraordinary quality and scale. He was right that decentralized development could solve problems that centralized approaches couldn't. He was right that open-source would reshape the industry.&lt;/p&gt;
&lt;p&gt;He was wrong about the institutional vacuum. The bazaar didn't eliminate the need for cathedrals; it changed what cathedrals do. They no longer build the infrastructure. They build on top of it, around it, and through it. The most powerful technology companies in the world are cathedral organizations that have learned to cultivate bazaars for strategic advantage.&lt;/p&gt;
&lt;p&gt;"The Cathedral and the Bazaar" is worth rereading in 2026 not because it predicted the future correctly (no essay could, across three decades) but because it identified dynamics that, once set in motion, produced outcomes no one predicted. The bazaar made software free, and free software made more software. The cathedrals adapted, and their adaptation made the bazaar more important, not less. Raymond's binary became a symbiosis that neither model, alone, could have produced.&lt;/p&gt;
&lt;p&gt;The essay ends with Raymond quoting Robert Browning: "A man's reach should exceed his grasp, or what's a heaven for?" The reach exceeded. The grasp caught something different than expected. That's not a failure of vision. That's how ideas work when they meet reality.&lt;/p&gt;</description><category>corporate strategy</category><category>economics</category><category>eric raymond</category><category>free software</category><category>gpl</category><category>history</category><category>licensing</category><category>linux</category><category>open-source</category><category>software</category><guid>https://tinycomputers.io/posts/the-cathedral-and-the-bazaar-nearly-30-years-later.html</guid><pubDate>Mon, 09 Mar 2026 16:00:00 GMT</pubDate></item><item><title>Why Some Chips Last 40+ Years: Z80, 68k, 6502, and the Secret to Processor Longevity</title><link>https://tinycomputers.io/posts/why-some-chips-last-40-years.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;figure&gt;&lt;img src="https://tinycomputers.io/images/zilog-z80.jpg"&gt;&lt;/figure&gt; &lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/why-some-chips-last-40-years_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;22 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;There's a Zilog Z80 in a graphing calculator sitting in a high school classroom right now. The student using it was born around 2008. The Z80 was designed in 1976. That processor is older than the student's parents.&lt;/p&gt;
&lt;p&gt;This isn't a quirky footnote. It's a pattern. The Z80, the Motorola 68000, the MOS Technology 6502, the Intel 8051: these processors have been in continuous production and active deployment for forty years or more. The Z80 is closing in on fifty. Meanwhile, processors that were objectively superior by nearly every technical measure (the Zilog Z8000, the National Semiconductor 32016, the Motorola 88000, the Intel i960) are footnotes in Wikipedia articles that nobody reads.&lt;/p&gt;
&lt;p&gt;What determines whether a processor lives for decades or dies in five years? I've spent the last two years building &lt;a href="https://tinycomputers.io/posts/clean-room-z80-emulator.html"&gt;Z80 emulators&lt;/a&gt;, writing &lt;a href="https://tinycomputers.io/posts/building-language-compilers-for-the-z80.html"&gt;compilers for the Z80&lt;/a&gt;, running &lt;a href="https://tinycomputers.io/posts/cpm-on-physical-retroshield-z80.html"&gt;CP/M on physical RetroShield hardware&lt;/a&gt;, and exploring the &lt;a href="https://tinycomputers.io/posts/motorola-68000-processor-and-the-ti-89-graphing-calculator.html"&gt;Motorola 68000 through TI calculators&lt;/a&gt;. I've read William Barden's &lt;a href="https://tinycomputers.io/posts/the-z80-microcomputer-handbook-william-barden.html"&gt;1978 handbook&lt;/a&gt; that was still being reprinted in 1985, and Steve Ciarcia's &lt;a href="https://tinycomputers.io/posts/build-your-own-z80-computer-steve-ciarcia.html"&gt;build-your-own guide&lt;/a&gt; that assumed you'd wire up a computer from discrete chips. The deeper I've gone into this world, the more convinced I've become that processor longevity isn't really about the processor. It's about everything around it.&lt;/p&gt;
&lt;h3&gt;The Survivors&lt;/h3&gt;
&lt;p&gt;Four processors stand out for their extraordinary longevity. Each was introduced in the mid-to-late 1970s. Each is still manufactured or cloned today.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Zilog Z80&lt;/strong&gt; (1976) was designed by Federico Faggin and Masatoshi Shima, both of whom had worked on the Intel 4004 and 8080. The Z80 was explicitly designed as a better 8080, backward-compatible with the 8080's instruction set but adding indexed addressing, a second register bank, a built-in DRAM refresh counter, and a single 5V power supply (the 8080 needed three voltage rails). It became the heart of CP/M machines, arcade cabinets, and eventually TI graphing calculators. Zilog's CMOS variant, the Z84C00, was manufactured continuously until &lt;a href="https://baud.rs/IboIHD"&gt;April 2024&lt;/a&gt;, when Littelfuse (Zilog's current owner) finally announced end-of-life after 48 years. The eZ80, a backward-compatible enhanced variant, continues in production, and third-party clones remain available. The Z80 instruction set isn't going anywhere even if the original silicon is.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The MOS Technology 6502&lt;/strong&gt; (1975) was designed by Chuck Peddle and Bill Mensch after they left Motorola. At \$25 when competing processors cost \$150-\$300, the 6502 was a revolution in affordability. It powered the Apple II, the Commodore 64, the Atari 2600, and the NES. Bill Mensch's Western Design Center still manufactures the W65C02S and W65C816S today, fifty years after the original design.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/chip-longevity/hitachi-hd68000.jpg" alt="Hitachi HD68000, a second-sourced clone of the Motorola MC68000" style="width: 340px; box-shadow: 0 30px 40px rgba(0,0,0,.1); float: right; margin: 0 0 20px 20px;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Motorola 68000&lt;/strong&gt; (1979) was the 32-bit processor that arrived a generation early. With a linear 24-bit address space and an orthogonal instruction set that programmers genuinely enjoyed using, it became the foundation for the original Macintosh, the Amiga, the Atari ST, the Sega Genesis, and Sun's first workstations. Its descendants (the 68020, 68030, 68040, ColdFire, and now NXP's modern variants) kept the architecture alive in embedded systems, automotive controllers, and &lt;a href="https://tinycomputers.io/posts/motorola-68000-processor-and-the-ti-89-graphing-calculator.html"&gt;Texas Instruments calculators&lt;/a&gt; well into the 2020s.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/chip-longevity/intel-p8051.jpg" alt="Intel P8051 microcontroller in DIP-40 package" style="width: 340px; box-shadow: 0 30px 40px rgba(0,0,0,.1); float: left; margin: 0 20px 20px 0;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Intel 8051&lt;/strong&gt; (1980) is perhaps the most quietly ubiquitous processor ever made. Designed as a microcontroller (a processor with RAM, ROM, timers, and I/O ports integrated on a single chip), the 8051 found its way into everything from washing machines to automotive engine controllers to industrial PLCs. Over two dozen companies have manufactured 8051 variants. If you've used an appliance, driven a car, or walked through a building with an elevator in the last forty years, you've interacted with an 8051 derivative.&lt;/p&gt;
&lt;p&gt;The 8051 is also a case study in &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox&lt;/a&gt; applied to silicon. As more manufacturers licensed and produced the 8051, unit costs fell. As unit costs fell, engineers designed it into applications that would never have justified a microcontroller at the original price: a toaster, a thermostat, a toy. Each new application expanded the market, which attracted more manufacturers, which drove costs lower still. The cycle fed itself for decades. Technically superior alternatives existed at every point along this curve, but they couldn't compete with an architecture whose ecosystem was compounding while their price-per-unit was still on the wrong side of the volume curve.&lt;/p&gt;
&lt;h3&gt;The Fallen&lt;/h3&gt;
&lt;p&gt;For every processor that lasted decades, dozens vanished. Some of these were technically impressive, arguably more capable than the survivors.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;Zilog Z8000&lt;/strong&gt; (1979), designed as the Z80's successor, offered a 16-bit architecture with segmented memory addressing. It was more powerful than the Z80 in every measurable way. It lasted roughly five years in the market before fading into obscurity. The segmented memory model (the same curse that plagued Intel's 8086/286) made programming painful. And critically, it wasn't backward-compatible with the Z80. Every Z80 program, every CP/M application, every line of existing code was useless on the Z8000. Zilog was asking customers to abandon their entire software investment.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;Motorola 88000&lt;/strong&gt; (1988) was Motorola's clean-sheet RISC design, intended to eventually replace the 68k family. It was technically excellent: pipelined, superscalar-capable, and well-designed. Motorola couldn't sell it. Customers had millions of lines of 68k code, working products, trained engineers, and proven toolchains. The 88000 offered better performance but required abandoning everything. Motorola eventually surrendered and joined IBM and Apple to create the PowerPC, which at least had the marketing muscle of three companies behind it.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;National Semiconductor 32016&lt;/strong&gt; (1982) was a full 32-bit processor at a time when the PC world was still on 16-bit. It was used in the Acorn Cambridge Workstation and a few other systems. It had bugs. The early silicon had errata that made reliable system design difficult. By the time National got the bugs out, the market had moved on.&lt;/p&gt;
&lt;p&gt;The pattern is consistent: technical superiority alone doesn't determine survival.&lt;/p&gt;
&lt;h3&gt;Five Factors That Determine Processor Longevity&lt;/h3&gt;
&lt;p&gt;After spending years in this world, I've identified five factors that separate the survivors from the fallen. They're listed roughly in order of importance, which is not the order most engineers would expect.&lt;/p&gt;
&lt;h4&gt;1. Second-Sourcing and Licensing&lt;/h4&gt;
&lt;p&gt;This is the single most important factor, and it's the one that engineers consistently underrate because it's a business decision, not a technical one.&lt;/p&gt;
&lt;p&gt;The Z80 was second-sourced by Mostek, SGS-Thomson, Sharp, NEC, Toshiba, Samsung, and others. When &lt;a href="https://www.littelfuse.com/"&gt;Littelfuse&lt;/a&gt;, the current owner of Zilog, finally discontinued the standalone Z84C00 in 2024, the instruction set didn't die, because it was never dependent on a single manufacturer. This is exactly what second-sourcing was designed to protect against. It mattered enormously to design engineers in the 1980s and 1990s, because committing a product design to a single-source processor was career-threatening. If your sole supplier had a fab fire, or went out of business, or simply decided to discontinue the chip, your product was dead.&lt;/p&gt;
&lt;p&gt;The 6502 was licensed to multiple manufacturers: Rockwell, Synertek, GTE, and later CMD and the Western Design Center. The 8051 took this to its logical extreme: Intel actively encouraged licensing, and the architecture was eventually manufactured by Atmel, Philips/NXP, Silicon Labs, Dallas/Maxim, Infineon, and dozens more. The 8051 became less a product and more a standard, an instruction set architecture that any competent semiconductor company could implement. It was, in hindsight, a preview of the model that ARM and RISC-V would later formalize: sell the design, not the chip, and let the ecosystem do the rest.&lt;/p&gt;
&lt;p&gt;The 68000 family was produced by Motorola, Hitachi, Signetics, Mostek, and Toshiba. Later, the ColdFire and subsequent architectures maintained enough compatibility to keep the ecosystem alive under Freescale and then NXP.&lt;/p&gt;
&lt;p&gt;The x86 architecture tells the same story at a larger scale. IBM refused to use Intel's 8088 in the original PC without a second source. That requirement forced Intel to license the design to AMD, a decision Intel spent the next four decades regretting and litigating. But the resulting duopoly is a major reason x86 survived the RISC revolution of the 1990s. When Sun, SGI, and DEC were pushing SPARC, MIPS, and Alpha, customers considering a switch to RISC had to weigh superior performance against the uncomfortable fact that each RISC architecture had exactly one supplier. x86 had two. That mattered more than clock speeds.&lt;/p&gt;
&lt;p&gt;Contrast all of this with the Z8000, which was essentially Zilog-only. Or the 88000, which was Motorola-only. Single-source processors carry existential risk for every product that uses them. Purchasing managers know this even when engineers don't.&lt;/p&gt;
&lt;h4&gt;2. Ecosystem and Toolchain Maturity&lt;/h4&gt;
&lt;p&gt;A processor without a mature toolchain is a science project. A processor with assemblers, compilers, debuggers, reference designs, application notes, textbooks, and a community of experienced engineers is an ecosystem.&lt;/p&gt;
&lt;p&gt;The Z80 ecosystem by the mid-1980s was staggering. There were books (&lt;a href="https://baud.rs/EZ3Bwg"&gt;Rodnay Zaks' &lt;em&gt;Programming the Z80&lt;/em&gt;&lt;/a&gt;, Barden's &lt;a href="https://baud.rs/5brWaW"&gt;handbook&lt;/a&gt;, Ciarcia's &lt;a href="https://baud.rs/kiLcPY"&gt;build guide&lt;/a&gt;, Coffron's &lt;a href="https://baud.rs/3hw1CF"&gt;applications manual&lt;/a&gt;) available at any technical bookstore. There were assemblers, C compilers, BASIC interpreters, and Forth systems. There were thousands of CP/M applications. There were magazines publishing Z80 projects monthly. There were university courses teaching Z80 assembly. Every year, this ecosystem grew, and every year, the cost of switching to a different processor increased.&lt;/p&gt;
&lt;p&gt;The 6502 had a similar ecosystem, driven heavily by the Apple II and Commodore 64 communities. The 8051 accumulated the largest ecosystem of any microcontroller family, with Keil (now ARM), IAR, SDCC, and many other toolchains providing development environments across every host platform.&lt;/p&gt;
&lt;p&gt;When I wrote about &lt;a href="https://tinycomputers.io/posts/how-we-learned-hardware-in-1983.html"&gt;how we learned hardware in 1983&lt;/a&gt;, I was documenting a snapshot of this ecosystem at its peak. Those books, those reference designs, those shared conventions: they weren't just educational resources. They were infrastructure. And infrastructure, once built, resists replacement.&lt;/p&gt;
&lt;h4&gt;3. ISA Simplicity and Predictability&lt;/h4&gt;
&lt;p&gt;There's a counterintuitive truth about instruction set architecture: the "best" ISA often isn't the one that survives. The one that survives is the one that's simple enough to implement cheaply, predictable enough to verify thoroughly, and small enough to teach in a semester.&lt;/p&gt;
&lt;p&gt;The Z80's instruction set is large by 8-bit standards, with 158 base instructions and variants pushing toward 700 when you count all the addressing modes. But the fundamental execution model is simple: fetch an instruction, decode it, execute it. No pipeline. No branch prediction. No speculative execution. No out-of-order dispatch. The behavior is deterministic. If you clock the Z80 at 4 MHz, you can calculate exactly how many T-states each instruction takes and predict your program's execution time down to the microsecond.&lt;/p&gt;
&lt;p&gt;This determinism is extraordinarily valuable in embedded systems. When you're designing an engine controller or a medical device, you need to know (not estimate, &lt;em&gt;know&lt;/em&gt;) that your interrupt handler will complete within a specific time window. Pipelined processors with branch prediction make this analysis much harder. Simple processors make it trivial.&lt;/p&gt;
&lt;p&gt;The 6502 takes this even further. With only 56 instructions and 13 addressing modes, the entire ISA fits on a single reference card. You can hold the complete instruction set in your head. This isn't a limitation; it's a feature. Engineers who can reason about every instruction their processor executes build more reliable systems than engineers who rely on abstractions they don't fully understand.&lt;/p&gt;
&lt;p&gt;The 8051 instruction set is similarly compact: 111 instructions, most executing in one or two machine cycles. The architecture includes bit-addressable memory, a feature that seems quirky until you're writing firmware for a device with dozens of individual control signals, at which point it becomes indispensable.&lt;/p&gt;
&lt;h4&gt;4. Power, Size, and Cost&lt;/h4&gt;
&lt;p&gt;The survivors share a common economic profile: they're cheap to manufacture, cheap to buy, and cheap to power.&lt;/p&gt;
&lt;p&gt;A Z84C00 in CMOS draws microwatts in standby. A W65C02S runs on a coin cell battery for years. An 8051 derivative can be manufactured on mature process nodes that have been paid for decades ago, with die sizes so small that the packaging costs more than the silicon. When your processor costs \$0.50 in volume and runs on the leakage current of a lithium cell, the engineering case for replacing it with something faster but more expensive becomes very hard to make.&lt;/p&gt;
&lt;p&gt;This is where processor longevity intersects with the economics I've written about in the &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox series&lt;/a&gt;. The relevant cost isn't just the chip; it's the total cost of the design: the processor, the toolchain, the engineering time, the qualification testing, the regulatory certification, and the opportunity cost of a redesign. A \$0.50 Z80 clone in a proven design with ten years of field data is almost impossible to displace, even if a \$0.30 ARM Cortex-M0 is technically superior, because the redesign and requalification costs dwarf the per-unit savings.&lt;/p&gt;
&lt;h4&gt;5. Inertia and Institutional Knowledge&lt;/h4&gt;
&lt;p&gt;The final factor is the hardest to quantify and the most powerful: institutional inertia.&lt;/p&gt;
&lt;p&gt;Somewhere in Germany, there's a factory running a production line controlled by Z80-based PLCs installed in 1988. The line produces automotive components. It runs 24/7. It works. The engineer who designed the control system retired fifteen years ago. The firmware was written in Z80 assembly and documented in a binder that lives in a filing cabinet near the line.&lt;/p&gt;
&lt;p&gt;Replacing this system would require: reverse-engineering the existing firmware (the original source code may or may not still exist), designing a new control system, writing new firmware, testing it against every production scenario the old system handles, qualifying the new system for automotive safety standards, scheduling downtime for installation, and training operators on the new system. The cost runs into hundreds of thousands of dollars. The risk is non-trivial; any bug could halt production.&lt;/p&gt;
&lt;p&gt;So they order more Z80s. And the Z80 stays in production for another year.&lt;/p&gt;
&lt;p&gt;Multiply this scenario by thousands of factories, millions of installed devices, and billions of lines of proven firmware, and you begin to understand why some processors simply cannot die. The cost of replacing them exceeds the cost of maintaining them, indefinitely.&lt;/p&gt;
&lt;p&gt;This is also why the &lt;a href="https://tinycomputers.io/posts/exploring-ti-84%2B.html"&gt;TI-84+ still uses a Z80&lt;/a&gt;. Texas Instruments has decades of TI-BASIC software, decades of teacher training materials, decades of standardized test approvals, and a user base that expects backward compatibility with programs written in 2004. The Z80 isn't the best processor for a modern calculator. But replacing it would require replacing &lt;em&gt;everything else&lt;/em&gt;, and "everything else" is where the real value lives.&lt;/p&gt;
&lt;h3&gt;The Newcomen Pattern&lt;/h3&gt;
&lt;p&gt;There's a historical analogy I keep returning to. Thomas Newcomen built his atmospheric steam engine in 1712. It was inefficient, converting roughly 1% of the heat energy in coal into useful work. James Watt's improved design, introduced in the 1760s, was dramatically better: separate condenser, double-acting cylinder, and eventually five times the thermal efficiency. By any rational engineering measure, the Newcomen engine should have vanished overnight.&lt;/p&gt;
&lt;p&gt;It didn't. Newcomen engines continued to be built and operated for decades after Watt's design was available. In some mining operations, they remained in service into the 19th century. The reasons were the same ones that keep Z80s in factories today: the existing engines worked, the operators knew how to maintain them, the replacement cost was high, and the performance of the old engine was &lt;em&gt;adequate&lt;/em&gt; for the task.&lt;/p&gt;
&lt;p&gt;"Adequate for the task" is the phrase that explains processor longevity better than any technical specification. The Z80 is adequate for a graphing calculator. The 6502 is adequate for a simple embedded controller. The 8051 is adequate for a washing machine. And "adequate" plus "proven" plus "cheap" plus "available from multiple sources" is a combination that "superior but new and unfamiliar" almost never beats.&lt;/p&gt;
&lt;h3&gt;The Numbers Tell the Story&lt;/h3&gt;
&lt;p&gt;It's worth pausing to appreciate the sheer scale of the survivors' deployment.&lt;/p&gt;
&lt;p&gt;The 8051 family has been manufactured in quantities estimated at over 10 billion units. That's not a typo. Ten billion. More 8051 derivatives have been produced than any other processor architecture in history, including x86. They're in your car; a modern automobile contains dozens of microcontrollers, many of them 8051 variants, handling everything from window controls to tire pressure monitoring. They're in your thermostat, your microwave, your garage door opener.&lt;/p&gt;
&lt;p&gt;The Z80 and its clones have shipped in quantities that are harder to pin down precisely, but conservative estimates exceed a billion units across all manufacturers and derivatives. The 6502 family, counting all variants from the original through the 65C816 that powered the Apple IIGS and the Super Nintendo, is in a similar range.&lt;/p&gt;
&lt;p&gt;The 68000 family took a different path: fewer total units but higher-value applications. Where the 8051 went wide and cheap, the 68k went deep and capable. It dominated the workstation market before RISC architectures displaced it, then settled into a long career in automotive and industrial control. NXP's ColdFire and subsequent QorIQ Layerscape processors carry DNA that traces back to the original 68000. The architecture didn't die; it evolved.&lt;/p&gt;
&lt;p&gt;What's remarkable about these numbers is that they &lt;em&gt;continue to grow&lt;/em&gt;. These aren't static installed bases slowly decaying as old equipment is retired. New products are still being designed with 8051 cores. New Z80-compatible processors are still being fabricated; even after Littelfuse discontinued the original Z84C00 in 2024, third-party clones and the eZ80 keep the instruction set alive. When I built a &lt;a href="https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html"&gt;dual Z80 RetroShield&lt;/a&gt;, I ordered Z84C0020PEC chips that were still in stock from the final production runs. A 1976 design, manufactured nearly half a century later. And the fact that Zilog's discontinuation made international headlines tells you everything about how deeply embedded these chips remain. You don't mourn a processor nobody uses.&lt;/p&gt;
&lt;h3&gt;What This Means for Modern Processors&lt;/h3&gt;
&lt;p&gt;The ARM Cortex-M0, introduced in 2009, is arguably the first modern processor that has a plausible shot at matching the longevity of the 8-bit survivors. It's licensable (like the 8051), simple (like the 6502), power-efficient (like the Z84C00), and backed by an ecosystem that's growing rapidly. ARM's licensing model (selling the design, not the chip) mirrors the model that made the 8051 ubiquitous.&lt;/p&gt;
&lt;p&gt;RISC-V, as an open ISA, goes even further. No licensing fees, no single company that can discontinue the architecture, no vendor lock-in. I've &lt;a href="https://tinycomputers.io/posts/milk-v-mars-review.html"&gt;reviewed RISC-V boards&lt;/a&gt; and watched the ecosystem grow. If any modern ISA is positioned to last fifty years, it's RISC-V, not because it's the best architecture, but because it's the hardest to kill.&lt;/p&gt;
&lt;p&gt;But here's the uncomfortable truth for anyone designing a new processor architecture: the window for establishing a forty-year processor is probably closed. The Z80, 6502, 68000, and 8051 all emerged during a period when the microprocessor market was being established. There were no entrenched incumbents. Every design win was greenfield. Every new application (calculators, arcade cabinets, industrial controllers, medical devices) was being designed for the first time with microprocessors.&lt;/p&gt;
&lt;p&gt;That era is over. Every new design now competes against an installed base. Every new ISA competes against ARM's ecosystem. The switching costs that keep forty-year-old processors alive are the same switching costs that prevent new architectures from gaining traction. The moat works in both directions.&lt;/p&gt;
&lt;h3&gt;The Lesson&lt;/h3&gt;
&lt;p&gt;The processors that last aren't the ones that push the performance envelope. They're the ones that solve a problem well enough, cheaply enough, reliably enough, and from enough sources that replacing them is never worth the trouble. Technical excellence is necessary but not sufficient. What matters more is the web of dependencies (the toolchains, the trained engineers, the certified designs, the proven firmware, the institutional knowledge) that accumulates around a processor over decades.&lt;/p&gt;
&lt;p&gt;The Z80 will outlive many of the engineers reading this, not because it's a great processor, but because it's woven into the fabric of systems that nobody has a compelling reason to redesign. The 8051 will outlive the Z80, because it's woven into even more systems. And somewhere in a high school classroom, a student is pressing buttons on a &lt;a href="https://tinycomputers.io/posts/exploring-ti-84%2B.html"&gt;TI-84+&lt;/a&gt; that runs on a fifty-year-old instruction set, completely unaware that the chip executing their quadratic formula has been doing this job since before their grandparents started dating.&lt;/p&gt;
&lt;p&gt;That's longevity. Not the kind you engineer. The kind that happens when everything around the chip conspires to keep it in place.&lt;/p&gt;
&lt;div style="margin-top: 3em; padding-top: 1em; border-top: 1px solid #ccc; font-size: 0.85em; color: #666;"&gt;
&lt;strong&gt;Image credits:&lt;/strong&gt; Hitachi HD68000 and Intel P8051 photographs by Konstantin Lanzet, via &lt;a href="https://commons.wikimedia.org/wiki/File:KL_Hitachi_HD68000.jpg"&gt;Wikimedia Commons&lt;/a&gt;. Licensed under GFDL and CC BY-SA 3.0 respectively.
&lt;/div&gt;</description><category>6502</category><category>8051</category><category>68000</category><category>embedded systems</category><category>isa</category><category>microprocessors</category><category>mos technology</category><category>motorola</category><category>processor architecture</category><category>retrocomputing</category><category>second-sourcing</category><category>z80</category><category>zilog</category><guid>https://tinycomputers.io/posts/why-some-chips-last-40-years.html</guid><pubDate>Sun, 08 Mar 2026 16:00:00 GMT</pubDate></item><item><title>Designing a Dual Z80 RetroShield: Two CPUs, One Bus, Zero GUI (Part 1)</title><link>https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/designing-a-dual-z80-retroshield-part-1_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;19 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/dual-z80/zilog-scc-dip40.jpeg" alt="A Zilog Z0853006PSC SCC chip in a DIP-40 package, marked with the Zilog logo and a 1981 copyright date" style="float: right; max-width: 300px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;p&gt;The RetroShield Z80 by Erturk Kocalar at &lt;a href="https://baud.rs/87wbBL"&gt;8bitforce.com&lt;/a&gt; is one of my favorite pieces of hardware. A real Zilog Z80 CPU on a shield that plugs into an Arduino Mega. The Arduino emulates memory and I/O while the Z80 executes real instructions on real silicon. I've used it to &lt;a href="https://tinycomputers.io/posts/cpm-on-physical-retroshield-z80.html"&gt;boot CP/M&lt;/a&gt;, &lt;a href="https://tinycomputers.io/posts/zork-on-retroshield-z80-arduino-giga.html"&gt;play Zork over WiFi&lt;/a&gt;, &lt;a href="https://tinycomputers.io/posts/cpm-on-arduino-giga-r1-wifi.html"&gt;port it to the Arduino Giga R1&lt;/a&gt;, and even &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;commission a custom level-converter shield&lt;/a&gt; to bridge the voltage gap.&lt;/p&gt;
&lt;p&gt;But a single Z80 is, well, a single Z80. Real multi-processor Z80 systems existed in the 1980s. Machines like the &lt;a href="https://baud.rs/tTpLxt"&gt;Cromemco System Three&lt;/a&gt; and some S-100 configurations ran multiple Z80s on a shared bus, with bus arbitration mediating access. The question that kept nagging at me: could I fit a second Z80 onto the RetroShield?&lt;/p&gt;
&lt;p&gt;I should be honest about something: PCB design is one of my least knowledgeable areas of computing. I'm comfortable with firmware, with compilers, with operating systems, but the physical layer, the world of copper traces and drill files and design rule checks, is territory I've mostly avoided. I can read a schematic, but I've never designed a board from scratch. What I wanted to find out was whether modern AI tools could bridge that gap, whether I could use AI to help me understand, alter, and extend &lt;a href="https://baud.rs/87wbBL"&gt;Erturk Kocalar's&lt;/a&gt; existing RetroShield design into something new without becoming a PCB design expert first.&lt;/p&gt;
&lt;p&gt;This is part one of a two-part series. This piece covers the design: architecture decisions, schematic work, PCB layout, autorouting, and Gerber generation. Part two will cover the physical boards arriving from the fab, assembly, bring-up, and the firmware that makes two Z80s cooperate.&lt;/p&gt;
&lt;p&gt;One more thing worth mentioning up front: every step of this design was done without a GUI. That was intentional. I wanted to see how far I could get with just a terminal, command-line EDA tools, AI assistance, and Python scripts that modify PCB files directly. Partly because I think text-based workflows compose better with AI; it's much easier for an AI to generate a Python script that manipulates a text-based PCB file than to drive a graphical EDA tool. And partly because I wanted the entire process to be reproducible and scriptable, not trapped in a series of mouse clicks I'd never remember.&lt;/p&gt;
&lt;h3&gt;The Original Design&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://gitlab.com/8bitforce/retroshield-hw/-/tree/master/hardware/kz80?ref_type=heads"&gt;stock RetroShield Z80&lt;/a&gt; is a clean, simple board. A 55.88mm × 53.34mm two-layer PCB carrying:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;U1&lt;/strong&gt;: A &lt;a href="https://baud.rs/FUCwFg"&gt;Z80 CPU&lt;/a&gt; in a DIP-40 package&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;J1&lt;/strong&gt;: A 2×18 pin header (36 pins) that plugs into &lt;a href="https://baud.rs/CWPoOM"&gt;Arduino Mega 2560&lt;/a&gt; pins 22–53&lt;/li&gt;
&lt;li&gt;A handful of passives: decoupling caps (C1, C2), a clock cap (C3), a clock series resistor (R1), an LED current-limiting resistor (R3), and a bus activity LED&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The J1 header carries everything the Z80 needs: 16 address lines (A0–A15), 8 data lines (D0–D7), and control signals (CLK, RESET, INT, NMI, MREQ, IORQ, RD, WR). The Arduino drives the clock, provides the data when the Z80 reads, captures the data when the Z80 writes, and emulates whatever memory and I/O map you define in firmware. It's elegant in its simplicity; the Z80 thinks it's talking to a real computer, and in a sense, it is.&lt;/p&gt;
&lt;p&gt;The schematic and PCB files use the gEDA format, text-based files that are human-readable and, crucially, scriptable. The schematic (&lt;code&gt;.sch&lt;/code&gt;) defines the logical connections. The PCB (&lt;code&gt;.pcb&lt;/code&gt;) defines the physical layout: component footprints, copper traces, vias, and board outline. Both are just text. This matters a lot for what comes next.&lt;/p&gt;
&lt;h3&gt;Why Two Z80s?&lt;/h3&gt;
&lt;p&gt;The honest answer is that I wanted to see if it could be done. But there are genuinely interesting things you can do with two processors sharing a bus:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Asymmetric multiprocessing.&lt;/strong&gt; One Z80 runs CP/M as the primary CPU. The second handles I/O (serial communication, disk access, network operations), freeing the primary CPU from waiting on slow peripherals. This mirrors how some S-100 systems used coprocessor boards.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cooperative multitasking.&lt;/strong&gt; Both CPUs execute independent programs, taking turns on the shared bus. The Arduino arbitrates access using the Z80's built-in BUSRQ/BUSACK mechanism, a hardware handshake designed exactly for this purpose. One CPU gets the bus, executes for a while, then yields so the other can run.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Debugging and instrumentation.&lt;/strong&gt; The second CPU can monitor the first. Watch the address bus to trace execution. Compare outputs. Run the same code on both CPUs and verify they produce identical results, which is useful for testing Z80 clones or FPGA implementations against real silicon.&lt;/p&gt;
&lt;p&gt;The Z80 was designed for multiprocessor operation. As Rodnay Zaks details in &lt;a href="https://baud.rs/IvCPVA"&gt;&lt;em&gt;Programming the Z80&lt;/em&gt;&lt;/a&gt;, it has dedicated bus request (BUSRQ) and bus acknowledge (BUSAK) pins specifically for multi-master bus sharing. Steve Ciarcia's &lt;a href="https://baud.rs/eLG5hK"&gt;&lt;em&gt;Build Your Own Z80 Computer&lt;/em&gt;&lt;/a&gt; covers the hardware side of these signals in practical detail. Most hobbyist projects never use them. This one does.&lt;/p&gt;
&lt;h3&gt;Architecture: Shared Bus with Independent Control&lt;/h3&gt;
&lt;p&gt;The first design I considered (and quickly rejected) gave each Z80 its own independent header. Two 36-pin headers, two complete sets of address, data, and control lines. This would have worked electrically, but it was wrong for several reasons. It would have required either two Arduino Megas or consumed all the I/O on one Mega with nothing left for bus arbitration. The board would have been enormous. And it wouldn't have reflected how real multi-processor Z80 systems actually worked.&lt;/p&gt;
&lt;p&gt;The right approach is a shared bus. Both Z80s connect to the same address and data lines through J1. They take turns driving the bus, just like in a real S-100 system. What each CPU needs independently is its own set of control signals: its own clock, its own reset, its own interrupt lines, and its own bus request/acknowledge pair.&lt;/p&gt;
&lt;p&gt;I checked the Arduino Mega's pin budget. J1 uses pins 22–53 (32 I/O pins). The Mega still has pins 2–21 (20 pins) plus analog pins A0–A15 (16 more, usable as digital I/O), leaving 36 pins sitting idle. A second CPU's control signals only need about 10 pins. There was plenty of room.&lt;/p&gt;
&lt;p&gt;The solution: a small supplementary 2×6 header (J2, 12 pins) carrying CPU2's independent control signals to the Arduino's remaining pins:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Pin 1:  +5V         Pin 2:  GND
Pin 3:  CLK_2       Pin 4:  RESET_2
Pin 5:  INT_2       Pin 6:  NMI_2
Pin 7:  MREQ_2      Pin 8:  IORQ_2
Pin 9:  RD_2        Pin 10: WR_2
Pin 11: BUSRQ_2     Pin 12: BUSAK_2
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;BUSRQ and BUSAK are the key pins. The Arduino firmware pulls BUSRQ low on whichever CPU should yield the bus. That CPU finishes its current machine cycle, tristates its outputs, and asserts BUSAK to signal it's off the bus. The other CPU can then drive the bus freely. It's the same mechanism Zilog designed in 1976; I'm just finally using it.&lt;/p&gt;
&lt;h3&gt;Building the Schematic, Without a Schematic Editor&lt;/h3&gt;
&lt;p&gt;The original project used classic gEDA tools (gschem, pcb), which are no longer packaged for Ubuntu 24.04. The modern replacement is lepton-eda, a maintained fork that reads the same file formats. But since the whole point was to avoid a GUI, even lepton-schematic's graphical mode was off the table.&lt;/p&gt;
&lt;p&gt;This is where AI earned its keep. I don't have the gEDA file format memorized; I've never needed to. But AI can work through the format specification and generate correct output. I described what I wanted (a second Z80 sharing the existing bus, with independent control signals on a new header), and the AI helped me produce the schematic files, the symbol definitions, and eventually the PCB modifications. I still had to understand the architecture and make the design decisions, but the AI handled the translation from intent to file format.&lt;/p&gt;
&lt;p&gt;gEDA schematic files are text. A component placement looks like this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;C 44300 47700 1 0 0 z80-1.sym
{
T 44400 59000 5 10 1 1 0 0 1
refdes=U2
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That's a Z80 symbol placed at coordinates (44300, 47700), with reference designator U2. Net connections are similarly textual. &lt;code&gt;N&lt;/code&gt; entries define wire segments, &lt;code&gt;U&lt;/code&gt; entries define bus rippers. You can write an entire schematic in a text editor if you understand the coordinate system.&lt;/p&gt;
&lt;p&gt;I created a new schematic page, &lt;code&gt;kz80_cpu2.sch&lt;/code&gt;, for the second CPU. In gEDA's multi-page scheme, nets with the same name on different pages are automatically connected. So CPU2's address pins connect to nets named A0, A1, ..., A15 (the same net names used on page 1), and the netlister merges them into shared nets. The shared bus happens at the netlist level without any explicit cross-page wiring.&lt;/p&gt;
&lt;p&gt;The one component that didn't exist yet was the 2×6 control header. I wrote a new gEDA symbol file (&lt;code&gt;ctrlhdr2x6-1.sym&lt;/code&gt;) from scratch, a rectangular body with 12 pins, labeled with the control signal names, specifying the HEADER12_1 footprint. It's about 30 lines of text, all hand-written.&lt;/p&gt;
&lt;p&gt;CPU2's schematic connections break down cleanly:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Shared with CPU1&lt;/strong&gt; (same net names, auto-merged): A0–A15, D0–D7, +5V, GND&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Independent to CPU2&lt;/strong&gt; (new nets with &lt;code&gt;_2&lt;/code&gt; suffix): CLK_2, RESET_2, INT_2, NMI_2, MREQ_2, IORQ_2, RD_2, WR_2, BUSRQ_2, BUSAK_2&lt;/p&gt;
&lt;p&gt;The total net count went from 37 to 48, only 11 new nets for an entirely new processor. That's the elegance of the shared-bus approach.&lt;/p&gt;
&lt;h3&gt;Modifying the PCB With Python&lt;/h3&gt;
&lt;p&gt;Here's where the CLI-only constraint got interesting. The normal workflow would be: run &lt;code&gt;lepton-sch2pcb&lt;/code&gt; to update the PCB with new components from the schematic, then open the PCB in a graphical editor to place and route them. But &lt;code&gt;lepton-sch2pcb&lt;/code&gt; had trouble finding footprints in pcb-rnd's library paths, and I didn't have a graphical editor anyway.&lt;/p&gt;
&lt;p&gt;So I had AI write a Python script (&lt;code&gt;add_cpu2_shared.py&lt;/code&gt;) to modify the PCB file directly. The pcb-rnd file format is text-based, with clearly delimited blocks for each component (Element), each copper trace (Line), each via (Via), and the netlist (NetList). The script:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Widened the board&lt;/strong&gt; from 55.88mm to 86.36mm, an extra 30.48mm to accommodate the second Z80 and control header, placed on the right half of the board.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Inserted five new Element blocks&lt;/strong&gt;: U2 (Z80, DIP-40), J2 (2×6 header), C4 and C5 (decoupling and clock caps), and R2 (clock series resistor). Each Element block is essentially a footprint definition: pin positions, pad dimensions, drill sizes, silkscreen outlines. I copied the dimensional parameters from the existing components to maintain consistency.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Updated the netlist&lt;/strong&gt; in two ways. For shared nets (A0–A15, D0–D7, +5V, GND), the script found each existing net block and appended &lt;code&gt;Connect("U2-xx")&lt;/code&gt; entries. For CPU2's independent control signals, it created 11 entirely new net blocks. The +5V net picked up four new connections: U2's VCC pin, U2's WAIT pin (tied high, since WAIT is active low, so high means "not waiting"), C4, and J2.&lt;/p&gt;
&lt;p&gt;The result was a valid PCB file with all components placed and all nets defined, but no copper traces connecting anything.&lt;/p&gt;
&lt;h3&gt;Autorouting: Let the Machine Do the Tedious Part&lt;/h3&gt;
&lt;p&gt;With components placed and nets defined, the board needed routing: actual copper traces connecting all those pins. Doing this by hand over SSH would have been masochistic. This is exactly what autorouters exist for.&lt;/p&gt;
&lt;p&gt;The workflow: export the PCB to Specctra DSN format (an industry-standard interchange format for autorouters), run &lt;a href="https://baud.rs/bdZw62"&gt;Freerouting&lt;/a&gt;, then import the results back.&lt;/p&gt;
&lt;h4&gt;First Attempt (Failed)&lt;/h4&gt;
&lt;p&gt;The first attempt exported the PCB with the original CPU1 traces still in place, hoping Freerouting would preserve them and only route the new nets. Instead, Freerouting spent 50+ seconds per pass trying to work around traces it couldn't associate with its own net encoding. After 48 passes and 40 minutes, it was still failing to route several nets.&lt;/p&gt;
&lt;h4&gt;Second Attempt (Clean Slate)&lt;/h4&gt;
&lt;p&gt;Another AI-generated Python script (&lt;code&gt;strip_traces.py&lt;/code&gt;) removed all existing copper traces from the PCB file. This was a careful operation. The script had to remove &lt;code&gt;Line[...]&lt;/code&gt; entries inside Layer blocks (copper traces) while preserving &lt;code&gt;ElementLine[...]&lt;/code&gt; entries (component silkscreen outlines that look syntactically similar).&lt;/p&gt;
&lt;p&gt;With a clean board, Freerouting ran in headless mode:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;java&lt;span class="w"&gt; &lt;/span&gt;-jar&lt;span class="w"&gt; &lt;/span&gt;/tmp/freerouting.jar&lt;span class="w"&gt; &lt;/span&gt;-de&lt;span class="w"&gt; &lt;/span&gt;kz80.dsn&lt;span class="w"&gt; &lt;/span&gt;-do&lt;span class="w"&gt; &lt;/span&gt;kz80.ses&lt;span class="w"&gt; &lt;/span&gt;-mp&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;20&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It completed the initial routing in 10 passes, then spent another 49 passes optimizing trace length, converging at pass 59 with the message: &lt;em&gt;"There were only 10.60 track length increase in the last 5 passes, so it's very likely that autorouter can't improve the result further."&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Total routing time: about three minutes. The result: 191 wires decomposed into 897 individual trace segments, plus 82 vias for layer transitions. Every net connected. Every design rule satisfied.&lt;/p&gt;
&lt;h4&gt;Importing Routes Back&lt;/h4&gt;
&lt;p&gt;One more headless problem: pcb-rnd's SES import requires the GUI. I tried &lt;code&gt;xvfb-run&lt;/code&gt; with action commands, but it hung waiting for GTK widget interactions that couldn't happen without a display.&lt;/p&gt;
&lt;p&gt;The solution was yet another AI-generated Python script (&lt;code&gt;ses_to_pcb.py&lt;/code&gt;) that parsed the Freerouting SES output and injected the routes directly into the PCB file as copper Line entries. The main complication was coordinate system conversion: the SES file uses a bottom-left origin (y increases upward) while pcb-rnd uses a top-left origin (y increases downward). The script also handled via translation, mapping Freerouting's via definitions to pcb-rnd's format with appropriate pad sizes, drill diameters, and clearances.&lt;/p&gt;
&lt;p&gt;897 trace segments and 82 vias injected. The PCB was fully routed.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/dual-z80/top-copper.png" alt="Top copper layer of the dual Z80 RetroShield PCB viewed in Gerber Viewer, showing 897 autorouted trace segments and 82 vias connecting both CPUs to the shared bus" style="width: 100%; max-width: 800px; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); margin: 1.5em 0;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The top copper layer after Freerouting: 897 trace segments connecting 48 nets across both Z80s, the J1 bus header, and the J2 control header. Every trace was placed by the autorouter; none were drawn by hand.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;Generating Production Files&lt;/h3&gt;
&lt;p&gt;The final step was generating Gerber files, the industry-standard format that PCB fabrication houses use to manufacture boards. pcb-rnd's command-line exporter handled this cleanly:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;pcb-rnd&lt;span class="w"&gt; &lt;/span&gt;-x&lt;span class="w"&gt; &lt;/span&gt;gerber&lt;span class="w"&gt; &lt;/span&gt;--all-layers&lt;span class="w"&gt; &lt;/span&gt;kz80.pcb
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This produced 11 files covering top and bottom copper, solder mask, silkscreen, paste stencil, board outline, and drill locations. pcb-rnd uses verbose filenames (&lt;code&gt;kz80.top.copper.none.3.gbr&lt;/code&gt;), so a renaming script converted them to the standard extensions (&lt;code&gt;.gtl&lt;/code&gt;, &lt;code&gt;.gbl&lt;/code&gt;, &lt;code&gt;.gts&lt;/code&gt;, etc.) that fabrication houses expect.&lt;/p&gt;
&lt;p&gt;I also added &lt;code&gt;tinycomputers.io&lt;/code&gt; to the top silkscreen layer, placed directly below the existing &lt;code&gt;www.8bitforce.com&lt;/code&gt; text, a small nod to both projects.&lt;/p&gt;
&lt;p&gt;The final Gerber package: 35KB zipped, ready for fabrication.&lt;/p&gt;
&lt;h3&gt;The Final Board&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/dual-z80/silkscreen.png" alt="Top silkscreen layer of the dual Z80 RetroShield PCB in Gerber Viewer, showing U1 and U2 Z80 CPU footprints, J1 and J2 headers, component labels, and tinycomputers.io branding" style="width: 100%; max-width: 800px; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); margin: 1.5em 0;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The top silkscreen: U1 (left) and U2 (right) with the J1 bus header on the far left and the J2 control header between the two CPUs. The silkscreen includes the original 8bitforce.com credit alongside tinycomputers.io.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Here's what changed from the original RetroShield to the dual-CPU version:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Original&lt;/th&gt;
&lt;th&gt;Dual CPU&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Board dimensions&lt;/td&gt;
&lt;td&gt;55.88 × 53.34mm&lt;/td&gt;
&lt;td&gt;86.36 × 53.34mm&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layers&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Z80 CPUs&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Headers&lt;/td&gt;
&lt;td&gt;J1 (36 pins)&lt;/td&gt;
&lt;td&gt;J1 (36) + J2 (12) = 48 pins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nets&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Through-hole components&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SMD components&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trace segments&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;897&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vias&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The board is wider but not taller. The second Z80 sits to the right of the first, with the J2 control header between them. Both CPUs share the J1 bus connection, and the Arduino firmware will manage who drives the bus at any given moment.&lt;/p&gt;
&lt;h3&gt;The Toolchain Nobody Uses&lt;/h3&gt;
&lt;p&gt;It's worth stepping back to note what just happened. An entire PCB was designed (schematic capture, component placement, autorouting, Gerber generation) without opening a single graphical application. Every step was either a command-line tool invocation or an AI-generated Python script manipulating text files. And it was done by someone who, at the start of the project, couldn't have told you the difference between a Gerber file and a drill file.&lt;/p&gt;
&lt;p&gt;That was the whole point. I chose to avoid a GUI specifically because I wanted to test a hypothesis: that AI-assisted, text-based workflows could let someone with domain knowledge in adjacent areas (firmware, systems programming) operate effectively in an unfamiliar domain (PCB design). The text-based EDA formats made this possible; they gave the AI something it could read, reason about, and generate. A graphical tool would have put me back to square one, clicking through menus I didn't understand.&lt;/p&gt;
&lt;p&gt;I'm not claiming this is &lt;em&gt;better&lt;/em&gt; than using KiCad or Altium with a mouse. For complex boards with hundreds of components, graphical tools and experienced designers are indispensable. But for a modification like this (adding a known set of components to an existing, well-documented open-source design), AI plus text-based tools was surprisingly effective. I brought the architectural understanding (how Z80 bus arbitration works, which signals need to be shared versus independent) and the AI handled the translation into file formats I'd never touched before. Most of the time was spent understanding the &lt;em&gt;design&lt;/em&gt;, not fighting tools.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The Gerber files are at the fab now. In part two, I'll cover what happens when the physical boards arrive: inspection, assembly, first power-on, and the Arduino firmware that orchestrates two Z80s on a shared bus. The firmware is where the real complexity lives: bus arbitration timing, memory mapping for two independent address spaces, and the question of what to actually &lt;em&gt;run&lt;/em&gt; on a dual-Z80 system in 2026.&lt;/p&gt;
&lt;p&gt;Here's a preview of what the bus arbitration core looks like. The Arduino manages which CPU owns the shared bus at any given moment using the Z80's hardware BUSRQ/BUSAK handshake:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;// --- Pin definitions (active low) ---&lt;/span&gt;
&lt;span class="c1"&gt;// CPU1 control (directly from J1 via existing RetroShield mapping)&lt;/span&gt;
&lt;span class="cp"&gt;#define CPU1_CLK      A5&lt;/span&gt;
&lt;span class="cp"&gt;#define CPU1_BUSRQ    A4    &lt;/span&gt;&lt;span class="c1"&gt;// directly from Arduino to CPU1 BUSRQ pin&lt;/span&gt;
&lt;span class="cp"&gt;#define CPU1_BUSAK    A3    &lt;/span&gt;&lt;span class="c1"&gt;// directly from CPU1 BUSAK pin to Arduino&lt;/span&gt;

&lt;span class="c1"&gt;// CPU2 control (directly from J2 header)&lt;/span&gt;
&lt;span class="cp"&gt;#define CPU2_CLK      2&lt;/span&gt;
&lt;span class="cp"&gt;#define CPU2_BUSRQ    3&lt;/span&gt;
&lt;span class="cp"&gt;#define CPU2_BUSAK    4&lt;/span&gt;

&lt;span class="c1"&gt;// Bus state&lt;/span&gt;
&lt;span class="k"&gt;volatile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;active_cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;setup&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// BUSRQ is output (Arduino tells CPU to release bus)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;pinMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU1_BUSRQ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;OUTPUT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;pinMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSRQ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;OUTPUT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// BUSAK is input (CPU tells Arduino it released bus)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;pinMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU1_BUSAK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;INPUT_PULLUP&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;pinMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSAK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;INPUT_PULLUP&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Start with CPU1 active, CPU2 off the bus&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;digitalWrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU1_BUSRQ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HIGH&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// HIGH = don't request bus release&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;digitalWrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSRQ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LOW&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// LOW  = request CPU2 to release bus&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;active_cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Wait for CPU2 to acknowledge it's off the bus&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;digitalRead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSAK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HIGH&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;switch_to_cpu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;active_cpu&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;old_busrq&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;active_cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CPU1_BUSRQ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSRQ&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;old_busak&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;active_cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CPU1_BUSAK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSAK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;new_busrq&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CPU1_BUSRQ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSRQ&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Ask the active CPU to release the bus&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;digitalWrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;old_busrq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LOW&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Wait for acknowledgment (CPU finishes current machine cycle first)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;micros&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;digitalRead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;old_busak&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HIGH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;micros&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// hung CPU — shouldn't happen&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Bus is free. Release the new CPU onto it.&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;digitalWrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_busrq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HIGH&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;active_cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The critical detail is timing. When the Arduino pulls BUSRQ low, the Z80 doesn't stop immediately; it finishes its current machine cycle, which can take 3–6 clock periods depending on the instruction. Only then does it tristate its address, data, and control outputs and assert BUSAK. The &lt;code&gt;while&lt;/code&gt; loop waits for that handshake to complete. During the transition, neither CPU is driving the bus, and the Arduino must not attempt any bus operations.&lt;/p&gt;
&lt;p&gt;This is a simplified version. The full firmware in part two will handle clock generation for both CPUs, memory mapping, I/O dispatch, and the arbitration policy (round-robin, priority-based, or cooperative yield). But the handshake above is the foundation everything else builds on. It's the same protocol that made multi-Z80 S-100 systems work in the early 1980s.&lt;/p&gt;
&lt;p&gt;The hardware design is the easy part. Making two 50-year-old processors cooperate is the challenge.&lt;/p&gt;
&lt;h3&gt;Source Files&lt;/h3&gt;
&lt;p&gt;All schematics, PCB files, Gerber outputs, and helper scripts for this project are open source:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/i4XqDV"&gt;dual-z80&lt;/a&gt;&lt;/strong&gt;: KiCad/gEDA source files, Gerber package, Python scripts for PCB manipulation, and build log&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;This is part one of a two-part series. Part two will cover board assembly, bring-up, and dual-CPU firmware.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Previous RetroShield posts: &lt;a href="https://tinycomputers.io/posts/cpm-on-physical-retroshield-z80.html"&gt;CP/M on the RetroShield&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;Fiverr PCB Design&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/cpm-on-arduino-giga-r1-wifi.html"&gt;CP/M on the Giga R1&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/zork-on-retroshield-z80-arduino-giga.html"&gt;Zork on the Giga&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description><category>arduino</category><category>dual cpu</category><category>freerouting</category><category>geda</category><category>gerber</category><category>hardware</category><category>lepton-eda</category><category>multiprocessor</category><category>pcb design</category><category>pcb-rnd</category><category>retro computing</category><category>retroshield</category><category>z80</category><guid>https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html</guid><pubDate>Fri, 06 Mar 2026 14:00:00 GMT</pubDate></item><item><title>Investing in the Jevons Expansion</title><link>https://tinycomputers.io/posts/investing-in-the-jevons-expansion.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/investing-in-the-jevons-expansion_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;16 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;This is the sixth piece in a series applying the Jevons Paradox framework to AI economics. The prior five built the theoretical case:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tinycomputers.io/posts/the-paradox-of-cheap-compute.html"&gt;The Paradox of Cheap Compute&lt;/a&gt; established the historical pattern: every time the cost of compute fell by an order of magnitude, total consumption expanded far beyond the efficiency gain.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinycomputers.io/posts/the-jevons-counter-thesis-why-ai-displacement-scenarios-underweight-demand-expansion.html"&gt;The Jevons Counter-Thesis&lt;/a&gt; argued that AI displacement models systematically undercount the demand expansion that follows when cognitive labor gets cheaper.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinycomputers.io/posts/moores-law-for-intelligence-what-happens-when-thinking-gets-cheap.html"&gt;Moore's Law for Intelligence&lt;/a&gt; mapped the inference cost curve and showed it mirrors early Moore's Law.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinycomputers.io/posts/something-big-is-happening-a-critique.html"&gt;Something Big Is Happening, And Something Big Is Missing&lt;/a&gt; applied the framework to a specific displacement scenario and showed where the analysis breaks down.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox.html"&gt;The AI Vampire Is Jevons Paradox&lt;/a&gt; identified the binding constraint: human judgment doesn't scale the way compute does.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This piece asks the practical question: if you believe the framework, what follows?&lt;/p&gt;
&lt;p&gt;I should be clear about what this is and what it isn't. This is not financial advice. I'm not recommending specific trades, allocations, or timing. What I'm doing is mapping a structural argument (Jevons-style demand expansion in AI) onto the physical and economic layers that expansion must pass through. The goal is to identify where expansion creates bottlenecks, because bottlenecks are where pricing power concentrates.&lt;/p&gt;
&lt;p&gt;The key insight is that you don't need to pick which AI company wins. You don't need to know whether OpenAI, Anthropic, Google, or some company that doesn't exist yet captures the application layer. What you need to identify are the fixed-supply inputs that &lt;em&gt;every&lt;/em&gt; AI company needs regardless of who wins. The expansion has to flow through certain physical chokepoints, and those chokepoints are investable.&lt;/p&gt;
&lt;h3&gt;The Framework in One Paragraph&lt;/h3&gt;
&lt;p&gt;For readers coming to this series fresh: Jevons Paradox describes what happens when a critical input gets dramatically cheaper. The intuitive expectation is that total spending on that input falls. The historical reality is the opposite: demand expands beyond the efficiency gain, and total consumption increases. Coal in the 19th century (as Jevons himself documented in &lt;a href="https://baud.rs/xjxPfz"&gt;&lt;em&gt;The Coal Question&lt;/em&gt;&lt;/a&gt;), transistors in the 20th, bandwidth in the 21st. The prior pieces in this series argue that AI inference costs are following the same curve, with the same structural conditions that produced Jevons outcomes in every prior case. If that argument holds, then what matters isn't whether AI gets more efficient; it's where the resulting demand expansion hits physical constraints.&lt;/p&gt;
&lt;h3&gt;The Objection That Isn't&lt;/h3&gt;
&lt;p&gt;The most common pushback I get on this series is some version of: "GPUs are hitting diminishing returns, capex is already enormous, and there's a natural ceiling on how far the expansion can go." Variations appear in coverage from &lt;a href="https://baud.rs/B5ATWQ"&gt;Northeastern&lt;/a&gt; and &lt;a href="https://baud.rs/bcFAl5"&gt;illuminem&lt;/a&gt;, often framed as a correction to the Jevons thesis.&lt;/p&gt;
&lt;p&gt;It's a reasonable-sounding objection. It's also wrong, and understanding &lt;em&gt;why&lt;/em&gt; it's wrong actually strengthens the Jevons case.&lt;/p&gt;
&lt;p&gt;The objection treats a technology-specific constraint as an input-level constraint. GPUs hitting diminishing returns doesn't mean &lt;em&gt;inference&lt;/em&gt; is hitting diminishing returns. It means GPUs are reaching the end of their particular S-curve. But GPUs aren't the only way to run inference. Custom ASICs, TPUs, NPUs, and novel architectures are opening entirely new cost curves &lt;em&gt;below&lt;/em&gt; the GPU curve. The GPU plateau isn't a ceiling; it's a handoff.&lt;/p&gt;
&lt;p&gt;The numbers are already visible. Broadcom controls roughly 70% of the custom AI ASIC market, reporting \$5.2 billion in AI semiconductor revenue in Q3 alone, with &lt;a href="https://baud.rs/zcsDXo"&gt;five major hyperscaler customers&lt;/a&gt; driving demand. &lt;a href="https://baud.rs/znj9ak"&gt;Marvell's custom XPU pipeline&lt;/a&gt; spans AWS, Google, Meta, and Microsoft, with AI revenue reaching \$2.6 billion in FY2026. Google's TPU transition from v6 to v7 delivered a &lt;a href="https://baud.rs/4aoJ1v"&gt;roughly 70% cost-per-token reduction&lt;/a&gt;. Taalas, a startup building hardwired inference chips, &lt;a href="https://baud.rs/QxPpqN"&gt;claims 1000x performance per watt&lt;/a&gt; versus general-purpose GPUs. Custom ASICs handle an estimated 20% of inference workloads today and are &lt;a href="https://baud.rs/eIj2sQ"&gt;projected to reach 70–75% by 2028&lt;/a&gt;, with custom ASIC shipments growing at 44.6% annually versus 16.1% for GPUs.&lt;/p&gt;
&lt;p&gt;Every prior Jevons cycle worked exactly this way. Newcomen's engine didn't just get incrementally better; it was replaced by Watt's engine, then Corliss, then turbines. Each new technology started a fresh S-curve before the previous one fully flattened. Moore's Law didn't ride a single technology either. As Chris Miller chronicles in &lt;a href="https://baud.rs/8MdhcB"&gt;&lt;em&gt;Chip War&lt;/em&gt;&lt;/a&gt;, bipolar gave way to NMOS, then CMOS, then FinFET, now gate-all-around. The pattern is always multiple overlapping S-curves, each beginning before the last one peaks.&lt;/p&gt;
&lt;p&gt;The data supports the mechanism: &lt;a href="https://baud.rs/O6Q4Tc"&gt;every 50% reduction in inference cost has been associated with a 200–300% increase in deployment&lt;/a&gt;. That's textbook Jevons elasticity.&lt;/p&gt;
&lt;p&gt;"Diminishing returns on GPUs" isn't a ceiling on inference. It's the moment the next technology takes over. That's the &lt;em&gt;mechanism&lt;/em&gt; of Jevons Paradox, not a counterpoint to it.&lt;/p&gt;
&lt;h3&gt;The Investment Layers&lt;/h3&gt;
&lt;p&gt;If Jevons-style expansion is real, it has to flow through physical infrastructure. I think about this in four layers, ordered from deepest (most expansion-certain) to shallowest (most speculative).&lt;/p&gt;
&lt;h4&gt;Layer 1: Energy and Power&lt;/h4&gt;
&lt;p&gt;Energy is the binding constraint. If AI demand expands at anything close to Jevons rates, someone has to generate the electricity. Data center electricity demand is on track to double this year, with the sector's total consumption &lt;a href="https://baud.rs/8hWfJa"&gt;surpassing Canada's national usage&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The structural problem is deeper than just demand growth. As Vaclav Smil details in &lt;a href="https://baud.rs/OMSIzZ"&gt;&lt;em&gt;Energy and Civilization&lt;/em&gt;&lt;/a&gt;, energy transitions are slow precisely because the physical infrastructure is massive and long-lived. Roughly 70% of the U.S. electrical grid was built between the 1950s and 1970s. Much of it is approaching end-of-life at the exact moment AI is driving the largest incremental demand increase in decades. This isn't a problem that resolves quickly. Power plants take years to permit and build. Grid transmission upgrades take longer.&lt;/p&gt;
&lt;p&gt;Nuclear is where the smart money is moving. Constellation Energy's merger with Calpine creates a fleet of 21 nuclear reactors plus 50 natural gas plants, essentially a baseload power platform positioned for AI demand. Amazon signed a 1.92 GW power purchase agreement at Susquehanna and committed \$500 million to small modular reactor development. These aren't speculative bets on future demand; they're capacity commitments predicated on demand that's already contractually visible.&lt;/p&gt;
&lt;p&gt;Hyperscaler capital expenditure tells the same story: \$602 billion planned for 2026, roughly 75% tied to AI infrastructure. Goldman Sachs estimates cumulative AI infrastructure spending of \$1.15 trillion between 2025 and 2027. That capital has to buy electricity, and the electricity has to come from somewhere.&lt;/p&gt;
&lt;h4&gt;Layer 2: Physical Infrastructure&lt;/h4&gt;
&lt;p&gt;Between the power plant and the GPU sits an enormous amount of physical equipment: transformers, switchgear, power distribution units, cooling systems, racks, cabling. This is the picks-and-shovels layer; it benefits regardless of which AI stack wins.&lt;/p&gt;
&lt;p&gt;Eaton reported data center orders up 70% year-over-year. Transformers have become a bottleneck, with lead times stretching to 18+ months for large power transformers. Vertiv, which makes power management and thermal systems, is sitting on a \$9.5 billion backlog. Liquid cooling, once a niche technology, is becoming standard for high-density AI compute racks.&lt;/p&gt;
&lt;p&gt;Grid transmission and distribution may be the most underappreciated bottleneck. You can build a data center in 18 months. Getting grid interconnection can take three to five years. The physical infrastructure required to move power from generation to consumption is the constraint that's hardest to accelerate, and it benefits from AI expansion regardless of which models, chips, or cloud providers ultimately dominate.&lt;/p&gt;
&lt;h4&gt;Layer 3: Custom Silicon&lt;/h4&gt;
&lt;p&gt;The GPU-to-ASIC transition described above isn't just evidence that the Jevons expansion continues; it's itself a Jevons trigger. Each new silicon architecture that enters production at lower cost-per-token reopens the demand curve.&lt;/p&gt;
&lt;p&gt;Broadcom's AI semiconductor revenue is &lt;a href="https://baud.rs/9Hp791"&gt;doubling year-over-year to roughly \$8.2 billion in Q1 FY2026&lt;/a&gt;. Marvell's custom XPU pipeline is expanding across all major hyperscalers. Both companies are positioned on the ASIC side of the GPU-to-ASIC transition, the side that's growing at 44.6% versus 16.1%.&lt;/p&gt;
&lt;p&gt;Nvidia still dominates training workloads, and Blackwell delivers a &lt;a href="https://baud.rs/5ns8n0"&gt;10x cost-per-token reduction for open-source inference models&lt;/a&gt;, which is itself a massive Jevons input. But inference is bifurcating. Training demands flexibility and programmability (Nvidia's strength). Inference at scale demands efficiency and cost optimization (where ASICs excel). The market is splitting, and both sides drive expansion.&lt;/p&gt;
&lt;h4&gt;Layer 4: The Application Tier&lt;/h4&gt;
&lt;p&gt;This is where it gets speculative. Cloud providers and hyperscalers function as toll booths; they collect revenue proportional to total compute consumed, making them natural beneficiaries of demand expansion. But the application tier above them is where you're picking winners, not betting on expansion itself.&lt;/p&gt;
&lt;p&gt;AI-native companies become viable only at cheaper inference price points. The legal tech startup that can offer document review at one-tenth the cost of a junior associate doesn't exist at \$20 per million tokens. It might exist at \$2. It definitely exists at \$0.20. Each step down the cost curve unlocks a new tier of applications.&lt;/p&gt;
&lt;p&gt;The contrarian opportunity in this layer is latent demand: the markets that don't exist yet because the service was too expensive for most people. Roughly 80% of Americans who need a lawyer can't afford one. Most small businesses can't afford financial planning. Most students can't afford tutoring. If inference costs follow a Jevons trajectory, these aren't aspirational markets; they're inevitable markets. But investing in them means picking which company captures each one, which is a fundamentally different bet than investing in the infrastructure that serves all of them.&lt;/p&gt;
&lt;h3&gt;Who Else Is Making This Bet&lt;/h3&gt;
&lt;p&gt;This framework isn't contrarian anymore. &lt;a href="https://baud.rs/Wy7mZE"&gt;Satya Nadella tweeted&lt;/a&gt; "Jevons paradox strikes again!" when DeepSeek demonstrated cheaper inference without reducing demand. Microsoft's AI revenue hit \$13 billion, up 175% year-over-year. &lt;a href="https://baud.rs/xdNj4l"&gt;Fortune noted&lt;/a&gt; that Nadella's optimism was explicitly grounded in the paradox: cheaper AI means more AI, not less.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://baud.rs/aRLPY8"&gt;Andreessen Horowitz made the economic case directly&lt;/a&gt;: cheaper tokens unlock more demand than efficiency saves. Their thesis is that foundation model economics follow the same curve as prior compute economics: falling costs expand the addressable market faster than they reduce per-unit revenue.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://baud.rs/Qcm7AN"&gt;NPR's Planet Money covered the thesis&lt;/a&gt; in mainstream terms, bringing Jevons Paradox from an obscure 19th-century economic observation to a household framework for understanding AI economics. &lt;a href="https://baud.rs/V6W8hJ"&gt;Nathan Witkin's analysis&lt;/a&gt; showed that employment in software development, translation, and radiology &lt;em&gt;increased&lt;/em&gt; after GPT-3, exactly the demand expansion the model predicts. &lt;a href="https://baud.rs/KUEJyl"&gt;Markman Capital&lt;/a&gt; called the "flawed consensus" of GPU diminishing returns "one of the most dangerous misreadings of the current market."&lt;/p&gt;
&lt;p&gt;&lt;a href="https://baud.rs/rD0Spu"&gt;Deloitte&lt;/a&gt;, McKinsey, and Bain are all projecting massive infrastructure buildout. &lt;a href="https://baud.rs/8hWfJa"&gt;McKinsey's \$7 trillion estimate&lt;/a&gt; for data center scaling reflects the same underlying logic: if demand expands as costs fall, the physical infrastructure to support it is the bottleneck.&lt;/p&gt;
&lt;p&gt;Jevons went from an obscure economics reference to a mainstream investment framework in roughly twelve months. That's not because it's trendy; it's because the data keeps confirming the pattern.&lt;/p&gt;
&lt;h3&gt;Where the Thesis Could Be Wrong&lt;/h3&gt;
&lt;p&gt;Intellectual honesty requires mapping the failure modes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Demand elasticity might be lower than historical precedent.&lt;/strong&gt; Every prior Jevons cycle involved inputs with massive latent demand: coal for industrial heat, transistors for consumer electronics, bandwidth for media. AI inference might not have the same depth of latent demand. If the tasks AI performs well are narrower than the tasks coal or transistors enabled, the expansion could stall earlier than the model predicts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Regulatory intervention could cap the expansion.&lt;/strong&gt; Energy policy, AI regulation, data center permitting restrictions. Any of these could artificially constrain the physical infrastructure that the expansion requires. Jevons Paradox describes an economic dynamic, not a law of physics. It can be overridden by policy.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The biological ceiling is real.&lt;/strong&gt; As I argued in &lt;a href="https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox.html"&gt;The AI Vampire Is Jevons Paradox&lt;/a&gt;, human judgment is the input that doesn't scale. If every Jevons expansion in AI ultimately concentrates demand on human decision-making, and human decision-making has genuine cognitive limits, the expansion hits a different kind of constraint, one that can't be solved with more silicon or more power.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Timing risk is the most likely failure mode.&lt;/strong&gt; The direction of the thesis could be correct while the timeline is wrong. Infrastructure bottlenecks might resolve more slowly than demand builds, creating periods of overinvestment followed by correction. The historical base rate favors Jevons, but base rates describe probabilities, not certainties. Plenty of investors have been right about the direction and still lost money because they were wrong about the timing.&lt;/p&gt;
&lt;h3&gt;The Physical Footprint of Expansion&lt;/h3&gt;
&lt;p&gt;The deepest layers (energy and physical infrastructure) are the safest Jevons bets. They benefit from AI demand expansion regardless of which models, chips, or companies win. You don't need to know whether GPT-7 or Claude 6 is the better model to know that both of them will need electricity, transformers, cooling, and grid capacity.&lt;/p&gt;
&lt;p&gt;The further up the stack you go, the more you're picking winners rather than betting on expansion. Custom silicon is a strong middle ground: the GPU-to-ASIC transition is structural, and the companies positioned on the right side of it have visible demand. But the application tier is where the uncertainty concentrates, and that's where most retail investors focus their attention.&lt;/p&gt;
&lt;p&gt;The expansion has a physical footprint. Every token generated requires electricity. Every data center requires grid interconnection. Every custom ASIC requires a fab slot. Every cooling system requires water. The Jevons expansion, if it plays out as the framework predicts, will be visible not in stock prices or earnings calls but in the physical world: in power generation capacity, in transformer lead times, in grid interconnection queues, in cooling system orders.&lt;/p&gt;
&lt;p&gt;Jevons won't announce itself. It never does. It shows up in electricity bills, in transformer backorders, in cooling system lead times, in the quiet scramble to secure power purchase agreements years in advance. The signal isn't in what people say about AI. It's in what they're building to support it.&lt;/p&gt;</description><category>ai</category><category>asic</category><category>data centers</category><category>economics</category><category>energy</category><category>gpu</category><category>infrastructure</category><category>investing</category><category>jevons paradox</category><category>nuclear</category><category>semiconductors</category><category>utilities</category><guid>https://tinycomputers.io/posts/investing-in-the-jevons-expansion.html</guid><pubDate>Thu, 05 Mar 2026 14:00:00 GMT</pubDate></item><item><title>Generating Technical Handbooks with AI: Parallel Agents, Source Code, and 2,400 Pages</title><link>https://tinycomputers.io/posts/generating-technical-handbooks-with-ai.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/generating-technical-handbooks-with-ai_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;17 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-handbook-generation/ballistics-engine-handbook-cover.jpg" alt="Cover of The Ballistics Engine Handbook - A Comprehensive Guide to Computational Exterior Ballistics, showing a bullet trajectory arc on a dark grid background" style="float: left; max-width: 300px; margin: 0 1.5em 1em 0; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Over the past few weeks I've generated three technical handbooks using Claude Code with Opus 4.6 and the Claude Agent SDK. &lt;a href="https://tinycomputers.io/data/ballistics-engine-handbook.pdf"&gt;The Ballistics Engine Handbook&lt;/a&gt;, 641 pages across 66 chapters covering computational exterior ballistics. &lt;a href="https://tinycomputers.io/data/lattice-handbook.pdf"&gt;The Lattice Handbook&lt;/a&gt;, 868 pages across 84 chapters documenting an entire programming language. &lt;a href="https://tinycomputers.io/data/sampo-cpu-handbook.pdf"&gt;The Sampo CPU Handbook&lt;/a&gt;, 871 pages across 82 chapters walking through the design, programming, and hardware implementation of a 16-bit RISC CPU.&lt;/p&gt;
&lt;p&gt;That's roughly 2,400 pages and 232 chapters of deeply technical content, generated from real codebases by AI agents that read actual source files before writing about them.&lt;/p&gt;
&lt;p&gt;These aren't ChatGPT summaries. They aren't the kind of vaguely plausible prose you get from asking an LLM to "write a book about X." Each handbook was produced by a framework that launches 10-12 Claude agents in parallel, each assigned a Part of the book, each with access to the real project source code, each writing &lt;a href="https://baud.rs/owj2PE"&gt;LaTeX&lt;/a&gt; chapters grounded in actual implementation. The result is documentation that references real functions, real CLI flags, real instruction encodings, because the agents read the code before writing about it.&lt;/p&gt;
&lt;h3&gt;Why Handbooks?&lt;/h3&gt;
&lt;p&gt;Developer documentation is chronically underwritten. Most projects ship with a README, maybe some auto-generated API docs, and a handful of examples. If you're lucky, there's a tutorial. The gap between "reference documentation" and "understanding how to actually use this thing" is enormous, and it's the gap where handbooks live.&lt;/p&gt;
&lt;p&gt;A good handbook explains not just what the API surface looks like but why the design decisions were made, how the pieces fit together, what the edge cases are, and how to use the tool effectively in real-world scenarios. Writing one for a complex project is a multi-month effort. For a solo developer maintaining a project in their spare time, it's effectively impossible; the opportunity cost is too high.&lt;/p&gt;
&lt;p&gt;AI changes the math. If you can point agents at source code and get a coherent, accurate, 600-page handbook, the cost drops from months to hours. The output isn't a finished book; it's a first draft that needs review, editing, and correction. But it's a dramatically better starting point than a blank page.&lt;/p&gt;
&lt;h3&gt;The Source-Aware Approach&lt;/h3&gt;
&lt;p&gt;What makes this different from asking a model to "write a book about ballistics" or "write a book about CPU design" is that the agents have access to the actual codebase.&lt;/p&gt;
&lt;p&gt;Each agent runs with its working directory set to the real project: the ballistics-engine Rust crate, the Lattice C compiler, or the Sampo CPU's Verilog and assembly. The agents have Read, Glob, and Grep access. They can open source files, search for function signatures, trace data structures, and understand the actual implementation before writing about it.&lt;/p&gt;
&lt;p&gt;The chapter definitions in the generation script include explicit source file references:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sourceReferences: ["src/atmosphere.rs", "src/drag.rs", "src/drag_model.rs"]
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When an agent starts writing a chapter on atmosphere modeling, the prompt tells it: read &lt;code&gt;src/atmosphere.rs&lt;/code&gt; first. The agent opens the file, sees the ICAO standard atmosphere implementation, finds the actual function signatures and constants, and writes a chapter grounded in what the code actually does, not what a language model thinks atmosphere modeling might look like.&lt;/p&gt;
&lt;p&gt;For the Ballistics Engine Handbook, this means chapters that reference real Rust functions, real CLI flags from &lt;code&gt;src/cli_api.rs&lt;/code&gt;, and real numerical methods from the solver. For the Sampo CPU Handbook, it means chapters that include actual Verilog module definitions, actual ISA encodings from the architecture spec, and actual assembler passes from the Rust toolchain. The agent reads &lt;code&gt;ENCODING.md&lt;/code&gt; and writes about instruction formats using the real bit layouts, not invented ones.&lt;/p&gt;
&lt;p&gt;This source-awareness is the difference between documentation that happens to sound plausible and documentation that is grounded in implementation. It doesn't eliminate hallucination (I'll get to that), but it dramatically reduces it.&lt;/p&gt;
&lt;h3&gt;The Parallel Agent Framework&lt;/h3&gt;
&lt;p&gt;The core of the system is a TypeScript file called &lt;code&gt;generate.mts&lt;/code&gt; that orchestrates parallel Claude Agent SDK sessions. Each handbook has its own version, but the architecture is the same.&lt;/p&gt;
&lt;p&gt;The book structure is defined as TypeScript data:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kd"&gt;interface&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Chapter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;number&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;pages&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;sections&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;Section&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;sourceReferences&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;interface&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Part&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;number&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;pageTarget&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;chapters&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;Chapter&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Each Part contains its chapters, each chapter lists its sections, page target, and which source files the agent should read. The Ballistics Engine Handbook has 9 Parts. The Lattice Handbook has 11. The Sampo Handbook has 11 plus appendices.&lt;/p&gt;
&lt;p&gt;When you run &lt;code&gt;npx tsx generate.mts&lt;/code&gt;, every Part launches as a separate Claude agent simultaneously:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;promises&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;BOOK_STRUCTURE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;runPartAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;runAppendixAgent&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;settled&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;allSettled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;promises&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For the Sampo Handbook, that's 12 agents running in parallel. Each one receives a detailed prompt containing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The full table of contents (so it knows what other Parts cover, for cross-reference awareness)&lt;/li&gt;
&lt;li&gt;Its specific chapters, sections, descriptions, and page targets&lt;/li&gt;
&lt;li&gt;Source file references to read before writing&lt;/li&gt;
&lt;li&gt;A style guide (more on this below)&lt;/li&gt;
&lt;li&gt;LaTeX formatting conventions, custom environments, and commands&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each agent calls the Claude Agent SDK's &lt;code&gt;query()&lt;/code&gt; function with Opus 4.6, running in the source project's directory:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;cwd&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/Users/alexjokela/projects/ballistics-engine"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;allowedTools&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Read"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Glob"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grep"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bash"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;permissionMode&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bypassPermissions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;maxTurns&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;30&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chapters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-opus-4-6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The agents write LaTeX &lt;code&gt;.tex&lt;/code&gt; chapter files to a &lt;code&gt;chapters/&lt;/code&gt; directory, which are &lt;code&gt;\include{}&lt;/code&gt;'d by the main &lt;code&gt;book.tex&lt;/code&gt;. Each agent logs its progress to a per-part log file. When all agents finish, the script reports results: duration, success/failure status, and file sizes for each generated chapter.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Promise.allSettled()&lt;/code&gt; is important here. If one Part fails (the agent hits a turn limit, encounters an error, or produces incomplete output), the other nine or ten agents keep running. You can rerun a single failed Part with &lt;code&gt;--part=N&lt;/code&gt; without regenerating the entire book.&lt;/p&gt;
&lt;p&gt;The parallelism is the key performance insight. A single agent writing all 82 chapters of the Sampo Handbook sequentially would take many hours. Twelve agents writing in parallel, each handling 7-8 chapters, complete the entire book in roughly 45 minutes to an hour of wall-clock time. The agents don't share state or coordinate; they work independently, which is what makes parallelism straightforward.&lt;/p&gt;
&lt;h3&gt;The Style Guide Problem&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-handbook-generation/lattice-handbook-cover.jpg" alt="Cover of The Lattice Handbook - A Comprehensive Guide to the Lattice Programming Language, showing a crystalline lattice structure on a deep purple background" style="float: right; max-width: 300px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Each handbook has a distinct voice, defined in a &lt;code&gt;CLAUDE.md&lt;/code&gt; file that the agent reads before starting:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Ballistics Engine Handbook&lt;/strong&gt;: "Technical, authoritative, and practical. Inspired by O'Reilly's Definitive Guide series." Use real cartridge data in every example. Show the physics. Include safety warnings for anything involving pressure or load data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Lattice Handbook&lt;/strong&gt;: "Conversational, precise, and playful. Inspired by Why's (Poignant) Guide to Ruby and Eloquent Ruby." Use chemistry and materials science metaphors for the phase system: values are materials that can be fluid or crystallized, freezing is literally crystallization.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Sampo CPU Handbook&lt;/strong&gt;: "Technical, authoritative, and hands-on. Think of it as a lab notebook that became a textbook." Show both hex and binary for instruction encodings. Use real code from the project; never invent hypothetical assembly or Verilog.&lt;/p&gt;
&lt;p&gt;Maintaining consistent voice across 10-12 agents writing simultaneously is a genuine challenge. Each agent reads the same style guide, but interpretation varies. What works: detailed, specific instructions with concrete examples of what to do and what not to do. All three guides include an explicit list of banned words ("simple," "easy," "trivial," "obviously," "just") because those words make struggling readers feel bad and they're the first thing an LLM reaches for when transitioning between concepts.&lt;/p&gt;
&lt;p&gt;What doesn't work: vague instructions like "be conversational" or "keep it engaging." Every agent interprets those differently. The Lattice Handbook's metaphor system (where the phase-based type system is described using chemistry analogies) required explicit instructions: "Values are materials. Freezing is crystallization. Thawing is melting. Arenas are regions where crystals are stored." Without that specificity, some agents would use the metaphors and others wouldn't, and the book would feel like it had multiple authors, which, in a sense, it does.&lt;/p&gt;
&lt;p&gt;The "no AI self-reference" rule is also critical. The style guide explicitly states: "Book content must read as if written entirely by the author, with no references to AI assistance." Without this, agents occasionally produce meta-commentary about their own generation process, which breaks immersion.&lt;/p&gt;
&lt;h3&gt;What Goes Wrong&lt;/h3&gt;
&lt;p&gt;An honest assessment of failure modes:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hallucinated APIs.&lt;/strong&gt; Despite source-awareness, agents sometimes invent function signatures, CLI flags, or configuration options that don't exist. This is the most dangerous failure mode because it reads authoritatively. The mitigation (explicit source file references) reduces but doesn't eliminate it. Every chapter needs a review pass where someone checks that the referenced functions and flags actually exist.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Uneven depth.&lt;/strong&gt; Some chapters come out thin, hitting the minimum viable content but lacking the depth a handbook reader expects. Others balloon beyond their page target with redundant examples. Page targets in the chapter definitions help, but agents treat them as loose guidelines.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cross-reference gaps.&lt;/strong&gt; Agent writing Part III doesn't know exactly what Agent writing Part VIII said. Each agent gets the full table of contents for awareness, but not the actual content of other Parts. This means cross-references are sometimes vague ("as we'll see in Chapter 25") or occasionally contradictory. The LaTeX &lt;code&gt;\cref{}&lt;/code&gt; system helps (agents insert labels and cross-references that at least compile correctly), but semantic consistency across Parts requires a human review pass.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;LaTeX formatting inconsistencies.&lt;/strong&gt; Different agents make different choices about when to use &lt;code&gt;\begin{notebox}&lt;/code&gt; vs. &lt;code&gt;\begin{tipbox}&lt;/code&gt;, how to format code listings, whether to put output inline or in a separate listing. The style guide constrains this, but the variation is noticeable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The confident-but-wrong problem.&lt;/strong&gt; AI writes with unwavering authority about implementation details it misread. An agent might open a Rust file, misinterpret a match arm, and write a paragraph confidently explaining behavior that the code doesn't actually produce. This is the hardest failure to catch because the prose sounds correct and references real source files; you have to actually trace the logic to find the error.&lt;/p&gt;
&lt;p&gt;The regeneration workflow handles most of these: rerun a single Part with &lt;code&gt;--part=N&lt;/code&gt;, review the output, iterate. A full regeneration of one Part takes about five minutes, fast enough to make iterative refinement practical.&lt;/p&gt;
&lt;h3&gt;Results and Numbers&lt;/h3&gt;
&lt;p&gt;Across the three handbooks:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Ballistics Engine&lt;/th&gt;
&lt;th&gt;Lattice&lt;/th&gt;
&lt;th&gt;Sampo CPU&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pages&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;641&lt;/td&gt;
&lt;td&gt;868&lt;/td&gt;
&lt;td&gt;871&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chapters&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;td&gt;84&lt;/td&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9 + Appendices&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;11 + Appendices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parallel Agents&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Source Language&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td&gt;Verilog/Rust/Assembly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Style&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;O'Reilly Guide&lt;/td&gt;
&lt;td&gt;Why's Poignant&lt;/td&gt;
&lt;td&gt;Lab Notebook&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The model for all three is Claude Opus 4.6. Total generation time per handbook is roughly 45-60 minutes wall-clock with parallel agents, compared to what would be 8+ hours running sequentially.&lt;/p&gt;
&lt;h4&gt;What This Costs&lt;/h4&gt;
&lt;p&gt;All of this work was done on a Claude Max subscription at \$200/month. At that tier, you get access to Opus 4.6 through Claude Code with what Anthropic describes as "significantly higher" usage limits than the \$20 Pro plan. How much higher? That's where things get vague.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-handbook-generation/claude-subscription-usage.png" alt="The Claude usage settings page showing plan usage limits: a session meter at 4% used, a weekly 'All models' meter at 66% used, a 'Sonnet only' meter at 1% used, and an 'Extra usage' toggle with \$0.00 spent. No token counts, no rate limits, no concrete units, just percentages of an unstated total." style="max-width: 100%; margin: 1em 0 1.5em 0; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Anthropic doesn't publish concrete token limits or rate caps for Max. The pricing page says you get "9x more usage" than Pro, but 9x of what? The Pro plan's limits are themselves unstated. You get a usage meter in the interface that fills up and eventually throttles you, but there's no documentation of what the meter measures, how it maps to tokens, or what the actual ceiling is. When you hit the limit, you're told to wait or upgrade. The \$100/month tier exists between Pro and Max, and Anthropic is equally vague about how it differs from either.&lt;/p&gt;
&lt;p&gt;In practice, the Max subscription was sufficient to generate all three handbooks (2,400 pages of content produced by dozens of parallel agent sessions running Opus 4.6) within a single billing cycle, without hitting much throttling that would have blocked the work. Whether that's representative of the limit or I happened to stay under it, I genuinely don't know. Anthropic's refusal to publish concrete limits makes it impossible to do the math in advance. You can't calculate cost-per-page or tokens-per-dollar because the denominator is secret.&lt;/p&gt;
&lt;p&gt;This is a strange posture for a company selling a product. The \$200/month price point positions Max as a professional tool, something you'd expense to a business or justify as a productivity investment. Professional tools come with specs. You know how many build minutes your CI plan includes. You know how many API calls your database tier supports. You know how many seats your Slack plan covers. Anthropic is asking for \$200/month and answering the question "what do I get for that?" with essentially "a lot, trust us."&lt;/p&gt;
&lt;p&gt;For what it's worth, the alternative would have been the API, where pricing is transparent: Opus 4.6 runs roughly \$15 per million input tokens and \$75 per million output tokens. Back-of-envelope math suggests that generating a single 800-page handbook through the API (with all the source file reading, prompt construction, and chapter output) would consume something in the range of several hundred dollars of tokens. Three handbooks would plausibly run \$500-1,000+ through direct API billing. If that estimate is in the right ballpark, the Max subscription is a genuine bargain for this kind of heavy-generation workload, but you just have to take it on faith because Anthropic won't show you the numbers.&lt;/p&gt;
&lt;p&gt;For comparison against the alternative: a professional technical writer producing this volume of deeply technical content (requiring them to understand exterior ballistics physics, or compiler internals, or CPU microarchitecture) would represent months of full-time work at rates that would make the API costs look trivial. The AI-generated output is a first draft, not a finished product. But it's a first draft that covers the full scope, references real source code, and provides a structure that would take weeks to produce manually.&lt;/p&gt;
&lt;h3&gt;What This Means for Documentation&lt;/h3&gt;
&lt;p&gt;This connects to a theme I've been writing about in the &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox series&lt;/a&gt;: documentation is a classic example of latent demand suppressed by cost.&lt;/p&gt;
&lt;p&gt;Most open-source projects have mediocre documentation because good documentation is expensive to produce. A solo maintainer choosing between implementing features and writing a 600-page handbook will choose features every time. The handbook doesn't get written, not because it wouldn't be valuable, but because the cost of producing it exceeds the maintainer's available time.&lt;/p&gt;
&lt;p&gt;If handbook generation becomes cheap enough, every serious project gets one. The total volume of technical documentation doesn't decrease; it explodes. And the human role shifts from production to curation. The expensive work isn't writing 600 pages anymore. It's defining the structure: deciding what the book should cover, in what order, at what depth. It's reviewing the output for accuracy: catching hallucinated APIs, verifying that code examples actually run, ensuring cross-references are coherent. It's editing for voice: making sure the playful tone of the Lattice Handbook doesn't lapse into the authoritative register of the Ballistics Handbook.&lt;/p&gt;
&lt;p&gt;This is Jevons in miniature. Cheaper documentation doesn't mean less documentation work. It means more documentation exists, and humans focus on the higher-judgment parts: structure, accuracy, and editorial voice.&lt;/p&gt;
&lt;h3&gt;The Framework Is the Product&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://baud.rs/mKMEFE"&gt;&lt;code&gt;generate.mts&lt;/code&gt; pattern&lt;/a&gt; is reusable. The same architecture (define a book structure in TypeScript, launch parallel agents with source code access, collect LaTeX output) applies to any project with a codebase and a desired handbook.&lt;/p&gt;
&lt;p&gt;The bottleneck isn't the AI's ability to write. It's the human's ability to define what the handbook should contain and whether the output is correct. Defining the structure for the Sampo Handbook (11 Parts, 82 chapters, hundreds of sections, source file references for each) took longer than running the generation. Reviewing and correcting the output takes longer than generating it.&lt;/p&gt;
&lt;p&gt;That bottleneck is itself a Jevons observation. When the cost of producing prose drops to near zero, the scarce input becomes human judgment about what the prose should say and whether it's right. The generation is the cheap part. The thinking is the expensive part. As it always has been.&lt;/p&gt;</description><category>ai</category><category>claude agent sdk</category><category>claude code</category><category>documentation</category><category>handbooks</category><category>latex</category><category>opus 4.6</category><category>parallel agents</category><category>technical writing</category><guid>https://tinycomputers.io/posts/generating-technical-handbooks-with-ai.html</guid><pubDate>Wed, 04 Mar 2026 17:00:00 GMT</pubDate></item><item><title>The AI Vampire Is Jevons Paradox</title><link>https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-ai-vampire-is-jevons-paradox_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;15 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-vampire-jevons/burne-jones-the-vampire-1897.jpg" alt="The Vampire, an 1897 painting by Philip Burne-Jones depicting a pale woman draped over a prostrate man, the visual origin of the vampire as metaphor for extraction" style="float: right; max-width: 40%; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Steve Yegge's &lt;a href="https://baud.rs/dJwDgQ"&gt;"The AI Vampire"&lt;/a&gt; has been circulating among developers and managers for the past few weeks, and it's striking a nerve. The core argument: AI makes you dramatically more productive (Yegge estimates 10x or more) but companies capture the entire surplus. You don't get a shorter workday. You get 10x the output at the same hours, with the cognitive load compressed into pure decision-making. The result is burnout on a scale the industry hasn't seen before. His prescription is blunt: calculate your \$/hr, work three to four hours a day, and refuse to let the vampire drain you dry.&lt;/p&gt;
&lt;p&gt;It's a compelling piece, written with Yegge's characteristic directness and self-awareness. And it describes something real. But as I read it, I kept seeing something he doesn't name, a pattern I've been writing about for months.&lt;/p&gt;
&lt;p&gt;This is the fourth piece in what has become a series on Jevons Paradox and AI economics. The &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;first&lt;/a&gt; traced the paradox through the semiconductor industry. The &lt;a href="https://tinycomputers.io/posts/the-jevons-counter-thesis-why-ai-displacement-scenarios-underweight-demand-expansion.html"&gt;second&lt;/a&gt; argued that AI displacement scenarios systematically undercount demand expansion. The &lt;a href="https://tinycomputers.io/posts/moores-law-for-intelligence-what-happens-when-thinking-gets-cheap.html"&gt;third&lt;/a&gt; explored what happens when the cost of intelligence follows a Moore's Law trajectory. Along the way, I responded to &lt;a href="https://tinycomputers.io/posts/something-big-is-happening-a-critique.html"&gt;Matt Shumer's displacement argument&lt;/a&gt; with the same framework.&lt;/p&gt;
&lt;p&gt;Those pieces all looked at the macro picture: markets expanding, new industries forming, total economic activity growing. Yegge is describing the micro picture. What it actually feels like to be a human worker inside a Jevons expansion. And what he's describing, whether he uses the term or not, is Jevons Paradox operating on human attention.&lt;/p&gt;
&lt;h3&gt;The Jevons Pattern, One More Time&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-vampire-jevons/meunier-descent-of-miners-1882.jpg" alt="Descent of the Miners into the Shaft, an 1882 painting by Constantin Meunier showing coal miners descending into a mine, the human beings at the point of production in the original Jevons cycle" style="max-width: 100%; margin: 0 0 1.5em 0; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;The pattern is simple enough to state in a sentence: when a critical input gets cheaper, demand expands beyond the efficiency gain. Total consumption of the input rises, not falls.&lt;/p&gt;
&lt;p&gt;Coal got cheaper per unit of useful work. Total coal consumption surged as new applications became viable. Transistors got cheaper per unit of compute. Total compute spending grew by orders of magnitude. Bandwidth got cheaper per unit of data. Total data consumption exploded. The per-unit savings are overwhelmed by the explosion in total units demanded.&lt;/p&gt;
&lt;p&gt;In my previous pieces, I applied this at the macro level. Cognitive output gets cheaper through AI. New industries emerge. Demand for cognitive work expands. The economy restructures around abundant, cheap intelligence. That argument is about markets, GDP, and employment categories: the aerial view.&lt;/p&gt;
&lt;p&gt;But Jevons has always had a micro counterpart. When coal got cheaper, individual mines didn't shut down early; they ran harder, longer, extracting more because the economics now justified it. When compute got cheaper, individual developers didn't write less code; they wrote vastly more, because the constraints that had limited what was practical evaporated. The expansion creates pressure at every level of the system, not just at the top.&lt;/p&gt;
&lt;p&gt;The macro story is about new markets forming. The micro story is about what happens to the people at the point of production, the ones whose labor is the input that just got cheaper.&lt;/p&gt;
&lt;h3&gt;What Yegge Is Actually Describing&lt;/h3&gt;
&lt;p&gt;Yegge's framework centers on a value-capture trap. He presents two scenarios:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Scenario A:&lt;/strong&gt; AI makes you 10x more productive. Your company captures the surplus. You now produce 10x the output at the same salary and hours. The company benefits. You burn out.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Scenario B:&lt;/strong&gt; You recognize the \$/hr math. If you were worth \$150/hr before AI and now produce 10x the output, your effective rate should be \$1,500/hr, or equivalently, you should work one-tenth the hours for the same salary. You work three to four hours a day, produce what used to take a full day, and keep your sanity.&lt;/p&gt;
&lt;p&gt;He frames this as a choice between being exploited and being strategic. And he's honest about the difficulty of Scenario B; most people can't negotiate a three-hour workday, most companies won't accept it, and the competitive dynamics push relentlessly toward Scenario A.&lt;/p&gt;
&lt;p&gt;Yegge's most vivid metaphor is that "AI has turned us all into Jeff Bezos." At Amazon, Bezos sat atop a machine that handled volume (logistics, warehousing, customer service, shipping) while he focused exclusively on high-leverage decisions. AI does the same thing for individual workers. It absorbs the volume work (the boilerplate code, the routine analysis, the standard responses) and leaves you with a residue of pure judgment calls. Every decision is consequential. Every hour is cognitively expensive.&lt;/p&gt;
&lt;p&gt;He also has an important moment of self-awareness. Yegge acknowledges that his own experience (forty years of engineering, unlimited AI tokens, deep familiarity with the tools) represents "unrealistic beauty standards" for the average developer. He's the equivalent of the fitness influencer whose workout routine is their full-time job. Most people don't have his context, his autonomy, or his leverage to negotiate Scenario B.&lt;/p&gt;
&lt;p&gt;And he identifies a crucial accelerant: the startup gold rush. AI has made it cheap enough to launch a company that "a million founders are chasing the same six ideas." This intensifies competition, which intensifies the pressure to push the output dial higher, which feeds the vampire.&lt;/p&gt;
&lt;h3&gt;The Jevons Connection&lt;/h3&gt;
&lt;p&gt;Here's what Yegge is describing in Jevons terms.&lt;/p&gt;
&lt;p&gt;AI makes cognitive output dramatically cheaper. Jevons predicts that demand won't fall in response; it will increase. That's exactly what happens. Companies don't say "same output, fewer hours." They say "10x the output, same hours." The efficiency gain doesn't reduce consumption of the input. It increases consumption. This is the paradox, and it is playing out precisely as the model predicts.&lt;/p&gt;
&lt;p&gt;But there's something different about this Jevons cycle, something that doesn't have a precedent in the historical cases.&lt;/p&gt;
&lt;p&gt;Coal doesn't get tired. Transistors don't burn out. Bandwidth doesn't need a nap. Every prior Jevons cycle involved an inert input. You could mine more coal, fabricate more chips, lay more fiber. When demand expanded, supply expanded to meet it, and the system found a new equilibrium at higher volume. The input didn't resist. It didn't have a biological ceiling.&lt;/p&gt;
&lt;p&gt;Human attention does.&lt;/p&gt;
&lt;p&gt;AI creates a concentration effect that Yegge describes precisely: it absorbs high-volume, routine work and leaves humans with a residue of pure judgment. The judgment work is, by definition, the most cognitively expensive kind of work, the kind that requires deep focus, contextual understanding, and the willingness to be wrong. And demand for this judgment work expands Jevons-style as AI makes the overall process cheaper. More projects get launched. More code gets written. More decisions need to be made. The volume of judgment calls scales with the volume of output, even as AI handles everything else.&lt;/p&gt;
&lt;p&gt;The problem is that the biological supply of deep, focused judgment is fixed. The deep work literature (Cal Newport and others have documented this extensively) converges on roughly three to four hours per day as the upper bound for sustained, cognitively demanding work. This isn't a cultural preference or a lifestyle choice. It's a constraint imposed by neurobiology. Attention is a depletable resource that recovers on a fixed biological schedule.&lt;/p&gt;
&lt;p&gt;This is the first Jevons cycle where expanding demand hits a hard biological ceiling on the input.&lt;/p&gt;
&lt;p&gt;Yegge's startup observation is also a Jevons phenomenon. AI made starting a company cheaper, so the number of startups exploded. More startups means more competition. More competition means more pressure to maximize output per person. The expansion creates its own acceleration, a feedback loop where cheaper cognitive output produces more ventures, which produce more demand for cognitive output, which increases the pressure on the humans in the loop.&lt;/p&gt;
&lt;p&gt;And the "unrealistic beauty standards" problem has a Jevons name too: it's the efficiency benchmark effect. In every Jevons cycle, the most efficient user of the cheaper input sets the competitive pace for everyone else. The factory that adopted steam power first forced every competitor to adopt it or die. The company that adopted AI first forces every competitor to match its output-per-employee or lose. Yegge, with his forty years and unlimited tokens, is the equivalent of the first factory with a Watt engine. His output level becomes the standard against which everyone is measured, even though most people can't replicate his efficiency.&lt;/p&gt;
&lt;h3&gt;Where the Ceiling Matters&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-vampire-jevons/coal-thrusters-trapper-1854.jpg" alt="Two coal thrusters and a trapper in a British coal mine, from J. C. Cobden's White Slaves of England, 1854, the human cost of running an input at maximum extraction" style="float: left; max-width: 40%; margin: 0 1.5em 1em 0; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;In every prior Jevons cycle, the resolution was supply expansion. Coal demand surged; mine more coal. Compute demand surged; fabricate more chips. Bandwidth demand surged; lay more fiber. The system found equilibrium at higher volume because the input could scale.&lt;/p&gt;
&lt;p&gt;Human cognitive capacity doesn't scale. You can't mine more judgment. You can't fabricate more attention. The three-to-four-hour ceiling on deep work isn't going to move because a company's OKRs demand it.&lt;/p&gt;
&lt;p&gt;This means a Jevons expansion in demand for human judgment has to resolve differently than prior cycles. There are really only three paths:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Better tooling that reduces the judgment burden.&lt;/strong&gt; AI gets good enough to handle more decisions autonomously, pushing the human-in-the-loop threshold higher. The frontier of what requires human judgment retreats as AI capability advances. This is already happening; the boundary between "AI can handle this" and "a human needs to decide" is moving rapidly. But it's not moving fast enough to outpace the demand expansion, which is why Yegge's burnout observation is accurate right now even if the long-term trajectory favors less human involvement.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Organizational restructuring.&lt;/strong&gt; More people, fewer high-stakes decisions each. Instead of one developer making judgment calls on 10x the output, you have three developers each handling a manageable portion. This is the "hire more" response, and it pushes back against the cost-reduction motive that drives Scenario A. Companies that pursue this path may produce better outcomes but at higher cost, which competitive dynamics tend to punish.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cultural pushback.&lt;/strong&gt; Yegge's \$/hr formula. Workers internalize the fixed-supply economics of their own attention, price it accordingly, and refuse to let demand expansion drain it below sustainable levels. This is individually rational but collectively difficult; it requires either enough leverage to negotiate, or enough cultural shift to change expectations.&lt;/p&gt;
&lt;p&gt;Yegge's \$/hr formula is, in Jevons terms, an attempt to set equilibrium for a fixed-supply resource. It is the cognitive equivalent of OPEC production quotas, an effort to prevent the price of a scarce input from being driven to zero by unconstrained demand. And like OPEC quotas, it works only if enough participants enforce it.&lt;/p&gt;
&lt;h3&gt;What This Means for the Macro Picture&lt;/h3&gt;
&lt;p&gt;I want to be honest about what Yegge's observation adds to the framework I've been building.&lt;/p&gt;
&lt;p&gt;My previous pieces argued that when cognitive output gets cheaper, demand expansion will create new economic activity that exceeds the displacement. I stand by that argument. But I underweighted the human-in-the-loop constraint. The demand expansion is real: new markets form, new companies launch, total economic activity grows. But every unit of that expanded activity still requires some quantum of human judgment, and that judgment runs on biological hardware with a fixed daily capacity.&lt;/p&gt;
&lt;p&gt;This doesn't invalidate the macro Jevons argument. Demand will expand. New industries will form. Total employment will restructure, not collapse. But the human attention constraint acts as a speed governor on the expansion. The economy can't scale cognitive output infinitely by just pushing the existing workforce harder, because the existing workforce has a biological ceiling on the input that matters most.&lt;/p&gt;
&lt;p&gt;This argues for Yegge's three-to-four-hour workday not as a lifestyle aspiration but as something closer to an economic inevitability, the natural equilibrium point for a Jevons cycle operating on a fixed-supply input. When demand for an input exceeds the maximum sustainable rate of supply, the system must either find a substitute (AI handling more decisions autonomously), expand the supplier base (more workers, shorter hours each), or accept a constrained equilibrium (the three-hour workday). Some combination of all three is likely.&lt;/p&gt;
&lt;p&gt;The interesting implication is that the Jevons expansion and the burnout crisis are not contradictory phenomena. They're the same phenomenon viewed from different vantage points. The macro analyst sees demand expanding and new economic activity forming. The individual worker sees an unsustainable cognitive load. Both are correct. They're describing different aspects of the same system adjusting to a radically cheaper input.&lt;/p&gt;
&lt;h3&gt;The Vampire and the Paradox&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-vampire-jevons/nosferatu-count-orlok-1922.jpg" alt="Max Schreck as Count Orlok in Nosferatu, 1922, the vampire as an image of relentless, impersonal extraction" style="float: right; max-width: 300px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Matt Shumer &lt;a href="https://tinycomputers.io/posts/something-big-is-happening-a-critique.html"&gt;worries about displacement&lt;/a&gt;, losing your job to AI. Steve Yegge worries about what happens to the people who aren't displaced, who keep their jobs but get vampired. Both are describing real phenomena. Neither is the whole picture.&lt;/p&gt;
&lt;p&gt;The Jevons framework encompasses both. Demand expansion creates new work, answering Shumer's displacement concern: the economy doesn't contract, it restructures. But the expansion concentrates cognitive load on the humans who remain in the loop, confirming Yegge's burnout observation, because the one input AI can't replace is the one input that can't scale.&lt;/p&gt;
&lt;p&gt;Shumer's error is modeling only the displacement side. Yegge's error is modeling only the extraction side. The full picture includes both: an economy producing vastly more cognitive output, creating genuinely new economic activity, while simultaneously pushing the humans at the center of it toward a biological wall.&lt;/p&gt;
&lt;p&gt;The vampire is real. It's also, like every Jevons cycle, a signal that something genuinely new is being created, that demand is expanding into territory that didn't exist before. The burnout isn't incidental to the expansion. It's a symptom of it. And like every prior Jevons cycle, the system will find an equilibrium, not because anyone plans it, but because a fixed-supply input eventually forces one. The question is how much damage the vampire does before we get there.&lt;/p&gt;</description><category>ai</category><category>burnout</category><category>critique</category><category>demand expansion</category><category>economics</category><category>jevons paradox</category><category>labor</category><category>productivity</category><category>steve yegge</category><category>technology</category><guid>https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox.html</guid><pubDate>Wed, 04 Mar 2026 14:00:00 GMT</pubDate></item></channel></rss>