<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>TinyComputers.io</title><link>https://tinycomputers.io/</link><description>Hands-on hardware projects and deep dives into embedded systems, Z80 retro computing, FPGAs, Rust on microcontrollers, PCB design, 3D printing, and AI on AMD GPUs.</description><atom:link href="https://tinycomputers.io/rss.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 A.C. Jokela 
&lt;!-- div style="width: 100%" --&gt;
&lt;a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"&gt;&lt;img alt="" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" /&gt; Creative Commons Attribution-ShareAlike&lt;/a&gt;&amp;nbsp;|&amp;nbsp;
&lt;!-- /div --&gt;
</copyright><lastBuildDate>Fri, 13 Mar 2026 19:27:46 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Redesigning a PCB with Claude Code and Open-Source EDA Tools (Part 1)</title><link>https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;20 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;div class="sponsor-widget"&gt;
&lt;div class="sponsor-widget-header"&gt;&lt;a href="https://baud.rs/youwpy"&gt;&lt;img src="https://tinycomputers.io/images/pcbway-logo.png" alt="PCBWay" style="height: 22px; vertical-align: middle; margin-right: 8px;"&gt;&lt;/a&gt; Sponsored Hardware&lt;/div&gt;
&lt;p&gt;This project was made possible by &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt;, who sponsored the fabrication of the redesigned GigaShield v0.2 level converter board. PCBWay offers PCB prototyping, assembly, CNC machining, and 3D printing services, from one-off prototypes to production runs. If you have a PCB design ready to go, check them out at &lt;a href="https://baud.rs/youwpy"&gt;pcbway.com&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img id="pcb-top-img" src="https://tinycomputers.io/images/giga-shield/giga-shield-v02-top.png" alt="GigaShield v0.2 PCB top view: routed two-layer board with 9 SN74LVC8T245PW level shifters, generated with Python and autorouted with Freerouting" style="float: right; max-width: 420px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); cursor: zoom-in;"&gt;&lt;/p&gt;
&lt;div id="img-modal" class="modal" onclick="this.style.display='none'"&gt;
&lt;span class="close" onclick="document.getElementById('img-modal').style.display='none'"&gt;×&lt;/span&gt;
&lt;img class="modal-content" id="modal-img"&gt;
&lt;div id="caption"&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;script&gt;
(function() {
    var img = document.getElementById('pcb-top-img');
    var modal = document.getElementById('img-modal');
    var modalImg = document.getElementById('modal-img');
    var caption = document.getElementById('caption');
    img.onclick = function() {
        modal.style.display = 'block';
        modalImg.src = this.src;
        caption.textContent = this.alt;
    };
    document.addEventListener('keydown', function(e) {
        if (e.key === 'Escape' &amp;&amp; modal.style.display === 'block') {
            modal.style.display = 'none';
        }
    });
})();
&lt;/script&gt;

&lt;p&gt;In January, I &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;spent $468 on Fiverr&lt;/a&gt; to have a professional design an &lt;a href="https://baud.rs/poSQeo"&gt;Arduino Giga R1&lt;/a&gt; shield with level shifters. It was a good design. Nine &lt;a href="https://baud.rs/y9JJt9"&gt;TXB0108PW&lt;/a&gt; bidirectional level translators, 72 channels of 3.3V-to-5V shifting, a clean two-layer board ready for fabrication. And then I started testing it with the &lt;a href="https://baud.rs/87wbBL"&gt;RetroShield Z80&lt;/a&gt;, and the auto-sensing level shifters fell apart.&lt;/p&gt;
&lt;p&gt;The TXB0108 is a clever chip. It detects signal direction automatically, so you don't need to tell it whether a pin is input or output. For most applications, that's a feature. For a Z80 bus interface, it's a fatal flaw. During bus cycles, the Z80 tri-states its address and data lines. The outputs go high-impedance. They're not driving high or low, they're floating. The TXB0108 can't determine drive direction from a floating signal. It guesses wrong, or it doesn't drive at all, and the Arduino on the other side sees garbage. The board was blind to half of what the Z80 was doing.&lt;/p&gt;
&lt;p&gt;The fix was clear: replace the TXB0108s with &lt;a href="https://baud.rs/zQqo34"&gt;SN74LVC8T245PW&lt;/a&gt; driven level shifters. The SN74LVC8T245 has an explicit DIR pin: you tell it which direction to translate, and it does exactly that, regardless of whether the signals are being actively driven. No guessing, no ambiguity, deterministic behavior during tri-state periods. The trade-off is that you need a direction control signal for each shifter IC, but that's a small price for reliability.&lt;/p&gt;
&lt;p&gt;What wasn't clear was how to execute the redesign. I could go back to Fiverr for another $400-500. I could spend weeks learning KiCad properly. Or I could try something that had worked surprisingly well on a &lt;a href="https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html"&gt;previous project&lt;/a&gt;: use AI and open-source command-line EDA tools to design the board from a terminal, without ever opening a graphical PCB editor.&lt;/p&gt;
&lt;p&gt;This is part one of a two-part series. This piece covers the design and toolchain: how I used &lt;a href="https://baud.rs/Z6Oq4k"&gt;Claude Code&lt;/a&gt;, the gEDA ecosystem, pcb-rnd, and &lt;a href="https://baud.rs/bdZw62"&gt;Freerouting&lt;/a&gt; to go from a failed design to production-ready Gerber files. Part two will cover the physical boards, assembly, and testing against the Z80.&lt;/p&gt;
&lt;h3&gt;The Toolchain Problem&lt;/h3&gt;
&lt;p&gt;The original Fiverr design was done in KiCad 9.0. My first instinct was to modify it directly: swap the TXB0108 footprints for SN74LVC8T245, update the pin mappings, add the DIR control header, and re-route. But there was a problem. My preferred command-line PCB tool, &lt;a href="https://baud.rs/1J64T5"&gt;pcb-rnd&lt;/a&gt;, is version 3.1.4 on Ubuntu. KiCad 9.0 uses a file format version (20241229) that pcb-rnd's &lt;code&gt;io_kicad&lt;/code&gt; plugin doesn't support. When I tried to open the KiCad PCB:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;unexpected layout version number (perhaps too new)
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Hard stop. No conversion path exists from KiCad 9.0 to pcb-rnd. The formats aren't just different versions. KiCad's S-expression format and pcb-rnd's text-based format are fundamentally different syntaxes.&lt;/p&gt;
&lt;p&gt;I could have started KiCad and used its GUI. But I'd already proven to myself with the &lt;a href="https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html"&gt;dual Z80 RetroShield project&lt;/a&gt; that text-based, AI-assisted PCB workflows are not only possible but sometimes preferable. The gEDA/pcb-rnd file format is human-readable. AI can parse it, reason about it, and generate it. A Python script can manipulate it. You can &lt;code&gt;diff&lt;/code&gt; two boards and see exactly what changed. None of that is true for a graphical-only workflow.&lt;/p&gt;
&lt;p&gt;So the plan became: extract everything useful from the KiCad source files, then rebuild the board from scratch in pcb-rnd's native format using Python. Sound insane? It kind of is. But it worked.&lt;/p&gt;
&lt;h3&gt;Extracting the DNA&lt;/h3&gt;
&lt;p&gt;Even though pcb-rnd couldn't read the KiCad files directly, the KiCad files contained all the design intelligence I needed. Component positions, net assignments, pin mappings, board dimensions. It was all there, just in a format I couldn't import.&lt;/p&gt;
&lt;p&gt;KiCad's CLI tools (&lt;code&gt;kicad-cli&lt;/code&gt;) could export what I needed:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Component positions (X, Y, rotation for each part)&lt;/span&gt;
kicad-cli&lt;span class="w"&gt; &lt;/span&gt;pcb&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;export&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pos&lt;span class="w"&gt; &lt;/span&gt;AlexJ_bz_ArduinoGigaShield.kicad_pcb&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;giga_pos.csv

&lt;span class="c1"&gt;# Netlist connectivity&lt;/span&gt;
kicad-cli&lt;span class="w"&gt; &lt;/span&gt;pcb&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;export&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;ipc2581&lt;span class="w"&gt; &lt;/span&gt;AlexJ_bz_ArduinoGigaShield.kicad_pcb&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;giga_netlist.d356
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The schematic file (&lt;code&gt;AlexJ_bz_ArduinoGigaShield.kicad_sch&lt;/code&gt;) was an S-expression text file I could parse to extract the signal mappings: which Giga pin connects to which 5V header pin through which level shifter channel. This was the most critical piece: getting the net assignments wrong would mean the board physically connects but logically doesn't work.&lt;/p&gt;
&lt;p&gt;This is where Claude Code earned its keep. I described the KiCad schematic structure and asked it to help me parse out the signal mappings. The KiCad schematic uses hierarchical sheets with positional net connections, which isn't the simplest format to work with manually, but straightforward for an AI that can read S-expressions and track net names across sheets. Within an hour, I had a complete mapping of all 72 signal channels across the 9 shifter ICs.&lt;/p&gt;
&lt;h3&gt;Generating the Board with Python&lt;/h3&gt;
&lt;p&gt;With positions and nets extracted, I wrote &lt;code&gt;build_giga_shield.py&lt;/code&gt;, a single Python script that generates the entire pcb-rnd board from scratch. No GUI involved. Every component footprint, every pin, every net connection is defined programmatically.&lt;/p&gt;
&lt;p&gt;The script is structured around four generator functions:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;tssop24_element()&lt;/code&gt;&lt;/strong&gt; generates the SN74LVC8T245PW footprint. TSSOP-24 is a precise geometry: 0.65mm pin pitch, 6.4mm pad-to-pad span, 24 pins. The function calculates pad positions mathematically: 12 pins on the left, 12 on the right, with pin 1 marked as square per convention. Getting the pin numbering right was critical. The SN74LVC8T245's datasheet shows pins 1-12 on the left (DIR, A1-A4, GND, A5-A8, OE#, GND) and pins 13-24 on the right counting bottom-to-top (B8-B5, VCCB, B4-B1, VCCA, VCCA, VCCB).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;pin_header_element()&lt;/code&gt;&lt;/strong&gt; handles through-hole pin headers with rotation support. The Arduino Giga R1 has an unusual form factor: the long pin headers run along the board edges horizontally, not vertically. In the original KiCad design, these were placed with 90-degree or -90-degree rotation. Without matching that rotation, a 26-pin header at y=84mm would extend 63.5mm downward to y=148mm, well past the 90mm board edge. The rotation transform was simple once identified:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;rotate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;px&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;py&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rot&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;px&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;rot&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;px&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;px&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;py&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;smd_0603_element()&lt;/code&gt;&lt;/strong&gt; creates the 0603 footprint shared by all 27 decoupling capacitors and 9 pull-down resistors. Small SMD parts, simple geometry.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;mounting_hole_element()&lt;/code&gt;&lt;/strong&gt; places the four 3.2mm mounting holes that align with the Arduino Giga's standoff positions.&lt;/p&gt;
&lt;p&gt;The coordinate system was the trickiest part. KiCad uses an arbitrary origin; in this design, x=106mm, y=30.5mm. pcb-rnd uses (0,0). Every KiCad coordinate had to be translated:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;KX&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;KY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;106.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;30.5&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;kpos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ky&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kx&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;KX&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;mm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ky&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;KY&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;build_pcb()&lt;/code&gt; function ties everything together: place components, assign nets, build the symbol table, generate the layer stack, and write out a valid pcb-rnd &lt;code&gt;.pcb&lt;/code&gt; file. Running the script produces a complete, unrouted board: components placed, netlist defined, silkscreen text positioned, board outline drawn. Ready for routing.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;$&lt;span class="w"&gt; &lt;/span&gt;python3&lt;span class="w"&gt; &lt;/span&gt;build_giga_shield.py
Generated&lt;span class="w"&gt; &lt;/span&gt;giga_shield.pcb
Board:&lt;span class="w"&gt; &lt;/span&gt;155mm&lt;span class="w"&gt; &lt;/span&gt;x&lt;span class="w"&gt; &lt;/span&gt;90mm
9x&lt;span class="w"&gt; &lt;/span&gt;SN74LVC8T245PW&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;TSSOP-24&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;level&lt;span class="w"&gt; &lt;/span&gt;shifters
DIR&lt;span class="w"&gt; &lt;/span&gt;control&lt;span class="w"&gt; &lt;/span&gt;via&lt;span class="w"&gt; &lt;/span&gt;J11&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;1x10&lt;span class="w"&gt; &lt;/span&gt;header&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;The Format Wars&lt;/h3&gt;
&lt;p&gt;Getting pcb-rnd to actually accept the generated file was its own adventure. pcb-rnd's parser is strict about things that look optional in the documentation, and its error messages are sometimes misleading. An error in an Element definition might be reported as a syntax error in the Layer section fifty lines later.&lt;/p&gt;
&lt;p&gt;Three format issues bit me hardest:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The &lt;code&gt;"smd"&lt;/code&gt; flag.&lt;/strong&gt; I initially generated elements with &lt;code&gt;Element["smd" "TSSOP24" "U1" ...]&lt;/code&gt;, which seemed logical for surface-mount parts. pcb-rnd rejected it with "Unknown flag: smd ignored," which cascaded into a complete parse failure. The fix: use an empty string &lt;code&gt;Element["" "TSSOP24" "U1" ...]&lt;/code&gt;. The SMD-ness is implicit from using &lt;code&gt;Pad[]&lt;/code&gt; entries instead of &lt;code&gt;Pin[]&lt;/code&gt; entries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Bare zeros.&lt;/strong&gt; pcb-rnd is inconsistent about whether &lt;code&gt;0&lt;/code&gt; and &lt;code&gt;0nm&lt;/code&gt; are interchangeable. In some contexts, bare &lt;code&gt;0&lt;/code&gt; works fine. In others, it causes a silent parse error that manifests as a syntax error dozens of lines later. The defensive fix: always use &lt;code&gt;0nm&lt;/code&gt;, never bare &lt;code&gt;0&lt;/code&gt;, everywhere.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Missing flags on Layer lines.&lt;/strong&gt; The &lt;code&gt;Line[]&lt;/code&gt; entry inside Layer blocks needs 7 fields, not 6. The seventh is a flags string like &lt;code&gt;"clearline"&lt;/code&gt;. My generator omitted it, producing &lt;code&gt;Line[x1 y1 x2 y2 thickness clearance]&lt;/code&gt;. The parser's error message: &lt;code&gt;syntax error, unexpected ']', expecting INTEGER or STRING&lt;/code&gt;, reported at the layer definition, not at the malformed line.&lt;/p&gt;
&lt;p&gt;I found these bugs using a binary search approach, truncating the file with &lt;code&gt;head -N&lt;/code&gt; and testing each truncation point until I isolated which section introduced the failure. It's crude but effective when error reporting is unhelpful. Claude Code helped enormously here. I'd paste the error and the surrounding file content, and it would spot the structural issue faster than I could.&lt;/p&gt;
&lt;h3&gt;The pcb-rnd Ecosystem&lt;/h3&gt;
&lt;p&gt;For anyone unfamiliar with the tools involved, a brief orientation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;gEDA&lt;/strong&gt; (GNU Electronic Design Automation) is a suite of open-source tools for electronic design. The original project dates to the late 1990s and includes &lt;code&gt;gschem&lt;/code&gt; (schematic capture), &lt;code&gt;pcb&lt;/code&gt; (PCB layout), and various utilities. The file formats are text-based and human-readable, a deliberate design choice that makes them scriptable and version-control-friendly. The original &lt;code&gt;pcb&lt;/code&gt; program is now deprecated.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pcb-rnd&lt;/strong&gt; is the actively maintained successor to gEDA's &lt;code&gt;pcb&lt;/code&gt; program. It reads and writes the same text-based PCB format, but adds modern features: more export formats, better plugin support, and critically for this project, command-line export of Gerber files, PNG renderings, and Specctra DSN files. It runs on Linux (packaged for Ubuntu) but not macOS, which is why I ran it over SSH on a remote machine throughout this project.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Freerouting&lt;/strong&gt; is a Java-based autorouter that speaks the Specctra DSN/SES interchange format. You feed it a board definition with components and nets but no traces, and it computes the copper routing, finding paths for every net while respecting design rules for trace width, clearance, and via placement. It's the open-source standard for PCB autorouting and has been used in production for decades.&lt;/p&gt;
&lt;p&gt;The workflow chains these tools together:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;build_giga_shield&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;giga_shield&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pcb&lt;/span&gt;
                            &lt;span class="err"&gt;↓&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pcb&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rnd&lt;/span&gt; &lt;span class="n"&gt;DSN&lt;/span&gt; &lt;span class="n"&gt;export&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                            &lt;span class="err"&gt;↓&lt;/span&gt;
                     &lt;span class="n"&gt;giga_shield&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dsn&lt;/span&gt;
                            &lt;span class="err"&gt;↓&lt;/span&gt;
                   &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Freerouting&lt;/span&gt; &lt;span class="n"&gt;autorouter&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                            &lt;span class="err"&gt;↓&lt;/span&gt;
                     &lt;span class="n"&gt;giga_shield&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ses&lt;/span&gt;
                            &lt;span class="err"&gt;↓&lt;/span&gt;
              &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pcb&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rnd&lt;/span&gt; &lt;span class="n"&gt;SES&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;Gerber&lt;/span&gt; &lt;span class="n"&gt;export&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                            &lt;span class="err"&gt;↓&lt;/span&gt;
                    &lt;span class="n"&gt;Production&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Every step is a command-line operation. Every intermediate file is text. Every transformation is reproducible. Change a component position in the Python script, re-run the pipeline, get new Gerber files. This is the power of text-based EDA: the entire design is version-controlled, diffable, and automatable.&lt;/p&gt;
&lt;h3&gt;Autorouting: The Machine Does the Tedious Part&lt;/h3&gt;
&lt;p&gt;With the board generated and validated in pcb-rnd, the next step was routing: connecting all 308 nets with actual copper traces across a two-layer board. This is where Freerouting comes in.&lt;/p&gt;
&lt;p&gt;The pipeline starts with exporting the unrouted board to Specctra DSN format. pcb-rnd handles this in batch mode on the remote Linux machine:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;pcb-rnd&lt;span class="w"&gt; &lt;/span&gt;-x&lt;span class="w"&gt; &lt;/span&gt;dsn&lt;span class="w"&gt; &lt;/span&gt;giga_shield.pcb
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The DSN file contains the board geometry, component placements, pad definitions, and netlist, everything the autorouter needs to compute a routing solution. One subtlety I learned the hard way: the DSN's &lt;code&gt;(structure)&lt;/code&gt; section needs explicit &lt;code&gt;(rule)&lt;/code&gt; and &lt;code&gt;(via)&lt;/code&gt; definitions. pcb-rnd's DSN exporter puts the design rules inside the net class section, but Freerouting also expects them in the structure section. Without them, the router can see the nets but can't figure out what trace widths and via sizes are legal, and it silently fails to route most connections. A two-line addition fixed this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;(via pstk_1)
(rule
  (width 0.254)
  (clearance 0.254)
)
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Freerouting itself is a Java application with both GUI and command-line modes. On my machine, I'm running a custom build from source. The current &lt;code&gt;main&lt;/code&gt; branch had a few issues I had to fix (a missing &lt;code&gt;static&lt;/code&gt; on the main method, a null pointer on &lt;code&gt;maxThreads&lt;/code&gt; in the GUI initialization, and a Gradle build compatibility issue). The v1.9 codepath was more reliable for headless routing:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;java&lt;span class="w"&gt; &lt;/span&gt;-jar&lt;span class="w"&gt; &lt;/span&gt;freerouting-1.9.0-executable.jar&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;-de&lt;span class="w"&gt; &lt;/span&gt;giga_shield.dsn&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;-do&lt;span class="w"&gt; &lt;/span&gt;giga_shield.ses
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The autorouter loaded the 308-net board, ran through its passes, and produced a Specctra Session file containing 2911 wire segments and 172 vias. Every net connected. Every design rule satisfied. The routing took about 10 seconds for initial placement followed by optimization passes.&lt;/p&gt;
&lt;video controls autoplay loop muted playsinline style="max-width: 100%; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); margin: 1em 0;"&gt;
  &lt;source src="https://tinycomputers.io/images/giga-shield/routing-traces.mp4" type="video/mp4"&gt;
&lt;/source&gt;&lt;/video&gt;

&lt;p&gt;Importing the routes back into pcb-rnd was the final step. pcb-rnd can import SES files through its batch mode:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;pcb-rnd&lt;span class="w"&gt; &lt;/span&gt;--gui&lt;span class="w"&gt; &lt;/span&gt;hid_batch&lt;span class="w"&gt; &lt;/span&gt;giga_shield.pcb&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;lt;&amp;lt;EOF&lt;/span&gt;
&lt;span class="s"&gt;ImportSes(giga_shield.ses)&lt;/span&gt;
&lt;span class="s"&gt;SaveTo(LayoutAs, giga_shield_routed.pcb)&lt;/span&gt;
&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The result: a fully routed PCB with 2911 traces and 172 vias, ready for Gerber export.&lt;/p&gt;
&lt;h3&gt;Running pcb-rnd Over SSH&lt;/h3&gt;
&lt;p&gt;One of the more unusual aspects of this project is that all pcb-rnd operations happened on a remote Ubuntu 24.04 machine accessed over SSH. pcb-rnd isn't available on macOS via Homebrew (I tried; there's a deprecated &lt;code&gt;pcb&lt;/code&gt; package but no &lt;code&gt;pcb-rnd&lt;/code&gt;), and building from source on macOS looked like a rabbit hole I didn't want to enter.&lt;/p&gt;
&lt;p&gt;The remote workflow was straightforward:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Upload the PCB&lt;/span&gt;
scp&lt;span class="w"&gt; &lt;/span&gt;giga_shield.pcb&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27:/tmp/

&lt;span class="c1"&gt;# Export DSN for routing&lt;/span&gt;
ssh&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pcb-rnd -x dsn /tmp/giga_shield.pcb"&lt;/span&gt;

&lt;span class="c1"&gt;# Import SES and export gerbers&lt;/span&gt;
ssh&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'pcb-rnd --gui hid_batch /tmp/giga_shield.pcb &amp;lt;&amp;lt;EOF&lt;/span&gt;
&lt;span class="s1"&gt;ImportSes(/tmp/giga_shield.ses)&lt;/span&gt;
&lt;span class="s1"&gt;SaveTo(LayoutAs, /tmp/giga_shield_routed.pcb)&lt;/span&gt;
&lt;span class="s1"&gt;EOF'&lt;/span&gt;

&lt;span class="c1"&gt;# Export production files&lt;/span&gt;
ssh&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pcb-rnd -x gerber --gerberfile /tmp/giga_shield /tmp/giga_shield_routed.pcb"&lt;/span&gt;
ssh&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pcb-rnd -x png --dpi 600 --photo-mode --outfile /tmp/top.png /tmp/giga_shield_routed.pcb"&lt;/span&gt;

&lt;span class="c1"&gt;# Download results&lt;/span&gt;
scp&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27:/tmp/giga_shield.*.gbr&lt;span class="w"&gt; &lt;/span&gt;.
scp&lt;span class="w"&gt; &lt;/span&gt;alex@10.1.1.27:/tmp/top.png&lt;span class="w"&gt; &lt;/span&gt;giga_shield_top.png
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It's more keystrokes than clicking Export in a GUI. But it's scriptable, repeatable, and fits into the same terminal where Claude Code is running. When I needed to iterate (move a component, re-route, re-export) I could do it in a single pipeline without switching contexts.&lt;/p&gt;
&lt;h3&gt;Claude Code as a Hardware Design Partner&lt;/h3&gt;
&lt;p&gt;I should be explicit about what Claude Code did and didn't do in this project, because the AI angle is the part people will either find most interesting or most suspicious.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What Claude Code did:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parsed the KiCad schematic to extract the 72-channel signal mapping across 9 level shifter ICs&lt;/li&gt;
&lt;li&gt;Wrote the initial &lt;code&gt;build_giga_shield.py&lt;/code&gt; generator script, including all four footprint generators and the net assignment logic&lt;/li&gt;
&lt;li&gt;Debugged pcb-rnd format issues by analyzing error messages and file structure&lt;/li&gt;
&lt;li&gt;Managed the remote SSH workflow: uploading files, running pcb-rnd commands, downloading results&lt;/li&gt;
&lt;li&gt;Fixed bugs in the Freerouting build (the &lt;code&gt;static main&lt;/code&gt; issue, the null &lt;code&gt;maxThreads&lt;/code&gt;, the Gradle &lt;code&gt;fileMode&lt;/code&gt; API change)&lt;/li&gt;
&lt;li&gt;Handled iterative changes: "move tinycomputers.io down by a millimeter" became an edit to the Python script, a regeneration, a re-import, and a re-export, all executed as a single flow&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;What Claude Code didn't do:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Make architectural decisions. The choice to use SN74LVC8T245 over TXB0108, the DIR control header design, the decision to use pull-down resistors defaulting to A-to-B direction. Those were my decisions based on understanding the Z80 bus protocol; it is also on me for selecting the TXB0108 in the first place&lt;/li&gt;
&lt;li&gt;Verify electrical correctness. I checked the SN74LVC8T245 datasheet pin mapping myself. I verified that OE# tied to GND means always-enabled. I confirmed the 10K pull-down value was appropriate for the DIR pin&lt;/li&gt;
&lt;li&gt;Replace domain knowledge. I knew why the TXB0108 failed during tri-state periods because I understand Z80 bus cycles. Claude Code could have looked up the TXB0108 datasheet, but it couldn't have diagnosed the real-world failure mode from first principles&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The pattern that emerged was: I made design decisions, Claude Code implemented them. I said "the DIR pins need pull-down resistors to default A-to-B direction," Claude Code generated the pcb-rnd Element entries with the correct footprint, position, and net assignments. I said "export gerbers at 600 DPI with photo mode," Claude Code ran the right &lt;code&gt;pcb-rnd&lt;/code&gt; command on the remote machine.&lt;/p&gt;
&lt;p&gt;This is the same division of labor I described in the &lt;a href="https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html"&gt;dual Z80 post&lt;/a&gt;: I bring the domain knowledge, the AI handles the format translation. The text-based nature of gEDA files makes this work. If the design lived in a binary format or required mouse interactions, the AI would have been far less useful.&lt;/p&gt;
&lt;h3&gt;The New Design&lt;/h3&gt;
&lt;p&gt;Here's what the redesigned board looks like:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;v0.1 (Fiverr/TXB0108)&lt;/th&gt;
&lt;th&gt;v0.2 (Claude Code/SN74LVC8T245)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Level Shifter IC&lt;/td&gt;
&lt;td&gt;TXB0108PW (TSSOP-20)&lt;/td&gt;
&lt;td&gt;SN74LVC8T245PW (TSSOP-24)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Direction Control&lt;/td&gt;
&lt;td&gt;Auto-sensing&lt;/td&gt;
&lt;td&gt;Explicit DIR pin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Channels&lt;/td&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shifter ICs&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decoupling Caps&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pull-down Resistors&lt;/td&gt;
&lt;td&gt;9 (OE)&lt;/td&gt;
&lt;td&gt;9 (DIR)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DIR Control Header&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;J11 (1x10)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Board Dimensions&lt;/td&gt;
&lt;td&gt;155mm x 90mm&lt;/td&gt;
&lt;td&gt;155mm x 90mm&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layers&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Design Tool&lt;/td&gt;
&lt;td&gt;KiCad 9.0 (GUI)&lt;/td&gt;
&lt;td&gt;Python + pcb-rnd (CLI)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Design Cost&lt;/td&gt;
&lt;td&gt;$468.63&lt;/td&gt;
&lt;td&gt;$0 (open source tools)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Design Time&lt;/td&gt;
&lt;td&gt;~10 days (outsourced)&lt;/td&gt;
&lt;td&gt;~2 days (with AI)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The J11 header is the key addition. It's a 1x10 pin header with 9 direction control pins (one per shifter IC) and a ground reference. Each DIR pin has a 10K pull-down resistor that defaults the direction to A-to-B (3.3V to 5V). To reverse a shifter's direction (for example, when the Arduino needs to read from the Z80's data bus) you drive the corresponding J11 pin high. The Arduino firmware manages this dynamically during bus cycles.&lt;/p&gt;
&lt;p&gt;The board carries "tinycomputers.io" and "v0.2" on the silkscreen, placed near the bottom edge. Version tracking on the physical board, a lesson learned from the Fiverr experience, where I had to pay $57 for a revision just to add version text to the silkscreen.&lt;/p&gt;
&lt;h3&gt;Generating Production Files&lt;/h3&gt;
&lt;p&gt;With the routed board in hand, the final step was generating files suitable for manufacturing. pcb-rnd handles this with command-line exporters:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Gerber files (9 layers: top/bottom copper, mask, silk, paste, outline, drill, fab)&lt;/span&gt;
pcb-rnd&lt;span class="w"&gt; &lt;/span&gt;-x&lt;span class="w"&gt; &lt;/span&gt;gerber&lt;span class="w"&gt; &lt;/span&gt;--gerberfile&lt;span class="w"&gt; &lt;/span&gt;giga_shield&lt;span class="w"&gt; &lt;/span&gt;giga_shield_routed.pcb

&lt;span class="c1"&gt;# Photo-realistic renderings&lt;/span&gt;
pcb-rnd&lt;span class="w"&gt; &lt;/span&gt;-x&lt;span class="w"&gt; &lt;/span&gt;png&lt;span class="w"&gt; &lt;/span&gt;--dpi&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;600&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--photo-mode&lt;span class="w"&gt; &lt;/span&gt;--outfile&lt;span class="w"&gt; &lt;/span&gt;top.png&lt;span class="w"&gt; &lt;/span&gt;giga_shield_routed.pcb
pcb-rnd&lt;span class="w"&gt; &lt;/span&gt;-x&lt;span class="w"&gt; &lt;/span&gt;png&lt;span class="w"&gt; &lt;/span&gt;--dpi&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;600&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--photo-mode&lt;span class="w"&gt; &lt;/span&gt;--photo-flip-x&lt;span class="w"&gt; &lt;/span&gt;--outfile&lt;span class="w"&gt; &lt;/span&gt;bottom.png&lt;span class="w"&gt; &lt;/span&gt;giga_shield_routed.pcb
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The Gerber output includes everything a fab house needs: top and bottom copper, solder mask, silkscreen, paste stencil, board outline, and drill locations. The photo-realistic PNG renderings use pcb-rnd's built-in renderer: green solder mask, gold-plated pads, white silkscreen text. They're useful for documentation and for sanity-checking the layout before sending it to fabrication.&lt;/p&gt;
&lt;p&gt;The BOM and centroid files were generated separately from the Python script's component data. The centroid file lists every SMD component's X/Y position and rotation, which is essential if you're having the boards assembled by a service rather than hand-soldering.&lt;/p&gt;
&lt;h3&gt;What's Different About This Approach&lt;/h3&gt;
&lt;p&gt;The standard way to design a PCB in 2026 is: open KiCad or Altium, draw a schematic, assign footprints, lay out the board, route traces (manually or with the built-in autorouter), and export Gerbers. It's a visual, interactive process that works well for most people and most projects.&lt;/p&gt;
&lt;p&gt;What I did is different in a few ways that I think are worth noting, even if they're not universally applicable:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The entire design is a Python script.&lt;/strong&gt; &lt;code&gt;build_giga_shield.py&lt;/code&gt; is the single source of truth. Want to move a component? Change a coordinate in the script. Want to add a net? Add it to the dictionary. Want to change every decoupling cap from 0.1uF to 0.22uF? Change a string. Then re-run the pipeline. There's no "did I save the layout?" ambiguity, no undo history to worry about, no risk of accidentally moving something with a stray mouse click.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Every intermediate file is text.&lt;/strong&gt; The &lt;code&gt;.pcb&lt;/code&gt; file, the &lt;code&gt;.dsn&lt;/code&gt; file, the &lt;code&gt;.ses&lt;/code&gt; file. All text, all diffable, all version-controllable. When I moved a component and re-routed, I could &lt;code&gt;git diff&lt;/code&gt; the PCB file and see exactly what changed. Try that with a binary PCB format.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI can participate meaningfully.&lt;/strong&gt; Because the files are text, Claude Code could read them, modify them, and verify them. It could grep for a component reference in the PCB file, find its coordinates, suggest a new position, and make the edit. It could read the Freerouting log and diagnose why routing failed. This level of AI participation simply isn't possible with graphical-only workflows.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The workflow is reproducible.&lt;/strong&gt; I can hand someone the Python script and the Freerouting JAR and they can regenerate the entire board from scratch, on any machine with Python and Java. No KiCad version compatibility issues, no plugin dependencies, no "works on my machine" problems.&lt;/p&gt;
&lt;p&gt;The trade-off is obvious: this approach requires understanding file formats at a level that graphical tools abstract away. If pcb-rnd's parser rejects your file with a misleading error message, you need to debug the file format, not just re-click a button. It's a power-user workflow. But for someone comfortable with text editors and command lines (which describes most of the audience reading a blog called tinycomputers.io), it's a viable alternative.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The Gerber files are ready for fabrication. In part two, I'll cover ordering the boards from &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt;, sourcing the SN74LVC8T245PW and passive components, and the moment of truth: plugging the RetroShield Z80 into the new shield and seeing if the Arduino can finally see the Z80's bus cycles clearly.&lt;/p&gt;
&lt;p&gt;I'll also compare the v0.2 board side-by-side with the original Fiverr v0.1 board: the TXB0108 auto-sensing design versus the SN74LVC8T245 driven design. Same board dimensions, same connector layout, fundamentally different level-shifting approach. The comparison should be instructive for anyone choosing between auto-sensing and driven level translators for bus interfaces.&lt;/p&gt;
&lt;p&gt;The Python build script, pcb-rnd source files, Gerber outputs, and all helper scripts are open source:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/pOawfA"&gt;giga-shield&lt;/a&gt;&lt;/strong&gt; — Complete design files, build pipeline, and production outputs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;This is part one of a two-part series. Part two will cover fabrication, assembly, and testing.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Previous posts in this series: &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;Fiverr PCB Design ($468)&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html"&gt;Dual Z80 RetroShield&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/cpm-on-arduino-giga-r1-wifi.html"&gt;CP/M on the Giga R1&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/zork-on-retroshield-z80-arduino-giga.html"&gt;Zork on the Giga&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description><category>ai</category><category>arduino</category><category>arduino giga</category><category>claude code</category><category>freerouting</category><category>geda</category><category>hardware</category><category>level shifter</category><category>open-source</category><category>pcb design</category><category>pcb-rnd</category><category>retroshield</category><category>z80</category><guid>https://tinycomputers.io/posts/redesigning-a-pcb-with-claude-code-and-open-source-eda-part-1.html</guid><pubDate>Fri, 13 Mar 2026 16:00:00 GMT</pubDate></item><item><title>The Real Cost of Running Qwen TTS Locally — Three Machines Compared</title><link>https://tinycomputers.io/posts/the-real-cost-of-running-qwen-tts-locally-three-machines-compared.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-real-cost-of-running-qwen-tts-locally-three-machines-compared_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;17 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/qwen-tts-benchmark/p40-server-shop.jpg" alt="The Tesla P40 server standing on its side in an unheated Minnesota shop building — one of three machines benchmarked for local TTS generation" style="float: right; max-width: 40%; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Every post on this site has an audio version. A small player at the top, a few minutes of narration, generated entirely on local hardware. No cloud API, no per-character fees, no data leaving the network. I wrote about &lt;a href="https://tinycomputers.io/posts/qwen-tts-on-amd-strix-halo.html"&gt;setting up the pipeline on AMD Strix Halo&lt;/a&gt; earlier this year, and the system has been running in production since — generating narrations for new posts, regenerating old ones when I revise them, and occasionally processing long-form content that would cost real money through Google Cloud TTS or ElevenLabs.&lt;/p&gt;
&lt;p&gt;But I now have three machines capable of running Qwen3-TTS, and they could not be more different from each other. An Apple M3 Max laptop. An AMD Ryzen AI MAX+ 395 mini desktop with integrated Radeon graphics. And a &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;four-GPU Tesla P40 server&lt;/a&gt; built from decade-old enterprise hardware bought on eBay. Three different silicon vendors, three different compute backends — MPS, ROCm, and CUDA — running the same model on the same text.&lt;/p&gt;
&lt;p&gt;The question I wanted to answer is simple: how do they actually compare? Not on paper. Not in theoretical FLOPS. In wall-clock time, generating real audio from a real blog post.&lt;/p&gt;
&lt;p&gt;The answer turned out to be more interesting than I expected, because the numbers tell a story about hardware architecture that raw specifications completely miss.&lt;/p&gt;
&lt;h3&gt;The Setup&lt;/h3&gt;
&lt;p&gt;The model is &lt;a href="https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice"&gt;Qwen3-TTS-12Hz-1.7B-CustomVoice&lt;/a&gt;, a 1.7 billion parameter autoregressive text-to-speech model from Alibaba's Qwen team. It generates natural-sounding speech with multiple speaker voices. I use the Eric voice for all blog narrations — clear, professional, well-paced for technical content.&lt;/p&gt;
&lt;p&gt;The three machines:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Apple M3 Max&lt;/strong&gt; — a &lt;a href="https://amzn.to/4rwlTa6"&gt;MacBook Pro&lt;/a&gt; with Apple's M3 Max chip. 14 CPU cores, 30 GPU cores, 64GB unified memory. The GPU runs through PyTorch's MPS (Metal Performance Shaders) backend. This is my daily driver laptop, and it generates TTS when I am writing and editing posts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AMD Radeon 8060S&lt;/strong&gt; — a Bosgame M5 mini desktop running &lt;a href="https://amzn.to/4bv5CMG"&gt;AMD's Ryzen AI MAX+ 395&lt;/a&gt;. This is a Strix Halo APU with integrated RDNA 3.5 graphics — not a discrete GPU. It shares 128GB of DDR5 system memory with the CPU, with roughly 96GB addressable as VRAM. The GPU runs through ROCm 7.2 with PyTorch 2.9.1. The gfx1151 architecture requires specific PyTorch wheels from AMD's pre-release index and several environment variable overrides to function. I wrote a &lt;a href="https://tinycomputers.io/posts/qwen-tts-on-amd-strix-halo.html"&gt;full setup guide&lt;/a&gt; for this machine.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NVIDIA Tesla P40&lt;/strong&gt; — a 2U rack-mount server with four &lt;a href="https://www.ebay.com/itm/306087510352?_skw=nvidia+tesla+p40+24gb+gpu&amp;amp;epid=27032254618&amp;amp;itmmeta=01KKJEGQKSK110HNM6214EB0TT&amp;amp;hash=item47443cc150:g:qAwAAOSwy0toUHXh&amp;amp;itmprp=enc%3AAQALAAABAGfYFPkwiKCW4ZNSs2u11xAq6UjArKrgnuEyMVTZhAZhOSUGYags6TsDJvvCEOa51UH2r%2BRe%2F182ah6rgiTIAIRULQNEL9rbiinCXMor%2FBNNZk0GaNKqTWkq9pLWGoRBM8NL%2BjC1aSA63XPe4YsFHjQkb%2Fmup21S3UM7oqwBrW%2BHep1E07lnrt2vzkljSA4xg7SnrA%2BFDtOdqvDwO4tpgB0t%2BtCv9%2BlXoh%2BeoEgpJqXgaaM0ad48OfmgKB13PF9RIPXLNI6z4SjV2O%2FXOk6nYPyD9Eg5wbzdmsXfNRhwitz7HEZ1bTRUnRmvKzQrw4B3r3LAag5f8%2B8CcCWfCRAkkG8%3D%7Ctkp%3ABk9SR4j6ws6cZw&amp;amp;mkcid=1&amp;amp;mkrid=711-53200-19255-0&amp;amp;siteid=0&amp;amp;campid=5338960379&amp;amp;customid=&amp;amp;toolid=10001&amp;amp;mkevt=1"&gt;Tesla P40 GPUs&lt;/a&gt;, each with 24GB of GDDR5X. Pascal architecture from 2016. Compute capability 6.1. No Tensor Cores, no native bfloat16 support. The benchmark uses a single P40, since Qwen TTS runs on one GPU. This machine lives in an unheated shop building in Minnesota and &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;screams through the winter&lt;/a&gt; when the BMC misinterprets sub-zero ambient temperatures as a hardware malfunction.&lt;/p&gt;
&lt;p&gt;All three machines run the same model checkpoint, the same text input, and the same speaker voice. The only differences are the silicon and the compute backend.&lt;/p&gt;
&lt;h3&gt;The Benchmark&lt;/h3&gt;
&lt;p&gt;I used a standardized 2,411-character passage — five paragraphs on the Jevons Paradox, dense enough to exercise the model's prosody and pacing on real written content. Each machine ran three consecutive generations from the same loaded model, producing roughly three minutes of audio per run. The first run includes kernel compilation and cache warmup; subsequent runs reflect steady-state performance.&lt;/p&gt;
&lt;p&gt;The metric that matters is Real-Time Factor (RTF): how many seconds of wall-clock time it takes to generate one second of audio. An RTF of 1.0 means the model generates audio at exactly real-time speed. Below 1.0 is faster than real-time. Above 1.0 means you are waiting.&lt;/p&gt;
&lt;h4&gt;Individual Runs&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Apple M3 Max (MPS)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Run&lt;/th&gt;
&lt;th&gt;Generation Time&lt;/th&gt;
&lt;th&gt;Audio Length&lt;/th&gt;
&lt;th&gt;RTF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;698.5s&lt;/td&gt;
&lt;td&gt;197.7s&lt;/td&gt;
&lt;td&gt;3.53&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;533.1s&lt;/td&gt;
&lt;td&gt;184.2s&lt;/td&gt;
&lt;td&gt;2.89&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;447.8s&lt;/td&gt;
&lt;td&gt;179.2s&lt;/td&gt;
&lt;td&gt;2.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Average&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;559.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;187.0s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.97&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;AMD Radeon 8060S (ROCm)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Run&lt;/th&gt;
&lt;th&gt;Generation Time&lt;/th&gt;
&lt;th&gt;Audio Length&lt;/th&gt;
&lt;th&gt;RTF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;729.2s&lt;/td&gt;
&lt;td&gt;173.6s&lt;/td&gt;
&lt;td&gt;4.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;460.0s&lt;/td&gt;
&lt;td&gt;204.8s&lt;/td&gt;
&lt;td&gt;2.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;548.2s&lt;/td&gt;
&lt;td&gt;214.2s&lt;/td&gt;
&lt;td&gt;2.56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Average&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;579.1s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;197.5s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.00&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;NVIDIA Tesla P40 (CUDA)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Run&lt;/th&gt;
&lt;th&gt;Generation Time&lt;/th&gt;
&lt;th&gt;Audio Length&lt;/th&gt;
&lt;th&gt;RTF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1511.4s&lt;/td&gt;
&lt;td&gt;204.1s&lt;/td&gt;
&lt;td&gt;7.41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1225.7s&lt;/td&gt;
&lt;td&gt;171.6s&lt;/td&gt;
&lt;td&gt;7.14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1537.2s&lt;/td&gt;
&lt;td&gt;206.7s&lt;/td&gt;
&lt;td&gt;7.44&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Average&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1424.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;194.1s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.33&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h4&gt;Summary&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Machine&lt;/th&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;Avg RTF&lt;/th&gt;
&lt;th&gt;Best RTF&lt;/th&gt;
&lt;th&gt;Avg Gen Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MacBook Pro&lt;/td&gt;
&lt;td&gt;M3 Max (MPS)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.97&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.50&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;559.8s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bosgame M5&lt;/td&gt;
&lt;td&gt;Radeon 8060S (ROCm)&lt;/td&gt;
&lt;td&gt;3.00&lt;/td&gt;
&lt;td&gt;2.25&lt;/td&gt;
&lt;td&gt;579.1s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Penguin 2U&lt;/td&gt;
&lt;td&gt;Tesla P40 (CUDA)&lt;/td&gt;
&lt;td&gt;7.33&lt;/td&gt;
&lt;td&gt;7.14&lt;/td&gt;
&lt;td&gt;1424.8s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;What the Numbers Mean&lt;/h3&gt;
&lt;p&gt;The headline result is that the M3 Max and Radeon 8060S are essentially tied, and the Tesla P40 is roughly 2.4 times slower than both. But that summary hides the interesting details.&lt;/p&gt;
&lt;h4&gt;The Warmup Effect Is Massive&lt;/h4&gt;
&lt;p&gt;On both the M3 Max and the Radeon 8060S, the first run is dramatically slower than subsequent runs. The M3 Max goes from RTF 3.53 on run 1 to RTF 2.50 on run 3 — a 29% improvement. The AMD shows an even larger swing: RTF 4.20 on run 1 dropping to RTF 2.25 on run 2, a 46% improvement.&lt;/p&gt;
&lt;p&gt;This is kernel compilation. Both MPS and ROCm compile GPU kernels on first use and cache them for subsequent calls. The Qwen TTS model hits a wide variety of kernel shapes during autoregressive generation — different sequence lengths, different attention patterns — and each new shape triggers a compilation on the first encounter. By run 2, most of the common shapes are cached, and performance stabilizes.&lt;/p&gt;
&lt;p&gt;The P40 shows almost no warmup effect. RTF 7.41 on run 1, 7.14 on run 2, 7.44 on run 3. CUDA's kernel compilation is faster and more mature, so the overhead is absorbed within the first few seconds rather than spread across the entire run. But this maturity does not translate into faster inference — CUDA compiles faster, but the P40's hardware is fundamentally slower at the operations this model requires.&lt;/p&gt;
&lt;p&gt;This has a practical implication that matters: &lt;strong&gt;short benchmarks on MPS and ROCm are misleading.&lt;/strong&gt; I initially ran a quick 276-character test on all three machines before doing the full benchmark. The short test showed the AMD at RTF 9.20 — almost identical to the P40's RTF 10.01, and far behind the M3 Max's RTF 2.84. That result nearly led me to conclude the AMD was performing as poorly as decade-old hardware. The longer benchmark, with its warmup effect amortized across more generation, revealed the truth: the AMD is just as fast as the M3 Max once the kernels are cached. If I had stopped at the short test, I would have drawn exactly the wrong conclusion.&lt;/p&gt;
&lt;h4&gt;Why the P40 Is So Slow&lt;/h4&gt;
&lt;p&gt;The Tesla P40 is a Pascal-generation GPU from 2016. It has 3,840 CUDA cores and 24GB of GDDR5X memory. On paper, it should be competitive — 12 TFLOPS of FP32 compute is not trivial. And for LLM inference through Ollama, the P40 &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;performs remarkably well&lt;/a&gt;, outperforming quad T4 instances on models up to 8B parameters.&lt;/p&gt;
&lt;p&gt;TTS is a different workload. Qwen3-TTS is an autoregressive transformer that generates audio tokens one at a time, each conditioned on all previous tokens. This means the inference is heavily memory-bandwidth bound during the decoding phase, and compute-bound during the attention and feedforward passes. The model is distributed in bfloat16 precision, which the P40 cannot compute natively — Pascal predates bfloat16 support entirely. PyTorch silently promotes bf16 operations to fp32 on the P40, roughly doubling the computation per operation and halving the effective throughput.&lt;/p&gt;
&lt;p&gt;The P40 also lacks the SDPA (Scaled Dot-Product Attention) hardware acceleration that newer architectures provide. On the M3 Max, MPS routes attention through Metal's optimized primitives. On the AMD, ROCm's AOTriton provides experimental flash attention support. On the P40, attention runs through standard CUDA kernels without any of these accelerations. For a model that generates thousands of autoregressive steps per audio clip, each involving a full attention pass over the growing sequence, this compounds dramatically.&lt;/p&gt;
&lt;p&gt;The P40 is not bad hardware. It is excellent hardware for the workloads it was designed for — batch inference on quantized LLMs where its 24GB of VRAM per card creates a memory advantage. But autoregressive TTS in bfloat16 hits every one of its architectural weaknesses simultaneously.&lt;/p&gt;
&lt;h4&gt;Unified Memory Wins This Workload&lt;/h4&gt;
&lt;p&gt;Both the M3 Max and the Radeon 8060S use unified memory architectures — the CPU and GPU share the same physical memory pool. The M3 Max has 64GB of unified LPDDR5. The Radeon 8060S shares 128GB of DDR5 with the CPU, with roughly 96GB addressable as VRAM.&lt;/p&gt;
&lt;p&gt;For a 1.7B parameter model in bf16, the weights occupy roughly 3.4GB. The model fits comfortably on all three machines. But the autoregressive generation pattern creates a stream of intermediate activations — KV cache entries, attention scores, feedforward intermediates — that grow with the sequence length. On a unified memory architecture, these intermediates exist in the same memory space as the model weights, avoiding any PCIe transfer overhead. On the P40, every interaction between CPU and GPU crosses a PCIe 3.0 bus.&lt;/p&gt;
&lt;p&gt;For LLM inference, where the bottleneck is token generation throughput and the KV cache fits in VRAM, the P40's discrete memory is fine. For TTS, where the model generates hundreds of audio tokens per second of speech and the attention window grows continuously, the memory access pattern favors unified architectures.&lt;/p&gt;
&lt;p&gt;This is not a universal statement about unified versus discrete memory. A modern discrete GPU with HBM2e or GDDR6X and PCIe 4.0 or 5.0 would likely outperform both the M3 Max and the Radeon 8060S on this workload. The P40's problem is not that its memory is discrete — it is that its memory is slow and its bus is narrow by 2026 standards.&lt;/p&gt;
&lt;h3&gt;The Model Architecture Question&lt;/h3&gt;
&lt;p&gt;While benchmarking Qwen TTS, I also ran a quick comparison with &lt;a href="https://huggingface.co/SWivid/F5-TTS"&gt;F5-TTS&lt;/a&gt; on the AMD machine to sanity-check the results. F5-TTS is a flow-matching model — fundamentally different from Qwen's autoregressive approach. Where Qwen generates audio tokens sequentially, each conditioned on all previous tokens, F5 generates audio in parallel through an iterative refinement process.&lt;/p&gt;
&lt;p&gt;The difference is stark. On the same Radeon 8060S, the same text, the same hardware:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Generation Time&lt;/th&gt;
&lt;th&gt;Audio Length&lt;/th&gt;
&lt;th&gt;RTF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3-TTS&lt;/td&gt;
&lt;td&gt;579.1s (avg)&lt;/td&gt;
&lt;td&gt;197.5s&lt;/td&gt;
&lt;td&gt;3.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;F5-TTS&lt;/td&gt;
&lt;td&gt;17.4s&lt;/td&gt;
&lt;td&gt;27.2s&lt;/td&gt;
&lt;td&gt;0.64&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;F5-TTS is faster than real-time. Qwen3-TTS takes three times longer than the audio it produces. On normalized terms, F5 is roughly five times faster than Qwen at steady-state — and the gap widens on shorter content where Qwen's warmup overhead is proportionally larger.&lt;/p&gt;
&lt;p&gt;This is not an apples-to-apples quality comparison. Qwen3-TTS generally produces more natural prosody, better handling of complex sentence structures, and more consistent speaker identity across long passages. F5-TTS is excellent but can occasionally drift in voice character or pacing on very long content. For blog narration, both are well above the threshold of "good enough," and the quality difference is smaller than you might expect given the architectural gap.&lt;/p&gt;
&lt;p&gt;The point is that hardware is only half the story. The choice of model architecture can matter more than the choice of GPU. A flow-matching model on integrated AMD graphics outperforms an autoregressive model on Apple's best laptop silicon by a wide margin. If generation speed is the constraint, switching models gains more than switching hardware.&lt;/p&gt;
&lt;h3&gt;What This Costs in Practice&lt;/h3&gt;
&lt;p&gt;The abstract benchmark numbers translate into concrete time and electricity costs when you are generating audio for a library of blog posts.&lt;/p&gt;
&lt;p&gt;A typical TinyComputers post runs 3,000 to 5,000 words, producing 15 to 25 minutes of narrated audio. At steady-state RTF:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Machine&lt;/th&gt;
&lt;th&gt;15 min audio&lt;/th&gt;
&lt;th&gt;25 min audio&lt;/th&gt;
&lt;th&gt;System Power&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;M3 Max&lt;/td&gt;
&lt;td&gt;~38 min&lt;/td&gt;
&lt;td&gt;~63 min&lt;/td&gt;
&lt;td&gt;~50W&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Radeon 8060S&lt;/td&gt;
&lt;td&gt;~38 min&lt;/td&gt;
&lt;td&gt;~63 min&lt;/td&gt;
&lt;td&gt;~100W&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tesla P40&lt;/td&gt;
&lt;td&gt;~110 min&lt;/td&gt;
&lt;td&gt;~183 min&lt;/td&gt;
&lt;td&gt;~400W&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The M3 Max and Radeon 8060S are tied on generation time, but the M3 Max draws roughly half the system power. For a single post, the electricity cost difference is negligible — a fraction of a cent. For batch processing a backlog of thirty posts, the M3 Max costs about \$0.18 in electricity versus \$0.36 for the AMD and \$3.50 for the P40.&lt;/p&gt;
&lt;p&gt;None of these numbers are alarming. Even the P40, at nearly two and a half hours per post and 400 watts from the wall, costs under fifteen cents in electricity per narration at Minnesota residential rates. The equivalent Google Cloud TTS job would cost \$4 to \$16 per post depending on the voice quality tier.&lt;/p&gt;
&lt;p&gt;To put cloud costs in perspective: I recently ran a fiction novel through Google's Chirp3-HD voice — 82,000 words, roughly 500,000 characters of text plus SSML markup. The bill came to \$17.25 at Google's rate of \$30 per million characters. That is not unreasonable for a one-off project, but it adds up quickly if you are generating audio regularly. The entire library of TinyComputers narrations — dozens of posts, hours of audio — has cost me nothing beyond the electricity to run the machines I already own. The economics of local TTS are favorable on every machine in the comparison.&lt;/p&gt;
&lt;p&gt;The real cost is time. If I am generating audio for a single new post, I start it on whichever machine is idle and check back in an hour. If I am regenerating audio for twenty posts after changing the speaker voice or updating the pipeline, the M3 Max or AMD will finish overnight. The P40 would take most of a weekend.&lt;/p&gt;
&lt;h3&gt;The Right Machine for the Job&lt;/h3&gt;
&lt;p&gt;After running these benchmarks, my workflow has shifted. The M3 Max is the default for new post narration — it is fast, quiet, and I am usually sitting in front of it when I finish writing. The AMD handles batch jobs and overnight processing, where its slightly higher power draw does not matter and its equivalent speed makes it interchangeable with the Mac. The P40 server is reserved for what it does best: &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;running large language models&lt;/a&gt; through Ollama, where its 96GB of aggregate VRAM gives it an advantage that neither the Mac nor the AMD can match.&lt;/p&gt;
&lt;p&gt;The P40 can still generate TTS in a pinch, and it does — when both other machines are occupied, I will queue a job on the P40 and accept the longer wait. But for a workload that is inherently autoregressive, memory-bandwidth sensitive, and dependent on bf16 precision, a ten-year-old Pascal GPU is the wrong tool.&lt;/p&gt;
&lt;p&gt;What surprised me most is how well the AMD performs. The Radeon 8060S is an integrated GPU sharing system memory with the CPU. It has no HBM, no dedicated VRAM, no NVLink. Its ROCm software stack requires environment variable hacks, pre-release PyTorch wheels, and a GFX version override to function at all. And yet, once the kernels warm up, it matches Apple's best laptop silicon stride for stride. The raw hardware is there — 40 RDNA 3.5 compute units with access to a deep pool of DDR5 memory. The software just needs to get out of the way, and on run 2 and beyond, it does.&lt;/p&gt;
&lt;h3&gt;Lessons&lt;/h3&gt;
&lt;p&gt;Three takeaways from this exercise that generalize beyond TTS:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Short benchmarks lie.&lt;/strong&gt; Kernel compilation overhead on MPS and ROCm is large enough to dominate a short test. If you are evaluating a new model on non-CUDA hardware, run it at least twice before drawing conclusions. The first run is measuring the software stack, not the hardware.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Architecture matters more than clock speed.&lt;/strong&gt; The P40 has more raw FLOPS than the Radeon 8060S. It does not matter. The P40 lacks native bf16, lacks efficient attention primitives, and sits behind a PCIe 3.0 bus. The Radeon has all three — and ties a chip designed by Apple's custom silicon team. For autoregressive models, the architectural fit between model and hardware dominates everything else.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Model choice can outweigh hardware choice.&lt;/strong&gt; F5-TTS running on the weakest GPU in this comparison is five times faster than Qwen3-TTS running on the strongest. If your constraint is generation speed and you can accept a modest quality trade-off, switching to a flow-matching architecture gains more than any hardware upgrade short of a data center GPU.&lt;/p&gt;
&lt;p&gt;The audio player at the top of each post on this site represents a few minutes of machine time on one of these three machines. Which machine generated it depends on the day, the workload, and what else is running. The listener cannot tell the difference. The audio sounds the same regardless of whether it was generated on a laptop, a mini desktop, or a rack-mount server in a cold Minnesota shop. That is the real benchmark — not which machine is fastest, but that all three are fast enough.&lt;/p&gt;</description><category>amd</category><category>apple silicon</category><category>audio</category><category>benchmarks</category><category>cuda</category><category>gpu</category><category>inference</category><category>m3 max</category><category>machine learning</category><category>mps</category><category>nvidia</category><category>qwen</category><category>rocm</category><category>strix halo</category><category>tesla p40</category><category>text-to-speech</category><category>tts</category><guid>https://tinycomputers.io/posts/the-real-cost-of-running-qwen-tts-locally-three-machines-compared.html</guid><pubDate>Thu, 12 Mar 2026 14:00:00 GMT</pubDate></item><item><title>Repurposing Enterprise GPUs: The Tesla P40 Home Lab Story</title><link>https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;17 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;There is a window, maybe eighteen months wide, where enterprise hardware hits a pricing sweet spot. The first-generation buyers — the hyperscalers, the research labs, the Fortune 500 AI teams — have moved on to the next generation. The second-hand market floods. Prices crater. And if you know what you're looking for, you can build something genuinely capable for less than a month of cloud compute.&lt;/p&gt;
&lt;p&gt;I built a four-GPU inference server for about twenty-five hundred dollars. This is the story of how, why, and whether you should do the same.&lt;/p&gt;
&lt;h3&gt;The Buy&lt;/h3&gt;
&lt;p&gt;The acquisition strategy is straightforward: eBay, patience, and knowing what to look for.&lt;/p&gt;
&lt;p&gt;Tesla P40s started appearing in volume on the secondary market around 2023, when cloud providers and enterprise data centers began cycling them out in favor of A100s and H100s. A card that sold for over five thousand dollars new was suddenly available for three hundred, then two hundred and fifty, then — if you watched listings carefully and were willing to buy from decommissioned lot sellers — sometimes less. I picked up four cards over the course of about two months, averaging two hundred and fifty dollars each.&lt;/p&gt;
&lt;p&gt;The chassis was a Penguin Computing 2U rack-mount server, also from eBay. These show up when government labs and research institutions liquidate equipment. The Penguin Computing systems are well-built — proper server-grade construction with redundant power supplies and engineered airflow. Mine takes the Xeon E5-2697A v4 and two were purchased from eBay: eighteen Broadwell cores, more than enough CPU to keep four GPUs fed. The chassis cost around six hundred dollars.&lt;/p&gt;
&lt;p&gt;Memory was the lucky purchase. I bought 252GB of DDR4 ECC RAM before the memory price spike that hit in late 2024 when every company on Earth decided they needed AI infrastructure simultaneously. What I paid around two hundred and fifty dollars for would cost significantly more today. Total build: roughly twenty-five hundred dollars.&lt;/p&gt;
&lt;h3&gt;The Hardware&lt;/h3&gt;
&lt;p&gt;The Tesla P40 is a 2016-era data center GPU. NVIDIA designed it for the Pascal generation, targeting inference workloads in enterprise environments. The specifications, for something you can buy on eBay for two hundred and fifty dollars, are remarkable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;24GB GDDR5X&lt;/strong&gt; per card — more memory than an RTX 4090&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;3,840 CUDA cores&lt;/strong&gt; — Pascal architecture, compute capability 6.1&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;12 TFLOPS FP32&lt;/strong&gt; — respectable even by 2026 standards for inference&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;250W TDP&lt;/strong&gt; — this is a data center card and it draws power like one&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Multiply by four and you get 96GB of VRAM for a thousand dollars. That is an extraordinary amount of GPU memory for the price. For context, a single NVIDIA A100 80GB still sells for north of five thousand dollars on the secondary market. Four P40s give you more total VRAM for a fraction of the cost.&lt;/p&gt;
&lt;h3&gt;What You Give Up&lt;/h3&gt;
&lt;p&gt;There is no free lunch in computing, and the P40 makes you pay for its low price in specific, sometimes painful ways.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;No Tensor Cores.&lt;/strong&gt; The P40 predates NVIDIA's Tensor Core architecture, which arrived with Volta in 2017. Tensor Cores accelerate matrix multiplication — the fundamental operation in neural network inference — by factors of 4x to 16x depending on precision. The P40 does everything with its CUDA cores, the old-fashioned way. This matters less than you might think for inference at moderate batch sizes, but it means you will never match the throughput of a V100 or newer card, clock for clock.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;No native BF16 or FP16.&lt;/strong&gt; This is the real gotcha. BF16 (bfloat16) has become the default precision for large language models. It is what most model weights are distributed in. The P40 cannot compute in BF16 natively — it emulates it through FP32 operations, which is roughly 21% slower than native support. In practice, this means you are running quantized models (Q4, Q5, Q8) through llama.cpp or similar frameworks, which handle the precision conversion for you. It works. It is not optimal.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Passive cooling designed for server airflow.&lt;/strong&gt; The P40 is a blower-style card designed for 1U and 2U server chassis with front-to-back forced airflow. In a proper server, this is fine. In anything else, you need to solve cooling yourself. I put mine in a Penguin Computing 2U rack-mount chassis, which has the right airflow characteristics, but this is not a card you drop into a desktop tower.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PCIe 3.0 x16.&lt;/strong&gt; The P40 connects via PCIe 3.0, which provides about 16 GB/s of bandwidth per direction. When you are running a model that spans four GPUs, the inter-GPU communication goes over PCIe, not NVLink. This creates a bottleneck for models that require heavy cross-GPU communication. For inference, where the communication pattern is more predictable than training, this is manageable. For training, it would be a serious constraint.&lt;/p&gt;
&lt;h3&gt;The Minnesota Problem&lt;/h3&gt;
&lt;p&gt;My server lives in an unheated shop building in northern Minnesota. This has created an issue that no hardware review will prepare you for.&lt;/p&gt;
&lt;p&gt;When ambient temperatures drop below freezing — which, in Minnesota, means roughly October through April — the onboard temperature sensors report values that the baseboard management controller interprets as a malfunction. The BMC's response is to spin every fan to maximum RPM as a protective measure.&lt;/p&gt;
&lt;p&gt;The result is a machine that, on quiet winter nights, is audible from the house. The house is a hundred and fifty feet away.&lt;/p&gt;
&lt;p&gt;I have not solved this problem. I have learned to live with it. You can override BMC fan curves on some platforms, but the Penguin Computing firmware is locked down in ways that make this nontrivial, and frankly, a server that runs its fans at full speed because it thinks it is dying is doing exactly what it should be doing. The firmware's assumptions are just wrong for the environment.&lt;/p&gt;
&lt;p&gt;The server runs 24/7 regardless of the season, and the cold air actually keeps the GPUs well within thermal limits — the irony is that the machine has never been cooler or louder than when it is twenty below zero outside. If you are considering a similar setup in a garage, basement, or outbuilding, factor in noise. A 2U server with four 250W GPUs is not quiet under any circumstances, and server-grade fans at full RPM are genuinely loud.&lt;/p&gt;
&lt;h3&gt;Setting Up the Software Stack&lt;/h3&gt;
&lt;p&gt;The driver situation for the P40 in 2026 is straightforward, though it was not always. NVIDIA's &lt;code&gt;nvidia-driver-570-server&lt;/code&gt; package works cleanly on Ubuntu, and the DKMS module rebuilds automatically on kernel updates — most of the time. I have had exactly two occasions where a kernel update broke the NVIDIA module and required manual intervention. This is fewer than I expected.&lt;/p&gt;
&lt;p&gt;For inference, I run &lt;a href="https://ollama.com"&gt;Ollama&lt;/a&gt;, which wraps llama.cpp and provides a simple API for model management and inference. Ollama handles multi-GPU sharding automatically — when you load a model, it distributes layers across GPUs based on available memory and model size. A 65GB model like gpt-oss:120b fits across three of the four P40s, leaving one free. Smaller models may only need one or two cards. The allocation is generally sensible, though you have less control over placement than you would with raw llama.cpp.&lt;/p&gt;
&lt;p&gt;The alternative stack — vLLM, TGI, or raw llama.cpp — offers more control over GPU assignment but requires more configuration. With llama.cpp directly, you can pin specific GPU layers to specific devices, which lets you optimize for the P40's memory topology. vLLM provides better batching and continuous batching for serving multiple concurrent requests. For a home lab where the primary use case is running various models for experimentation and development rather than serving production traffic, Ollama's simplicity wins.&lt;/p&gt;
&lt;p&gt;One thing worth noting: the P40 is well-supported by the GGUF ecosystem that llama.cpp (and therefore Ollama) uses. GGUF quantized models — Q4_K_M, Q5_K_M, Q8_0 — run without issues on Pascal hardware. The quantization handles the BF16 problem for you: model weights are stored in 4-bit or 8-bit integer formats and dequantized to FP32 at runtime, which the P40 handles natively. You are not fighting the hardware; you are working with it.&lt;/p&gt;
&lt;h3&gt;The Benchmarks&lt;/h3&gt;
&lt;p&gt;Theory is cheap. Benchmarks are what matter. I ran the same inference workload across three configurations: my four P40 home lab, a single AWS Tesla T4 instance, and a quad T4 instance on AWS. The T4 is the closest cloud comparison — it is the workhorse inference GPU in AWS's fleet, one generation newer than the P40 (Turing architecture, 2018), with 16GB of GDDR6 and actual Tensor Cores.&lt;/p&gt;
&lt;p&gt;All benchmarks used Ollama with the same prompt, measuring tokens per second during the evaluation phase (excluding model load time).&lt;/p&gt;
&lt;h4&gt;Dense Models&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Parameters&lt;/th&gt;
&lt;th&gt;4x P40 (Home Lab)&lt;/th&gt;
&lt;th&gt;1x T4 (AWS \$0.53/hr)&lt;/th&gt;
&lt;th&gt;4x T4 (AWS \$3.91/hr)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.2&lt;/td&gt;
&lt;td&gt;3B&lt;/td&gt;
&lt;td&gt;94.3 tok/s&lt;/td&gt;
&lt;td&gt;81.5 tok/s&lt;/td&gt;
&lt;td&gt;101.5 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5&lt;/td&gt;
&lt;td&gt;7B&lt;/td&gt;
&lt;td&gt;52.7 tok/s&lt;/td&gt;
&lt;td&gt;36.9 tok/s&lt;/td&gt;
&lt;td&gt;40.3 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.1&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;47.8 tok/s&lt;/td&gt;
&lt;td&gt;35.7 tok/s&lt;/td&gt;
&lt;td&gt;29.2 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The P40 wins on the 7B and 8B models by substantial margins — 31% and 64% respectively over the quad T4 configuration. The only model where the T4 edges ahead is the 3B, which is small enough to fit entirely on a single GPU. Here, the T4's higher clock speeds and faster GDDR6 memory give it an advantage because there is no multi-GPU overhead to penalize it.&lt;/p&gt;
&lt;p&gt;The 8B result is particularly interesting. The quad T4 actually performs &lt;em&gt;worse&lt;/em&gt; than a single T4 on this model (29.2 vs 35.7 tok/s). Ollama shards the model across all four GPUs even though it fits on one, and the PCIe communication overhead between four T4s costs more than it gains. The P40, with its larger 24GB per-card memory, likely fits more of the model per GPU, reducing cross-GPU transfers.&lt;/p&gt;
&lt;h4&gt;The MoE Advantage&lt;/h4&gt;
&lt;p&gt;The most compelling benchmark comes from OpenAI's gpt-oss — a 120-billion parameter mixture-of-experts model with only 5.1 billion active parameters per token. The MoE architecture means the model's total weight is large (it needs the memory), but the computation per token is modest (only a fraction of the parameters fire for any given input).&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;4x P40&lt;/th&gt;
&lt;th&gt;4x T4 (AWS \$3.91/hr)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gpt-oss&lt;/td&gt;
&lt;td&gt;120B MoE (5.1B active)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;28.1 tok/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;20.6 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The P40 runs OpenAI's 120B model at 28.1 tokens per second — 36% faster than the cloud instance, and fast enough for comfortable interactive use. This is a state-of-the-art model running on decade-old GPUs at a speed that would have been impressive on much newer hardware a year ago.&lt;/p&gt;
&lt;p&gt;The reason is memory. The gpt-oss model uses MXFP4 quantization on its MoE weights, bringing the total model size to about 65GB. Four P40s offer 96GB of VRAM — enough to hold the entire model in GPU memory. Four T4s offer only 64GB, which means some of the model likely spills to system RAM, adding latency on every token.&lt;/p&gt;
&lt;p&gt;This is the P40's superpower: 24GB per card was overkill in 2016, and it is exactly right in 2026. Models have grown to fill the memory, and the P40 has more of it per dollar than almost anything else on the market.&lt;/p&gt;
&lt;h4&gt;Where It Falls Apart&lt;/h4&gt;
&lt;p&gt;Dense 70B models are a different story. Llama 3.1 70B at Q4_0 quantization (39GB) fits across 96GB of P40 VRAM, but the inference speed is essentially unusable: 0.033 tokens per second. One token every thirty seconds. Answering "What is 2+2?" took six and a half minutes. The combination of no Tensor Cores, PCIe 3.0 interconnect, and the sheer volume of cross-GPU data transfers for a dense 70B model pushes the per-token latency beyond any practical threshold.&lt;/p&gt;
&lt;p&gt;The quad T4 on AWS managed 2.0 tokens per second on the same model — sixty times faster. Slow, but functional. The T4's Tensor Cores make the difference here — at this scale, the P40's raw CUDA cores simply cannot keep up with the matrix math.&lt;/p&gt;
&lt;p&gt;The lesson: MoE models and quantized models up to about 8B parameters are the P40's sweet spot. Dense models above 13B start hitting diminishing returns. Dense 70B is a wall.&lt;/p&gt;
&lt;h3&gt;The Cost Argument&lt;/h3&gt;
&lt;p&gt;Here is the math that justifies the project.&lt;/p&gt;
&lt;p&gt;A &lt;code&gt;g4dn.12xlarge&lt;/code&gt; on AWS — four Tesla T4s, 48 vCPUs, 192GB RAM — costs \$3.91 per hour. My home lab outperforms it on every model except the smallest. If I run inference for just four hours a day, the cloud cost would be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Daily&lt;/strong&gt;: \$15.64&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monthly&lt;/strong&gt;: \$469&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Yearly&lt;/strong&gt;: \$5,694&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My server cost \$2,500 to build. It pays for itself in roughly five months of equivalent cloud usage. After that, the only ongoing cost is electricity. At Minnesota residential rates (roughly \$0.12/kWh) and an average draw of 800W under load, that is about \$70 per month. Less than a single day of the equivalent cloud instance.&lt;/p&gt;
&lt;p&gt;Even if you factor in the P40's lower performance on some workloads and assume you only get 70% of the cloud equivalent's utility, the break-even point is still well under a year. For a home lab that runs 24/7 for development, experimentation, and &lt;a href="https://tinycomputers.io/posts/clean-room-z80-emulator.html"&gt;text-to-speech generation&lt;/a&gt;, the economics are overwhelming.&lt;/p&gt;
&lt;h3&gt;What I Actually Use It For&lt;/h3&gt;
&lt;p&gt;The server runs several workloads:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Local LLM inference.&lt;/strong&gt; This is the primary use case. Having a local inference server with 96GB of VRAM means I can run frontier-class open-weight models without sending data to a cloud API. For development work — where I might make hundreds of inference calls while iterating on a project — the zero marginal cost changes how I work. I experiment more freely when each query costs nothing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Text-to-speech.&lt;/strong&gt; I run &lt;a href="https://tinycomputers.io/posts/clean-room-z80-emulator.html"&gt;Qwen TTS&lt;/a&gt; on the P40s to generate audio narration for blog posts. The model fits comfortably in the P40's memory, and the generation speed is acceptable for batch processing. The narration you hear on posts across this site was generated on these GPUs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Development and testing.&lt;/strong&gt; When I am building projects like &lt;a href="https://tinycomputers.io/posts/sampo-designing-a-16-bit-risc-cpu-from-scratch-part-1-theory-and-architecture.html"&gt;Sampo&lt;/a&gt; or &lt;a href="https://tinycomputers.io/posts/introducing-lattice-a-crystallization-based-programming-language.html"&gt;Lattice&lt;/a&gt;, having local GPU compute available for testing AI-assisted workflows means I do not need to worry about API rate limits or costs during intensive development sessions.&lt;/p&gt;
&lt;p&gt;The server sits on my local network at a static IP, accessible from any machine in the house. It is always on, always available, and always free to use. That availability changes your relationship with AI inference in ways that are hard to appreciate until you have lived with it. There is a psychological difference between "this costs two cents per query" and "this costs nothing per query." The first makes you think about whether the query is worth it. The second lets you experiment without friction — and that friction reduction, compounded across hundreds of daily interactions, fundamentally changes how you work.&lt;/p&gt;
&lt;p&gt;This is, incidentally, a small-scale example of the &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox&lt;/a&gt; I have been writing about in this blog's economics series. Making inference cheaper did not cause me to run the same number of queries and pocket the savings. It caused me to run dramatically more queries, on more models, for more projects, consuming more total compute than I ever would have purchased from a cloud provider. The efficiency created demand.&lt;/p&gt;
&lt;h3&gt;Should You Build One?&lt;/h3&gt;
&lt;p&gt;The honest answer is: it depends on what you value.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Build one if:&lt;/strong&gt;
- You run local inference regularly and the cloud costs are adding up
- You want 96GB of VRAM for under a thousand dollars in GPU costs
- You have the physical space, electrical capacity, and noise tolerance for a rack-mount server
- You enjoy the process of building and configuring systems — this is not a plug-and-play experience&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Do not build one if:&lt;/strong&gt;
- You need the latest model performance (Tensor Cores, FP8, NVLink)
- You are training models, not running inference
- You need reliability guarantees — this is a home lab, not a production environment
- You are not comfortable with Linux system administration, driver debugging, and occasional hardware troubleshooting&lt;/p&gt;
&lt;p&gt;The P40 window will not last forever. As newer GPUs age out of data centers — the V100, the A100 — the P40 will eventually lose its price-to-performance advantage. The V100, with its first-generation Tensor Cores and 32GB of HBM2, is already starting to appear at attractive secondary market prices. Within a year, it may be the new sweet spot. But right now, in early 2026, four P40s on eBay represent one of the best deals in GPU computing. Ninety-six gigabytes of VRAM, proven CUDA compatibility, and a decade of driver maturity, for the price of a weekend trip.&lt;/p&gt;
&lt;p&gt;The server in my shop building will keep running. The fans will keep screaming through the Minnesota winter. And I will keep running models on hardware that a hyperscaler discarded three years ago, at speeds that would have been remarkable on any hardware five years ago. That is the beauty of the secondary market — someone else paid for the R&amp;amp;D, someone else paid for the depreciation, and you get the compute.&lt;/p&gt;</description><category>ai</category><category>benchmarks</category><category>cuda</category><category>deep learning</category><category>ebay</category><category>enterprise hardware</category><category>gpu</category><category>home lab</category><category>inference</category><category>nvidia</category><category>ollama</category><category>tesla p40</category><guid>https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html</guid><pubDate>Wed, 11 Mar 2026 14:00:00 GMT</pubDate></item><item><title>JokelaOS: Writing a Bare-Metal x86 Kernel from Scratch</title><link>https://tinycomputers.io/posts/jokelaos-bare-metal-x86-kernel.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/jokelaos-bare-metal-x86-kernel_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;30 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;There's a moment early in any OS project where the serial port prints its first character and you realize that nothing you've written has a safety net. No libc. No kernel underneath. No syscall to fall back on. If the byte appears on the terminal, it's because you programmed the UART (Universal Asynchronous Receiver-Transmitter) divisor latch, polled the line status register, and wrote to the data port. If it doesn't appear, you stare at register dumps until you find the mistake. There's no debugger — you haven't written one yet.&lt;/p&gt;
&lt;p&gt;The closest thing I can compare it to is the first time I got a &lt;a href="https://tinycomputers.io/posts/arduino-z80-+-forth.html"&gt;RetroShield Z80&lt;/a&gt; talking over serial — that moment where a processor you wired up yourself pushes a character out of an emulated ACIA and it appears on your screen. The Z80 version involves physical hardware and solder. The x86 version is virtual — QEMU, a cross-compiler, and a Multiboot header — but the feeling is the same. You built the entire path from CPU to character. Nothing was given to you.&lt;/p&gt;
&lt;p&gt;JokelaOS started there: a Multiboot header, a stack, and a &lt;code&gt;call kmain&lt;/code&gt;. Everything that followed — GDT (Global Descriptor Table), IDT (Interrupt Descriptor Table), memory management, a network stack, preemptive multitasking, paging, user mode, a shell — was built one subsystem at a time, tested after every change, with no external code. No forks of existing kernels. No libc. No shortcuts.&lt;/p&gt;
&lt;p&gt;To be clear about what this is: JokelaOS is a toy. It's a learning project. The memory allocator is a linear scan. The scheduler has no concept of priority. The file system can't delete files. The user authentication stores passwords in plaintext in a static array. Nothing here is production-grade, and none of it is intended to be. The value is in the building — understanding what each subsystem actually does by writing it from scratch, making the mistakes, and fixing them with nothing between you and the hardware.&lt;/p&gt;
&lt;p&gt;This is the story of what it takes to go from twenty lines of NASM to a kernel that boots, manages memory, runs user programs in Ring 3, handles syscalls, responds to pings, and gives you a command prompt.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/jokelaos/jokelaos0.png" alt="JokelaOS boot sequence in QEMU — GDT, IDT, PCI enumeration, memory map, paging init, RTL8139 driver, and network stack initialization" style="max-width: 100%; border-radius: 6px; box-shadow: 0 10px 20px rgba(0,0,0,.1); margin: 1em 0;" loading="lazy"&gt;&lt;/p&gt;
&lt;h3&gt;The Target&lt;/h3&gt;
&lt;p&gt;JokelaOS targets 32-bit x86 (i686) and runs under QEMU. The toolchain is a cross-compiler (&lt;code&gt;i686-elf-gcc&lt;/code&gt;, &lt;code&gt;i686-elf-ld&lt;/code&gt;) with NASM (Netwide Assembler) for the assembly files. The C standard is &lt;code&gt;gnu11&lt;/code&gt; — GNU extensions are required for inline assembly. There are no external libraries whatsoever, not even a freestanding &lt;code&gt;string.h&lt;/code&gt;. Every &lt;code&gt;memcpy&lt;/code&gt;, every &lt;code&gt;memset&lt;/code&gt;, every &lt;code&gt;printf&lt;/code&gt;-like function is written from scratch.&lt;/p&gt;
&lt;p&gt;The only console is the serial port. COM1 at 0x3F8, 115200 baud, 8N1 (8 data bits, no parity, 1 stop bit). All kernel output goes through &lt;code&gt;serial_printf()&lt;/code&gt;. This is a deliberate choice: serial is simpler than VGA text mode, works perfectly with QEMU's &lt;code&gt;-serial stdio&lt;/code&gt;, and means the kernel's output appears directly in the host terminal. No framebuffer driver needed, no font rendering, no cursor management. Just bytes on a wire.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;$&lt;span class="w"&gt; &lt;/span&gt;make&lt;span class="w"&gt; &lt;/span&gt;run
qemu-system-i386&lt;span class="w"&gt; &lt;/span&gt;-kernel&lt;span class="w"&gt; &lt;/span&gt;build/jokelaos.bin&lt;span class="w"&gt; &lt;/span&gt;-serial&lt;span class="w"&gt; &lt;/span&gt;stdio&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;-display&lt;span class="w"&gt; &lt;/span&gt;none&lt;span class="w"&gt; &lt;/span&gt;-device&lt;span class="w"&gt; &lt;/span&gt;rtl8139,netdev&lt;span class="o"&gt;=&lt;/span&gt;net0&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;-netdev&lt;span class="w"&gt; &lt;/span&gt;user,id&lt;span class="o"&gt;=&lt;/span&gt;net0&lt;span class="w"&gt; &lt;/span&gt;-no-reboot
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Kernel Architecture: Where JokelaOS Fits&lt;/h3&gt;
&lt;p&gt;Before getting into implementation, it's worth understanding the design space. Not all kernels are structured the same way, and the choice of architecture has consequences that ripple through every subsystem.&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;monolithic kernel&lt;/strong&gt; puts everything — memory management, scheduling, file systems, device drivers, the network stack — into a single binary running in Ring 0. All kernel code shares one address space. A function call from the scheduler into the memory manager is just that: a function call. No context switches, no message serialization, no copying buffers between address spaces. Linux is monolithic. So are the BSDs. So is JokelaOS.&lt;/p&gt;
&lt;p&gt;The advantage is performance and simplicity. When the network stack needs to allocate a page, it calls &lt;code&gt;pmm_alloc_frame()&lt;/code&gt; directly. When the shell wants to load a program, it calls the loader, which calls the PMM (Physical Memory Manager), which calls the paging subsystem — all in the same address space, all with the same privilege level, all at the cost of a function call. There's no overhead beyond what the work itself requires.&lt;/p&gt;
&lt;p&gt;The disadvantage is that every driver, every subsystem, every line of kernel code has full access to every other line of kernel code's memory. A bug in the RTL8139 driver can corrupt the process table. A buffer overrun in the serial port handler can overwrite page tables. In a production monolithic kernel like Linux, this is mitigated by code review, testing, and an enormous community of contributors. In a toy kernel written by one person, it means bugs are spectacular.&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;microkernel&lt;/strong&gt; takes the opposite approach. Only the absolute minimum runs in Ring 0: the scheduler, IPC (Inter-Process Communication) message passing, and basic memory management. Everything else — file systems, device drivers, the network stack — runs as separate user-space processes (called "servers") that communicate through message passing. Mach, developed at Carnegie Mellon in the 1980s, is the canonical example. MINIX 3 is a modern realization of the idea, designed by Andrew Tanenbaum specifically to demonstrate microkernel reliability. L4 and its descendants (seL4, which has a formal mathematical proof of correctness) represent the performance-optimized end of the microkernel spectrum.&lt;/p&gt;
&lt;p&gt;The advantage is isolation. If the network driver crashes, it crashes in its own address space. The kernel restarts it. The file system never noticed. This matters enormously for reliability and security — seL4 is used in military and aviation systems where "the driver crashed and took the kernel with it" is not acceptable.&lt;/p&gt;
&lt;p&gt;The disadvantage is IPC overhead. Every interaction between subsystems that would be a function call in a monolithic kernel becomes a message: marshal the arguments, trap into the kernel, copy the message to the destination server's address space, schedule that server, let it process the request, marshal the reply, trap back. Mach's original implementation was notoriously slow — sometimes 50-70% slower than monolithic equivalents for system-call-heavy workloads. L4 demonstrated that much of this overhead was implementation quality rather than an inherent property of the architecture, but the fundamental cost of crossing address space boundaries doesn't disappear.&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;hybrid kernel&lt;/strong&gt; tries to split the difference. Windows NT is the most commercially successful example: it has a microkernel-like separation of concerns in its architecture, but runs most of the subsystems that a pure microkernel would put in user space (the window manager, parts of the file system, device drivers) in kernel mode for performance. macOS runs XNU, which is a Mach microkernel fused with a BSD monolithic kernel — Mach handles the low-level primitives (memory management, IPC, scheduling), while the BSD layer provides the POSIX API, the file system, and networking, all running in Ring 0. It's a microkernel by lineage but monolithic in practice.&lt;/p&gt;
&lt;p&gt;There are more exotic designs. &lt;strong&gt;Exokernels&lt;/strong&gt;, researched at MIT in the 1990s, eliminate almost all kernel abstractions and let applications manage hardware resources directly, with the kernel only enforcing protection. &lt;strong&gt;Unikernels&lt;/strong&gt; (MirageOS, IncludeOS) compile the application and a minimal OS library into a single binary that runs directly on the hypervisor — no ring separation at all, because there's only one program and it's trusted by definition.&lt;/p&gt;
&lt;p&gt;JokelaOS is monolithic, and deliberately so. It's the simplest architecture to implement, it's the easiest to debug (everything is in one address space, so a &lt;code&gt;serial_printf()&lt;/code&gt; anywhere can see anything), and it's what you build when you're trying to understand how each subsystem works in isolation before worrying about how to decouple them. A microkernel JokelaOS would be a more interesting engineering artifact, but it would also be three times as much code before you could print a single character — you'd need working IPC before the serial driver could talk to anything.&lt;/p&gt;
&lt;h3&gt;Booting: The First 33 Lines&lt;/h3&gt;
&lt;p&gt;The entire boot sequence fits in &lt;code&gt;boot.asm&lt;/code&gt;. Multiboot v1 requires a magic number (&lt;code&gt;0x1BADB002&lt;/code&gt;), flags, and a checksum in a specific header format. GRUB or QEMU's &lt;code&gt;-kernel&lt;/code&gt; loader scans for this header, loads the binary, and jumps to &lt;code&gt;_start&lt;/code&gt; in protected mode with paging disabled.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;section&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;.multiboot&lt;/span&gt;
&lt;span class="k"&gt;align&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;dd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x1BADB002&lt;/span&gt;&lt;span class="w"&gt;                           &lt;/span&gt;&lt;span class="c1"&gt;; Multiboot magic&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;dd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x00000003&lt;/span&gt;&lt;span class="w"&gt;                           &lt;/span&gt;&lt;span class="c1"&gt;; Flags: page-align + memory map&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;dd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x1BADB002&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x00000003&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="c1"&gt;; Checksum&lt;/span&gt;

&lt;span class="k"&gt;section&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;.text&lt;/span&gt;
&lt;span class="k"&gt;global&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;_start&lt;/span&gt;
&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;kmain&lt;/span&gt;

&lt;span class="nl"&gt;_start:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;mov&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;esp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;stack_top&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;popf&lt;/span&gt;&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="c1"&gt;; Clear EFLAGS&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;ebx&lt;/span&gt;&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;; Multiboot info struct pointer&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;; Multiboot magic number&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;kmain&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;cli&lt;/span&gt;
&lt;span class="nl"&gt;.hang:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;hlt&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;jmp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;.hang&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That's it. Set up a stack, clear the flags register, push the two values the Multiboot spec guarantees (magic number in EAX, info struct pointer in EBX), and call C. If &lt;code&gt;kmain&lt;/code&gt; ever returns, disable interrupts and halt forever.&lt;/p&gt;
&lt;p&gt;The 16 KB stack is allocated in the BSS (Block Started by Symbol) section — the region where uninitialized global data lives, zeroed at load time. The linker script places the kernel at 1 MB (the standard x86 protected-mode load address), with &lt;code&gt;.multiboot&lt;/code&gt; first so the bootloader can find the header within the first 8 KB of the binary.&lt;/p&gt;
&lt;h3&gt;Protection Rings: Hardware-Enforced Privilege&lt;/h3&gt;
&lt;p&gt;x86 protected mode provides four privilege levels, numbered 0 through 3, called rings. Ring 0 is the most privileged — the kernel runs here. Ring 3 is the least privileged — user programs run here. Rings 1 and 2 exist in the hardware but almost nobody uses them. Linux doesn't. Windows doesn't. JokelaOS doesn't. The practical x86 privilege model is two rings: kernel and user.&lt;/p&gt;
&lt;p&gt;The ring system isn't a software convention. It's enforced by the CPU itself, in silicon. The processor tracks the Current Privilege Level (CPL) — the ring the currently executing code belongs to — and checks it against every sensitive operation. A Ring 3 process that executes &lt;code&gt;cli&lt;/code&gt; (disable interrupts), &lt;code&gt;hlt&lt;/code&gt; (halt the CPU), &lt;code&gt;lgdt&lt;/code&gt; (load a new GDT), or &lt;code&gt;mov cr3&lt;/code&gt; (change the page directory) triggers a General Protection Fault. The CPU literally refuses to execute the instruction. A Ring 3 process can't touch I/O ports unless the kernel has explicitly granted access through the I/O Permission Bitmap in the TSS. It can't modify its own segment registers to escalate privilege, because the CPU validates every segment load against the descriptor's DPL (Descriptor Privilege Level).&lt;/p&gt;
&lt;p&gt;The only way for Ring 3 code to enter Ring 0 is through a gate — an interrupt gate, a trap gate, or a call gate. Gates are entries in the IDT or GDT that the kernel sets up in advance. They define the exact entry points where Ring 3 code can cross into Ring 0, what the new code and stack segments will be, and what privilege level is required to use them. There's no way for user code to jump to an arbitrary kernel address. It can only enter the kernel through the doors the kernel has built.&lt;/p&gt;
&lt;p&gt;This is what makes an operating system an operating system rather than a library. Without ring separation, a buggy user program can corrupt kernel memory, disable interrupts, reprogram the PIC, or overwrite the page tables. With ring separation, the worst it can do is crash itself.&lt;/p&gt;
&lt;p&gt;The mechanism that implements all of this is the Global Descriptor Table.&lt;/p&gt;
&lt;h3&gt;The GDT: Defining the World&lt;/h3&gt;
&lt;p&gt;The GDT defines memory segments — their base addresses, sizes, privilege levels, and whether they hold code or data. Each segment descriptor is an 8-byte structure with fields packed into non-obvious bit positions (a consequence of backward compatibility with the 286, which had a different descriptor format that the 386 had to extend without breaking).&lt;/p&gt;
&lt;p&gt;JokelaOS uses a flat memory model: every segment covers the full 4 GB address space with base 0 and limit 0xFFFFFFFF. The segmentation hardware is effectively nullified, which is what you want on modern x86 where paging handles memory protection. But the GDT is still mandatory — the CPU requires it for the ring system to function. Even with flat segments, the DPL field in each descriptor is what tells the CPU "code using this segment is Ring 0" or "code using this segment is Ring 3."&lt;/p&gt;
&lt;p&gt;The GDT has six entries:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Index&lt;/th&gt;
&lt;th&gt;Selector&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0x00&lt;/td&gt;
&lt;td&gt;Null descriptor (required by x86)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0x08&lt;/td&gt;
&lt;td&gt;Kernel code (Ring 0, execute/read)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0x10&lt;/td&gt;
&lt;td&gt;Kernel data (Ring 0, read/write)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0x18&lt;/td&gt;
&lt;td&gt;User code (Ring 3, execute/read)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0x20&lt;/td&gt;
&lt;td&gt;User data (Ring 3, read/write)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;0x28&lt;/td&gt;
&lt;td&gt;Task State Segment&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Entries 1 and 2 are identical to entries 3 and 4 in every way except the DPL field — two bits in the access byte that say &lt;code&gt;00&lt;/code&gt; (Ring 0) versus &lt;code&gt;11&lt;/code&gt; (Ring 3). That two-bit difference is the entire kernel/user boundary.&lt;/p&gt;
&lt;p&gt;When a user process runs, the CPU's CS register is loaded with 0x1B — that's selector 0x18 (pointing to GDT entry 3, the user code segment) OR'd with RPL 3 (Requested Privilege Level, the bottom two bits of the selector). The data segment registers get 0x23 (GDT entry 4, user data, RPL 3). The CPU sets CPL to match, and from that point on, every instruction is checked against Ring 3 privileges. The kernel runs with CS=0x08 (GDT entry 1, RPL 0) and DS=0x10 (GDT entry 2, RPL 0).&lt;/p&gt;
&lt;p&gt;The TSS (Task State Segment) is the bridge between rings. When the CPU takes an interrupt while running Ring 3 code, it needs to switch to a Ring 0 stack — you can't trust the user's stack pointer to be valid, and you certainly can't run kernel interrupt handlers on a user-controlled stack. The TSS holds the Ring 0 stack pointer (&lt;code&gt;esp0&lt;/code&gt;). Every context switch updates the TSS with the current process's kernel stack, so the CPU always knows where to land when transitioning from user mode to kernel mode.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;gdt_init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;gdt_set_entry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;                     &lt;/span&gt;&lt;span class="c1"&gt;// Null&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;gdt_set_entry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFFFFFFFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x9A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xCF&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// Kernel code&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;gdt_set_entry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFFFFFFFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xCF&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// Kernel data&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;gdt_set_entry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFFFFFFFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xCF&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// User code&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;gdt_set_entry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFFFFFFFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xF2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xCF&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// User data&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// TSS entry built separately&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The access byte &lt;code&gt;0x9A&lt;/code&gt; means: present, Ring 0, code segment, executable, readable. &lt;code&gt;0xFA&lt;/code&gt; means the same thing but Ring 3. These magic numbers come straight from the Intel manuals and they're the kind of thing you get wrong three times before you get right once.&lt;/p&gt;
&lt;h3&gt;Interrupts: Exceptions, IRQs, and the PIC&lt;/h3&gt;
&lt;p&gt;The IDT (Interrupt Descriptor Table) maps interrupt vectors to handler functions. JokelaOS sets up 256 entries: CPU exceptions (0-31), hardware IRQs (32-47), and the syscall gate (0x80).&lt;/p&gt;
&lt;p&gt;The x86 PIC (Programmable Interrupt Controller) needs remapping. By default, the master PIC maps IRQs 0-7 to interrupt vectors 8-15, which collide with CPU exceptions (double fault is vector 8, for instance). The standard fix is to remap the master PIC to vectors 32-39 and the slave to 40-47. This requires sending four Initialization Command Words to each PIC in the correct sequence — the kind of hardware protocol that hasn't changed since the IBM PC/AT in 1984.&lt;/p&gt;
&lt;p&gt;ISR (Interrupt Service Routine) stubs are written in NASM. Each one pushes an error code (or a dummy zero for exceptions that don't push one), pushes the interrupt number, saves all general-purpose registers, calls the C handler, restores registers, and does an &lt;code&gt;iret&lt;/code&gt;. The stubs are generated with macros:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;%macro ISR_NOERRCODE 1&lt;/span&gt;
&lt;span class="k"&gt;global&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;isr&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="nf"&gt;isr&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;dword&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;; dummy error code&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;dword&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; interrupt number&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;jmp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;isr_common&lt;/span&gt;
&lt;span class="cp"&gt;%endmacro&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The C-side dispatcher checks the interrupt number. For exceptions (0-31), it prints the register state and halts — there's no recovery from a page fault when you don't have a page fault handler yet. For IRQs (32-47), it calls the registered handler function and sends an EOI (End of Interrupt) command to the PIC. For interrupt 0x80, it dispatches to the syscall handler.&lt;/p&gt;
&lt;p&gt;One critical detail: interrupt 0x80 is set as a &lt;strong&gt;trap gate&lt;/strong&gt; with DPL 3, not an interrupt gate. This means Ring 3 code can trigger it with &lt;code&gt;int 0x80&lt;/code&gt;. All other interrupt gates are DPL 0 — a user program that tries to execute &lt;code&gt;int 0x00&lt;/code&gt; gets a General Protection Fault instead. This is the mechanism that makes syscalls work while keeping everything else protected.&lt;/p&gt;
&lt;h3&gt;Memory: Three Allocators&lt;/h3&gt;
&lt;p&gt;JokelaOS has three layers of memory management, each built on top of the previous one.&lt;/p&gt;
&lt;h4&gt;The Bump Allocator&lt;/h4&gt;
&lt;p&gt;The simplest possible allocator. A pointer starts at the first page boundary after the kernel image (&lt;code&gt;_kernel_end&lt;/code&gt; from the linker script) and only moves forward. &lt;code&gt;kmalloc(size)&lt;/code&gt; aligns the pointer to 16 bytes, returns it, and advances by &lt;code&gt;size&lt;/code&gt;. There is no &lt;code&gt;kfree()&lt;/code&gt;. Memory allocated with the bump allocator is permanent.&lt;/p&gt;
&lt;p&gt;This sounds primitive, and it is. But it's also exactly right for kernel initialization. The GDT, IDT, page tables, file system metadata, user table — these are allocated once and never freed. The bump allocator handles all of them with zero fragmentation and zero overhead.&lt;/p&gt;
&lt;h4&gt;The Physical Memory Manager&lt;/h4&gt;
&lt;p&gt;Once the kernel needs to allocate and free pages dynamically (for process stacks, program code, page tables), it needs a real allocator. The PMM uses a bitmap: one bit per 4 KB physical frame, supporting up to 256 MB of RAM (65,536 frames, 8 KB bitmap).&lt;/p&gt;
&lt;p&gt;Initialization parses the Multiboot memory map to find usable RAM regions, then marks everything from frame 0 through the end of the bump heap as reserved. This protects the IVT (Interrupt Vector Table), BIOS data area, kernel image, and all bump-allocated structures from being handed out as free pages.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;pmm_alloc_frame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;total_frames&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bitmap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;bitmap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;free_count&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PAGE_SIZE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// out of memory&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Linear scan, no free lists, no buddy system. It's O(n) per allocation, which is fine when n is measured in thousands and allocations are infrequent. A production kernel would use something smarter. This kernel allocates a few dozen pages total.&lt;/p&gt;
&lt;h4&gt;Paging&lt;/h4&gt;
&lt;p&gt;With physical frames available, the kernel can enable paging. &lt;code&gt;paging_init()&lt;/code&gt; builds a page directory and 32 page tables, identity-mapping the first 128 MB of physical memory (virtual address = physical address). The page directory goes into CR3, and setting the PG bit in CR0 turns the MMU (Memory Management Unit) on.&lt;/p&gt;
&lt;p&gt;Identity mapping means the kernel doesn't need to worry about virtual-to-physical translation for its own code and data. Kernel pointers just work. When user processes need memory, the loader allocates physical frames and maps them into the process's address space with the PG_USER flag set, allowing Ring 3 access.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;paging_map_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;virt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;phys&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dir_idx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;virt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;22&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tbl_idx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;virt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x3FF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_directory&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dir_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PG_PRESENT&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tbl_frame&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pmm_alloc_frame&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;memset&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;tbl_frame&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PAGE_SIZE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;page_directory&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dir_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tbl_frame&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PG_PRESENT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PG_WRITE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;page_directory&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dir_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFFFFF000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tbl_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phys&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFFFFF000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;asm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;volatile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"invlpg (%0)"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;virt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"memory"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;invlpg&lt;/code&gt; instruction flushes the TLB (Translation Lookaside Buffer) entry for the mapped virtual address, which is critical — without it, the CPU might use a stale translation from its cache and access the wrong physical page.&lt;/p&gt;
&lt;h3&gt;The Network Stack&lt;/h3&gt;
&lt;p&gt;JokelaOS has a working network stack — the one subsystem where "toy" undersells it slightly. It resolves ARP (Address Resolution Protocol), constructs IPv4 (Internet Protocol version 4) packets with correct checksums, and handles ICMP (Internet Control Message Protocol) echo request/reply with measured round-trip times. There's no TCP (Transmission Control Protocol), no UDP (User Datagram Protocol), no sockets. But the packets that leave this kernel are real packets that traverse real networks.&lt;/p&gt;
&lt;p&gt;The NIC (Network Interface Controller) is an emulated RTL8139, the simplest PCI (Peripheral Component Interconnect) Ethernet controller that QEMU supports. The driver initializes the chip by writing to its configuration registers: reset, enable transmitter and receiver, set up a receive ring buffer, configure the interrupt mask, and unmask IRQ 11. Packet transmission uses a four-descriptor TX ring; reception is interrupt-driven through the RTL8139's ring buffer.&lt;/p&gt;
&lt;p&gt;PCI enumeration scans the configuration space to find the RTL8139 by vendor/device ID (0x10EC:0x8139), reads the I/O base address from BAR0 (Base Address Register 0), and enables bus mastering. This is the only driver in the system — there's no USB, no disk, no display. One NIC, one network.&lt;/p&gt;
&lt;p&gt;The stack is layered:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Link&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ethernet.c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Frame demux by EtherType&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ARP&lt;/td&gt;
&lt;td&gt;&lt;code&gt;arp.c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Table + request/reply&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ipv4.c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Routing, header checksum&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transport&lt;/td&gt;
&lt;td&gt;&lt;code&gt;icmp.c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Echo reply + outgoing ping&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;On boot, the kernel sends an ARP request for the gateway (10.0.2.2, QEMU's default) and waits for the reply. Once the gateway's MAC (Media Access Control) address is resolved, the kernel can ping arbitrary hosts through QEMU's SLIRP (Session-Level IP Redirect Protocol) NAT (Network Address Translation). A &lt;code&gt;ping 10.1.1.1&lt;/code&gt; from the shell constructs an ICMP echo request, wraps it in an IPv4 packet, wraps that in an Ethernet frame, and pushes it out through the RTL8139's TX ring. When the reply comes back, the receive ISR fires, the Ethernet layer demuxes by EtherType, the IP layer validates the checksum, and the ICMP handler matches the echo reply to the outstanding request and computes the RTT (round-trip time).&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.1.1.1&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Pinging&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.1.1.1&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.1.1.1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.1.1.1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.1.1.1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.1.1.1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Getting here required writing every byte-order conversion (&lt;code&gt;htons&lt;/code&gt;, &lt;code&gt;htonl&lt;/code&gt;), every checksum computation (the IP header checksum is a one's complement sum of 16-bit words), every packet layout (Ethernet header is 14 bytes, IP header is 20, ICMP is 8 plus payload). None of this is hard individually. Together, it's a thousand places to put a byte in the wrong order.&lt;/p&gt;
&lt;h3&gt;Processes and Preemptive Multitasking&lt;/h3&gt;
&lt;p&gt;The process subsystem manages up to 16 processes in a static table. Each process has a state (UNUSED, READY, RUNNING, DEAD), a kernel stack pointer, and a user-mode entry point and stack.&lt;/p&gt;
&lt;p&gt;Process creation doesn't follow the UNIX &lt;code&gt;fork()&lt;/code&gt;/&lt;code&gt;exec()&lt;/code&gt; model. There's no cloning of address spaces, no copy-on-write, no replacing the current process image. Instead, the loader allocates fresh physical frames for the program's code and stack, copies the flat binary into the code pages, and calls &lt;code&gt;proc_create()&lt;/code&gt;, which allocates a 4 KB kernel stack and builds a fake stack frame on it. This stack frame is what &lt;code&gt;context_switch()&lt;/code&gt; will "return" into on the process's first schedule — it contains saved registers and a return address pointing to &lt;code&gt;proc_entry_user()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;proc_entry_user()&lt;/code&gt; is a small assembly sequence that performs the Ring 0 to Ring 3 transition. It sets the data segment registers to the user data selector (0x23), pushes a fake interrupt frame (SS, ESP, EFLAGS with IF=1, CS, EIP), and executes &lt;code&gt;iret&lt;/code&gt;. The CPU pops the frame, switches to Ring 3, and starts executing the user program. From the hardware's perspective, this looks identical to returning from an interrupt that happened to interrupt a user-mode program — which is exactly the trick.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;proc_entry_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;process_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;proc_current&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;asm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;volatile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"mov $0x23, %%ax &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"mov %%ax, %%ds  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"mov %%ax, %%es  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"mov %%ax, %%fs  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"mov %%ax, %%gs  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"push $0x23      &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// SS&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"push %0         &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// ESP&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"pushf           &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"pop %%eax       &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"or $0x200, %%eax&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// Set IF&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"push %%eax      &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// EFLAGS&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"push $0x1B      &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// CS (user code)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"push %1         &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// EIP&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="s"&gt;"iret"&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;user_esp&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;user_eip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"eax"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"memory"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Context switching uses a simple assembly stub in &lt;code&gt;switch.asm&lt;/code&gt;. It saves the callee-saved registers (EBP, EBX, ESI, EDI), stores ESP into the old process's slot, loads the new process's ESP, restores registers, and returns. The &lt;code&gt;ret&lt;/code&gt; instruction pops the return address from the new stack and resumes where that process left off.&lt;/p&gt;
&lt;p&gt;Scheduling is preemptive round-robin. The PIT (Programmable Interval Timer) fires at 1000 Hz. Every 10 ticks (10 ms), the IRQ handler calls &lt;code&gt;proc_schedule()&lt;/code&gt;, which finds the next READY process and switches to it. If no user processes are ready, control stays with PID 0 (the kernel/shell). This is the minimum viable scheduler — no priorities, no time slices, no fairness guarantees. But it works: two user programs printing characters to serial run concurrently, interleaved by the timer.&lt;/p&gt;
&lt;h3&gt;Syscalls&lt;/h3&gt;
&lt;p&gt;User programs communicate with the kernel through &lt;code&gt;int 0x80&lt;/code&gt;. The mechanism — a software interrupt that transitions from Ring 3 to Ring 0 — is the same one Linux used on i386 before &lt;code&gt;sysenter&lt;/code&gt; replaced it. The register convention is borrowed too: syscall number in EAX, arguments in EBX/ECX/EDX/ESI/EDI, return value in EAX. But that's where the resemblance ends.&lt;/p&gt;
&lt;p&gt;JokelaOS is not a UNIX. The syscall numbers are custom — exit is 0, write is 1, getpid is 2, read is 3 — not Linux's i386 table (where exit is 1, read is 3, write is 4, getpid is 20). There's no &lt;code&gt;fork()&lt;/code&gt;, no &lt;code&gt;exec()&lt;/code&gt;, no &lt;code&gt;open()&lt;/code&gt;, no &lt;code&gt;close()&lt;/code&gt;, no signals, no pipes. File descriptors 0 and 1 exist as concepts (stdin maps to the keyboard buffer, stdout maps to the serial port) but there's no file descriptor table behind them. The syscall handler just checks &lt;code&gt;if (fd == 1)&lt;/code&gt; and calls &lt;code&gt;serial_putchar()&lt;/code&gt;. The process model isn't UNIX either — there's no parent/child relationship, no &lt;code&gt;wait()&lt;/code&gt;, no process groups. Processes are created by the loader and scheduled round-robin until they exit. It's closer to a microcontroller RTOS (Real-Time Operating System) than to anything in the UNIX lineage.&lt;/p&gt;
&lt;p&gt;Four syscalls are implemented:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Number&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Arguments&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;SYS_EXIT&lt;/td&gt;
&lt;td&gt;ebx=status&lt;/td&gt;
&lt;td&gt;Terminate process&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;SYS_WRITE&lt;/td&gt;
&lt;td&gt;ebx=fd, ecx=buf, edx=len&lt;/td&gt;
&lt;td&gt;Write to serial (fd=1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;SYS_GETPID&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Return current PID&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;SYS_READ&lt;/td&gt;
&lt;td&gt;ebx=fd, ecx=buf, edx=len&lt;/td&gt;
&lt;td&gt;Read from keyboard (fd=0)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This is enough to write programs that print output, read input, identify themselves, and exit cleanly. The syscall dispatcher validates file descriptors (only 0 and 1 are legal) and bounds-checks lengths. SYS_WRITE sends bytes to the serial port; SYS_READ drains the keyboard buffer non-blocking.&lt;/p&gt;
&lt;p&gt;User programs are flat binaries — raw machine code with no headers, no relocations, no ELF (Executable and Linkable Format) parsing. The loader copies the binary to freshly allocated pages and jumps to byte zero. Programs that need to reference their own data use position-independent tricks:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;next&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; push EIP&lt;/span&gt;
&lt;span class="nl"&gt;next:&lt;/span&gt;
&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;ebp&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;; EBP = address of this instruction&lt;/span&gt;
&lt;span class="nf"&gt;lea&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;ecx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;ebp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;offset_to_data&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is the same technique used by shellcode and position-independent code on x86. It works because &lt;code&gt;call&lt;/code&gt; pushes the address of the next instruction, which gives you a known reference point relative to the code's actual load address.&lt;/p&gt;
&lt;h3&gt;The Shell&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/jokelaos/jokelaos1.png" alt="JokelaOS running in QEMU — ping output, login prompt, and ps command showing process table" style="max-width: 100%; border-radius: 6px; box-shadow: 0 10px 20px rgba(0,0,0,.1); margin: 0 0 1em 0;" loading="lazy"&gt;&lt;/p&gt;
&lt;p&gt;With all the subsystems in place, the shell ties them together into something interactive. &lt;code&gt;shell_run()&lt;/code&gt; is the kernel's main loop after initialization — it presents a login prompt, authenticates against the user table, and drops into a command interpreter.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="o"&gt;==============================&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;JokelaOS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="mf"&gt;.1&lt;/span&gt;
&lt;span class="o"&gt;==============================&lt;/span&gt;

&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GDT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;loaded&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ring&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ring&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TSS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;IDT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;loaded&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PIC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;remapped&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Multiboot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;confirmed&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Multiboot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x9500&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Bump&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;allocator&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ready&lt;/span&gt;

&lt;span class="n"&gt;PCI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;03.0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;vendor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;EC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8139&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RTL8139&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;RTL8139&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ready&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MAC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;52&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;54&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;56&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;IRQ&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;

&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;ramfs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;Users&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;guest&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;PMM&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;31269&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;free&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;122&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Paging&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;enabled&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MB&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;identity&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;mapped&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PIT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;timer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Hz&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Keyboard&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;serial&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ready&lt;/span&gt;

&lt;span class="n"&gt;JokelaOS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;alive&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="nl"&gt;login&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;
&lt;span class="nl"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;****&lt;/span&gt;
&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The shell supports: &lt;code&gt;help&lt;/code&gt;, &lt;code&gt;ls&lt;/code&gt;, &lt;code&gt;run &amp;lt;program&amp;gt;&lt;/code&gt;, &lt;code&gt;ps&lt;/code&gt;, &lt;code&gt;mem&lt;/code&gt;, &lt;code&gt;ping &amp;lt;ip&amp;gt;&lt;/code&gt;, &lt;code&gt;uptime&lt;/code&gt;, &lt;code&gt;whoami&lt;/code&gt;, and &lt;code&gt;logout&lt;/code&gt;. The line editor handles backspace. Password input echoes asterisks. The &lt;code&gt;run&lt;/code&gt; command loads a flat binary from ramfs, creates a process, and the scheduler picks it up on the next timer tick.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ps&lt;/code&gt; shows the process table:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;root$ ps
  PID  STATE
    0  RUNNING
    1  READY
    2  DEAD
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;mem&lt;/code&gt; shows memory usage:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;root$ mem
Heap used: 8832 bytes
PMM free:  31267 frames (122 MB)
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The keyboard input path is worth noting. The PS/2 keyboard controller fires IRQ 1. The handler reads the scancode from port 0x60, converts it to ASCII using a US QWERTY lookup table (with shift modifier tracking), and drops it into a 256-byte circular buffer. Serial input takes the same path — the UART's receive interrupt (IRQ 4) reads the incoming byte and injects it into the keyboard buffer. This means the shell works identically whether you're typing on a PS/2 keyboard or through the QEMU serial console.&lt;/p&gt;
&lt;h3&gt;The RAM File System&lt;/h3&gt;
&lt;p&gt;User programs need to live somewhere. With no disk driver, the file system is purely in-memory. &lt;code&gt;ramfs&lt;/code&gt; stores up to 32 files, each with a name (28 bytes), a data pointer, and a size. &lt;code&gt;ramfs_create()&lt;/code&gt; allocates space with the bump allocator and copies the binary in. &lt;code&gt;ramfs_find()&lt;/code&gt; does a linear search by name.&lt;/p&gt;
&lt;p&gt;During boot, two test programs are embedded directly in &lt;code&gt;kmain.c&lt;/code&gt; as byte arrays of hand-assembled x86 machine code. One prints the character '1' ten times; the other prints '2' ten times. Both use SYS_WRITE to output through the serial port and SYS_EXIT to terminate cleanly. They're loaded into ramfs, and &lt;code&gt;run print1&lt;/code&gt; from the shell executes them in user mode.&lt;/p&gt;
&lt;p&gt;This is about as minimal as a file system gets. No directories, no permissions, no deletion. But it demonstrates the complete path from "bytes in kernel memory" to "user-mode process executing with its own address space."&lt;/p&gt;
&lt;h3&gt;What I Learned&lt;/h3&gt;
&lt;p&gt;Writing a kernel from scratch teaches you things that no amount of reading about kernels will teach you. Some of these are technical. Most are about the nature of systems programming itself.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The boot process is the hardest part.&lt;/strong&gt; Not because the code is complex — &lt;code&gt;boot.asm&lt;/code&gt; is 33 lines — but because when something goes wrong, you have zero diagnostic capability. The serial port isn't initialized yet. The IDT isn't loaded. If your Multiboot header checksum is wrong by one bit, QEMU silently fails. You're debugging with QEMU's &lt;code&gt;-d int&lt;/code&gt; flag and reading hex dumps of interrupt frames.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;x86 protected mode is an archaeology project.&lt;/strong&gt; The PIC remapping sequence dates from the IBM PC/AT (1984). The GDT access bytes encode information in bit patterns designed for hardware that predates flat memory models. The TSS exists because Intel's original vision for the 286 involved hardware task switching that nobody ended up using. You're programming against forty years of backward compatibility, and every one of those layers is still there, still mandatory, still silently breaking things if you get it wrong.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The gap between "works in Ring 0" and "works in Ring 3" is enormous.&lt;/strong&gt; A kernel that runs entirely in supervisor mode can be surprisingly simple. The moment you add user mode, you need: the TSS (so the CPU knows where the kernel stack is), Ring 3 GDT segments, trap gates for syscalls, a mechanism to build fake interrupt frames for the initial &lt;code&gt;iret&lt;/code&gt; into user mode, and careful validation of every pointer that crosses the kernel boundary. Each of these is individually straightforward. Getting them all correct simultaneously is where the real difficulty lies.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Preemptive scheduling is simpler than it sounds.&lt;/strong&gt; The concept — save state, pick next process, restore state — translates almost directly into code. The context switch is twelve instructions of assembly. The scheduler is a for loop. What makes it tricky is the interaction with everything else: the TSS must be updated, the interrupt must send EOI before switching, the process's kernel stack must be set up so that restoring registers and returning lands in the right place. The scheduler itself is trivial. The invariants it depends on are not.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Writing a network stack is an exercise in byte ordering.&lt;/strong&gt; Ethernet is big-endian. x86 is little-endian. IP addresses, port numbers, checksums, packet lengths — every multi-byte field requires explicit conversion. Miss one &lt;code&gt;htons()&lt;/code&gt; and your packets are valid-looking garbage. The RTL8139 driver, the ARP implementation, the IP checksum — each is maybe fifty lines. The debugging when a byte is swapped is hours.&lt;/p&gt;
&lt;h3&gt;The Numbers&lt;/h3&gt;
&lt;p&gt;JokelaOS in its current form:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Files&lt;/th&gt;
&lt;th&gt;Approximate LOC&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Boot (ASM)&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;~120&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel core&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;~1,200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Drivers&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;~250&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network stack&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;~450&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;26&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~2,000&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Two thousand lines for a kernel that boots, manages memory with paging, runs preemptive multitasking with Ring 3 isolation, handles interrupts, implements syscalls, has a working network stack, and provides an interactive shell. No line is borrowed from another project. Every byte is accounted for.&lt;/p&gt;
&lt;p&gt;The entire thing builds in under a second and the binary is around 40 KB. &lt;code&gt;make run&lt;/code&gt; goes from source to a running kernel in QEMU in about two seconds. This fast iteration cycle is what made the project possible — every subsystem was tested immediately after being written, and bugs were caught before they could compound.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;JokelaOS is a foundation, not a finished product. The obvious next steps are a proper virtual memory manager (per-process page directories instead of a shared identity map), a real file system (even a simple FAT12 (File Allocation Table, 12-bit) would be a significant step up from ramfs), and ELF binary loading. Beyond that: a disk driver would unlock persistence, TCP would make the network stack actually useful, and a proper &lt;code&gt;fork()&lt;/code&gt;/&lt;code&gt;exec()&lt;/code&gt; would make the process model complete.&lt;/p&gt;
&lt;p&gt;But the point of JokelaOS was never to build a production operating system. The point was to understand what an operating system actually does — not in the abstract, not from a textbook diagram, but in the specific, concrete sense of "these bytes go into these ports in this order and then the hardware does this thing." Every subsystem in JokelaOS exists because I wanted to understand it, and the only way to truly understand a piece of systems software is to write it yourself.&lt;/p&gt;
&lt;p&gt;The source code is on &lt;a href="https://baud.rs/B9FPjG"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;</description><category>assembly</category><category>bare metal</category><category>c</category><category>kernel</category><category>multitasking</category><category>networking</category><category>osdev</category><category>paging</category><category>qemu</category><category>systems programming</category><category>x86</category><guid>https://tinycomputers.io/posts/jokelaos-bare-metal-x86-kernel.html</guid><pubDate>Tue, 10 Mar 2026 15:00:00 GMT</pubDate></item><item><title>The Cathedral and the Bazaar, Nearly 30 Years Later</title><link>https://tinycomputers.io/posts/the-cathedral-and-the-bazaar-nearly-30-years-later.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-cathedral-and-the-bazaar-nearly-30-years-later_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;20 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/cathedral-bazaar/eric-raymond.jpg" alt="Eric S. Raymond" style="float: right; margin: 0 0 15px 20px; max-width: 240px; border-radius: 6px;" title="Eric S. Raymond, author of 'The Cathedral and the Bazaar.' Photo by jerone2, CC BY-SA 2.0, via Wikimedia Commons."&gt;&lt;/p&gt;
&lt;p&gt;In 1997, Eric S. Raymond presented a paper at the Linux Kongress in Bavaria that would reshape how an entire industry thought about building software. "The Cathedral and the Bazaar" drew a sharp line between two models of development. The cathedral: careful, centralized, release-when-ready. The bazaar: open, decentralized, release-early-and-often. Raymond argued, with considerable evidence from the Linux kernel and his own fetchmail project, that the bazaar would win.&lt;/p&gt;
&lt;p&gt;Nearly three decades later, we can evaluate the claim. And the answer is more interesting than a simple yes or no.&lt;/p&gt;
&lt;h3&gt;What Raymond Actually Argued&lt;/h3&gt;
&lt;p&gt;The essay's core thesis was that certain software problems — particularly large, complex ones — were better solved by decentralized communities than by centralized teams. Raymond distilled this into several principles, the most famous being Linus's Law: "Given enough eyeballs, all bugs are shallow." With enough contributors examining source code, every bug would be obvious to someone.&lt;/p&gt;
&lt;p&gt;He identified several supporting dynamics. Release early and often. Treat your users as co-developers. If you treat beta testers as your most valuable resource, they'll respond by becoming your most valuable resource. Keep the architecture modular enough that contributors can work on pieces independently.&lt;/p&gt;
&lt;p&gt;The implicit assumption was ideological as much as technical. Open-source development would succeed because it aligned individual motivation (scratching a personal itch, building reputation, the intellectual satisfaction of solving problems) with collective benefit. No corporate hierarchy required. No cathedral architects directing the work from above.&lt;/p&gt;
&lt;p&gt;It was, in its way, a profoundly optimistic vision of human coordination.&lt;/p&gt;
&lt;p&gt;Here's what makes the timing remarkable: when Raymond presented his paper in 1997, the term "open source" didn't exist. He was writing about "free software" and the Linux development model. The phrase was coined months later, in February 1998, at a strategy session in Palo Alto — partly catalyzed by the essay's success and Netscape's decision to release the Navigator source code (a decision Raymond's essay directly influenced). The Open Source Initiative followed weeks after that, co-founded by Raymond and Bruce Perens.&lt;/p&gt;
&lt;p&gt;The movement Raymond was describing was young. The GNU Project was fourteen years old. The Free Software Foundation was twelve. The GPL was eight. Linux itself was only six. BSD had been circulating since the late 1970s, but the &lt;a href="https://tinycomputers.io/posts/how-bsds-licensing-issues-paved-the-way-for-linuxs-rise-to-prominence.html"&gt;legal battles&lt;/a&gt; that nearly killed it were barely resolved. There was no GitHub, no SourceForge, no standardized workflow for distributed contribution. The bazaar Raymond championed was a handful of mailing lists, FTP servers, and the sheer force of Linus Torvalds's integrative judgment.&lt;/p&gt;
&lt;p&gt;The essay didn't just describe a revolution. It named one that hadn't named itself yet.&lt;/p&gt;
&lt;h3&gt;The Bazaar Won&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/cathedral-bazaar/grand-bazaar-istanbul.jpg" alt="Interior of the Grand Bazaar, Istanbul" style="max-width: 100%; border-radius: 6px; margin-bottom: 15px;" title="The Grand Bazaar in Istanbul — one of the oldest and largest covered markets in the world. Photo by Slyronit, CC BY-SA 4.0, via Wikimedia Commons."&gt;&lt;/p&gt;
&lt;p&gt;By any quantitative measure, Raymond was right. Linux runs the cloud. Android runs the phone. Firefox and then Chromium reshaped the browser. Apache and then Nginx served the web. PostgreSQL and MySQL handled the data. Python, Ruby, Node.js, Rust, Go — the languages that define modern development are overwhelmingly open-source.&lt;/p&gt;
&lt;p&gt;The numbers are staggering. GitHub hosts over 400 million repositories. The Linux kernel has received contributions from over 20,000 individual developers. Every major cloud provider — Amazon, Microsoft, Google — runs on open-source infrastructure. Even Microsoft, which once called Linux a "cancer," now contributes to it, acquired GitHub, and ships a Linux kernel inside Windows.&lt;/p&gt;
&lt;p&gt;If you'd told someone in 1997 that the world's most valuable companies would run their businesses on software they didn't own and couldn't fully control, they would have questioned your judgment. Raymond's prediction wasn't just right. It was conservative.&lt;/p&gt;
&lt;h3&gt;The Cathedral Came Back Wearing Bazaar Clothes&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/cathedral-bazaar/toledo-cathedral.jpg" alt="Interior of the Gothic Cathedral of Toledo, Spain" style="float: left; margin: 0 20px 15px 0; max-width: 300px; border-radius: 6px;" title="Interior of the Cathedral of Toledo, Spain. Photo by Adam Jones, CC BY-SA 3.0, via Wikimedia Commons."&gt;&lt;/p&gt;
&lt;p&gt;Here's where Raymond's vision diverges from what actually happened. The bazaar won, but the cathedrals adapted.&lt;/p&gt;
&lt;p&gt;Meta open-sources PyTorch and Llama. Google open-sources TensorFlow, Kubernetes, Android, and Chromium. Microsoft open-sources VS Code, TypeScript, and .NET. Amazon builds its most profitable business on top of open-source databases, then offers them as managed services. These are not acts of ideological commitment. They are strategic decisions made by organizations with cathedral-scale resources and cathedral-scale ambitions.&lt;/p&gt;
&lt;p&gt;The pattern is consistent: open-source the layer you want to commoditize, then capture value at the layer above. Google open-sources Android to commoditize mobile operating systems, then captures value through the Play Store and advertising. Meta open-sources PyTorch to commoditize the AI framework layer, then captures value through the models and services built on top. Amazon doesn't need to own the database — it needs to own the infrastructure the database runs on.&lt;/p&gt;
&lt;p&gt;This is what Raymond didn't anticipate. The bazaar model wasn't just adopted by idealists scratching personal itches. It was weaponized by the most powerful corporations in history as a competitive strategy. The &lt;a href="https://tinycomputers.io/posts/how-bsds-licensing-issues-paved-the-way-for-linuxs-rise-to-prominence.html"&gt;BSD licensing disputes&lt;/a&gt; that shaped early open-source history look almost quaint compared to the strategic licensing wars that followed.&lt;/p&gt;
&lt;p&gt;There's a personal irony here too. Raymond himself wasn't immune to the cathedral's gravitational pull. He received 150,000 pre-IPO shares of VA Linux, briefly making him worth approximately \$36 million. He wrote an essay called "Surprised by Wealth" about the experience, pledging that the money wouldn't change him. By April 2002, the shares were &lt;a href="https://workbench.cadenhead.org/news/3149/eric-s-raymond-bazaar-financial-advisor"&gt;worth \$195,000&lt;/a&gt; — he'd held through the entire collapse without selling. The bazaar's chief evangelist got rich, briefly, through the most cathedral-scale financial mechanism in capitalism: Wall Street pre-IPO allocations. The wealth came and went through institutions the bazaar model was supposed to make irrelevant.&lt;/p&gt;
&lt;p&gt;Joel Spolsky described this dynamic in 2002 as "commoditize your complements." Open-source your competitors' profit center, and your own products become more valuable. But even Spolsky didn't fully see how far it would go. In 2026, the bazaar is less a revolutionary alternative to the cathedral than a resource the cathedral harvests.&lt;/p&gt;
&lt;h3&gt;The Efficiency That Created More, Not Less&lt;/h3&gt;
&lt;p&gt;Raymond's essay focused on the development model — how code gets written, reviewed, and shipped. What he didn't explore was the economic consequence of making infrastructure-quality software free.&lt;/p&gt;
&lt;p&gt;When the bazaar model succeeded, it didn't just change how software was built. It changed how much software existed. By making operating systems, web servers, databases, programming languages, and frameworks available at zero marginal cost, the bazaar removed the floor from the cost of building new things. A startup in 2005 could do what a well-funded company in 1995 could not, because the entire stack was free.&lt;/p&gt;
&lt;p&gt;The result wasn't less total development effort. It was dramatically more. Linux didn't consolidate the operating system landscape into one efficient platform — it spawned hundreds of distributions, each with its own community, its own design philosophy, its own ecosystem. Free databases didn't mean fewer databases. It meant PostgreSQL, MySQL, MariaDB, SQLite, MongoDB, Redis, CockroachDB, and dozens more, each serving demand that wouldn't have existed if everyone had to pay Oracle prices.&lt;/p&gt;
&lt;p&gt;This pattern — efficiency gains leading to expanded consumption rather than reduced effort — &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;has a name in economics&lt;/a&gt;, and it shows up everywhere technology reduces the cost of a critical input. The bazaar made software infrastructure cheap, and the world responded by building more software than anyone in 1997 could have imagined.&lt;/p&gt;
&lt;p&gt;There's a second-order effect too. By making infrastructure free, the bazaar lowered the cost of building &lt;em&gt;on top of&lt;/em&gt; that infrastructure. Entire industries — SaaS, cloud computing, the modern startup ecosystem — simply wouldn't have been viable if everyone had to pay cathedral-model prices for their stack. The &lt;a href="https://tinycomputers.io/posts/what-visicalc-teaches-us-about-ai.html"&gt;VisiCalc pattern&lt;/a&gt; repeated itself: a tool that was supposed to eliminate work created new categories of work that dwarfed the original.&lt;/p&gt;
&lt;p&gt;And Raymond's own principle — treat users as co-developers — is itself a demand-expanding dynamic. Converting consumers of software into producers means the resource (developer attention) gets deployed more broadly, not more efficiently. More people write code because more people &lt;em&gt;can&lt;/em&gt; write code, because the tools are free, the examples are public, and the barrier to participation is a GitHub account.&lt;/p&gt;
&lt;h3&gt;What Raymond Got Wrong&lt;/h3&gt;
&lt;p&gt;The essay's blind spots have become painfully clear.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Maintainer burnout.&lt;/strong&gt; Raymond assumed that contributor motivation was self-sustaining — people would keep showing up because the work was interesting. He didn't account for the dynamics that emerge when a hobby project becomes critical infrastructure. The OpenSSL library, maintained for years by a handful of volunteers, secured the majority of encrypted web traffic until the Heartbleed vulnerability in 2014 revealed how thin the maintenance layer really was. The left-pad incident, the core-js crisis, the Log4j vulnerability — each demonstrated that the bazaar's supply of labor is not inexhaustible. It concentrates on the exciting work and neglects the essential work.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Free-riding at scale.&lt;/strong&gt; The essay assumed a rough symmetry between use and contribution. The reality is asymmetric: billions of dollars in commercial value extracted from projects maintained by unpaid or underpaid developers. Amazon took Elasticsearch and offered it as a managed service. When Elastic changed their license to prevent this, the open-source community split. MongoDB, Redis, and HashiCorp followed similar paths — companies that built open-source projects, watched cloud providers commoditize them, and responded by restricting their licenses.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Security supply chains.&lt;/strong&gt; A bazaar has no gatekeepers, which Raymond saw as a feature. It's also a vulnerability. The SolarWinds attack, dependency confusion attacks, typosquatting on npm — these exploit the trust model that makes the bazaar work. When anyone can contribute, anyone includes adversaries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Governance.&lt;/strong&gt; Raymond wrote about the bazaar as if the only governance question was technical: who decides which patches get merged? The real governance questions turned out to be social and economic: who funds maintenance? Who decides licensing changes? Who gets to use the work commercially? These questions have no bazaar-native answers. They require institutions — foundations, companies, legal frameworks — which is to say, they require cathedrals.&lt;/p&gt;
&lt;h3&gt;The Licensing Wars&lt;/h3&gt;
&lt;p&gt;The clearest evidence that Raymond's framework was incomplete is the licensing landscape of 2026.&lt;/p&gt;
&lt;p&gt;The GPL, which Richard Stallman designed to ensure that modified software remained free, worked well in a world where software was distributed as binaries. The cloud broke that model. If you run GPL software as a service, you never "distribute" it — users interact with the output, not the code. The software is free in theory and proprietary in practice.&lt;/p&gt;
&lt;p&gt;The response was a proliferation of new licenses. The AGPL closed the cloud loophole by requiring source availability for network services. The Business Source License (BSL) made code available to read but restricted commercial use until a time-delayed release to open source. The Server Side Public License (SSPL) required that anyone offering the software as a service must open-source their entire stack.&lt;/p&gt;
&lt;p&gt;Each of these represents a partial retreat from the bazaar model. Not back to the cathedral — the code is still visible, forkable, auditable — but to something Raymond didn't envision: a commons with fences. The ideological purity of "free as in freedom" collided with the economic reality that freedom without reciprocity becomes exploitation.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://tinycomputers.io/posts/how-bsds-licensing-issues-paved-the-way-for-linuxs-rise-to-prominence.html"&gt;BSD licensing story&lt;/a&gt; foreshadowed this. The permissive BSD license allowed commercial forks without contribution back. This wasn't a problem when the commercial ecosystem was small. When the commercial ecosystem became the entire cloud computing industry, the lack of reciprocity became untenable for projects that couldn't attract cathedral-scale corporate sponsorship.&lt;/p&gt;
&lt;h3&gt;What Raymond Got Right&lt;/h3&gt;
&lt;p&gt;Despite these blind spots, the essay's core insight has proven durable: for certain classes of problems, decentralized coordination outperforms centralized planning.&lt;/p&gt;
&lt;p&gt;This isn't because decentralized systems are morally superior. It's because they solve the information problem differently. A cathedral architect must understand the entire system well enough to direct work from above. A bazaar participant only needs to understand their local patch well enough to improve it. As systems grow in complexity, the information burden on the cathedral architect grows faster than the burden on any individual bazaar participant.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/cathedral-bazaar/linus-torvalds.jpg" alt="Linus Torvalds at LinuxCon Europe 2014" style="float: right; margin: 0 0 15px 20px; max-width: 260px; border-radius: 6px;" title="Linus Torvalds at LinuxCon Europe, 2014. Photo by Krd, CC BY-SA 4.0, via Wikimedia Commons."&gt;&lt;/p&gt;
&lt;p&gt;The Linux kernel is the proof. No single person understands the entire Linux kernel. It's too large, too complex, spanning too many hardware architectures and subsystems. But the kernel works — and works remarkably well — because the development model doesn't require any single person to understand it all. Subsystem maintainers understand their domains. Linus Torvalds understands the integration points. Contributors understand the specific problems they're solving. The architecture of the development process mirrors the architecture of the software.&lt;/p&gt;
&lt;p&gt;This insight extends beyond software. Wikipedia works on bazaar principles. Citizen science projects like Galaxy Zoo and Foldit leverage distributed human attention. Even hardware design is slowly moving in this direction, though the marginal cost of atoms versus bits &lt;a href="https://tinycomputers.io/posts/why-some-chips-last-40-years.html"&gt;creates structural barriers&lt;/a&gt; that software doesn't face. The concept of &lt;a href="https://tinycomputers.io/posts/why-some-chips-last-40-years.html"&gt;second-sourcing&lt;/a&gt; — multiple manufacturers producing compatible chips — is, in a sense, the hardware world's version of the bazaar. The Z80 survived for nearly fifty years partly because Zilog couldn't monopolize it.&lt;/p&gt;
&lt;p&gt;Raymond also got the motivational model roughly right, even if the details were off. People do contribute to open-source projects for intrinsic reasons — intellectual satisfaction, reputation, the desire to solve problems that matter to them personally. The mistake was assuming these motivations were sufficient at industrial scale, without institutional support.&lt;/p&gt;
&lt;h3&gt;The Bazaar in 2026&lt;/h3&gt;
&lt;p&gt;The open-source landscape of 2026 bears little resemblance to what Raymond described in 1997, but the dynamics he identified are still operating.&lt;/p&gt;
&lt;p&gt;The bazaar model made software infrastructure so cheap that it created more demand for software than any cathedral could have supplied. It enabled the cloud, the startup ecosystem, the AI revolution — all built on free foundations. The efficiency didn't reduce consumption. It unlocked latent demand that dwarfed the original market.&lt;/p&gt;
&lt;p&gt;At the same time, the cathedral never disappeared. It adapted. The most sophisticated cathedrals now build bazaars strategically — open-sourcing frameworks and tools that make their own proprietary services more valuable. Meta's contribution to PyTorch isn't charity. Google's contribution to Kubernetes isn't ideology. They're infrastructure investments that make the entire ecosystem dependent on capabilities only cathedral-scale organizations can provide.&lt;/p&gt;
&lt;p&gt;The result is a layered system more nuanced than Raymond's binary. At the bottom: genuine bazaar-model projects maintained by communities (the Linux kernel, PostgreSQL, countless libraries). In the middle: corporate-sponsored projects that look like bazaars but serve cathedral strategies (Kubernetes, Chromium, Llama). At the top: proprietary services built on open foundations (AWS, Google Cloud, OpenAI's API).&lt;/p&gt;
&lt;p&gt;Each layer depends on the ones below it. Each layer captures value differently. And the whole structure is held together by a web of licenses, foundations, corporate agreements, and social norms that Raymond's 1997 essay couldn't have anticipated.&lt;/p&gt;
&lt;p&gt;What's strangest about this arrangement is its circularity. Corporations adopted open source because it was free and good. Volunteer maintainers couldn't scale to meet corporate demand — Heartbleed and Log4j proved that. So corporations began funding open-source projects to keep their own infrastructure stable. But funding brought governance influence. The top Linux kernel contributors aren't hobbyists scratching personal itches. They're engineers employed by Google, Microsoft, Red Hat, Intel, and Huawei, steering the roadmap toward their employers' needs. Kubernetes evolves in ways that benefit Google Cloud. PyTorch evolves in ways that benefit Meta's AI stack.&lt;/p&gt;
&lt;p&gt;The projects became dependent on corporate funding. But the corporations became equally dependent on the projects. If Google pulled out of Kubernetes, the project would struggle. If Kubernetes collapsed, Google Cloud would struggle. So Google funds it more, which deepens the entanglement, which makes withdrawal more costly, which demands more funding. The snake eats its own tail.&lt;/p&gt;
&lt;p&gt;Google and Amazon compete ferociously in cloud computing, but they cooperate on the same open-source infrastructure that both their businesses require. They're rivals building on shared foundations that neither can afford to let fail and neither fully controls. The commons isn't independent anymore — but neither are the corporations.&lt;/p&gt;
&lt;p&gt;Raymond imagined the bazaar as freedom from institutional dependency. What emerged is mutual capture. The cathedral could fire its architects. The bazaar's corporate sponsors can't walk away from the bazaar, and the bazaar can't survive without them. Independence became entanglement — and the entanglement, paradoxically, is what makes the system work.&lt;/p&gt;
&lt;h3&gt;The Essay Worth Rereading&lt;/h3&gt;
&lt;p&gt;Raymond saw something real about how coordination works in networks. He was right that the bazaar model could produce software of extraordinary quality and scale. He was right that decentralized development could solve problems that centralized approaches couldn't. He was right that open-source would reshape the industry.&lt;/p&gt;
&lt;p&gt;He was wrong about the institutional vacuum. The bazaar didn't eliminate the need for cathedrals — it changed what cathedrals do. They no longer build the infrastructure. They build on top of it, around it, and through it. The most powerful technology companies in the world are cathedral organizations that have learned to cultivate bazaars for strategic advantage.&lt;/p&gt;
&lt;p&gt;"The Cathedral and the Bazaar" is worth rereading in 2026 not because it predicted the future correctly — no essay could, across three decades — but because it identified dynamics that, once set in motion, produced outcomes no one predicted. The bazaar made software free, and free software made more software. The cathedrals adapted, and their adaptation made the bazaar more important, not less. Raymond's binary became a symbiosis that neither model, alone, could have produced.&lt;/p&gt;
&lt;p&gt;The essay ends with Raymond quoting Robert Browning: "A man's reach should exceed his grasp, or what's a heaven for?" The reach exceeded. The grasp caught something different than expected. That's not a failure of vision. That's how ideas work when they meet reality.&lt;/p&gt;</description><category>corporate strategy</category><category>economics</category><category>eric raymond</category><category>free software</category><category>gpl</category><category>history</category><category>licensing</category><category>linux</category><category>open-source</category><category>software</category><guid>https://tinycomputers.io/posts/the-cathedral-and-the-bazaar-nearly-30-years-later.html</guid><pubDate>Mon, 09 Mar 2026 16:00:00 GMT</pubDate></item><item><title>Why Some Chips Last 40+ Years: Z80, 68k, 6502, and the Secret to Processor Longevity</title><link>https://tinycomputers.io/posts/why-some-chips-last-40-years.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;figure&gt;&lt;img src="https://tinycomputers.io/images/zilog-z80.jpg"&gt;&lt;/figure&gt; &lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/why-some-chips-last-40-years_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;22 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;There's a Zilog Z80 in a graphing calculator sitting in a high school classroom right now. The student using it was born around 2008. The Z80 was designed in 1976. That processor is older than the student's parents.&lt;/p&gt;
&lt;p&gt;This isn't a quirky footnote. It's a pattern. The Z80, the Motorola 68000, the MOS Technology 6502, the Intel 8051 — these processors have been in continuous production and active deployment for forty years or more. The Z80 is closing in on fifty. Meanwhile, processors that were objectively superior by nearly every technical measure — the Zilog Z8000, the National Semiconductor 32016, the Motorola 88000, the Intel i960 — are footnotes in Wikipedia articles that nobody reads.&lt;/p&gt;
&lt;p&gt;What determines whether a processor lives for decades or dies in five years? I've spent the last two years building &lt;a href="https://tinycomputers.io/posts/clean-room-z80-emulator.html"&gt;Z80 emulators&lt;/a&gt;, writing &lt;a href="https://tinycomputers.io/posts/building-language-compilers-for-the-z80.html"&gt;compilers for the Z80&lt;/a&gt;, running &lt;a href="https://tinycomputers.io/posts/cpm-on-physical-retroshield-z80.html"&gt;CP/M on physical RetroShield hardware&lt;/a&gt;, and exploring the &lt;a href="https://tinycomputers.io/posts/motorola-68000-processor-and-the-ti-89-graphing-calculator.html"&gt;Motorola 68000 through TI calculators&lt;/a&gt;. I've read William Barden's &lt;a href="https://tinycomputers.io/posts/the-z80-microcomputer-handbook-william-barden.html"&gt;1978 handbook&lt;/a&gt; that was still being reprinted in 1985, and Steve Ciarcia's &lt;a href="https://tinycomputers.io/posts/build-your-own-z80-computer-steve-ciarcia.html"&gt;build-your-own guide&lt;/a&gt; that assumed you'd wire up a computer from discrete chips. The deeper I've gone into this world, the more convinced I've become that processor longevity isn't really about the processor. It's about everything around it.&lt;/p&gt;
&lt;h3&gt;The Survivors&lt;/h3&gt;
&lt;p&gt;Four processors stand out for their extraordinary longevity. Each was introduced in the mid-to-late 1970s. Each is still manufactured or cloned today.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Zilog Z80&lt;/strong&gt; (1976) was designed by Federico Faggin and Masatoshi Shima, both of whom had worked on the Intel 4004 and 8080. The Z80 was explicitly designed as a better 8080 — backward-compatible with the 8080's instruction set but adding indexed addressing, a second register bank, a built-in DRAM refresh counter, and a single 5V power supply (the 8080 needed three voltage rails). It became the heart of CP/M machines, arcade cabinets, and eventually TI graphing calculators. Zilog's CMOS variant, the Z84C00, was manufactured continuously until &lt;a href="https://baud.rs/IboIHD"&gt;April 2024&lt;/a&gt;, when Littelfuse — Zilog's current owner — finally announced end-of-life after 48 years. The eZ80, a backward-compatible enhanced variant, continues in production, and third-party clones remain available. The Z80 instruction set isn't going anywhere even if the original silicon is.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The MOS Technology 6502&lt;/strong&gt; (1975) was designed by Chuck Peddle and Bill Mensch after they left Motorola. At \$25 when competing processors cost \$150-\$300, the 6502 was a revolution in affordability. It powered the Apple II, the Commodore 64, the Atari 2600, and the NES. Bill Mensch's Western Design Center still manufactures the W65C02S and W65C816S today — fifty years after the original design.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/chip-longevity/hitachi-hd68000.jpg" alt="Hitachi HD68000 — a second-sourced clone of the Motorola MC68000" style="width: 340px; box-shadow: 0 30px 40px rgba(0,0,0,.1); float: right; margin: 0 0 20px 20px;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Motorola 68000&lt;/strong&gt; (1979) was the 32-bit processor that arrived a generation early. With a linear 24-bit address space and an orthogonal instruction set that programmers genuinely enjoyed using, it became the foundation for the original Macintosh, the Amiga, the Atari ST, the Sega Genesis, and Sun's first workstations. Its descendants — the 68020, 68030, 68040, ColdFire, and now NXP's modern variants — kept the architecture alive in embedded systems, automotive controllers, and &lt;a href="https://tinycomputers.io/posts/motorola-68000-processor-and-the-ti-89-graphing-calculator.html"&gt;Texas Instruments calculators&lt;/a&gt; well into the 2020s.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/chip-longevity/intel-p8051.jpg" alt="Intel P8051 microcontroller in DIP-40 package" style="width: 340px; box-shadow: 0 30px 40px rgba(0,0,0,.1); float: left; margin: 0 20px 20px 0;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Intel 8051&lt;/strong&gt; (1980) is perhaps the most quietly ubiquitous processor ever made. Designed as a microcontroller — a processor with RAM, ROM, timers, and I/O ports integrated on a single chip — the 8051 found its way into everything from washing machines to automotive engine controllers to industrial PLCs. Over two dozen companies have manufactured 8051 variants. If you've used an appliance, driven a car, or walked through a building with an elevator in the last forty years, you've interacted with an 8051 derivative.&lt;/p&gt;
&lt;p&gt;The 8051 is also a case study in &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox&lt;/a&gt; applied to silicon. As more manufacturers licensed and produced the 8051, unit costs fell. As unit costs fell, engineers designed it into applications that would never have justified a microcontroller at the original price — a toaster, a thermostat, a toy. Each new application expanded the market, which attracted more manufacturers, which drove costs lower still. The cycle fed itself for decades. Technically superior alternatives existed at every point along this curve, but they couldn't compete with an architecture whose ecosystem was compounding while their price-per-unit was still on the wrong side of the volume curve.&lt;/p&gt;
&lt;h3&gt;The Fallen&lt;/h3&gt;
&lt;p&gt;For every processor that lasted decades, dozens vanished. Some of these were technically impressive — arguably more capable than the survivors.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;Zilog Z8000&lt;/strong&gt; (1979), designed as the Z80's successor, offered a 16-bit architecture with segmented memory addressing. It was more powerful than the Z80 in every measurable way. It lasted roughly five years in the market before fading into obscurity. The segmented memory model — the same curse that plagued Intel's 8086/286 — made programming painful. And critically, it wasn't backward-compatible with the Z80. Every Z80 program, every CP/M application, every line of existing code was useless on the Z8000. Zilog was asking customers to abandon their entire software investment.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;Motorola 88000&lt;/strong&gt; (1988) was Motorola's clean-sheet RISC design, intended to eventually replace the 68k family. It was technically excellent — pipelined, superscalar-capable, and well-designed. Motorola couldn't sell it. Customers had millions of lines of 68k code, working products, trained engineers, and proven toolchains. The 88000 offered better performance but required abandoning everything. Motorola eventually surrendered and joined IBM and Apple to create the PowerPC, which at least had the marketing muscle of three companies behind it.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;National Semiconductor 32016&lt;/strong&gt; (1982) was a full 32-bit processor at a time when the PC world was still on 16-bit. It was used in the Acorn Cambridge Workstation and a few other systems. It had bugs. The early silicon had errata that made reliable system design difficult. By the time National got the bugs out, the market had moved on.&lt;/p&gt;
&lt;p&gt;The pattern is consistent: technical superiority alone doesn't determine survival.&lt;/p&gt;
&lt;h3&gt;Five Factors That Determine Processor Longevity&lt;/h3&gt;
&lt;p&gt;After spending years in this world, I've identified five factors that separate the survivors from the fallen. They're listed roughly in order of importance — which is not the order most engineers would expect.&lt;/p&gt;
&lt;h4&gt;1. Second-Sourcing and Licensing&lt;/h4&gt;
&lt;p&gt;This is the single most important factor, and it's the one that engineers consistently underrate because it's a business decision, not a technical one.&lt;/p&gt;
&lt;p&gt;The Z80 was second-sourced by Mostek, SGS-Thomson, Sharp, NEC, Toshiba, Samsung, and others. When &lt;a href="https://www.littelfuse.com/"&gt;Littelfuse&lt;/a&gt;, the current owner of Zilog, finally discontinued the standalone Z84C00 in 2024, the instruction set didn't die — because it was never dependent on a single manufacturer. This is exactly what second-sourcing was designed to protect against. It mattered enormously to design engineers in the 1980s and 1990s, because committing a product design to a single-source processor was career-threatening. If your sole supplier had a fab fire, or went out of business, or simply decided to discontinue the chip, your product was dead.&lt;/p&gt;
&lt;p&gt;The 6502 was licensed to multiple manufacturers — Rockwell, Synertek, GTE, and later CMD and the Western Design Center. The 8051 took this to its logical extreme: Intel actively encouraged licensing, and the architecture was eventually manufactured by Atmel, Philips/NXP, Silicon Labs, Dallas/Maxim, Infineon, and dozens more. The 8051 became less a product and more a standard — an instruction set architecture that any competent semiconductor company could implement. It was, in hindsight, a preview of the model that ARM and RISC-V would later formalize: sell the design, not the chip, and let the ecosystem do the rest.&lt;/p&gt;
&lt;p&gt;The 68000 family was produced by Motorola, Hitachi, Signetics, Mostek, and Toshiba. Later, the ColdFire and subsequent architectures maintained enough compatibility to keep the ecosystem alive under Freescale and then NXP.&lt;/p&gt;
&lt;p&gt;The x86 architecture tells the same story at a larger scale. IBM refused to use Intel's 8088 in the original PC without a second source. That requirement forced Intel to license the design to AMD — a decision Intel spent the next four decades regretting and litigating. But the resulting duopoly is a major reason x86 survived the RISC revolution of the 1990s. When Sun, SGI, and DEC were pushing SPARC, MIPS, and Alpha, customers considering a switch to RISC had to weigh superior performance against the uncomfortable fact that each RISC architecture had exactly one supplier. x86 had two. That mattered more than clock speeds.&lt;/p&gt;
&lt;p&gt;Contrast all of this with the Z8000, which was essentially Zilog-only. Or the 88000, which was Motorola-only. Single-source processors carry existential risk for every product that uses them. Purchasing managers know this even when engineers don't.&lt;/p&gt;
&lt;h4&gt;2. Ecosystem and Toolchain Maturity&lt;/h4&gt;
&lt;p&gt;A processor without a mature toolchain is a science project. A processor with assemblers, compilers, debuggers, reference designs, application notes, textbooks, and a community of experienced engineers is an ecosystem.&lt;/p&gt;
&lt;p&gt;The Z80 ecosystem by the mid-1980s was staggering. There were books — &lt;a href="https://baud.rs/EZ3Bwg"&gt;Rodnay Zaks' &lt;em&gt;Programming the Z80&lt;/em&gt;&lt;/a&gt;, Barden's &lt;a href="https://baud.rs/5brWaW"&gt;handbook&lt;/a&gt;, Ciarcia's &lt;a href="https://baud.rs/kiLcPY"&gt;build guide&lt;/a&gt;, Coffron's &lt;a href="https://baud.rs/3hw1CF"&gt;applications manual&lt;/a&gt; — available at any technical bookstore. There were assemblers, C compilers, BASIC interpreters, and Forth systems. There were thousands of CP/M applications. There were magazines publishing Z80 projects monthly. There were university courses teaching Z80 assembly. Every year, this ecosystem grew, and every year, the cost of switching to a different processor increased.&lt;/p&gt;
&lt;p&gt;The 6502 had a similar ecosystem, driven heavily by the Apple II and Commodore 64 communities. The 8051 accumulated the largest ecosystem of any microcontroller family, with Keil (now ARM), IAR, SDCC, and many other toolchains providing development environments across every host platform.&lt;/p&gt;
&lt;p&gt;When I wrote about &lt;a href="https://tinycomputers.io/posts/how-we-learned-hardware-in-1983.html"&gt;how we learned hardware in 1983&lt;/a&gt;, I was documenting a snapshot of this ecosystem at its peak. Those books, those reference designs, those shared conventions — they weren't just educational resources. They were infrastructure. And infrastructure, once built, resists replacement.&lt;/p&gt;
&lt;h4&gt;3. ISA Simplicity and Predictability&lt;/h4&gt;
&lt;p&gt;There's a counterintuitive truth about instruction set architecture: the "best" ISA often isn't the one that survives. The one that survives is the one that's simple enough to implement cheaply, predictable enough to verify thoroughly, and small enough to teach in a semester.&lt;/p&gt;
&lt;p&gt;The Z80's instruction set is large by 8-bit standards — 158 base instructions with variants pushing toward 700 when you count all the addressing modes. But the fundamental execution model is simple: fetch an instruction, decode it, execute it. No pipeline. No branch prediction. No speculative execution. No out-of-order dispatch. The behavior is deterministic. If you clock the Z80 at 4 MHz, you can calculate exactly how many T-states each instruction takes and predict your program's execution time down to the microsecond.&lt;/p&gt;
&lt;p&gt;This determinism is extraordinarily valuable in embedded systems. When you're designing an engine controller or a medical device, you need to know — not estimate, &lt;em&gt;know&lt;/em&gt; — that your interrupt handler will complete within a specific time window. Pipelined processors with branch prediction make this analysis much harder. Simple processors make it trivial.&lt;/p&gt;
&lt;p&gt;The 6502 takes this even further. With only 56 instructions and 13 addressing modes, the entire ISA fits on a single reference card. You can hold the complete instruction set in your head. This isn't a limitation — it's a feature. Engineers who can reason about every instruction their processor executes build more reliable systems than engineers who rely on abstractions they don't fully understand.&lt;/p&gt;
&lt;p&gt;The 8051 instruction set is similarly compact: 111 instructions, most executing in one or two machine cycles. The architecture includes bit-addressable memory — a feature that seems quirky until you're writing firmware for a device with dozens of individual control signals, at which point it becomes indispensable.&lt;/p&gt;
&lt;h4&gt;4. Power, Size, and Cost&lt;/h4&gt;
&lt;p&gt;The survivors share a common economic profile: they're cheap to manufacture, cheap to buy, and cheap to power.&lt;/p&gt;
&lt;p&gt;A Z84C00 in CMOS draws microwatts in standby. A W65C02S runs on a coin cell battery for years. An 8051 derivative can be manufactured on mature process nodes that have been paid for decades ago, with die sizes so small that the packaging costs more than the silicon. When your processor costs \$0.50 in volume and runs on the leakage current of a lithium cell, the engineering case for replacing it with something faster but more expensive becomes very hard to make.&lt;/p&gt;
&lt;p&gt;This is where processor longevity intersects with the economics I've written about in the &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox series&lt;/a&gt;. The relevant cost isn't just the chip — it's the total cost of the design: the processor, the toolchain, the engineering time, the qualification testing, the regulatory certification, and the opportunity cost of a redesign. A \$0.50 Z80 clone in a proven design with ten years of field data is almost impossible to displace, even if a \$0.30 ARM Cortex-M0 is technically superior, because the redesign and requalification costs dwarf the per-unit savings.&lt;/p&gt;
&lt;h4&gt;5. Inertia and Institutional Knowledge&lt;/h4&gt;
&lt;p&gt;The final factor is the hardest to quantify and the most powerful: institutional inertia.&lt;/p&gt;
&lt;p&gt;Somewhere in Germany, there's a factory running a production line controlled by Z80-based PLCs installed in 1988. The line produces automotive components. It runs 24/7. It works. The engineer who designed the control system retired fifteen years ago. The firmware was written in Z80 assembly and documented in a binder that lives in a filing cabinet near the line.&lt;/p&gt;
&lt;p&gt;Replacing this system would require: reverse-engineering the existing firmware (the original source code may or may not still exist), designing a new control system, writing new firmware, testing it against every production scenario the old system handles, qualifying the new system for automotive safety standards, scheduling downtime for installation, and training operators on the new system. The cost runs into hundreds of thousands of dollars. The risk is non-trivial — any bug could halt production.&lt;/p&gt;
&lt;p&gt;So they order more Z80s. And the Z80 stays in production for another year.&lt;/p&gt;
&lt;p&gt;Multiply this scenario by thousands of factories, millions of installed devices, and billions of lines of proven firmware, and you begin to understand why some processors simply cannot die. The cost of replacing them exceeds the cost of maintaining them, indefinitely.&lt;/p&gt;
&lt;p&gt;This is also why the &lt;a href="https://tinycomputers.io/posts/exploring-ti-84%2B.html"&gt;TI-84+ still uses a Z80&lt;/a&gt;. Texas Instruments has decades of TI-BASIC software, decades of teacher training materials, decades of standardized test approvals, and a user base that expects backward compatibility with programs written in 2004. The Z80 isn't the best processor for a modern calculator. But replacing it would require replacing &lt;em&gt;everything else&lt;/em&gt;, and "everything else" is where the real value lives.&lt;/p&gt;
&lt;h3&gt;The Newcomen Pattern&lt;/h3&gt;
&lt;p&gt;There's a historical analogy I keep returning to. Thomas Newcomen built his atmospheric steam engine in 1712. It was inefficient — converting roughly 1% of the heat energy in coal into useful work. James Watt's improved design, introduced in the 1760s, was dramatically better: separate condenser, double-acting cylinder, and eventually five times the thermal efficiency. By any rational engineering measure, the Newcomen engine should have vanished overnight.&lt;/p&gt;
&lt;p&gt;It didn't. Newcomen engines continued to be built and operated for decades after Watt's design was available. In some mining operations, they remained in service into the 19th century. The reasons were the same ones that keep Z80s in factories today: the existing engines worked, the operators knew how to maintain them, the replacement cost was high, and the performance of the old engine was &lt;em&gt;adequate&lt;/em&gt; for the task.&lt;/p&gt;
&lt;p&gt;"Adequate for the task" is the phrase that explains processor longevity better than any technical specification. The Z80 is adequate for a graphing calculator. The 6502 is adequate for a simple embedded controller. The 8051 is adequate for a washing machine. And "adequate" plus "proven" plus "cheap" plus "available from multiple sources" is a combination that "superior but new and unfamiliar" almost never beats.&lt;/p&gt;
&lt;h3&gt;The Numbers Tell the Story&lt;/h3&gt;
&lt;p&gt;It's worth pausing to appreciate the sheer scale of the survivors' deployment.&lt;/p&gt;
&lt;p&gt;The 8051 family has been manufactured in quantities estimated at over 10 billion units. That's not a typo. Ten billion. More 8051 derivatives have been produced than any other processor architecture in history, including x86. They're in your car — a modern automobile contains dozens of microcontrollers, many of them 8051 variants, handling everything from window controls to tire pressure monitoring. They're in your thermostat, your microwave, your garage door opener.&lt;/p&gt;
&lt;p&gt;The Z80 and its clones have shipped in quantities that are harder to pin down precisely, but conservative estimates exceed a billion units across all manufacturers and derivatives. The 6502 family, counting all variants from the original through the 65C816 that powered the Apple IIGS and the Super Nintendo, is in a similar range.&lt;/p&gt;
&lt;p&gt;The 68000 family took a different path — fewer total units but higher-value applications. Where the 8051 went wide and cheap, the 68k went deep and capable. It dominated the workstation market before RISC architectures displaced it, then settled into a long career in automotive and industrial control. NXP's ColdFire and subsequent QorIQ Layerscape processors carry DNA that traces back to the original 68000. The architecture didn't die; it evolved.&lt;/p&gt;
&lt;p&gt;What's remarkable about these numbers is that they &lt;em&gt;continue to grow&lt;/em&gt;. These aren't static installed bases slowly decaying as old equipment is retired. New products are still being designed with 8051 cores. New Z80-compatible processors are still being fabricated — even after Littelfuse discontinued the original Z84C00 in 2024, third-party clones and the eZ80 keep the instruction set alive. When I built a &lt;a href="https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html"&gt;dual Z80 RetroShield&lt;/a&gt;, I ordered Z84C0020PEC chips that were still in stock from the final production runs. A 1976 design, manufactured nearly half a century later. And the fact that Zilog's discontinuation made international headlines tells you everything about how deeply embedded these chips remain — you don't mourn a processor nobody uses.&lt;/p&gt;
&lt;h3&gt;What This Means for Modern Processors&lt;/h3&gt;
&lt;p&gt;The ARM Cortex-M0, introduced in 2009, is arguably the first modern processor that has a plausible shot at matching the longevity of the 8-bit survivors. It's licensable (like the 8051), simple (like the 6502), power-efficient (like the Z84C00), and backed by an ecosystem that's growing rapidly. ARM's licensing model — selling the design, not the chip — mirrors the model that made the 8051 ubiquitous.&lt;/p&gt;
&lt;p&gt;RISC-V, as an open ISA, goes even further. No licensing fees, no single company that can discontinue the architecture, no vendor lock-in. I've &lt;a href="https://tinycomputers.io/posts/milk-v-mars-review.html"&gt;reviewed RISC-V boards&lt;/a&gt; and watched the ecosystem grow. If any modern ISA is positioned to last fifty years, it's RISC-V — not because it's the best architecture, but because it's the hardest to kill.&lt;/p&gt;
&lt;p&gt;But here's the uncomfortable truth for anyone designing a new processor architecture: the window for establishing a forty-year processor is probably closed. The Z80, 6502, 68000, and 8051 all emerged during a period when the microprocessor market was being established. There were no entrenched incumbents. Every design win was greenfield. Every new application — calculators, arcade cabinets, industrial controllers, medical devices — was being designed for the first time with microprocessors.&lt;/p&gt;
&lt;p&gt;That era is over. Every new design now competes against an installed base. Every new ISA competes against ARM's ecosystem. The switching costs that keep forty-year-old processors alive are the same switching costs that prevent new architectures from gaining traction. The moat works in both directions.&lt;/p&gt;
&lt;h3&gt;The Lesson&lt;/h3&gt;
&lt;p&gt;The processors that last aren't the ones that push the performance envelope. They're the ones that solve a problem well enough, cheaply enough, reliably enough, and from enough sources that replacing them is never worth the trouble. Technical excellence is necessary but not sufficient. What matters more is the web of dependencies — the toolchains, the trained engineers, the certified designs, the proven firmware, the institutional knowledge — that accumulates around a processor over decades.&lt;/p&gt;
&lt;p&gt;The Z80 will outlive many of the engineers reading this, not because it's a great processor, but because it's woven into the fabric of systems that nobody has a compelling reason to redesign. The 8051 will outlive the Z80, because it's woven into even more systems. And somewhere in a high school classroom, a student is pressing buttons on a &lt;a href="https://tinycomputers.io/posts/exploring-ti-84%2B.html"&gt;TI-84+&lt;/a&gt; that runs on a fifty-year-old instruction set, completely unaware that the chip executing their quadratic formula has been doing this job since before their grandparents started dating.&lt;/p&gt;
&lt;p&gt;That's longevity. Not the kind you engineer. The kind that happens when everything around the chip conspires to keep it in place.&lt;/p&gt;
&lt;div style="margin-top: 3em; padding-top: 1em; border-top: 1px solid #ccc; font-size: 0.85em; color: #666;"&gt;
&lt;strong&gt;Image credits:&lt;/strong&gt; Hitachi HD68000 and Intel P8051 photographs by Konstantin Lanzet, via &lt;a href="https://commons.wikimedia.org/wiki/File:KL_Hitachi_HD68000.jpg"&gt;Wikimedia Commons&lt;/a&gt;. Licensed under GFDL and CC BY-SA 3.0 respectively.
&lt;/div&gt;</description><category>6502</category><category>8051</category><category>68000</category><category>embedded systems</category><category>isa</category><category>microprocessors</category><category>mos technology</category><category>motorola</category><category>processor architecture</category><category>retrocomputing</category><category>second-sourcing</category><category>z80</category><category>zilog</category><guid>https://tinycomputers.io/posts/why-some-chips-last-40-years.html</guid><pubDate>Sun, 08 Mar 2026 16:00:00 GMT</pubDate></item><item><title>Designing a Dual Z80 RetroShield: Two CPUs, One Bus, Zero GUI (Part 1)</title><link>https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/designing-a-dual-z80-retroshield-part-1_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;19 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/dual-z80/zilog-scc-dip40.jpeg" alt="A Zilog Z0853006PSC SCC chip in a DIP-40 package, marked with the Zilog logo and a 1981 copyright date" style="float: right; max-width: 300px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;p&gt;The RetroShield Z80 by Erturk Kocalar at &lt;a href="https://baud.rs/87wbBL"&gt;8bitforce.com&lt;/a&gt; is one of my favorite pieces of hardware. A real Zilog Z80 CPU on a shield that plugs into an Arduino Mega. The Arduino emulates memory and I/O while the Z80 executes real instructions on real silicon. I've used it to &lt;a href="https://tinycomputers.io/posts/cpm-on-physical-retroshield-z80.html"&gt;boot CP/M&lt;/a&gt;, &lt;a href="https://tinycomputers.io/posts/zork-on-retroshield-z80-arduino-giga.html"&gt;play Zork over WiFi&lt;/a&gt;, &lt;a href="https://tinycomputers.io/posts/cpm-on-arduino-giga-r1-wifi.html"&gt;port it to the Arduino Giga R1&lt;/a&gt;, and even &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;commission a custom level-converter shield&lt;/a&gt; to bridge the voltage gap.&lt;/p&gt;
&lt;p&gt;But a single Z80 is, well, a single Z80. Real multi-processor Z80 systems existed in the 1980s — machines like the &lt;a href="https://baud.rs/tTpLxt"&gt;Cromemco System Three&lt;/a&gt; and some S-100 configurations ran multiple Z80s on a shared bus, with bus arbitration mediating access. The question that kept nagging at me: could I fit a second Z80 onto the RetroShield?&lt;/p&gt;
&lt;p&gt;I should be honest about something: PCB design is one of my least knowledgeable areas of computing. I'm comfortable with firmware, with compilers, with operating systems — but the physical layer, the world of copper traces and drill files and design rule checks, is territory I've mostly avoided. I can read a schematic, but I've never designed a board from scratch. What I wanted to find out was whether modern AI tools could bridge that gap — whether I could use AI to help me understand, alter, and extend &lt;a href="https://baud.rs/87wbBL"&gt;Erturk Kocalar's&lt;/a&gt; existing RetroShield design into something new without becoming a PCB design expert first.&lt;/p&gt;
&lt;p&gt;This is part one of a two-part series. This piece covers the design: architecture decisions, schematic work, PCB layout, autorouting, and Gerber generation. Part two will cover the physical boards arriving from the fab, assembly, bring-up, and the firmware that makes two Z80s cooperate.&lt;/p&gt;
&lt;p&gt;One more thing worth mentioning up front: every step of this design was done without a GUI. That was intentional. I wanted to see how far I could get with just a terminal, command-line EDA tools, AI assistance, and Python scripts that modify PCB files directly. Partly because I think text-based workflows compose better with AI — it's much easier for an AI to generate a Python script that manipulates a text-based PCB file than to drive a graphical EDA tool. And partly because I wanted the entire process to be reproducible and scriptable, not trapped in a series of mouse clicks I'd never remember.&lt;/p&gt;
&lt;h3&gt;The Original Design&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://gitlab.com/8bitforce/retroshield-hw/-/tree/master/hardware/kz80?ref_type=heads"&gt;stock RetroShield Z80&lt;/a&gt; is a clean, simple board. A 55.88mm × 53.34mm two-layer PCB carrying:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;U1&lt;/strong&gt; — A &lt;a href="https://baud.rs/FUCwFg"&gt;Z80 CPU&lt;/a&gt; in a DIP-40 package&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;J1&lt;/strong&gt; — A 2×18 pin header (36 pins) that plugs into &lt;a href="https://baud.rs/CWPoOM"&gt;Arduino Mega 2560&lt;/a&gt; pins 22–53&lt;/li&gt;
&lt;li&gt;A handful of passives: decoupling caps (C1, C2), a clock cap (C3), a clock series resistor (R1), an LED current-limiting resistor (R3), and a bus activity LED&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The J1 header carries everything the Z80 needs: 16 address lines (A0–A15), 8 data lines (D0–D7), and control signals (CLK, RESET, INT, NMI, MREQ, IORQ, RD, WR). The Arduino drives the clock, provides the data when the Z80 reads, captures the data when the Z80 writes, and emulates whatever memory and I/O map you define in firmware. It's elegant in its simplicity — the Z80 thinks it's talking to a real computer, and in a sense, it is.&lt;/p&gt;
&lt;p&gt;The schematic and PCB files use the gEDA format — text-based files that are human-readable and, crucially, scriptable. The schematic (&lt;code&gt;.sch&lt;/code&gt;) defines the logical connections. The PCB (&lt;code&gt;.pcb&lt;/code&gt;) defines the physical layout: component footprints, copper traces, vias, and board outline. Both are just text. This matters a lot for what comes next.&lt;/p&gt;
&lt;h3&gt;Why Two Z80s?&lt;/h3&gt;
&lt;p&gt;The honest answer is that I wanted to see if it could be done. But there are genuinely interesting things you can do with two processors sharing a bus:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Asymmetric multiprocessing.&lt;/strong&gt; One Z80 runs CP/M as the primary CPU. The second handles I/O — serial communication, disk access, network operations — freeing the primary CPU from waiting on slow peripherals. This mirrors how some S-100 systems used coprocessor boards.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cooperative multitasking.&lt;/strong&gt; Both CPUs execute independent programs, taking turns on the shared bus. The Arduino arbitrates access using the Z80's built-in BUSRQ/BUSACK mechanism — a hardware handshake designed exactly for this purpose. One CPU gets the bus, executes for a while, then yields so the other can run.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Debugging and instrumentation.&lt;/strong&gt; The second CPU can monitor the first. Watch the address bus to trace execution. Compare outputs. Run the same code on both CPUs and verify they produce identical results — useful for testing Z80 clones or FPGA implementations against real silicon.&lt;/p&gt;
&lt;p&gt;The Z80 was designed for multiprocessor operation. As Rodnay Zaks details in &lt;a href="https://baud.rs/IvCPVA"&gt;&lt;em&gt;Programming the Z80&lt;/em&gt;&lt;/a&gt;, it has dedicated bus request (BUSRQ) and bus acknowledge (BUSAK) pins specifically for multi-master bus sharing. Steve Ciarcia's &lt;a href="https://baud.rs/eLG5hK"&gt;&lt;em&gt;Build Your Own Z80 Computer&lt;/em&gt;&lt;/a&gt; covers the hardware side of these signals in practical detail. Most hobbyist projects never use them. This one does.&lt;/p&gt;
&lt;h3&gt;Architecture: Shared Bus with Independent Control&lt;/h3&gt;
&lt;p&gt;The first design I considered — and quickly rejected — gave each Z80 its own independent header. Two 36-pin headers, two complete sets of address, data, and control lines. This would have worked electrically, but it was wrong for several reasons. It would have required either two Arduino Megas or consumed all the I/O on one Mega with nothing left for bus arbitration. The board would have been enormous. And it wouldn't have reflected how real multi-processor Z80 systems actually worked.&lt;/p&gt;
&lt;p&gt;The right approach is a shared bus. Both Z80s connect to the same address and data lines through J1. They take turns driving the bus, just like in a real S-100 system. What each CPU needs independently is its own set of control signals — its own clock, its own reset, its own interrupt lines, and its own bus request/acknowledge pair.&lt;/p&gt;
&lt;p&gt;I checked the Arduino Mega's pin budget. J1 uses pins 22–53 (32 I/O pins). The Mega still has pins 2–21 (20 pins) plus analog pins A0–A15 (16 more, usable as digital I/O) — 36 pins sitting idle. A second CPU's control signals only need about 10 pins. There was plenty of room.&lt;/p&gt;
&lt;p&gt;The solution: a small supplementary 2×6 header (J2, 12 pins) carrying CPU2's independent control signals to the Arduino's remaining pins:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Pin 1:  +5V         Pin 2:  GND
Pin 3:  CLK_2       Pin 4:  RESET_2
Pin 5:  INT_2       Pin 6:  NMI_2
Pin 7:  MREQ_2      Pin 8:  IORQ_2
Pin 9:  RD_2        Pin 10: WR_2
Pin 11: BUSRQ_2     Pin 12: BUSAK_2
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;BUSRQ and BUSAK are the key pins. The Arduino firmware pulls BUSRQ low on whichever CPU should yield the bus. That CPU finishes its current machine cycle, tristates its outputs, and asserts BUSAK to signal it's off the bus. The other CPU can then drive the bus freely. It's the same mechanism Zilog designed in 1976 — I'm just finally using it.&lt;/p&gt;
&lt;h3&gt;Building the Schematic — Without a Schematic Editor&lt;/h3&gt;
&lt;p&gt;The original project used classic gEDA tools (gschem, pcb), which are no longer packaged for Ubuntu 24.04. The modern replacement is lepton-eda, a maintained fork that reads the same file formats. But since the whole point was to avoid a GUI, even lepton-schematic's graphical mode was off the table.&lt;/p&gt;
&lt;p&gt;This is where AI earned its keep. I don't have the gEDA file format memorized — I've never needed to. But AI can work through the format specification and generate correct output. I described what I wanted (a second Z80 sharing the existing bus, with independent control signals on a new header), and the AI helped me produce the schematic files, the symbol definitions, and eventually the PCB modifications. I still had to understand the architecture and make the design decisions, but the AI handled the translation from intent to file format.&lt;/p&gt;
&lt;p&gt;gEDA schematic files are text. A component placement looks like this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;C 44300 47700 1 0 0 z80-1.sym
{
T 44400 59000 5 10 1 1 0 0 1
refdes=U2
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That's a Z80 symbol placed at coordinates (44300, 47700), with reference designator U2. Net connections are similarly textual — &lt;code&gt;N&lt;/code&gt; entries define wire segments, &lt;code&gt;U&lt;/code&gt; entries define bus rippers. You can write an entire schematic in a text editor if you understand the coordinate system.&lt;/p&gt;
&lt;p&gt;I created a new schematic page, &lt;code&gt;kz80_cpu2.sch&lt;/code&gt;, for the second CPU. In gEDA's multi-page scheme, nets with the same name on different pages are automatically connected. So CPU2's address pins connect to nets named A0, A1, ..., A15 — the same net names used on page 1 — and the netlister merges them into shared nets. The shared bus happens at the netlist level without any explicit cross-page wiring.&lt;/p&gt;
&lt;p&gt;The one component that didn't exist yet was the 2×6 control header. I wrote a new gEDA symbol file (&lt;code&gt;ctrlhdr2x6-1.sym&lt;/code&gt;) from scratch — a rectangular body with 12 pins, labeled with the control signal names, specifying the HEADER12_1 footprint. It's about 30 lines of text, all hand-written.&lt;/p&gt;
&lt;p&gt;CPU2's schematic connections break down cleanly:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Shared with CPU1&lt;/strong&gt; (same net names, auto-merged): A0–A15, D0–D7, +5V, GND&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Independent to CPU2&lt;/strong&gt; (new nets with &lt;code&gt;_2&lt;/code&gt; suffix): CLK_2, RESET_2, INT_2, NMI_2, MREQ_2, IORQ_2, RD_2, WR_2, BUSRQ_2, BUSAK_2&lt;/p&gt;
&lt;p&gt;The total net count went from 37 to 48 — only 11 new nets for an entirely new processor. That's the elegance of the shared-bus approach.&lt;/p&gt;
&lt;h3&gt;Modifying the PCB — With Python&lt;/h3&gt;
&lt;p&gt;Here's where the CLI-only constraint got interesting. The normal workflow would be: run &lt;code&gt;lepton-sch2pcb&lt;/code&gt; to update the PCB with new components from the schematic, then open the PCB in a graphical editor to place and route them. But &lt;code&gt;lepton-sch2pcb&lt;/code&gt; had trouble finding footprints in pcb-rnd's library paths, and I didn't have a graphical editor anyway.&lt;/p&gt;
&lt;p&gt;So I had AI write a Python script (&lt;code&gt;add_cpu2_shared.py&lt;/code&gt;) to modify the PCB file directly. The pcb-rnd file format is text-based, with clearly delimited blocks for each component (Element), each copper trace (Line), each via (Via), and the netlist (NetList). The script:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Widened the board&lt;/strong&gt; from 55.88mm to 86.36mm — an extra 30.48mm to accommodate the second Z80 and control header, placed on the right half of the board.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Inserted five new Element blocks&lt;/strong&gt; — U2 (Z80, DIP-40), J2 (2×6 header), C4 and C5 (decoupling and clock caps), and R2 (clock series resistor). Each Element block is essentially a footprint definition: pin positions, pad dimensions, drill sizes, silkscreen outlines. I copied the dimensional parameters from the existing components to maintain consistency.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Updated the netlist&lt;/strong&gt; in two ways. For shared nets (A0–A15, D0–D7, +5V, GND), the script found each existing net block and appended &lt;code&gt;Connect("U2-xx")&lt;/code&gt; entries. For CPU2's independent control signals, it created 11 entirely new net blocks. The +5V net picked up four new connections: U2's VCC pin, U2's WAIT pin (tied high — active low, so high means "not waiting"), C4, and J2.&lt;/p&gt;
&lt;p&gt;The result was a valid PCB file with all components placed and all nets defined — but no copper traces connecting anything.&lt;/p&gt;
&lt;h3&gt;Autorouting: Let the Machine Do the Tedious Part&lt;/h3&gt;
&lt;p&gt;With components placed and nets defined, the board needed routing — actual copper traces connecting all those pins. Doing this by hand over SSH would have been masochistic. This is exactly what autorouters exist for.&lt;/p&gt;
&lt;p&gt;The workflow: export the PCB to Specctra DSN format (an industry-standard interchange format for autorouters), run &lt;a href="https://baud.rs/bdZw62"&gt;Freerouting&lt;/a&gt;, then import the results back.&lt;/p&gt;
&lt;h4&gt;First Attempt (Failed)&lt;/h4&gt;
&lt;p&gt;The first attempt exported the PCB with the original CPU1 traces still in place, hoping Freerouting would preserve them and only route the new nets. Instead, Freerouting spent 50+ seconds per pass trying to work around traces it couldn't associate with its own net encoding. After 48 passes and 40 minutes, it was still failing to route several nets.&lt;/p&gt;
&lt;h4&gt;Second Attempt (Clean Slate)&lt;/h4&gt;
&lt;p&gt;Another AI-generated Python script (&lt;code&gt;strip_traces.py&lt;/code&gt;) removed all existing copper traces from the PCB file. This was a careful operation — the script had to remove &lt;code&gt;Line[...]&lt;/code&gt; entries inside Layer blocks (copper traces) while preserving &lt;code&gt;ElementLine[...]&lt;/code&gt; entries (component silkscreen outlines that look syntactically similar).&lt;/p&gt;
&lt;p&gt;With a clean board, Freerouting ran in headless mode:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;java&lt;span class="w"&gt; &lt;/span&gt;-jar&lt;span class="w"&gt; &lt;/span&gt;/tmp/freerouting.jar&lt;span class="w"&gt; &lt;/span&gt;-de&lt;span class="w"&gt; &lt;/span&gt;kz80.dsn&lt;span class="w"&gt; &lt;/span&gt;-do&lt;span class="w"&gt; &lt;/span&gt;kz80.ses&lt;span class="w"&gt; &lt;/span&gt;-mp&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;20&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It completed the initial routing in 10 passes, then spent another 49 passes optimizing trace length, converging at pass 59 with the message: &lt;em&gt;"There were only 10.60 track length increase in the last 5 passes, so it's very likely that autorouter can't improve the result further."&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Total routing time: about three minutes. The result: 191 wires decomposed into 897 individual trace segments, plus 82 vias for layer transitions. Every net connected. Every design rule satisfied.&lt;/p&gt;
&lt;h4&gt;Importing Routes Back&lt;/h4&gt;
&lt;p&gt;One more headless problem: pcb-rnd's SES import requires the GUI. I tried &lt;code&gt;xvfb-run&lt;/code&gt; with action commands, but it hung waiting for GTK widget interactions that couldn't happen without a display.&lt;/p&gt;
&lt;p&gt;The solution was yet another AI-generated Python script (&lt;code&gt;ses_to_pcb.py&lt;/code&gt;) that parsed the Freerouting SES output and injected the routes directly into the PCB file as copper Line entries. The main complication was coordinate system conversion — the SES file uses a bottom-left origin (y increases upward) while pcb-rnd uses a top-left origin (y increases downward). The script also handled via translation, mapping Freerouting's via definitions to pcb-rnd's format with appropriate pad sizes, drill diameters, and clearances.&lt;/p&gt;
&lt;p&gt;897 trace segments and 82 vias injected. The PCB was fully routed.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/dual-z80/top-copper.png" alt="Top copper layer of the dual Z80 RetroShield PCB viewed in Gerber Viewer, showing 897 autorouted trace segments and 82 vias connecting both CPUs to the shared bus" style="width: 100%; max-width: 800px; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); margin: 1.5em 0;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The top copper layer after Freerouting — 897 trace segments connecting 48 nets across both Z80s, the J1 bus header, and the J2 control header. Every trace was placed by the autorouter; none were drawn by hand.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;Generating Production Files&lt;/h3&gt;
&lt;p&gt;The final step was generating Gerber files — the industry-standard format that PCB fabrication houses use to manufacture boards. pcb-rnd's command-line exporter handled this cleanly:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;pcb-rnd&lt;span class="w"&gt; &lt;/span&gt;-x&lt;span class="w"&gt; &lt;/span&gt;gerber&lt;span class="w"&gt; &lt;/span&gt;--all-layers&lt;span class="w"&gt; &lt;/span&gt;kz80.pcb
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This produced 11 files covering top and bottom copper, solder mask, silkscreen, paste stencil, board outline, and drill locations. pcb-rnd uses verbose filenames (&lt;code&gt;kz80.top.copper.none.3.gbr&lt;/code&gt;), so a renaming script converted them to the standard extensions (&lt;code&gt;.gtl&lt;/code&gt;, &lt;code&gt;.gbl&lt;/code&gt;, &lt;code&gt;.gts&lt;/code&gt;, etc.) that fabrication houses expect.&lt;/p&gt;
&lt;p&gt;I also added &lt;code&gt;tinycomputers.io&lt;/code&gt; to the top silkscreen layer, placed directly below the existing &lt;code&gt;www.8bitforce.com&lt;/code&gt; text — a small nod to both projects.&lt;/p&gt;
&lt;p&gt;The final Gerber package: 35KB zipped, ready for fabrication.&lt;/p&gt;
&lt;h3&gt;The Final Board&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/dual-z80/silkscreen.png" alt="Top silkscreen layer of the dual Z80 RetroShield PCB in Gerber Viewer, showing U1 and U2 Z80 CPU footprints, J1 and J2 headers, component labels, and tinycomputers.io branding" style="width: 100%; max-width: 800px; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); margin: 1.5em 0;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The top silkscreen: U1 (left) and U2 (right) with the J1 bus header on the far left and the J2 control header between the two CPUs. The silkscreen includes the original 8bitforce.com credit alongside tinycomputers.io.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Here's what changed from the original RetroShield to the dual-CPU version:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Original&lt;/th&gt;
&lt;th&gt;Dual CPU&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Board dimensions&lt;/td&gt;
&lt;td&gt;55.88 × 53.34mm&lt;/td&gt;
&lt;td&gt;86.36 × 53.34mm&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layers&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Z80 CPUs&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Headers&lt;/td&gt;
&lt;td&gt;J1 (36 pins)&lt;/td&gt;
&lt;td&gt;J1 (36) + J2 (12) = 48 pins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nets&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Through-hole components&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SMD components&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trace segments&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;897&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vias&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The board is wider but not taller. The second Z80 sits to the right of the first, with the J2 control header between them. Both CPUs share the J1 bus connection, and the Arduino firmware will manage who drives the bus at any given moment.&lt;/p&gt;
&lt;h3&gt;The Toolchain Nobody Uses&lt;/h3&gt;
&lt;p&gt;It's worth stepping back to note what just happened. An entire PCB was designed — schematic capture, component placement, autorouting, Gerber generation — without opening a single graphical application. Every step was either a command-line tool invocation or an AI-generated Python script manipulating text files. And it was done by someone who, at the start of the project, couldn't have told you the difference between a Gerber file and a drill file.&lt;/p&gt;
&lt;p&gt;That was the whole point. I chose to avoid a GUI specifically because I wanted to test a hypothesis: that AI-assisted, text-based workflows could let someone with domain knowledge in adjacent areas (firmware, systems programming) operate effectively in an unfamiliar domain (PCB design). The text-based EDA formats made this possible — they gave the AI something it could read, reason about, and generate. A graphical tool would have put me back to square one, clicking through menus I didn't understand.&lt;/p&gt;
&lt;p&gt;I'm not claiming this is &lt;em&gt;better&lt;/em&gt; than using KiCad or Altium with a mouse. For complex boards with hundreds of components, graphical tools and experienced designers are indispensable. But for a modification like this — adding a known set of components to an existing, well-documented open-source design — AI plus text-based tools was surprisingly effective. I brought the architectural understanding (how Z80 bus arbitration works, which signals need to be shared versus independent) and the AI handled the translation into file formats I'd never touched before. Most of the time was spent understanding the &lt;em&gt;design&lt;/em&gt;, not fighting tools.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The Gerber files are at the fab now. In part two, I'll cover what happens when the physical boards arrive: inspection, assembly, first power-on, and the Arduino firmware that orchestrates two Z80s on a shared bus. The firmware is where the real complexity lives — bus arbitration timing, memory mapping for two independent address spaces, and the question of what to actually &lt;em&gt;run&lt;/em&gt; on a dual-Z80 system in 2026.&lt;/p&gt;
&lt;p&gt;Here's a preview of what the bus arbitration core looks like. The Arduino manages which CPU owns the shared bus at any given moment using the Z80's hardware BUSRQ/BUSAK handshake:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;// --- Pin definitions (active low) ---&lt;/span&gt;
&lt;span class="c1"&gt;// CPU1 control (directly from J1 via existing RetroShield mapping)&lt;/span&gt;
&lt;span class="cp"&gt;#define CPU1_CLK      A5&lt;/span&gt;
&lt;span class="cp"&gt;#define CPU1_BUSRQ    A4    &lt;/span&gt;&lt;span class="c1"&gt;// directly from Arduino to CPU1 BUSRQ pin&lt;/span&gt;
&lt;span class="cp"&gt;#define CPU1_BUSAK    A3    &lt;/span&gt;&lt;span class="c1"&gt;// directly from CPU1 BUSAK pin to Arduino&lt;/span&gt;

&lt;span class="c1"&gt;// CPU2 control (directly from J2 header)&lt;/span&gt;
&lt;span class="cp"&gt;#define CPU2_CLK      2&lt;/span&gt;
&lt;span class="cp"&gt;#define CPU2_BUSRQ    3&lt;/span&gt;
&lt;span class="cp"&gt;#define CPU2_BUSAK    4&lt;/span&gt;

&lt;span class="c1"&gt;// Bus state&lt;/span&gt;
&lt;span class="k"&gt;volatile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;active_cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;setup&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// BUSRQ is output (Arduino tells CPU to release bus)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;pinMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU1_BUSRQ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;OUTPUT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;pinMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSRQ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;OUTPUT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// BUSAK is input (CPU tells Arduino it released bus)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;pinMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU1_BUSAK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;INPUT_PULLUP&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;pinMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSAK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;INPUT_PULLUP&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Start with CPU1 active, CPU2 off the bus&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;digitalWrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU1_BUSRQ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HIGH&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// HIGH = don't request bus release&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;digitalWrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSRQ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LOW&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// LOW  = request CPU2 to release bus&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;active_cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Wait for CPU2 to acknowledge it's off the bus&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;digitalRead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSAK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HIGH&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;switch_to_cpu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;active_cpu&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;old_busrq&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;active_cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CPU1_BUSRQ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSRQ&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;old_busak&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;active_cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CPU1_BUSAK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSAK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;new_busrq&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CPU1_BUSRQ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CPU2_BUSRQ&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Ask the active CPU to release the bus&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;digitalWrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;old_busrq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LOW&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Wait for acknowledgment (CPU finishes current machine cycle first)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;micros&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;digitalRead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;old_busak&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HIGH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;micros&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// hung CPU — shouldn't happen&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Bus is free. Release the new CPU onto it.&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;digitalWrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_busrq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HIGH&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;active_cpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The critical detail is timing. When the Arduino pulls BUSRQ low, the Z80 doesn't stop immediately — it finishes its current machine cycle, which can take 3–6 clock periods depending on the instruction. Only then does it tristate its address, data, and control outputs and assert BUSAK. The &lt;code&gt;while&lt;/code&gt; loop waits for that handshake to complete. During the transition, neither CPU is driving the bus, and the Arduino must not attempt any bus operations.&lt;/p&gt;
&lt;p&gt;This is a simplified version — the full firmware in part two will handle clock generation for both CPUs, memory mapping, I/O dispatch, and the arbitration policy (round-robin, priority-based, or cooperative yield). But the handshake above is the foundation everything else builds on. It's the same protocol that made multi-Z80 S-100 systems work in the early 1980s.&lt;/p&gt;
&lt;p&gt;The hardware design is the easy part. Making two 50-year-old processors cooperate is the challenge.&lt;/p&gt;
&lt;h3&gt;Source Files&lt;/h3&gt;
&lt;p&gt;All schematics, PCB files, Gerber outputs, and helper scripts for this project are open source:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/i4XqDV"&gt;dual-z80&lt;/a&gt;&lt;/strong&gt; — KiCad/gEDA source files, Gerber package, Python scripts for PCB manipulation, and build log&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;This is part one of a two-part series. Part two will cover board assembly, bring-up, and dual-CPU firmware.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Previous RetroShield posts: &lt;a href="https://tinycomputers.io/posts/cpm-on-physical-retroshield-z80.html"&gt;CP/M on the RetroShield&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;Fiverr PCB Design&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/cpm-on-arduino-giga-r1-wifi.html"&gt;CP/M on the Giga R1&lt;/a&gt; · &lt;a href="https://tinycomputers.io/posts/zork-on-retroshield-z80-arduino-giga.html"&gt;Zork on the Giga&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description><category>arduino</category><category>dual cpu</category><category>freerouting</category><category>geda</category><category>gerber</category><category>hardware</category><category>lepton-eda</category><category>multiprocessor</category><category>pcb design</category><category>pcb-rnd</category><category>retro computing</category><category>retroshield</category><category>z80</category><guid>https://tinycomputers.io/posts/designing-a-dual-z80-retroshield-part-1.html</guid><pubDate>Fri, 06 Mar 2026 14:00:00 GMT</pubDate></item><item><title>Investing in the Jevons Expansion</title><link>https://tinycomputers.io/posts/investing-in-the-jevons-expansion.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/investing-in-the-jevons-expansion_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;16 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;This is the sixth piece in a series applying the Jevons Paradox framework to AI economics. The prior five built the theoretical case:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tinycomputers.io/posts/the-paradox-of-cheap-compute/"&gt;The Paradox of Cheap Compute&lt;/a&gt; established the historical pattern — every time the cost of compute fell by an order of magnitude, total consumption expanded far beyond the efficiency gain.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinycomputers.io/posts/the-jevons-counter-thesis-why-ai-displacement-scenarios-underweight-demand-expansion/"&gt;The Jevons Counter-Thesis&lt;/a&gt; argued that AI displacement models systematically undercount the demand expansion that follows when cognitive labor gets cheaper.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinycomputers.io/posts/moores-law-for-intelligence-what-happens-when-thinking-gets-cheap/"&gt;Moore's Law for Intelligence&lt;/a&gt; mapped the inference cost curve and showed it mirrors early Moore's Law.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinycomputers.io/posts/something-big-is-happening-a-critique/"&gt;Something Big Is Happening — And Something Big Is Missing&lt;/a&gt; applied the framework to a specific displacement scenario and showed where the analysis breaks down.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox/"&gt;The AI Vampire Is Jevons Paradox&lt;/a&gt; identified the binding constraint: human judgment doesn't scale the way compute does.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This piece asks the practical question: if you believe the framework, what follows?&lt;/p&gt;
&lt;p&gt;I should be clear about what this is and what it isn't. This is not financial advice. I'm not recommending specific trades, allocations, or timing. What I'm doing is mapping a structural argument — Jevons-style demand expansion in AI — onto the physical and economic layers that expansion must pass through. The goal is to identify where expansion creates bottlenecks, because bottlenecks are where pricing power concentrates.&lt;/p&gt;
&lt;p&gt;The key insight is that you don't need to pick which AI company wins. You don't need to know whether OpenAI, Anthropic, Google, or some company that doesn't exist yet captures the application layer. What you need to identify are the fixed-supply inputs that &lt;em&gt;every&lt;/em&gt; AI company needs regardless of who wins. The expansion has to flow through certain physical chokepoints, and those chokepoints are investable.&lt;/p&gt;
&lt;h3&gt;The Framework in One Paragraph&lt;/h3&gt;
&lt;p&gt;For readers coming to this series fresh: Jevons Paradox describes what happens when a critical input gets dramatically cheaper. The intuitive expectation is that total spending on that input falls. The historical reality is the opposite — demand expands beyond the efficiency gain, and total consumption increases. Coal in the 19th century — as Jevons himself documented in &lt;a href="https://baud.rs/xjxPfz"&gt;&lt;em&gt;The Coal Question&lt;/em&gt;&lt;/a&gt; — transistors in the 20th, bandwidth in the 21st. The prior pieces in this series argue that AI inference costs are following the same curve, with the same structural conditions that produced Jevons outcomes in every prior case. If that argument holds, then what matters isn't whether AI gets more efficient — it's where the resulting demand expansion hits physical constraints.&lt;/p&gt;
&lt;h3&gt;The Objection That Isn't&lt;/h3&gt;
&lt;p&gt;The most common pushback I get on this series is some version of: "GPUs are hitting diminishing returns, capex is already enormous, and there's a natural ceiling on how far the expansion can go." Variations appear in coverage from &lt;a href="https://baud.rs/B5ATWQ"&gt;Northeastern&lt;/a&gt; and &lt;a href="https://baud.rs/bcFAl5"&gt;illuminem&lt;/a&gt;, often framed as a correction to the Jevons thesis.&lt;/p&gt;
&lt;p&gt;It's a reasonable-sounding objection. It's also wrong — and understanding &lt;em&gt;why&lt;/em&gt; it's wrong actually strengthens the Jevons case.&lt;/p&gt;
&lt;p&gt;The objection treats a technology-specific constraint as an input-level constraint. GPUs hitting diminishing returns doesn't mean &lt;em&gt;inference&lt;/em&gt; is hitting diminishing returns. It means GPUs are reaching the end of their particular S-curve. But GPUs aren't the only way to run inference. Custom ASICs, TPUs, NPUs, and novel architectures are opening entirely new cost curves &lt;em&gt;below&lt;/em&gt; the GPU curve. The GPU plateau isn't a ceiling — it's a handoff.&lt;/p&gt;
&lt;p&gt;The numbers are already visible. Broadcom controls roughly 70% of the custom AI ASIC market, reporting \$5.2 billion in AI semiconductor revenue in Q3 alone, with &lt;a href="https://baud.rs/zcsDXo"&gt;five major hyperscaler customers&lt;/a&gt; driving demand. &lt;a href="https://baud.rs/znj9ak"&gt;Marvell's custom XPU pipeline&lt;/a&gt; spans AWS, Google, Meta, and Microsoft, with AI revenue reaching \$2.6 billion in FY2026. Google's TPU transition from v6 to v7 delivered a &lt;a href="https://baud.rs/4aoJ1v"&gt;roughly 70% cost-per-token reduction&lt;/a&gt;. Taalas, a startup building hardwired inference chips, &lt;a href="https://baud.rs/QxPpqN"&gt;claims 1000x performance per watt&lt;/a&gt; versus general-purpose GPUs. Custom ASICs handle an estimated 20% of inference workloads today and are &lt;a href="https://baud.rs/eIj2sQ"&gt;projected to reach 70–75% by 2028&lt;/a&gt;, with custom ASIC shipments growing at 44.6% annually versus 16.1% for GPUs.&lt;/p&gt;
&lt;p&gt;Every prior Jevons cycle worked exactly this way. Newcomen's engine didn't just get incrementally better — it was replaced by Watt's engine, then Corliss, then turbines. Each new technology started a fresh S-curve before the previous one fully flattened. Moore's Law didn't ride a single technology either — as Chris Miller chronicles in &lt;a href="https://baud.rs/8MdhcB"&gt;&lt;em&gt;Chip War&lt;/em&gt;&lt;/a&gt;, bipolar gave way to NMOS, then CMOS, then FinFET, now gate-all-around. The pattern is always multiple overlapping S-curves, each beginning before the last one peaks.&lt;/p&gt;
&lt;p&gt;The data supports the mechanism: &lt;a href="https://baud.rs/O6Q4Tc"&gt;every 50% reduction in inference cost has been associated with a 200–300% increase in deployment&lt;/a&gt;. That's textbook Jevons elasticity.&lt;/p&gt;
&lt;p&gt;"Diminishing returns on GPUs" isn't a ceiling on inference. It's the moment the next technology takes over. That's the &lt;em&gt;mechanism&lt;/em&gt; of Jevons Paradox, not a counterpoint to it.&lt;/p&gt;
&lt;h3&gt;The Investment Layers&lt;/h3&gt;
&lt;p&gt;If Jevons-style expansion is real, it has to flow through physical infrastructure. I think about this in four layers, ordered from deepest (most expansion-certain) to shallowest (most speculative).&lt;/p&gt;
&lt;h4&gt;Layer 1: Energy and Power&lt;/h4&gt;
&lt;p&gt;Energy is the binding constraint. If AI demand expands at anything close to Jevons rates, someone has to generate the electricity. Data center electricity demand is on track to double this year, with the sector's total consumption &lt;a href="https://baud.rs/8hWfJa"&gt;surpassing Canada's national usage&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The structural problem is deeper than just demand growth. As Vaclav Smil details in &lt;a href="https://baud.rs/OMSIzZ"&gt;&lt;em&gt;Energy and Civilization&lt;/em&gt;&lt;/a&gt;, energy transitions are slow precisely because the physical infrastructure is massive and long-lived. Roughly 70% of the U.S. electrical grid was built between the 1950s and 1970s. Much of it is approaching end-of-life at the exact moment AI is driving the largest incremental demand increase in decades. This isn't a problem that resolves quickly. Power plants take years to permit and build. Grid transmission upgrades take longer.&lt;/p&gt;
&lt;p&gt;Nuclear is where the smart money is moving. Constellation Energy's merger with Calpine creates a fleet of 21 nuclear reactors plus 50 natural gas plants — essentially a baseload power platform positioned for AI demand. Amazon signed a 1.92 GW power purchase agreement at Susquehanna and committed \$500 million to small modular reactor development. These aren't speculative bets on future demand — they're capacity commitments predicated on demand that's already contractually visible.&lt;/p&gt;
&lt;p&gt;Hyperscaler capital expenditure tells the same story: \$602 billion planned for 2026, roughly 75% tied to AI infrastructure. Goldman Sachs estimates cumulative AI infrastructure spending of \$1.15 trillion between 2025 and 2027. That capital has to buy electricity, and the electricity has to come from somewhere.&lt;/p&gt;
&lt;h4&gt;Layer 2: Physical Infrastructure&lt;/h4&gt;
&lt;p&gt;Between the power plant and the GPU sits an enormous amount of physical equipment: transformers, switchgear, power distribution units, cooling systems, racks, cabling. This is the picks-and-shovels layer — it benefits regardless of which AI stack wins.&lt;/p&gt;
&lt;p&gt;Eaton reported data center orders up 70% year-over-year. Transformers have become a bottleneck, with lead times stretching to 18+ months for large power transformers. Vertiv, which makes power management and thermal systems, is sitting on a \$9.5 billion backlog. Liquid cooling, once a niche technology, is becoming standard for high-density AI compute racks.&lt;/p&gt;
&lt;p&gt;Grid transmission and distribution may be the most underappreciated bottleneck. You can build a data center in 18 months. Getting grid interconnection can take three to five years. The physical infrastructure required to move power from generation to consumption is the constraint that's hardest to accelerate — and it benefits from AI expansion regardless of which models, chips, or cloud providers ultimately dominate.&lt;/p&gt;
&lt;h4&gt;Layer 3: Custom Silicon&lt;/h4&gt;
&lt;p&gt;The GPU-to-ASIC transition described above isn't just evidence that the Jevons expansion continues — it's itself a Jevons trigger. Each new silicon architecture that enters production at lower cost-per-token reopens the demand curve.&lt;/p&gt;
&lt;p&gt;Broadcom's AI semiconductor revenue is &lt;a href="https://baud.rs/9Hp791"&gt;doubling year-over-year to roughly \$8.2 billion in Q1 FY2026&lt;/a&gt;. Marvell's custom XPU pipeline is expanding across all major hyperscalers. Both companies are positioned on the ASIC side of the GPU-to-ASIC transition — the side that's growing at 44.6% versus 16.1%.&lt;/p&gt;
&lt;p&gt;Nvidia still dominates training workloads, and Blackwell delivers a &lt;a href="https://baud.rs/5ns8n0"&gt;10x cost-per-token reduction for open-source inference models&lt;/a&gt; — which is itself a massive Jevons input. But inference is bifurcating. Training demands flexibility and programmability (Nvidia's strength). Inference at scale demands efficiency and cost optimization (where ASICs excel). The market is splitting, and both sides drive expansion.&lt;/p&gt;
&lt;h4&gt;Layer 4: The Application Tier&lt;/h4&gt;
&lt;p&gt;This is where it gets speculative. Cloud providers and hyperscalers function as toll booths — they collect revenue proportional to total compute consumed, making them natural beneficiaries of demand expansion. But the application tier above them is where you're picking winners, not betting on expansion itself.&lt;/p&gt;
&lt;p&gt;AI-native companies become viable only at cheaper inference price points. The legal tech startup that can offer document review at one-tenth the cost of a junior associate doesn't exist at \$20 per million tokens. It might exist at \$2. It definitely exists at \$0.20. Each step down the cost curve unlocks a new tier of applications.&lt;/p&gt;
&lt;p&gt;The contrarian opportunity in this layer is latent demand — the markets that don't exist yet because the service was too expensive for most people. Roughly 80% of Americans who need a lawyer can't afford one. Most small businesses can't afford financial planning. Most students can't afford tutoring. If inference costs follow a Jevons trajectory, these aren't aspirational markets — they're inevitable markets. But investing in them means picking which company captures each one, which is a fundamentally different bet than investing in the infrastructure that serves all of them.&lt;/p&gt;
&lt;h3&gt;Who Else Is Making This Bet&lt;/h3&gt;
&lt;p&gt;This framework isn't contrarian anymore. &lt;a href="https://baud.rs/Wy7mZE"&gt;Satya Nadella tweeted&lt;/a&gt; "Jevons paradox strikes again!" when DeepSeek demonstrated cheaper inference without reducing demand. Microsoft's AI revenue hit \$13 billion, up 175% year-over-year. &lt;a href="https://baud.rs/xdNj4l"&gt;Fortune noted&lt;/a&gt; that Nadella's optimism was explicitly grounded in the paradox — cheaper AI means more AI, not less.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://baud.rs/aRLPY8"&gt;Andreessen Horowitz made the economic case directly&lt;/a&gt;: cheaper tokens unlock more demand than efficiency saves. Their thesis is that foundation model economics follow the same curve as prior compute economics — falling costs expand the addressable market faster than they reduce per-unit revenue.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://baud.rs/Qcm7AN"&gt;NPR's Planet Money covered the thesis&lt;/a&gt; in mainstream terms, bringing Jevons Paradox from an obscure 19th-century economic observation to a household framework for understanding AI economics. &lt;a href="https://baud.rs/V6W8hJ"&gt;Nathan Witkin's analysis&lt;/a&gt; showed that employment in software development, translation, and radiology &lt;em&gt;increased&lt;/em&gt; after GPT-3 — exactly the demand expansion the model predicts. &lt;a href="https://baud.rs/KUEJyl"&gt;Markman Capital&lt;/a&gt; called the "flawed consensus" of GPU diminishing returns "one of the most dangerous misreadings of the current market."&lt;/p&gt;
&lt;p&gt;&lt;a href="https://baud.rs/rD0Spu"&gt;Deloitte&lt;/a&gt;, McKinsey, and Bain are all projecting massive infrastructure buildout. &lt;a href="https://baud.rs/8hWfJa"&gt;McKinsey's \$7 trillion estimate&lt;/a&gt; for data center scaling reflects the same underlying logic: if demand expands as costs fall, the physical infrastructure to support it is the bottleneck.&lt;/p&gt;
&lt;p&gt;Jevons went from an obscure economics reference to a mainstream investment framework in roughly twelve months. That's not because it's trendy — it's because the data keeps confirming the pattern.&lt;/p&gt;
&lt;h3&gt;Where the Thesis Could Be Wrong&lt;/h3&gt;
&lt;p&gt;Intellectual honesty requires mapping the failure modes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Demand elasticity might be lower than historical precedent.&lt;/strong&gt; Every prior Jevons cycle involved inputs with massive latent demand — coal for industrial heat, transistors for consumer electronics, bandwidth for media. AI inference might not have the same depth of latent demand. If the tasks AI performs well are narrower than the tasks coal or transistors enabled, the expansion could stall earlier than the model predicts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Regulatory intervention could cap the expansion.&lt;/strong&gt; Energy policy, AI regulation, data center permitting restrictions — any of these could artificially constrain the physical infrastructure that the expansion requires. Jevons Paradox describes an economic dynamic, not a law of physics. It can be overridden by policy.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The biological ceiling is real.&lt;/strong&gt; As I argued in &lt;a href="https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox/"&gt;The AI Vampire Is Jevons Paradox&lt;/a&gt;, human judgment is the input that doesn't scale. If every Jevons expansion in AI ultimately concentrates demand on human decision-making, and human decision-making has genuine cognitive limits, the expansion hits a different kind of constraint — one that can't be solved with more silicon or more power.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Timing risk is the most likely failure mode.&lt;/strong&gt; The direction of the thesis could be correct while the timeline is wrong. Infrastructure bottlenecks might resolve more slowly than demand builds, creating periods of overinvestment followed by correction. The historical base rate favors Jevons, but base rates describe probabilities, not certainties. Plenty of investors have been right about the direction and still lost money because they were wrong about the timing.&lt;/p&gt;
&lt;h3&gt;The Physical Footprint of Expansion&lt;/h3&gt;
&lt;p&gt;The deepest layers — energy and physical infrastructure — are the safest Jevons bets. They benefit from AI demand expansion regardless of which models, chips, or companies win. You don't need to know whether GPT-7 or Claude 6 is the better model to know that both of them will need electricity, transformers, cooling, and grid capacity.&lt;/p&gt;
&lt;p&gt;The further up the stack you go, the more you're picking winners rather than betting on expansion. Custom silicon is a strong middle ground — the GPU-to-ASIC transition is structural, and the companies positioned on the right side of it have visible demand. But the application tier is where the uncertainty concentrates, and that's where most retail investors focus their attention.&lt;/p&gt;
&lt;p&gt;The expansion has a physical footprint. Every token generated requires electricity. Every data center requires grid interconnection. Every custom ASIC requires a fab slot. Every cooling system requires water. The Jevons expansion, if it plays out as the framework predicts, will be visible not in stock prices or earnings calls but in the physical world — in power generation capacity, in transformer lead times, in grid interconnection queues, in cooling system orders.&lt;/p&gt;
&lt;p&gt;Jevons won't announce itself. It never does. It shows up in electricity bills, in transformer backorders, in cooling system lead times, in the quiet scramble to secure power purchase agreements years in advance. The signal isn't in what people say about AI. It's in what they're building to support it.&lt;/p&gt;</description><category>ai</category><category>asic</category><category>data centers</category><category>economics</category><category>energy</category><category>gpu</category><category>infrastructure</category><category>investing</category><category>jevons paradox</category><category>nuclear</category><category>semiconductors</category><category>utilities</category><guid>https://tinycomputers.io/posts/investing-in-the-jevons-expansion.html</guid><pubDate>Thu, 05 Mar 2026 14:00:00 GMT</pubDate></item><item><title>Generating Technical Handbooks with AI: Parallel Agents, Source Code, and 2,400 Pages</title><link>https://tinycomputers.io/posts/generating-technical-handbooks-with-ai.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/generating-technical-handbooks-with-ai_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;17 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-handbook-generation/ballistics-engine-handbook-cover.jpg" alt="Cover of The Ballistics Engine Handbook — A Comprehensive Guide to Computational Exterior Ballistics, showing a bullet trajectory arc on a dark grid background" style="float: left; max-width: 300px; margin: 0 1.5em 1em 0; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Over the past few weeks I've generated three technical handbooks using Claude Code with Opus 4.6 and the Claude Agent SDK. &lt;a href="https://tinycomputers.io/data/ballistics-engine-handbook.pdf"&gt;The Ballistics Engine Handbook&lt;/a&gt; — 641 pages across 66 chapters covering computational exterior ballistics. &lt;a href="https://tinycomputers.io/data/lattice-handbook.pdf"&gt;The Lattice Handbook&lt;/a&gt; — 868 pages across 84 chapters documenting an entire programming language. &lt;a href="https://tinycomputers.io/data/sampo-cpu-handbook.pdf"&gt;The Sampo CPU Handbook&lt;/a&gt; — 871 pages across 82 chapters walking through the design, programming, and hardware implementation of a 16-bit RISC CPU.&lt;/p&gt;
&lt;p&gt;That's roughly 2,400 pages and 232 chapters of deeply technical content, generated from real codebases by AI agents that read actual source files before writing about them.&lt;/p&gt;
&lt;p&gt;These aren't ChatGPT summaries. They aren't the kind of vaguely plausible prose you get from asking an LLM to "write a book about X." Each handbook was produced by a framework that launches 10-12 Claude agents in parallel, each assigned a Part of the book, each with access to the real project source code, each writing &lt;a href="https://baud.rs/owj2PE"&gt;LaTeX&lt;/a&gt; chapters grounded in actual implementation. The result is documentation that references real functions, real CLI flags, real instruction encodings — because the agents read the code before writing about it.&lt;/p&gt;
&lt;h3&gt;Why Handbooks?&lt;/h3&gt;
&lt;p&gt;Developer documentation is chronically underwritten. Most projects ship with a README, maybe some auto-generated API docs, and a handful of examples. If you're lucky, there's a tutorial. The gap between "reference documentation" and "understanding how to actually use this thing" is enormous, and it's the gap where handbooks live.&lt;/p&gt;
&lt;p&gt;A good handbook explains not just what the API surface looks like but why the design decisions were made, how the pieces fit together, what the edge cases are, and how to use the tool effectively in real-world scenarios. Writing one for a complex project is a multi-month effort. For a solo developer maintaining a project in their spare time, it's effectively impossible — the opportunity cost is too high.&lt;/p&gt;
&lt;p&gt;AI changes the math. If you can point agents at source code and get a coherent, accurate, 600-page handbook, the cost drops from months to hours. The output isn't a finished book — it's a first draft that needs review, editing, and correction. But it's a dramatically better starting point than a blank page.&lt;/p&gt;
&lt;h3&gt;The Source-Aware Approach&lt;/h3&gt;
&lt;p&gt;What makes this different from asking a model to "write a book about ballistics" or "write a book about CPU design" is that the agents have access to the actual codebase.&lt;/p&gt;
&lt;p&gt;Each agent runs with its working directory set to the real project — the ballistics-engine Rust crate, the Lattice C compiler, or the Sampo CPU's Verilog and assembly. The agents have Read, Glob, and Grep access. They can open source files, search for function signatures, trace data structures, and understand the actual implementation before writing about it.&lt;/p&gt;
&lt;p&gt;The chapter definitions in the generation script include explicit source file references:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sourceReferences: ["src/atmosphere.rs", "src/drag.rs", "src/drag_model.rs"]
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When an agent starts writing a chapter on atmosphere modeling, the prompt tells it: read &lt;code&gt;src/atmosphere.rs&lt;/code&gt; first. The agent opens the file, sees the ICAO standard atmosphere implementation, finds the actual function signatures and constants, and writes a chapter grounded in what the code actually does — not what a language model thinks atmosphere modeling might look like.&lt;/p&gt;
&lt;p&gt;For the Ballistics Engine Handbook, this means chapters that reference real Rust functions, real CLI flags from &lt;code&gt;src/cli_api.rs&lt;/code&gt;, and real numerical methods from the solver. For the Sampo CPU Handbook, it means chapters that include actual Verilog module definitions, actual ISA encodings from the architecture spec, and actual assembler passes from the Rust toolchain. The agent reads &lt;code&gt;ENCODING.md&lt;/code&gt; and writes about instruction formats using the real bit layouts, not invented ones.&lt;/p&gt;
&lt;p&gt;This source-awareness is the difference between documentation that happens to sound plausible and documentation that is grounded in implementation. It doesn't eliminate hallucination — I'll get to that — but it dramatically reduces it.&lt;/p&gt;
&lt;h3&gt;The Parallel Agent Framework&lt;/h3&gt;
&lt;p&gt;The core of the system is a TypeScript file called &lt;code&gt;generate.mts&lt;/code&gt; that orchestrates parallel Claude Agent SDK sessions. Each handbook has its own version, but the architecture is the same.&lt;/p&gt;
&lt;p&gt;The book structure is defined as TypeScript data:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kd"&gt;interface&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Chapter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;number&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;pages&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;sections&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;Section&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;sourceReferences&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;interface&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Part&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;number&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;pageTarget&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;chapters&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;Chapter&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Each Part contains its chapters, each chapter lists its sections, page target, and which source files the agent should read. The Ballistics Engine Handbook has 9 Parts. The Lattice Handbook has 11. The Sampo Handbook has 11 plus appendices.&lt;/p&gt;
&lt;p&gt;When you run &lt;code&gt;npx tsx generate.mts&lt;/code&gt;, every Part launches as a separate Claude agent simultaneously:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;promises&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;BOOK_STRUCTURE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;runPartAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;runAppendixAgent&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;settled&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;allSettled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;promises&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For the Sampo Handbook, that's 12 agents running in parallel. Each one receives a detailed prompt containing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The full table of contents (so it knows what other Parts cover — cross-reference awareness)&lt;/li&gt;
&lt;li&gt;Its specific chapters, sections, descriptions, and page targets&lt;/li&gt;
&lt;li&gt;Source file references to read before writing&lt;/li&gt;
&lt;li&gt;A style guide (more on this below)&lt;/li&gt;
&lt;li&gt;LaTeX formatting conventions, custom environments, and commands&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each agent calls the Claude Agent SDK's &lt;code&gt;query()&lt;/code&gt; function with Opus 4.6, running in the source project's directory:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;cwd&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/Users/alexjokela/projects/ballistics-engine"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;allowedTools&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Read"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Glob"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grep"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bash"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;permissionMode&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bypassPermissions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;maxTurns&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;30&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chapters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-opus-4-6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The agents write LaTeX &lt;code&gt;.tex&lt;/code&gt; chapter files to a &lt;code&gt;chapters/&lt;/code&gt; directory, which are &lt;code&gt;\include{}&lt;/code&gt;'d by the main &lt;code&gt;book.tex&lt;/code&gt;. Each agent logs its progress to a per-part log file. When all agents finish, the script reports results — duration, success/failure status, and file sizes for each generated chapter.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Promise.allSettled()&lt;/code&gt; is important here. If one Part fails — the agent hits a turn limit, encounters an error, or produces incomplete output — the other nine or ten agents keep running. You can rerun a single failed Part with &lt;code&gt;--part=N&lt;/code&gt; without regenerating the entire book.&lt;/p&gt;
&lt;p&gt;The parallelism is the key performance insight. A single agent writing all 82 chapters of the Sampo Handbook sequentially would take many hours. Twelve agents writing in parallel, each handling 7-8 chapters, complete the entire book in roughly 45 minutes to an hour of wall-clock time. The agents don't share state or coordinate — they work independently, which is what makes parallelism straightforward.&lt;/p&gt;
&lt;h3&gt;The Style Guide Problem&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-handbook-generation/lattice-handbook-cover.jpg" alt="Cover of The Lattice Handbook — A Comprehensive Guide to the Lattice Programming Language, showing a crystalline lattice structure on a deep purple background" style="float: right; max-width: 300px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Each handbook has a distinct voice, defined in a &lt;code&gt;CLAUDE.md&lt;/code&gt; file that the agent reads before starting:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Ballistics Engine Handbook&lt;/strong&gt;: "Technical, authoritative, and practical. Inspired by O'Reilly's Definitive Guide series." Use real cartridge data in every example. Show the physics. Include safety warnings for anything involving pressure or load data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Lattice Handbook&lt;/strong&gt;: "Conversational, precise, and playful. Inspired by Why's (Poignant) Guide to Ruby and Eloquent Ruby." Use chemistry and materials science metaphors for the phase system — values are materials that can be fluid or crystallized, freezing is literally crystallization.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Sampo CPU Handbook&lt;/strong&gt;: "Technical, authoritative, and hands-on. Think of it as a lab notebook that became a textbook." Show both hex and binary for instruction encodings. Use real code from the project — never invent hypothetical assembly or Verilog.&lt;/p&gt;
&lt;p&gt;Maintaining consistent voice across 10-12 agents writing simultaneously is a genuine challenge. Each agent reads the same style guide, but interpretation varies. What works: detailed, specific instructions with concrete examples of what to do and what not to do. All three guides include an explicit list of banned words — "simple," "easy," "trivial," "obviously," "just" — because those words make struggling readers feel bad and they're the first thing an LLM reaches for when transitioning between concepts.&lt;/p&gt;
&lt;p&gt;What doesn't work: vague instructions like "be conversational" or "keep it engaging." Every agent interprets those differently. The Lattice Handbook's metaphor system — where the phase-based type system is described using chemistry analogies — required explicit instructions: "Values are materials. Freezing is crystallization. Thawing is melting. Arenas are regions where crystals are stored." Without that specificity, some agents would use the metaphors and others wouldn't, and the book would feel like it had multiple authors — which, in a sense, it does.&lt;/p&gt;
&lt;p&gt;The "no AI self-reference" rule is also critical. The style guide explicitly states: "Book content must read as if written entirely by the author — no references to AI assistance." Without this, agents occasionally produce meta-commentary about their own generation process, which breaks immersion.&lt;/p&gt;
&lt;h3&gt;What Goes Wrong&lt;/h3&gt;
&lt;p&gt;An honest assessment of failure modes:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hallucinated APIs.&lt;/strong&gt; Despite source-awareness, agents sometimes invent function signatures, CLI flags, or configuration options that don't exist. This is the most dangerous failure mode because it reads authoritatively. The mitigation — explicit source file references — reduces but doesn't eliminate it. Every chapter needs a review pass where someone checks that the referenced functions and flags actually exist.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Uneven depth.&lt;/strong&gt; Some chapters come out thin — hitting the minimum viable content but lacking the depth a handbook reader expects. Others balloon beyond their page target with redundant examples. Page targets in the chapter definitions help, but agents treat them as loose guidelines.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cross-reference gaps.&lt;/strong&gt; Agent writing Part III doesn't know exactly what Agent writing Part VIII said. Each agent gets the full table of contents for awareness, but not the actual content of other Parts. This means cross-references are sometimes vague ("as we'll see in Chapter 25") or occasionally contradictory. The LaTeX &lt;code&gt;\cref{}&lt;/code&gt; system helps — agents insert labels and cross-references that at least compile correctly — but semantic consistency across Parts requires a human review pass.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;LaTeX formatting inconsistencies.&lt;/strong&gt; Different agents make different choices about when to use &lt;code&gt;\begin{notebox}&lt;/code&gt; vs. &lt;code&gt;\begin{tipbox}&lt;/code&gt;, how to format code listings, whether to put output inline or in a separate listing. The style guide constrains this, but the variation is noticeable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The confident-but-wrong problem.&lt;/strong&gt; AI writes with unwavering authority about implementation details it misread. An agent might open a Rust file, misinterpret a match arm, and write a paragraph confidently explaining behavior that the code doesn't actually produce. This is the hardest failure to catch because the prose sounds correct and references real source files — you have to actually trace the logic to find the error.&lt;/p&gt;
&lt;p&gt;The regeneration workflow handles most of these: rerun a single Part with &lt;code&gt;--part=N&lt;/code&gt;, review the output, iterate. A full regeneration of one Part takes about five minutes — fast enough to make iterative refinement practical.&lt;/p&gt;
&lt;h3&gt;Results and Numbers&lt;/h3&gt;
&lt;p&gt;Across the three handbooks:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Ballistics Engine&lt;/th&gt;
&lt;th&gt;Lattice&lt;/th&gt;
&lt;th&gt;Sampo CPU&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pages&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;641&lt;/td&gt;
&lt;td&gt;868&lt;/td&gt;
&lt;td&gt;871&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chapters&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;td&gt;84&lt;/td&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9 + Appendices&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;11 + Appendices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parallel Agents&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Source Language&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td&gt;Verilog/Rust/Assembly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Style&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;O'Reilly Guide&lt;/td&gt;
&lt;td&gt;Why's Poignant&lt;/td&gt;
&lt;td&gt;Lab Notebook&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The model for all three is Claude Opus 4.6. Total generation time per handbook is roughly 45-60 minutes wall-clock with parallel agents — compared to what would be 8+ hours running sequentially.&lt;/p&gt;
&lt;h4&gt;What This Costs&lt;/h4&gt;
&lt;p&gt;All of this work was done on a Claude Max subscription at \$200/month. At that tier, you get access to Opus 4.6 through Claude Code with what Anthropic describes as "significantly higher" usage limits than the \$20 Pro plan. How much higher? That's where things get vague.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-handbook-generation/claude-subscription-usage.png" alt="The Claude usage settings page showing plan usage limits — a session meter at 4% used, a weekly 'All models' meter at 66% used, a 'Sonnet only' meter at 1% used, and an 'Extra usage' toggle with \$0.00 spent. No token counts, no rate limits, no concrete units — just percentages of an unstated total." style="max-width: 100%; margin: 1em 0 1.5em 0; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Anthropic doesn't publish concrete token limits or rate caps for Max. The pricing page says you get "9x more usage" than Pro — but 9x of what? The Pro plan's limits are themselves unstated. You get a usage meter in the interface that fills up and eventually throttles you, but there's no documentation of what the meter measures, how it maps to tokens, or what the actual ceiling is. When you hit the limit, you're told to wait or upgrade. The \$100/month tier exists between Pro and Max, and Anthropic is equally vague about how it differs from either.&lt;/p&gt;
&lt;p&gt;In practice, the Max subscription was sufficient to generate all three handbooks — 2,400 pages of content produced by dozens of parallel agent sessions running Opus 4.6 — within a single billing cycle, without hitting much throttling that would have blocked the work. Whether that's representative of the limit or I happened to stay under it, I genuinely don't know. Anthropic's refusal to publish concrete limits makes it impossible to do the math in advance. You can't calculate cost-per-page or tokens-per-dollar because the denominator is secret.&lt;/p&gt;
&lt;p&gt;This is a strange posture for a company selling a product. The \$200/month price point positions Max as a professional tool — something you'd expense to a business or justify as a productivity investment. Professional tools come with specs. You know how many build minutes your CI plan includes. You know how many API calls your database tier supports. You know how many seats your Slack plan covers. Anthropic is asking for \$200/month and answering the question "what do I get for that?" with essentially "a lot — trust us."&lt;/p&gt;
&lt;p&gt;For what it's worth, the alternative would have been the API, where pricing is transparent: Opus 4.6 runs roughly \$15 per million input tokens and \$75 per million output tokens. Back-of-envelope math suggests that generating a single 800-page handbook through the API — with all the source file reading, prompt construction, and chapter output — would consume something in the range of several hundred dollars of tokens. Three handbooks would plausibly run \$500-1,000+ through direct API billing. If that estimate is in the right ballpark, the Max subscription is a genuine bargain for this kind of heavy-generation workload — you just have to take it on faith because Anthropic won't show you the numbers.&lt;/p&gt;
&lt;p&gt;For comparison against the alternative: a professional technical writer producing this volume of deeply technical content — requiring them to understand exterior ballistics physics, or compiler internals, or CPU microarchitecture — would represent months of full-time work at rates that would make the API costs look trivial. The AI-generated output is a first draft, not a finished product. But it's a first draft that covers the full scope, references real source code, and provides a structure that would take weeks to produce manually.&lt;/p&gt;
&lt;h3&gt;What This Means for Documentation&lt;/h3&gt;
&lt;p&gt;This connects to a theme I've been writing about in the &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox series&lt;/a&gt;: documentation is a classic example of latent demand suppressed by cost.&lt;/p&gt;
&lt;p&gt;Most open-source projects have mediocre documentation because good documentation is expensive to produce. A solo maintainer choosing between implementing features and writing a 600-page handbook will choose features every time. The handbook doesn't get written — not because it wouldn't be valuable, but because the cost of producing it exceeds the maintainer's available time.&lt;/p&gt;
&lt;p&gt;If handbook generation becomes cheap enough, every serious project gets one. The total volume of technical documentation doesn't decrease — it explodes. And the human role shifts from production to curation. The expensive work isn't writing 600 pages anymore. It's defining the structure — deciding what the book should cover, in what order, at what depth. It's reviewing the output for accuracy — catching hallucinated APIs, verifying that code examples actually run, ensuring cross-references are coherent. It's editing for voice — making sure the playful tone of the Lattice Handbook doesn't lapse into the authoritative register of the Ballistics Handbook.&lt;/p&gt;
&lt;p&gt;This is Jevons in miniature. Cheaper documentation doesn't mean less documentation work. It means more documentation exists, and humans focus on the higher-judgment parts: structure, accuracy, and editorial voice.&lt;/p&gt;
&lt;h3&gt;The Framework Is the Product&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://baud.rs/mKMEFE"&gt;&lt;code&gt;generate.mts&lt;/code&gt; pattern&lt;/a&gt; is reusable. The same architecture — define a book structure in TypeScript, launch parallel agents with source code access, collect LaTeX output — applies to any project with a codebase and a desired handbook.&lt;/p&gt;
&lt;p&gt;The bottleneck isn't the AI's ability to write. It's the human's ability to define what the handbook should contain and whether the output is correct. Defining the structure for the Sampo Handbook — 11 Parts, 82 chapters, hundreds of sections, source file references for each — took longer than running the generation. Reviewing and correcting the output takes longer than generating it.&lt;/p&gt;
&lt;p&gt;That bottleneck is itself a Jevons observation. When the cost of producing prose drops to near zero, the scarce input becomes human judgment about what the prose should say and whether it's right. The generation is the cheap part. The thinking is the expensive part. As it always has been.&lt;/p&gt;</description><category>ai</category><category>claude agent sdk</category><category>claude code</category><category>documentation</category><category>handbooks</category><category>latex</category><category>opus 4.6</category><category>parallel agents</category><category>technical writing</category><guid>https://tinycomputers.io/posts/generating-technical-handbooks-with-ai.html</guid><pubDate>Wed, 04 Mar 2026 17:00:00 GMT</pubDate></item><item><title>The AI Vampire Is Jevons Paradox</title><link>https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-ai-vampire-is-jevons-paradox_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;15 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-vampire-jevons/burne-jones-the-vampire-1897.jpg" alt="The Vampire, an 1897 painting by Philip Burne-Jones depicting a pale woman draped over a prostrate man — the visual origin of the vampire as metaphor for extraction" style="float: right; max-width: 40%; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Steve Yegge's &lt;a href="https://baud.rs/dJwDgQ"&gt;"The AI Vampire"&lt;/a&gt; has been circulating among developers and managers for the past few weeks, and it's striking a nerve. The core argument: AI makes you dramatically more productive — Yegge estimates 10x or more — but companies capture the entire surplus. You don't get a shorter workday. You get 10x the output at the same hours, with the cognitive load compressed into pure decision-making. The result is burnout on a scale the industry hasn't seen before. His prescription is blunt: calculate your \$/hr, work three to four hours a day, and refuse to let the vampire drain you dry.&lt;/p&gt;
&lt;p&gt;It's a compelling piece, written with Yegge's characteristic directness and self-awareness. And it describes something real. But as I read it, I kept seeing something he doesn't name — a pattern I've been writing about for months.&lt;/p&gt;
&lt;p&gt;This is the fourth piece in what has become a series on Jevons Paradox and AI economics. The &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;first&lt;/a&gt; traced the paradox through the semiconductor industry. The &lt;a href="https://tinycomputers.io/posts/the-jevons-counter-thesis-why-ai-displacement-scenarios-underweight-demand-expansion.html"&gt;second&lt;/a&gt; argued that AI displacement scenarios systematically undercount demand expansion. The &lt;a href="https://tinycomputers.io/posts/moores-law-for-intelligence-what-happens-when-thinking-gets-cheap.html"&gt;third&lt;/a&gt; explored what happens when the cost of intelligence follows a Moore's Law trajectory. Along the way, I responded to &lt;a href="https://tinycomputers.io/posts/something-big-is-happening-a-critique.html"&gt;Matt Shumer's displacement argument&lt;/a&gt; with the same framework.&lt;/p&gt;
&lt;p&gt;Those pieces all looked at the macro picture — markets expanding, new industries forming, total economic activity growing. Yegge is describing the micro picture. What it actually feels like to be a human worker inside a Jevons expansion. And what he's describing, whether he uses the term or not, is Jevons Paradox operating on human attention.&lt;/p&gt;
&lt;h3&gt;The Jevons Pattern, One More Time&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-vampire-jevons/meunier-descent-of-miners-1882.jpg" alt="Descent of the Miners into the Shaft, an 1882 painting by Constantin Meunier showing coal miners descending into a mine — the human beings at the point of production in the original Jevons cycle" style="max-width: 100%; margin: 0 0 1.5em 0; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;The pattern is simple enough to state in a sentence: when a critical input gets cheaper, demand expands beyond the efficiency gain. Total consumption of the input rises, not falls.&lt;/p&gt;
&lt;p&gt;Coal got cheaper per unit of useful work. Total coal consumption surged as new applications became viable. Transistors got cheaper per unit of compute. Total compute spending grew by orders of magnitude. Bandwidth got cheaper per unit of data. Total data consumption exploded. The per-unit savings are overwhelmed by the explosion in total units demanded.&lt;/p&gt;
&lt;p&gt;In my previous pieces, I applied this at the macro level. Cognitive output gets cheaper through AI. New industries emerge. Demand for cognitive work expands. The economy restructures around abundant, cheap intelligence. That argument is about markets, GDP, and employment categories — the aerial view.&lt;/p&gt;
&lt;p&gt;But Jevons has always had a micro counterpart. When coal got cheaper, individual mines didn't shut down early — they ran harder, longer, extracting more because the economics now justified it. When compute got cheaper, individual developers didn't write less code — they wrote vastly more, because the constraints that had limited what was practical evaporated. The expansion creates pressure at every level of the system, not just at the top.&lt;/p&gt;
&lt;p&gt;The macro story is about new markets forming. The micro story is about what happens to the people at the point of production — the ones whose labor is the input that just got cheaper.&lt;/p&gt;
&lt;h3&gt;What Yegge Is Actually Describing&lt;/h3&gt;
&lt;p&gt;Yegge's framework centers on a value-capture trap. He presents two scenarios:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Scenario A:&lt;/strong&gt; AI makes you 10x more productive. Your company captures the surplus. You now produce 10x the output at the same salary and hours. The company benefits. You burn out.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Scenario B:&lt;/strong&gt; You recognize the \$/hr math. If you were worth \$150/hr before AI and now produce 10x the output, your effective rate should be \$1,500/hr — or equivalently, you should work one-tenth the hours for the same salary. You work three to four hours a day, produce what used to take a full day, and keep your sanity.&lt;/p&gt;
&lt;p&gt;He frames this as a choice between being exploited and being strategic. And he's honest about the difficulty of Scenario B — most people can't negotiate a three-hour workday, most companies won't accept it, and the competitive dynamics push relentlessly toward Scenario A.&lt;/p&gt;
&lt;p&gt;Yegge's most vivid metaphor is that "AI has turned us all into Jeff Bezos." At Amazon, Bezos sat atop a machine that handled volume — logistics, warehousing, customer service, shipping — while he focused exclusively on high-leverage decisions. AI does the same thing for individual workers. It absorbs the volume work — the boilerplate code, the routine analysis, the standard responses — and leaves you with a residue of pure judgment calls. Every decision is consequential. Every hour is cognitively expensive.&lt;/p&gt;
&lt;p&gt;He also has an important moment of self-awareness. Yegge acknowledges that his own experience — forty years of engineering, unlimited AI tokens, deep familiarity with the tools — represents "unrealistic beauty standards" for the average developer. He's the equivalent of the fitness influencer whose workout routine is their full-time job. Most people don't have his context, his autonomy, or his leverage to negotiate Scenario B.&lt;/p&gt;
&lt;p&gt;And he identifies a crucial accelerant: the startup gold rush. AI has made it cheap enough to launch a company that "a million founders are chasing the same six ideas." This intensifies competition, which intensifies the pressure to push the output dial higher, which feeds the vampire.&lt;/p&gt;
&lt;h3&gt;The Jevons Connection&lt;/h3&gt;
&lt;p&gt;Here's what Yegge is describing in Jevons terms.&lt;/p&gt;
&lt;p&gt;AI makes cognitive output dramatically cheaper. Jevons predicts that demand won't fall in response — it will increase. That's exactly what happens. Companies don't say "same output, fewer hours." They say "10x the output, same hours." The efficiency gain doesn't reduce consumption of the input. It increases consumption. This is the paradox, and it is playing out precisely as the model predicts.&lt;/p&gt;
&lt;p&gt;But there's something different about this Jevons cycle — something that doesn't have a precedent in the historical cases.&lt;/p&gt;
&lt;p&gt;Coal doesn't get tired. Transistors don't burn out. Bandwidth doesn't need a nap. Every prior Jevons cycle involved an inert input. You could mine more coal, fabricate more chips, lay more fiber. When demand expanded, supply expanded to meet it, and the system found a new equilibrium at higher volume. The input didn't resist. It didn't have a biological ceiling.&lt;/p&gt;
&lt;p&gt;Human attention does.&lt;/p&gt;
&lt;p&gt;AI creates a concentration effect that Yegge describes precisely: it absorbs high-volume, routine work and leaves humans with a residue of pure judgment. The judgment work is, by definition, the most cognitively expensive kind of work — the kind that requires deep focus, contextual understanding, and the willingness to be wrong. And demand for this judgment work expands Jevons-style as AI makes the overall process cheaper. More projects get launched. More code gets written. More decisions need to be made. The volume of judgment calls scales with the volume of output, even as AI handles everything else.&lt;/p&gt;
&lt;p&gt;The problem is that the biological supply of deep, focused judgment is fixed. The deep work literature — Cal Newport and others have documented this extensively — converges on roughly three to four hours per day as the upper bound for sustained, cognitively demanding work. This isn't a cultural preference or a lifestyle choice. It's a constraint imposed by neurobiology. Attention is a depletable resource that recovers on a fixed biological schedule.&lt;/p&gt;
&lt;p&gt;This is the first Jevons cycle where expanding demand hits a hard biological ceiling on the input.&lt;/p&gt;
&lt;p&gt;Yegge's startup observation is also a Jevons phenomenon. AI made starting a company cheaper, so the number of startups exploded. More startups means more competition. More competition means more pressure to maximize output per person. The expansion creates its own acceleration — a feedback loop where cheaper cognitive output produces more ventures, which produce more demand for cognitive output, which increases the pressure on the humans in the loop.&lt;/p&gt;
&lt;p&gt;And the "unrealistic beauty standards" problem has a Jevons name too: it's the efficiency benchmark effect. In every Jevons cycle, the most efficient user of the cheaper input sets the competitive pace for everyone else. The factory that adopted steam power first forced every competitor to adopt it or die. The company that adopted AI first forces every competitor to match its output-per-employee or lose. Yegge, with his forty years and unlimited tokens, is the equivalent of the first factory with a Watt engine. His output level becomes the standard against which everyone is measured — even though most people can't replicate his efficiency.&lt;/p&gt;
&lt;h3&gt;Where the Ceiling Matters&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-vampire-jevons/coal-thrusters-trapper-1854.jpg" alt="Two coal thrusters and a trapper in a British coal mine, from J. C. Cobden's White Slaves of England, 1854 — the human cost of running an input at maximum extraction" style="float: left; max-width: 40%; margin: 0 1.5em 1em 0; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;In every prior Jevons cycle, the resolution was supply expansion. Coal demand surged — mine more coal. Compute demand surged — fabricate more chips. Bandwidth demand surged — lay more fiber. The system found equilibrium at higher volume because the input could scale.&lt;/p&gt;
&lt;p&gt;Human cognitive capacity doesn't scale. You can't mine more judgment. You can't fabricate more attention. The three-to-four-hour ceiling on deep work isn't going to move because a company's OKRs demand it.&lt;/p&gt;
&lt;p&gt;This means a Jevons expansion in demand for human judgment has to resolve differently than prior cycles. There are really only three paths:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Better tooling that reduces the judgment burden.&lt;/strong&gt; AI gets good enough to handle more decisions autonomously, pushing the human-in-the-loop threshold higher. The frontier of what requires human judgment retreats as AI capability advances. This is already happening — the boundary between "AI can handle this" and "a human needs to decide" is moving rapidly. But it's not moving fast enough to outpace the demand expansion, which is why Yegge's burnout observation is accurate right now even if the long-term trajectory favors less human involvement.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Organizational restructuring.&lt;/strong&gt; More people, fewer high-stakes decisions each. Instead of one developer making judgment calls on 10x the output, you have three developers each handling a manageable portion. This is the "hire more" response, and it pushes back against the cost-reduction motive that drives Scenario A. Companies that pursue this path may produce better outcomes but at higher cost, which competitive dynamics tend to punish.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cultural pushback.&lt;/strong&gt; Yegge's \$/hr formula. Workers internalize the fixed-supply economics of their own attention, price it accordingly, and refuse to let demand expansion drain it below sustainable levels. This is individually rational but collectively difficult — it requires either enough leverage to negotiate, or enough cultural shift to change expectations.&lt;/p&gt;
&lt;p&gt;Yegge's \$/hr formula is, in Jevons terms, an attempt to set equilibrium for a fixed-supply resource. It is the cognitive equivalent of OPEC production quotas — an effort to prevent the price of a scarce input from being driven to zero by unconstrained demand. And like OPEC quotas, it works only if enough participants enforce it.&lt;/p&gt;
&lt;h3&gt;What This Means for the Macro Picture&lt;/h3&gt;
&lt;p&gt;I want to be honest about what Yegge's observation adds to the framework I've been building.&lt;/p&gt;
&lt;p&gt;My previous pieces argued that when cognitive output gets cheaper, demand expansion will create new economic activity that exceeds the displacement. I stand by that argument. But I underweighted the human-in-the-loop constraint. The demand expansion is real — new markets form, new companies launch, total economic activity grows. But every unit of that expanded activity still requires some quantum of human judgment, and that judgment runs on biological hardware with a fixed daily capacity.&lt;/p&gt;
&lt;p&gt;This doesn't invalidate the macro Jevons argument. Demand will expand. New industries will form. Total employment will restructure, not collapse. But the human attention constraint acts as a speed governor on the expansion. The economy can't scale cognitive output infinitely by just pushing the existing workforce harder, because the existing workforce has a biological ceiling on the input that matters most.&lt;/p&gt;
&lt;p&gt;This argues for Yegge's three-to-four-hour workday not as a lifestyle aspiration but as something closer to an economic inevitability — the natural equilibrium point for a Jevons cycle operating on a fixed-supply input. When demand for an input exceeds the maximum sustainable rate of supply, the system must either find a substitute (AI handling more decisions autonomously), expand the supplier base (more workers, shorter hours each), or accept a constrained equilibrium (the three-hour workday). Some combination of all three is likely.&lt;/p&gt;
&lt;p&gt;The interesting implication is that the Jevons expansion and the burnout crisis are not contradictory phenomena. They're the same phenomenon viewed from different vantage points. The macro analyst sees demand expanding and new economic activity forming. The individual worker sees an unsustainable cognitive load. Both are correct. They're describing different aspects of the same system adjusting to a radically cheaper input.&lt;/p&gt;
&lt;h3&gt;The Vampire and the Paradox&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/ai-vampire-jevons/nosferatu-count-orlok-1922.jpg" alt="Max Schreck as Count Orlok in Nosferatu, 1922 — the vampire as an image of relentless, impersonal extraction" style="float: right; max-width: 300px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Matt Shumer &lt;a href="https://tinycomputers.io/posts/something-big-is-happening-a-critique.html"&gt;worries about displacement&lt;/a&gt; — losing your job to AI. Steve Yegge worries about what happens to the people who aren't displaced — who keep their jobs but get vampired. Both are describing real phenomena. Neither is the whole picture.&lt;/p&gt;
&lt;p&gt;The Jevons framework encompasses both. Demand expansion creates new work, answering Shumer's displacement concern — the economy doesn't contract, it restructures. But the expansion concentrates cognitive load on the humans who remain in the loop, confirming Yegge's burnout observation — because the one input AI can't replace is the one input that can't scale.&lt;/p&gt;
&lt;p&gt;Shumer's error is modeling only the displacement side. Yegge's error is modeling only the extraction side. The full picture includes both: an economy producing vastly more cognitive output, creating genuinely new economic activity, while simultaneously pushing the humans at the center of it toward a biological wall.&lt;/p&gt;
&lt;p&gt;The vampire is real. It's also, like every Jevons cycle, a signal that something genuinely new is being created — that demand is expanding into territory that didn't exist before. The burnout isn't incidental to the expansion. It's a symptom of it. And like every prior Jevons cycle, the system will find an equilibrium — not because anyone plans it, but because a fixed-supply input eventually forces one. The question is how much damage the vampire does before we get there.&lt;/p&gt;</description><category>ai</category><category>burnout</category><category>critique</category><category>demand expansion</category><category>economics</category><category>jevons paradox</category><category>labor</category><category>productivity</category><category>steve yegge</category><category>technology</category><guid>https://tinycomputers.io/posts/the-ai-vampire-is-jevons-paradox.html</guid><pubDate>Wed, 04 Mar 2026 14:00:00 GMT</pubDate></item><item><title>Something Big Is Happening — And Something Big Is Missing</title><link>https://tinycomputers.io/posts/something-big-is-happening-a-critique.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/something-big-is-happening-a-critique_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;18 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Matt Shumer's &lt;a href="https://baud.rs/POg6A7"&gt;"Something Big Is Happening"&lt;/a&gt; has been making the rounds — forwarded by founders, reposted by VCs, shared by worried parents and recent graduates. If you haven't read it, the core argument is straightforward: AI capabilities are advancing at an unprecedented pace, the public doesn't appreciate how fast things are moving, and roughly half of entry-level white-collar jobs will be displaced within one to five years. He frames this as a personal warning to the non-technical people in his life, drawing an explicit parallel to February 2020 — the moment before COVID when the warnings were there but most people weren't listening.&lt;/p&gt;
&lt;p&gt;It is a well-written, earnest piece, and it resonated for a reason. The capability gains are real. The perception gap is real. The practical advice is genuinely useful. Shumer deserves credit for engaging seriously with a question that most people in his position — CEO of an AI company — have financial incentives to either hype or deflect.&lt;/p&gt;
&lt;p&gt;But the piece has a hole in the center of it, and it's the same hole that appears in nearly every AI displacement argument I've encountered. I've written about this through the lens of &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox&lt;/a&gt;, explored it as a &lt;a href="https://tinycomputers.io/posts/the-jevons-counter-thesis-why-ai-displacement-scenarios-underweight-demand-expansion.html"&gt;direct counter-thesis to displacement scenarios&lt;/a&gt;, and examined what happens when you &lt;a href="https://tinycomputers.io/posts/moores-law-for-intelligence-what-happens-when-thinking-gets-cheap.html"&gt;apply Moore's Law to the cost of intelligence itself&lt;/a&gt;. The pattern is consistent, and Shumer's piece reproduces the analytical error at its core: it models what AI replaces without modeling what AI creates.&lt;/p&gt;
&lt;h3&gt;The Steelman&lt;/h3&gt;
&lt;p&gt;Before critiquing the piece, I want to present its strongest version in good faith — because Shumer gets several important things right, and dismissing the argument wholesale would be intellectually lazy.&lt;/p&gt;
&lt;p&gt;The capability curve is real. METR benchmarks show AI task completion doubling roughly every seven months, possibly accelerating. Shumer cites this data, and it's legitimate. I've experienced the curve firsthand. Over the past year and a half, I've built a &lt;a href="https://tinycomputers.io/posts/open-sourcing-a-high-performance-rust-based-ballistics-engine.html"&gt;high-performance Rust-based ballistics engine&lt;/a&gt; and &lt;a href="https://tinycomputers.io/posts/introducing-lattice-a-crystallization-based-programming-language.html"&gt;Lattice, an entire programming language&lt;/a&gt; with a novel phase-based type system — working across GPT-4, GPT-4o, Claude Haiku, Opus, and most recently Opus 4.6 with Claude Code. The progression itself is the data point. Early models could help with fragments. Today's frontier models reason across thousands of lines of interconnected code — tracking type systems, managing compiler passes, understanding how changes in one module ripple through the rest. These aren't toy demos. They're production-quality projects where the AI operated at the architectural level. The capability gap between late 2024 and early 2026 is genuinely striking.&lt;/p&gt;
&lt;p&gt;The perception gap is real too. Shumer makes a point that doesn't get enough attention: most people's experience with AI is limited to free-tier models that lag frontier capabilities by twelve months or more. Someone who tried ChatGPT once in 2024 and found it mediocre is extrapolating from hardware that's already obsolete. The gap between the free experience and the paid frontier experience is larger than most people realize, and Shumer is right to flag it.&lt;/p&gt;
&lt;p&gt;The self-improvement feedback loops are real. OpenAI has stated that GPT-5.3 Codex was "instrumental in creating itself." Anthropic's training pipeline uses prior Claude models to evaluate training examples. Dario Amodei predicts AI autonomously building next-generation versions within one to two years. These aren't speculative claims — they're descriptions of current practice, and they compress the improvement timeline.&lt;/p&gt;
&lt;p&gt;Shumer's practical advice is sound: use paid tools, select the best available models, spend an hour a day experimenting, build financial resilience, develop adaptability as a core skill. This is good counsel regardless of how the macro picture unfolds.&lt;/p&gt;
&lt;p&gt;And the urgency is not manufactured. Whatever you think the economic consequences will be, the pace of capability improvement is unprecedented in the history of technology. Shumer is right that most people are not paying attention. Where he goes wrong is in what he concludes from that observation.&lt;/p&gt;
&lt;h3&gt;The Substitution Fallacy&lt;/h3&gt;
&lt;p&gt;Here is Shumer's core analytical error — and the one that most critiques of his piece also miss.&lt;/p&gt;
&lt;p&gt;He treats "AI can do X" as equivalent to "AI will replace all humans doing X." His piece moves through a list of job categories — legal work, financial analysis, software engineering, medical analysis, customer service — and for each one, the logic is: AI can now perform this work at a level that rivals human professionals, therefore the humans performing this work are at risk. Implicit in this framing is the assumption that the economy stays roughly the same size, with machines doing work that humans used to do. The number of legal analyses needed stays constant. The number of financial models stays constant. The amount of software stays constant. AI just does it cheaper.&lt;/p&gt;
&lt;p&gt;This is the substitution frame, and it has been wrong by orders of magnitude at every prior technological inflection point.&lt;/p&gt;
&lt;p&gt;I explored this in detail in my &lt;a href="https://tinycomputers.io/posts/the-jevons-counter-thesis-why-ai-displacement-scenarios-underweight-demand-expansion.html"&gt;Jevons counter-thesis&lt;/a&gt;. The mechanism is straightforward: when a critical input becomes dramatically cheaper, the addressable market for everything that uses that input expands. New use cases emerge that were previously uneconomical. Existing use cases scale to populations that were previously priced out. Total consumption of the now-cheaper input rises even as the per-unit cost falls.&lt;/p&gt;
&lt;p&gt;The numbers on latent demand are not speculative. Roughly 80% of Americans who need legal help cannot afford it. Personalized tutoring is a luxury good — \$50 to \$100 per hour puts it out of reach for the average family. Custom software development, at \$50,000 or more per engagement, is inaccessible to most small businesses. Personalized financial planning is available only to households with six-figure investable assets. These aren't hypothetical markets. They are documented, unmet demand suppressed by the cost of the human intelligence required to serve them.&lt;/p&gt;
&lt;p&gt;When Shumer writes that his lawyer friend finds AI "rivals junior associates" for contract review and legal research, the Jevons question is: what happens when legal analysis costs one-hundredth what it costs today? The answer isn't "lawyers lose their jobs." It's "hundreds of millions of people who currently have zero legal representation suddenly have access to it." The total volume of legal analysis performed doesn't shrink. It explodes. Whether that explosion employs as many human lawyers as today is a genuine question — but it's a very different question from "AI replaces lawyers," and Shumer's piece never asks it.&lt;/p&gt;
&lt;p&gt;The same logic applies to every category on his list. If financial modeling becomes 100x cheaper, every small business gets CFO-grade analysis — a market expansion of orders of magnitude relative to the current financial services industry. If software development becomes 100x cheaper, the barrier between "person with an idea" and "working application" functionally disappears, and the total volume of software produced doesn't shrink — it expands to include millions of applications that nobody would build at current costs.&lt;/p&gt;
&lt;h3&gt;The Pandemic Analogy Problem&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/something-big-critique/taylor-power-loom-1862.jpg" alt="W.G. Taylor's Patent Power Loom Calico Machine, an 1862 engraving showing Victorian-era visitors in top hats and crinolines admiring an industrial power loom — technology as spectacle, observed without full understanding of its economic consequences" style="float: right; max-width: 40%; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;"Think back to February 2020." It's an emotionally effective opening, and it does exactly what Shumer intends — it activates the memory of a time when the warnings were there but most people didn't act until it was too late. As a rhetorical device, it works. As an analytical framework, it's misleading.&lt;/p&gt;
&lt;p&gt;COVID was a pure externality. It destroyed without creating. A virus doesn't generate new economic activity as it spreads. It imposes costs, disrupts supply chains, and kills people. The appropriate response was defensive: stockpile supplies, get vaccinated, stay home. The framing of individual survival — how do &lt;em&gt;I&lt;/em&gt; get through this — was correct for a pandemic because a pandemic doesn't create opportunity. It just destroys.&lt;/p&gt;
&lt;p&gt;Technology transitions are categorically different. They create as they destroy, and historically, the creation overwhelms the destruction. The better analogy — the one Shumer doesn't use — is the semiconductor revolution. Computing destroyed millions of clerical, typist, switchboard operator, and filing clerk jobs. It also created the software industry, the internet economy, the mobile ecosystem, social media, cloud computing, e-commerce logistics, and millions of roles that had no conceptual precursor in the prior economy. Total employment didn't shrink. It restructured and grew.&lt;/p&gt;
&lt;p&gt;The pandemic analogy does something else that's analytically costly: it frames the correct response as individual survival. How do &lt;em&gt;I&lt;/em&gt; prepare? How do &lt;em&gt;I&lt;/em&gt; stay ahead? This is the right question for a virus. It is the wrong question for a technology transition, where the correct frame is not "how do I survive displacement" but "what new things become possible?" Shumer's advice — use the tools, build adaptability, experiment daily — is good advice. But it's embedded in a survivalist frame that misses the larger economic picture. The person who learned to build websites in 1995 wasn't surviving the death of typesetting. They were participating in the creation of something that would be orders of magnitude larger than the industry it disrupted.&lt;/p&gt;
&lt;h3&gt;A Founder's Experience Is Not the Economy&lt;/h3&gt;
&lt;p&gt;"I describe what I want built, in plain English, and it just appears."&lt;/p&gt;
&lt;p&gt;I believe him. I've had similar experiences. When I built the ballistics engine and Lattice, there were moments where the workflow felt qualitatively different from anything I'd experienced in over three decades of writing software. The capability is real and it's striking.&lt;/p&gt;
&lt;p&gt;But Shumer is generalizing from the thinnest part of the adoption curve. A startup founder building prototypes with frontier AI tools is the absolute highest-leverage, lowest-friction use case for current technology. There are no compliance departments. No regulatory review. No integration with legacy systems built on COBOL. No liability frameworks that require a human signature. No union contracts. No procurement cycles measured in fiscal years.&lt;/p&gt;
&lt;p&gt;The gap between "a founder can build a prototype in an afternoon" and "a hospital deploys AI-driven diagnostics at scale" is measured in years, not months. Regulatory friction, institutional inertia, liability requirements, and cultural resistance are real. The FDA doesn't move at startup speed. Neither do insurance companies, government agencies, or school districts. These aren't trivial obstacles — they're the mechanisms through which society manages risk, and they exist for reasons.&lt;/p&gt;
&lt;p&gt;I don't want to overweight this argument. Institutional friction can be overstated, and appeals to regulation can become a way of avoiding engagement with the underlying capability shift. The important point is narrower: Shumer extrapolates from his personal productivity gain to "nothing done on a computer is safe," and that's an extrapolation error. A founder experiencing sudden personal leverage and projecting that curve onto civilization is a recognizable pattern in tech commentary. It's usually too bullish on the timeline and too narrow on the mechanism.&lt;/p&gt;
&lt;h3&gt;What Gets Created&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/something-big-critique/kaypro-ii-jerusalem-1984.jpg" alt="A boy using a Kaypro II computer running CP/M in Jerusalem, 1984 — at the beginning of a cost curve that would eventually put a supercomputer in every pocket" style="float: right; max-width: 300px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;This is the biggest gap in Shumer's piece — and the biggest gap in most commentary on AI and employment. He spends thousands of words on what AI can replace. He spends zero words on what AI makes possible for the first time.&lt;/p&gt;
&lt;p&gt;I examined this through the &lt;a href="https://tinycomputers.io/posts/moores-law-for-intelligence-what-happens-when-thinking-gets-cheap.html"&gt;Moore's Law for Intelligence&lt;/a&gt; framework — the 10x / 100x / 1,000x staircase of what becomes viable as the cost per unit of machine intelligence drops. The historical pattern from semiconductors is unambiguous: each order-of-magnitude cost reduction didn't just make existing applications cheaper. It created entirely new categories of economic activity that were literally unimaginable at the prior price point.&lt;/p&gt;
&lt;p&gt;Nobody in 1975 predicted Instagram, Uber, or Spotify. Not because they required new physics — they required compute cheap enough to fit in a pocket. The applications were latent, waiting for the cost curve to reach them.&lt;/p&gt;
&lt;p&gt;The same is true for intelligence. We can identify the structural conditions for demand expansion even if we can't predict the specific applications:&lt;/p&gt;
&lt;p&gt;Small businesses with CFO-grade financial analysts — not because they hire CFOs, but because AI makes that analysis accessible at \$50 per month instead of \$200,000 per year. Personalized tutoring for every student — not an incremental improvement on existing education, but a qualitative shift in how learning works for the 95% of families who can't afford human tutors. Legal help for the 80% of Americans currently priced out. Preventive medicine embedded in every checkup, where an AI has read every relevant paper published in the last decade and cross-referenced it against the patient's complete history.&lt;/p&gt;
&lt;p&gt;And the nature of software engineering itself is changing — not replacing engineers but redefining the skill. The workflow is already shifting from "write code line by line" to "describe architecture, direct implementation, review output." At 100x cheaper inference, small teams build products that previously required departments. At 1,000x cheaper, the barrier between having an idea and having working software effectively disappears. That's not displacement of engineers — it's an explosion in the total volume of software that gets built, and it requires people who know what to build and why.&lt;/p&gt;
&lt;p&gt;We can't predict the Instagram of cheap cognition. But we can observe that the structural conditions — massive latent demand, rapidly falling costs, intense competition distributing gains to consumers — are identical to the conditions that preceded every prior wave of demand-driven economic expansion.&lt;/p&gt;
&lt;h3&gt;The Speed Question — Where Shumer Is Strongest&lt;/h3&gt;
&lt;p&gt;The legitimate uncertainty in Shumer's argument isn't whether displacement will happen. It will. The question is whether it happens faster than demand expansion can absorb.&lt;/p&gt;
&lt;p&gt;Prior Jevons cycles unfolded over decades. Agricultural mechanization displaced 90% of farm workers over a century. Computerization restructured white-collar work over roughly forty years. If AI compresses displacement into two to three years, the question of whether demand expansion keeps pace becomes genuinely urgent. This is where Shumer's urgency has teeth, and it's the argument I take most seriously.&lt;/p&gt;
&lt;p&gt;I was honest about this in both the &lt;a href="https://tinycomputers.io/posts/the-jevons-counter-thesis-why-ai-displacement-scenarios-underweight-demand-expansion.html"&gt;counter-thesis&lt;/a&gt; and the &lt;a href="https://tinycomputers.io/posts/moores-law-for-intelligence-what-happens-when-thinking-gets-cheap.html"&gt;Moore's Law piece&lt;/a&gt;: the speed of this transition is unprecedented, and historical analogy doesn't fully resolve the timing question. The transitional pain for people whose livelihoods depend on cognitive labor that AI can replicate is real and potentially severe.&lt;/p&gt;
&lt;p&gt;But notice the asymmetry in Shumer's framing. Disruption happens at AI speed — step-function capability jumps, immediate adoption, rapid displacement. Demand expansion, on the other hand, is treated as essentially static or non-existent. The economy absorbs the shock and contracts. End of story.&lt;/p&gt;
&lt;p&gt;This asymmetry isn't supported by the evidence. The smartphone created a trillion-dollar app economy in under five years. Cloud computing spawned tens of thousands of SaaS companies within a decade. When a critical input becomes 100x cheaper, entrepreneurs move fast — because the profit opportunity is enormous. Shumer's own experience is evidence of this: he's a founder building products at unprecedented speed using AI tools. Scale that behavior across millions of entrepreneurs who suddenly have access to capabilities that were previously reserved for well-funded teams, and the demand side moves faster than any prior technology transition.&lt;/p&gt;
&lt;p&gt;The honest answer is that we don't know whether demand expansion will keep pace with displacement. That's a genuine uncertainty. But Shumer presents it as a foregone conclusion in one direction — displacement wins, full stop — and that's not an evidence-based position. It's a bet against the strongest empirical pattern in economic history.&lt;/p&gt;
&lt;h3&gt;What to Take from This&lt;/h3&gt;
&lt;p&gt;Shumer's practical advice to individuals is sound even if his macro analysis is incomplete. Use the tools. Build adaptability. Experiment daily. Don't ignore the capability curve — it's real, it's fast, and it will restructure how cognitive work gets done.&lt;/p&gt;
&lt;p&gt;But don't mistake a substitution-only model for the full picture. The most consistent empirical pattern in economic history is that when a critical input gets dramatically cheaper, total consumption increases and the economy restructures around the cheaper input. Coal. Transistors. Bandwidth. Lighting. Every time, the predictions that efficiency would destroy demand were wrong — not because displacement didn't happen, but because demand expansion overwhelmed it. Betting that this pattern has finally broken requires an extraordinary burden of proof that Shumer's piece — eloquent, urgent, and emotionally resonant as it is — does not meet.&lt;/p&gt;
&lt;p&gt;Something big &lt;em&gt;is&lt;/em&gt; happening. What's missing from the conversation is the other half of it.&lt;/p&gt;</description><category>ai</category><category>critique</category><category>demand expansion</category><category>displacement</category><category>economics</category><category>jevons paradox</category><category>labor</category><category>technology</category><category>white-collar</category><guid>https://tinycomputers.io/posts/something-big-is-happening-a-critique.html</guid><pubDate>Sun, 01 Mar 2026 14:00:00 GMT</pubDate></item><item><title>Moore's Law for Intelligence: What Happens When Thinking Gets Cheap</title><link>https://tinycomputers.io/posts/moores-law-for-intelligence-what-happens-when-thinking-gets-cheap.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/moores-law-for-intelligence-what-happens-when-thinking-gets-cheap_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;24 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/moores-law-intelligence/silicon-wafer.jpg" alt="A silicon wafer with an array of integrated circuit dies, the physical foundation of Moore's Law" style="float: right; max-width: 40%; margin: 0 0 1em 1.5em; border-radius: 4px;"&gt;&lt;/p&gt;
&lt;p&gt;I have written about &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox&lt;/a&gt; twice now — once through the history of the semiconductor industry, and once as a &lt;a href="https://tinycomputers.io/posts/the-jevons-counter-thesis-why-ai-displacement-scenarios-underweight-demand-expansion.html"&gt;broader examination&lt;/a&gt; of what happens when the cost of a critical economic input collapses. The pattern is consistent: demand expands to overwhelm the savings. Coal. Transistors. Bandwidth. Lighting.&lt;/p&gt;
&lt;p&gt;Those pieces looked at the pattern itself. This one is different. I want to run a thought experiment forward, not backward.&lt;/p&gt;
&lt;p&gt;I've also spent a lot of time on this site looking backward at computing history — watching &lt;a href="https://tinycomputers.io/posts/stewart-cheifet-and-his-computer-chronicles.html"&gt;Stewart Cheifet walk viewers through the early personal computer revolution&lt;/a&gt; on &lt;em&gt;The Computer Chronicles&lt;/em&gt;, examining how &lt;a href="https://tinycomputers.io/posts/language-manipulators-what-a-1983-episode-of-the-computer-chronicles-got-right-and-wrong-about-word-processing.html"&gt;word processing went from a curiosity to a necessity&lt;/a&gt; in a single decade, tracing &lt;a href="https://tinycomputers.io/posts/george-morrow-pioneer-of-personal-computing.html"&gt;George Morrow's&lt;/a&gt; role in making personal computing real, and following &lt;a href="https://tinycomputers.io/posts/cpm-history-and-legacy.html"&gt;CP/M's arc&lt;/a&gt; from operating system of the future to historical footnote. I've &lt;a href="https://tinycomputers.io/posts/cpm-on-physical-retroshield-z80.html"&gt;run CP/M on physical RetroShield hardware&lt;/a&gt;, explored the &lt;a href="https://tinycomputers.io/posts/motorola-68000-processor-and-the-ti-89-graphing-calculator.html"&gt;Motorola 68000&lt;/a&gt; that powered a generation of machines, and dug into &lt;a href="https://tinycomputers.io/posts/infocom-zork-history.html"&gt;how Infocom turned text adventures into a business&lt;/a&gt; at a time when 64K of RAM was generous. That immersion in where computing came from is exactly what makes the forward question so vivid — because at every stage, the people living through the transition couldn't see what was coming next. The engineers building CP/M didn't anticipate DOS. The engineers building DOS didn't anticipate the web. The engineers building the web didn't anticipate the iPhone. The pattern is always the same: cheaper compute enables things that were unimaginable at the prior cost.&lt;/p&gt;
&lt;p&gt;The question isn't "will AI destroy jobs?" or "is the doom scenario wrong?" The question is: &lt;strong&gt;what becomes possible when thinking gets cheap?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Because AI compute is following a cost curve that looks remarkably like the early decades of Moore's Law. And if that continues — if the cost per unit of machine intelligence drops by an order of magnitude every few years — the consequences extend far beyond making today's chatbots cheaper to run.&lt;/p&gt;
&lt;h3&gt;The Cost Curve&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/moores-law-intelligence/moores-law-transistor-count.png" alt="Microprocessor transistor counts from 1971 to 2011 plotted on a logarithmic scale, showing Moore's Law doubling trend" style="max-width: 100%; margin: 0 0 1.5em 0; border-radius: 4px;"&gt;&lt;/p&gt;
&lt;p&gt;Moore's Law, in its original formulation, described the doubling of transistors per integrated circuit roughly every two years. But the economic consequence that mattered wasn't transistor density — it was cost per unit of compute. From the 1960s through the 2010s, the cost per FLOP declined at a compound rate that delivered roughly a 10x improvement every four to five years. A computation that cost \$1 million in 1975 cost \$1 by 2010. That decline didn't just make existing applications cheaper. It created entirely new categories of computing that were inconceivable at the prior cost structure.&lt;/p&gt;
&lt;p&gt;AI inference costs are now following a similar trajectory, but faster. OpenAI's text-davinci-003, released in late 2022, cost \$20 per million tokens. GPT-4o mini, released in mid-2024, delivers substantially better performance at \$0.15 per million input tokens — a 99% cost reduction in under two years. Claude, Gemini, and open-source models have followed similar curves. DeepSeek entered the market in early 2025 with pricing that undercut Western frontier models by roughly 90%, compressing the timeline further through competitive pressure.&lt;/p&gt;
&lt;p&gt;The GPU hardware underneath these models is on its own Moore's Law trajectory. GPU price-performance in FLOP/s per dollar doubles approximately every 2.5 years for ML-class hardware. Architectural improvements in transformers, mixture-of-experts routing, quantization, speculative decoding, and distillation compound on top of the hardware gains. The result is a cost curve where the effective price of a unit of machine reasoning is falling faster than the price of a transistor did during the semiconductor industry's most explosive growth phase.&lt;/p&gt;
&lt;p&gt;This matters because we know, empirically, what happens when the cost of a foundational input follows an exponential decline. We have sixty years of data on it. The compute industry went from a few thousand mainframes serving governments and large corporations to billions of devices in every pocket, every appliance, every traffic light. Total spending on computing didn't shrink as costs fell — it expanded by orders of magnitude, because each 10x cost reduction unlocked categories of use that didn't exist at the prior price point.&lt;/p&gt;
&lt;p&gt;The thought experiment is straightforward: apply that pattern to intelligence itself.&lt;/p&gt;
&lt;h3&gt;Today's Price Points Create Today's Use Cases&lt;/h3&gt;
&lt;p&gt;At current pricing — roughly \$3 per million input tokens for a frontier model like Claude Sonnet — AI is economically viable for a specific class of applications. Customer support automation. Code assistance. Document summarization. Marketing copy. Translation. These are the use cases where the value generated per token comfortably exceeds the cost per token, and where the interaction pattern involves relatively short exchanges.&lt;/p&gt;
&lt;p&gt;But there are vast categories of potential use where current pricing makes the math uncomfortable or impossible. Consider:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Continuous monitoring and analysis.&lt;/strong&gt; A financial analyst who wants an AI to continuously watch SEC filings, earnings calls, patent applications, and news feeds across 500 companies — analyzing each document in full, cross-referencing against historical patterns, and generating alerts — would consume billions of tokens per month. At current prices, this costs tens of thousands of dollars monthly. At 100x cheaper, it costs the price of a SaaS subscription.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Full-codebase reasoning.&lt;/strong&gt; This one is already arriving. Anthropic's Claude Opus 4.6, working through Claude Code, can operate at the repository level — reading files, understanding architecture, running tests, and making changes across an entire codebase in a single session. I've used it to build a &lt;a href="https://tinycomputers.io/posts/open-sourcing-a-high-performance-rust-based-ballistics-engine.html"&gt;high-performance Rust-based ballistics engine&lt;/a&gt; and to develop &lt;a href="https://tinycomputers.io/posts/introducing-lattice-a-crystallization-based-programming-language.html"&gt;Lattice, an entire programming language&lt;/a&gt; with a &lt;a href="https://tinycomputers.io/posts/from-tree-walker-to-bytecode-vm-compiling-lattice.html"&gt;bytecode VM compiler&lt;/a&gt; — projects where the AI wasn't autocompleting fragments but reasoning across thousands of lines of interconnected code, tracking type systems, managing compiler passes, and understanding how changes in one module ripple through the rest. The constraint today isn't capability — it's cost. These sessions consume large volumes of tokens, which means they're viable for serious engineering work but not yet cheap enough to run continuously on every commit, every pull request, every deployment. At 100x cheaper, that changes. At 1,000x cheaper, every codebase has an always-on collaborator that has read everything and forgets nothing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Personalized education at scale.&lt;/strong&gt; A truly personalized AI tutor that adapts to a student's learning style, tracks their understanding across subjects, reviews their homework in detail, explains mistakes with patience, and adjusts its teaching strategy over months — this requires sustained, high-volume token consumption per student. Multiply by millions of students and the current cost structure breaks. At 100x cheaper, it's viable for a school district. At 1,000x cheaper, it's viable for an individual family.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Preventive medicine.&lt;/strong&gt; Analyzing a patient's complete medical history, genetic data, lifestyle information, lab results, and the current research literature to generate genuinely personalized health recommendations — not the generic advice a five-minute doctor's visit produces, but the kind of comprehensive analysis that currently only concierge medicine patients paying \$10,000+ per year receive. At current token prices, this is prohibitively expensive for routine use. At 100x cheaper, it could be embedded in every annual checkup.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ambient intelligence.&lt;/strong&gt; The concept of AI that runs continuously in the background of your life — understanding your calendar, email, documents, and goals, proactively surfacing relevant information, drafting responses, scheduling meetings, flagging conflicts — requires sustained inference at volumes that would cost hundreds of dollars per day at current prices. At 1,000x cheaper, it costs less than your phone bill.&lt;/p&gt;
&lt;p&gt;These aren't science fiction scenarios. They're applications of current model capabilities at price points that don't yet exist. The models can already do most of this work. The cost curve is the bottleneck.&lt;/p&gt;
&lt;h3&gt;The 10x / 100x / 1,000x Framework&lt;/h3&gt;
&lt;p&gt;Moore's Law didn't deliver its benefits in a smooth, continuous flow. It came in thresholds — price points at which qualitatively new applications became viable. The pattern with AI compute is likely to follow the same staircase function.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;At 10x cheaper&lt;/strong&gt; (plausible within 1-2 years): AI becomes viable for tasks that are currently "almost worth it." Small businesses that can't justify \$500/month for AI tooling find it worthwhile at \$50/month. Individual professionals — accountants, lawyers, doctors, engineers — integrate AI into their daily workflow not as an occasional tool but as a constant companion. The volume of AI-mediated work increases dramatically, but the character of work doesn't fundamentally change. This is the equivalent of the minicomputer era — the same kind of computing, available to more people.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;At 100x cheaper&lt;/strong&gt; (plausible within 3-5 years): The applications listed above become economically viable. Continuous analysis, full-codebase reasoning, personalized education, preventive medicine at scale. At this price point, AI stops being a tool you use and starts being infrastructure you run on. Every document you write gets reviewed. Every decision you make gets a second opinion. Every student gets a tutor. Every patient gets a diagnostician. The total volume of inference consumed per capita increases by far more than 100x, because new use cases emerge that weren't contemplated at the prior price. This is the personal computer moment — qualitatively new categories of use.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;At 1,000x cheaper&lt;/strong&gt; (plausible within 5-8 years): Intelligence becomes ambient and disposable. You don't think about whether to use AI for a task any more than you think about whether to use electricity for a task. Every appliance, every vehicle, every building, every piece of infrastructure has embedded reasoning running continuously. Your home understands your patterns and adapts. Your car negotiates traffic in real time not just with sensors but with models that predict the behavior of every other vehicle. Agricultural equipment analyzes soil conditions at the individual plant level. Supply chains optimize in real time across thousands of variables. This is the smartphone moment — computing so cheap and pervasive that it becomes invisible.&lt;/p&gt;
&lt;h3&gt;The Compounding Effect&lt;/h3&gt;
&lt;p&gt;There's a dynamic in AI cost reduction that didn't exist with traditional Moore's Law: cheaper inference enables better models, which enables even cheaper inference.&lt;/p&gt;
&lt;p&gt;When inference is expensive, researchers are constrained in how they can train and evaluate models. Each experiment costs real money. Each architecture search consumes significant compute budgets. When inference costs drop, researchers can run more experiments, evaluate more architectures, and discover more efficient approaches — which further reduces costs. Distillation (training a smaller model to mimic a larger one) becomes more practical when the larger model is cheaper to run at scale. Synthetic data generation (using AI to create training data for other AI) becomes more economical. The cost reduction compounds on itself.&lt;/p&gt;
&lt;p&gt;This is already happening. GPT-4 was used to generate synthetic training data for GPT-4o. Claude's training pipeline uses prior Claude models to evaluate and filter training examples. Google's Gemini models help design the next generation of TPU chips that will run future Gemini models. The AI equivalent of "using computers to design better computers" arrived in year three of the current wave, decades earlier in the relative timeline than it took the semiconductor industry to reach the same recursive dynamic.&lt;/p&gt;
&lt;p&gt;The implication is that the cost curve isn't just declining — it's declining at an accelerating rate because each improvement enables the next one. The semiconductor industry saw this acceleration plateau after about fifty years as it approached physical limits of silicon. AI has no equivalent physical constraint on the horizon. The limits are architectural and algorithmic, and those limits have been falling faster than hardware limits ever did.&lt;/p&gt;
&lt;h3&gt;What the Semiconductor Analogy Actually Predicts&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/moores-law-intelligence/cray-1.jpg" alt="A Cray-1 supercomputer on display, showing its distinctive cylindrical tower design with bench seating and exposed cooling plumbing" style="float: right; max-width: 45%; margin: 0 0 1em 1.5em; border-radius: 4px;"&gt;&lt;/p&gt;
&lt;p&gt;In 1975, a Cray-1 supercomputer delivered about 160 MFLOPS and cost \$8 million. In 2025, an iPhone delivers roughly 2 TFLOPS of neural engine performance and costs \$800. That's a 12,500x performance increase at a 10,000x cost decrease — a net improvement of roughly 100 million times in price-performance over fifty years.&lt;/p&gt;
&lt;p&gt;Nobody in 1975 predicted Instagram, Uber, Google Maps, or Spotify. Not because these applications required fundamentally new physics — they just required compute that was cheap enough to run in a device that fit in your pocket. The applications were latent, waiting for the cost curve to reach them.&lt;/p&gt;
&lt;p&gt;The history is instructive at each threshold. When a capable computer crossed below \$20,000 in the early 1980s, it unlocked small business accounting — the same work mainframes did, just for smaller organizations. When it crossed below \$2,000 in the mid-1990s, it unlocked home computing, and with it the web browser, email, and e-commerce. When capable compute crossed below \$200 in the smartphone era, it unlocked ride-sharing, mobile payments, and social media — none of which had any conceptual precursor at the \$20,000 price point. Each 10x reduction didn't just expand the existing market. It created a market that was literally unimaginable at the prior price.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/moores-law-intelligence/ibm-system-360.jpg" alt="An IBM System/360 Model 30 mainframe computer with its distinctive red cabinet and operator control panel" style="float: right; max-width: 45%; margin: 0 0 1em 1.5em; border-radius: 4px;"&gt;&lt;/p&gt;
&lt;p&gt;The same principle applies to intelligence. We are in the mainframe era of AI. The applications we see today — chatbots, code assistants, image generators — are the equivalent of payroll processing and scientific computation on 1960s mainframes. They are real and valuable, but they represent a tiny fraction of what becomes possible when the cost drops by five or six orders of magnitude.&lt;/p&gt;
&lt;p&gt;What are the Instagram and Uber equivalents of cheap intelligence? By definition, we can't fully predict them. But we can identify the structural conditions that will enable them:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When intelligence costs less than attention, delegation becomes default.&lt;/strong&gt; Today, the cognitive cost of formulating a good prompt, evaluating the output, and iterating often exceeds the cost of just doing the task yourself. As models get cheaper, faster, and better at understanding context, the threshold shifts. Eventually, not delegating a cognitive task to AI becomes the irrational choice, the way not using a calculator for arithmetic became irrational.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When intelligence costs less than data storage, everything gets analyzed.&lt;/strong&gt; Today, most data that organizations collect is never analyzed. It's stored, archived, and forgotten — because the cost of human analysis exceeds the expected value of the insights. When AI analysis is effectively free, every dataset gets examined. Every log file gets reviewed. Every customer interaction gets analyzed for patterns. The volume of insight generated from existing data increases by orders of magnitude.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When intelligence costs less than communication overhead, organizations restructure.&lt;/strong&gt; This is already starting. A significant fraction of white-collar work is coordination — meetings, emails, status updates, project management. These exist because humans need to synchronize their mental models of shared projects. AI tools are already compressing this layer: meeting summaries that eliminate the need for half the attendees, project dashboards that maintain themselves, codebases where an AI agent tracks the state of every open issue so developers don't have to sit through standup. When AI can maintain a comprehensive, always-current model of a project's state, much of the coordination overhead that justifies entire job categories — project managers, program managers, business analysts, internal consultants — begins to evaporate. An organization that currently needs 50 people to coordinate a complex project might need 10, with AI handling the information synthesis that previously required human intermediaries. That's a genuine productivity gain. It's also 40 people who need to find something else to do — and the honest answer is that we don't yet know how fast the demand side creates new roles to absorb them.&lt;/p&gt;
&lt;h3&gt;The Demand Expansion Is the Story&lt;/h3&gt;
&lt;p&gt;The instinct when hearing "AI gets 1,000x cheaper" is to think about cost savings. That's the substitution frame — doing the same things for less money. And yes, that will happen. But the semiconductor analogy tells us that cost savings are the boring part of the story.&lt;/p&gt;
&lt;p&gt;When compute got 1,000x cheaper between 1980 and 2000, the interesting story wasn't that scientific simulations got cheaper to run. It was that entirely new industries — PC software, internet services, mobile apps, social media, cloud computing — emerged to consume orders of magnitude more compute than the entire prior industry had used. The efficiency gain on existing applications was dwarfed by the demand expansion from new applications.&lt;/p&gt;
&lt;p&gt;The same will likely be true for intelligence. Consider bandwidth as a parallel case. In 1995, a 28.8 kbps modem made email and basic web pages viable. Nobody was streaming video — it was physically impossible at that bandwidth, not merely expensive. By 2005, broadband had made streaming music viable. By 2015, streaming 4K video was routine. By 2025, cloud gaming and real-time video conferencing were infrastructure-level assumptions. Total bandwidth consumption didn't decline as it got cheaper. It increased by roughly a million times, because each generation of cost reduction enabled applications that consumed orders of magnitude more bandwidth than the previous generation's entire output.&lt;/p&gt;
&lt;p&gt;The interesting story isn't that customer support gets cheaper. It's the applications that are currently impossible — not difficult, not expensive, but literally impossible at current price points — that become not just possible but routine.&lt;/p&gt;
&lt;p&gt;A world where every small business has a CFO-grade financial analyst. Where every patient has a diagnostician who has read every relevant paper published in the last decade. Where every student has a tutor who knows exactly where they're struggling and why. Where every local government has the analytical capacity currently reserved for federal agencies.&lt;/p&gt;
&lt;p&gt;And the nature of building software itself is changing in ways that go beyond "engineers with better tools." For most of computing history, writing code meant a human translating intent into syntax — line by line, function by function. AI assistance started as autocomplete: suggesting the next line, filling in boilerplate. But that phase is already ending. Today, with tools like Claude Code, the workflow has inverted. The human describes what they want — an architecture, a feature, a behavior — and the AI writes the implementation across files, runs the tests, and iterates on failures. The engineer's role shifts from writing code to directing and reviewing it, from syntax to judgment. At 10x cheaper, this is how professional developers work. At 100x cheaper, it's how small teams build products that previously required departments. At 1,000x cheaper, the barrier between "person with an idea" and "working software" functionally disappears. The entire concept of what it means to be a software engineer is being rewritten in real time — not by replacing engineers, but by redefining the skill from "can you write this code?" to "do you know what to build and why?"&lt;/p&gt;
&lt;p&gt;These aren't efficiency improvements on existing systems. They're new capabilities that create new categories of economic activity, new forms of organization, and new kinds of products and services that don't have current analogs — just as social media, ride-sharing, and cloud computing had no analogs in the mainframe era.&lt;/p&gt;
&lt;h3&gt;The Question That Matters&lt;/h3&gt;
&lt;p&gt;I should be honest about what I don't know. The displacement scenarios for white-collar labor are not fantasy. AI is already capable enough to handle work that was solidly middle-class professional territory two years ago — document review, financial analysis, code generation, customer support, content production. The scenarios where this accelerates faster than the economy can absorb are plausible, and anyone who dismisses them outright isn't paying attention. When a technology can replicate cognitive labor at a fraction of the cost, the transitional pain for the people whose livelihoods depend on that labor is real and potentially severe. The speed matters: prior technology transitions unfolded over decades, and AI compression of that timeline into years is a genuine uncertainty that historical analogy doesn't fully resolve.&lt;/p&gt;
&lt;p&gt;But there is a question that displacement scenarios consistently underweight, and it's the one I explored in my &lt;a href="https://tinycomputers.io/posts/the-jevons-counter-thesis-why-ai-displacement-scenarios-underweight-demand-expansion.html"&gt;Jevons counter-thesis&lt;/a&gt;: what happens on the demand side? Every model that projects mass unemployment from cheap AI is implicitly assuming that the economy remains roughly the same size, with machines doing the work humans used to do. That's the substitution frame. And the substitution frame has been wrong at every prior technological inflection point — not slightly wrong, but wrong by orders of magnitude.&lt;/p&gt;
&lt;p&gt;The semiconductor industry's answer, delivered over sixty years of data, is unambiguous. Every order-of-magnitude cost reduction generated more economic activity, more employment, and more total compute consumption than the one before it. The economy didn't shrink as compute got cheaper. It restructured around cheap compute and grew. Roughly 80% of Americans who need legal help can't afford it today. Personalized tutoring is a luxury good. Custom software is out of reach for most small businesses. These aren't speculative markets — they're documented unmet demand suppressed by the cost of human intelligence. When that cost collapses, the demand doesn't stay static.&lt;/p&gt;
&lt;p&gt;The honest answer is that both things will happen simultaneously. Jobs will be displaced — some permanently. And new categories of economic activity will emerge that are currently inconceivable, just as social media and cloud computing were inconceivable in the mainframe era. The question is which force dominates, and how fast the transition occurs. I think the historical pattern favors demand expansion, but I hold that view with the humility of someone who knows the speed of this particular transition is unprecedented.&lt;/p&gt;
&lt;p&gt;AI inference costs are following the same curve as semiconductors, possibly faster. The tokens-per-dollar ratio will improve by orders of magnitude. And when it does, the applications that emerge will make today's AI use cases look as quaint as running payroll on a room-sized mainframe.&lt;/p&gt;
&lt;p&gt;The thought experiment ends where all Jevons stories end: with more consumption, not less. More intelligence deployed, not less. More economic activity built on cheap cognition, not less. The cost curve is the enabling condition. What gets built on top of it is the part we can't fully predict — and historically, that's always been the most interesting part.&lt;/p&gt;</description><category>ai</category><category>compute costs</category><category>demand expansion</category><category>economics</category><category>inference</category><category>jevons paradox</category><category>moore's law</category><category>semiconductors</category><category>technology</category><category>tokens</category><guid>https://tinycomputers.io/posts/moores-law-for-intelligence-what-happens-when-thinking-gets-cheap.html</guid><pubDate>Sat, 28 Feb 2026 14:00:00 GMT</pubDate></item><item><title>An LLM Clean Room Z80 Emulator: Building from Specifications, Not Source Code</title><link>https://tinycomputers.io/posts/clean-room-z80-emulator.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/clean-room-z80-emulator_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;43 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/clean-room-z80-emulator/zilog-z80.jpg" alt="An original Zilog Z80 CPU in a white ceramic DIP-40 package, manufactured in Dallas, 1976" style="float: right; max-width: 300px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 10px 20px rgba(0,0,0,.1);" loading="lazy"&gt;&lt;/p&gt;
&lt;p&gt;There's a particular kind of satisfaction in building something from specifications rather than from someone else's implementation. When you take timing diagrams and instruction tables and turn them into working code, you're not copying — you're reconstructing. Every decision about how to decode an opcode, how to handle the flag register's undocumented bits, how to sequence a block transfer instruction — these become deliberate choices, informed by the original engineering documents but filtered through genuine understanding of the problem.&lt;/p&gt;
&lt;p&gt;This is the story of writing a complete Z80 emulator under clean room constraints, with the twist that the implementer is an LLM. I used &lt;a href="https://baud.rs/claude-code"&gt;Claude Code&lt;/a&gt; to write every line of C — the CPU core, the test suite, the system emulator — with a single, non-negotiable rule: &lt;strong&gt;no reference to existing Z80 emulator source code&lt;/strong&gt;. The inputs were the &lt;a href="https://baud.rs/EESjG1"&gt;Zilog Z80 CPU User Manual&lt;/a&gt;, my architectural plan, and the test ROMs to prove it works.&lt;/p&gt;
&lt;p&gt;An important clarification on what "clean room" means here. The constraint was no existing emulator source code — not "only the official Zilog manual." The Z80's undocumented behaviors (the F3/F5 flag bits, IXH/IXL half-index registers, DDCB register copy side effects) aren't in Zilog's official documentation. They come from decades of community reverse-engineering documented in references like Sean Young's "&lt;a href="https://baud.rs/s0MAzk"&gt;The Undocumented Z80 Documented&lt;/a&gt;" and similar technical write-ups. Claude's training data includes this secondary documentation, and the clean room constraint didn't prohibit drawing on that knowledge — it prohibited referencing how &lt;em&gt;other emulators implemented&lt;/em&gt; that knowledge. The distinction matters: a specification of behavior is not the same as someone else's code that implements it.&lt;/p&gt;
&lt;h3&gt;Why an LLM Clean Room?&lt;/h3&gt;
&lt;p&gt;The term "clean room" comes from the semiconductor and software industries, where it describes a development methodology designed to produce implementations that are legally and intellectually independent of existing ones. In the chip fabrication sense, it's a literal dust-free environment. In the software sense, it means building from specifications and documentation without ever examining existing implementations.&lt;/p&gt;
&lt;p&gt;When an LLM writes code, there's always the question: is this implementation derived from the specification I gave it, or is it pattern-matching against emulator source code in its training data? This is the central tension of using AI for systems programming. An LLM has likely seen dozens of Z80 emulators during training. If you just ask it to "write a Z80 emulator," you'll get something that works — but you can't know whether it's an original implementation or a recombination of memorized code.&lt;/p&gt;
&lt;p&gt;The clean room constraint changes the experiment. By explicitly instructing Claude that this is a clean room project — that all implementation must be derived solely from specifications and documentation, not from existing emulator source code — you're testing whether the model can work from first principles rather than from pattern recall. Can it read an instruction set specification, understand the semantics of each opcode, and produce correct flag computations without cribbing from someone else's &lt;code&gt;z80.c&lt;/code&gt;?&lt;/p&gt;
&lt;p&gt;Antirez explored this territory recently with his &lt;a href="https://baud.rs/Gwet5H"&gt;own Z80 emulator project&lt;/a&gt;, using Claude Code to generate a working ZX Spectrum emulator. His experiment demonstrated something important about LLM-assisted development: that providing an agent with proper specifications and documentation — rather than asking it to regurgitate training data — produces implementations that are genuinely novel assemblies of knowledge rather than memorized patterns. The code Claude produced for antirez passed the notoriously thorough ZEXALL test suite, validating every documented Z80 behavior including the undocumented flag bits. Antirez's conclusion was that the LLM wasn't decompressing training data — it was &lt;em&gt;assembling knowledge&lt;/em&gt;, the way a human developer would when working from a datasheet.&lt;/p&gt;
&lt;p&gt;Reading antirez's write-up was the catalyst for this project. I wanted to see whether the same approach — specifications in, working emulator out, clean room constraints enforced throughout — would hold up when I drove the process myself. The Z80 User Manual is one of the best-documented processor specifications ever written. Everything you need to build a working emulator is in that document. The question is whether an LLM, given that document as its source of truth and told not to reference existing implementations, can produce something correct.&lt;/p&gt;
&lt;h3&gt;The Process&lt;/h3&gt;
&lt;p&gt;The workflow looked nothing like "prompt and pray." I started by writing a detailed architectural plan: the CPU state struct layout, the instruction decoding strategy (bit field decomposition), the system emulator's responsibilities, the test coverage targets. This plan became Claude's specification — not just "write a Z80 emulator" but "implement the Z80 CPU using x/y/z/p/q bit field decoding of the opcode byte, with these specific callback interfaces, these T-state timing requirements, and this test structure."&lt;/p&gt;
&lt;p&gt;Claude then implemented each component: &lt;code&gt;z80.h&lt;/code&gt; first, then the full &lt;code&gt;z80.c&lt;/code&gt; CPU core, then the test suite, then the system emulator. I reviewed each piece, ran the tests, identified failures, and fed the errors back. The first compile had a T-state timing issue with DD/FD prefixed instructions — the prefix overhead was being double-counted. One test out of 117 failed. Claude diagnosed the problem (the prefix dispatch was adding 4 T-states on top of instruction timings that already included the prefix cost) and fixed it.&lt;/p&gt;
&lt;p&gt;This iterative loop — plan, implement, test, fix — is exactly how a human developer would work. The difference is velocity. The entire CPU core, all 1,300 lines of C covering every official Z80 instruction plus undocumented behaviors, was produced in a single session. A human developer working from the same specification would spend days or weeks reaching the same point. The LLM's advantage isn't that it knows more — it's that it can hold the entire instruction set specification in context and translate it to code without the cognitive overhead of context-switching between the manual and the editor.&lt;/p&gt;
&lt;h3&gt;The Architecture&lt;/h3&gt;
&lt;p&gt;What Claude produced is a four-file emulator:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;z80.h&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CPU state struct, flag constants, public API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;z80.c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Complete Z80 CPU emulation core&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;z80_test.c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;117 unit tests covering all instruction groups&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;zxs.c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Unified emulator binary with ACIA serial and CP/M support&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The design philosophy is straightforward: the CPU core knows nothing about the system it's running in. It communicates with the outside world exclusively through four callback functions — memory read, memory write, I/O in, and I/O out. The system emulator (&lt;code&gt;zxs.c&lt;/code&gt;) provides these callbacks and implements whatever hardware peripherals the target system requires.&lt;/p&gt;
&lt;p&gt;This separation matters. The same CPU core can run a Grant Searle BASIC SBC, an RC2014, or a CP/M program without any changes to &lt;code&gt;z80.c&lt;/code&gt;. The system-specific behavior lives entirely in the callbacks.&lt;/p&gt;
&lt;h3&gt;The CPU State&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/clean-room-z80-emulator/z80-architecture.png" alt="Z80 CPU architecture block diagram showing the register file with main and shadow registers, ALU, instruction decoder, and 8-bit data bus / 16-bit address bus" style="max-width: 100%; margin: 0 0 1em 0; border-radius: 4px; box-shadow: 0 10px 20px rgba(0,0,0,.1);" loading="lazy"&gt;&lt;/p&gt;
&lt;p&gt;A Z80 has more programmer-visible state than you might expect if you're used to simpler processors. The main register file includes the accumulator &lt;code&gt;A&lt;/code&gt; and flags register &lt;code&gt;F&lt;/code&gt;, plus three general-purpose register pairs &lt;code&gt;BC&lt;/code&gt;, &lt;code&gt;DE&lt;/code&gt;, and &lt;code&gt;HL&lt;/code&gt;. But then there's a complete &lt;em&gt;shadow&lt;/em&gt; set of all those registers — &lt;code&gt;A'&lt;/code&gt;, &lt;code&gt;F'&lt;/code&gt;, &lt;code&gt;BC'&lt;/code&gt;, &lt;code&gt;DE'&lt;/code&gt;, &lt;code&gt;HL'&lt;/code&gt; — accessible only through the &lt;code&gt;EX AF,AF'&lt;/code&gt; and &lt;code&gt;EXX&lt;/code&gt; exchange instructions.&lt;/p&gt;
&lt;p&gt;Add the two 16-bit index registers &lt;code&gt;IX&lt;/code&gt; and &lt;code&gt;IY&lt;/code&gt;, the stack pointer &lt;code&gt;SP&lt;/code&gt;, the program counter &lt;code&gt;PC&lt;/code&gt;, the interrupt vector register &lt;code&gt;I&lt;/code&gt;, and the memory refresh counter &lt;code&gt;R&lt;/code&gt;, and you've got a substantial amount of state to track. Then there's the interrupt system: two flip-flops &lt;code&gt;IFF1&lt;/code&gt; and &lt;code&gt;IFF2&lt;/code&gt;, the interrupt mode register (modes 0, 1, or 2), a halt flag, and a one-instruction delay flag for the &lt;code&gt;EI&lt;/code&gt; instruction.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;typedef&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cm"&gt;/* Main registers */&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;D&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;L&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cm"&gt;/* Shadow registers */&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;A_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;F_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;B_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;C_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;D_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;E_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;H_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;L_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cm"&gt;/* Index registers */&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;IX&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;IY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cm"&gt;/* Stack pointer and program counter */&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cm"&gt;/* Interrupt and refresh registers */&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;I&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cm"&gt;/* Interrupt state */&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;IFF1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;IFF2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;IM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;halted&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ei_delay&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cm"&gt;/* Cycle counter */&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t_states&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cm"&gt;/* Memory and I/O callbacks */&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;z80_read_fn&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;mem_read&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;z80_write_fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mem_write&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;z80_in_fn&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;io_in&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;z80_out_fn&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;io_out&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80_t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I chose to store each register as an individual byte rather than using unions or bitfields to form 16-bit pairs. This makes the code more explicit — when you see &lt;code&gt;cpu-&amp;gt;B&lt;/code&gt;, you know exactly what's being accessed. Register pairs are assembled and disassembled through inline helper functions like &lt;code&gt;rp_bc()&lt;/code&gt; and &lt;code&gt;set_bc()&lt;/code&gt;. The compiler optimizes these away completely, so there's no performance cost for the clarity.&lt;/p&gt;
&lt;h3&gt;Instruction Decoding: The Bit Field Approach&lt;/h3&gt;
&lt;p&gt;The Z80's instruction encoding has a structure that isn't immediately obvious if you're just looking at an opcode table, but becomes clear once you read the User Manual carefully. Every opcode byte can be decomposed into bit fields:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;x&lt;/code&gt; = bits 7:6 (the two highest bits)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;y&lt;/code&gt; = bits 5:3 (the middle three bits)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;z&lt;/code&gt; = bits 2:0 (the lowest three bits)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;p&lt;/code&gt; = bits 5:4 (y &amp;gt;&amp;gt; 1)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;q&lt;/code&gt; = bit 3 (y &amp;amp; 1)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These fields determine the instruction's category and operands. For the unprefixed opcodes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;x=0&lt;/strong&gt;: Miscellaneous — relative jumps, 16-bit loads, 16-bit arithmetic, INC/DEC, 8-bit loads with immediate data, and the accumulator rotate/DAA/CPL/SCF/CCF group&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;x=1&lt;/strong&gt;: Register-to-register loads (&lt;code&gt;LD r, r'&lt;/code&gt;), with the special case of &lt;code&gt;LD (HL),(HL)&lt;/code&gt; encoding &lt;code&gt;HALT&lt;/code&gt; instead&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;x=2&lt;/strong&gt;: ALU operations between the accumulator and a register (&lt;code&gt;ADD A,r&lt;/code&gt; through &lt;code&gt;CP r&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;x=3&lt;/strong&gt;: Returns, jumps, calls, stack operations, RST vectors, I/O, exchange instructions, interrupt control, and the prefix bytes for extended instruction groups&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This structure means you can decode most unprefixed instructions with a three-level switch on x, then z (or y), rather than a 256-entry lookup table. The code reads more like the specification:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;switch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* HALT */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;halted&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;PC&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* LD r[y], r[z] */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;set_reg8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;get_reg8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cm"&gt;/* ALU A, r[z] */&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;do_alu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;get_reg8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The register index mapping — 0=B, 1=C, 2=D, 3=E, 4=H, 5=L, 6=(HL), 7=A — is used consistently throughout the instruction set. Index 6 always means the memory byte pointed to by &lt;code&gt;HL&lt;/code&gt;, which is why &lt;code&gt;LD (HL),(HL)&lt;/code&gt; would be meaningless (load memory from the same memory location) and gets repurposed as &lt;code&gt;HALT&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;The Prefix System&lt;/h3&gt;
&lt;p&gt;The Z80 extends its instruction set through prefix bytes: &lt;code&gt;CB&lt;/code&gt;, &lt;code&gt;ED&lt;/code&gt;, &lt;code&gt;DD&lt;/code&gt;, and &lt;code&gt;FD&lt;/code&gt;. Each opens up a different dimension of functionality.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CB prefix&lt;/strong&gt;: Rotate/shift operations and bit manipulation. The same x/y/z decode applies, but now x=0 is rotate/shift, x=1 is &lt;code&gt;BIT&lt;/code&gt; (test), x=2 is &lt;code&gt;RES&lt;/code&gt; (reset), and x=3 is &lt;code&gt;SET&lt;/code&gt;. This gives you eight different rotate/shift operations on any of the eight register positions, and bit test/set/reset for any of eight bit positions on any register. That's 248 instructions from a single prefix byte.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ED prefix&lt;/strong&gt;: Extended operations that don't fit the main opcode map. Block transfer and search instructions (&lt;code&gt;LDI&lt;/code&gt;, &lt;code&gt;LDIR&lt;/code&gt;, &lt;code&gt;LDD&lt;/code&gt;, &lt;code&gt;LDDR&lt;/code&gt;, &lt;code&gt;CPI&lt;/code&gt;, &lt;code&gt;CPIR&lt;/code&gt;, and their output counterparts), 16-bit arithmetic with carry (&lt;code&gt;ADC HL,rp&lt;/code&gt; and &lt;code&gt;SBC HL,rp&lt;/code&gt;), extended I/O (&lt;code&gt;IN r,(C)&lt;/code&gt; and &lt;code&gt;OUT (C),r&lt;/code&gt;), interrupt mode selection, and a handful of register transfer instructions (&lt;code&gt;LD I,A&lt;/code&gt;, &lt;code&gt;LD A,R&lt;/code&gt;, etc.).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;DD and FD prefixes&lt;/strong&gt;: These modify the &lt;em&gt;following&lt;/em&gt; instruction by replacing &lt;code&gt;HL&lt;/code&gt; with &lt;code&gt;IX&lt;/code&gt; or &lt;code&gt;IY&lt;/code&gt; respectively. Wherever the unprefixed instruction uses &lt;code&gt;HL&lt;/code&gt; as a 16-bit register, the prefixed version uses &lt;code&gt;IX&lt;/code&gt; or &lt;code&gt;IY&lt;/code&gt;. Wherever it accesses &lt;code&gt;(HL)&lt;/code&gt; as a memory operand, the prefixed version accesses &lt;code&gt;(IX+d)&lt;/code&gt; or &lt;code&gt;(IY+d)&lt;/code&gt;, where &lt;code&gt;d&lt;/code&gt; is a signed displacement byte inserted between the opcode and any immediate data.&lt;/p&gt;
&lt;p&gt;This substitution extends to the individual &lt;code&gt;H&lt;/code&gt; and &lt;code&gt;L&lt;/code&gt; registers in many contexts — &lt;code&gt;LD A,H&lt;/code&gt; becomes &lt;code&gt;LD A,IXH&lt;/code&gt; with a DD prefix. These "half-index" register operations are technically undocumented but universally supported by real silicon and widely used by software. A clean room implementation needs to handle them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;DDCB and FDCB&lt;/strong&gt;: The most complex prefix combination. For bit operations on indexed memory &lt;code&gt;(IX+d)&lt;/code&gt;, the displacement byte comes &lt;em&gt;before&lt;/em&gt; the opcode byte, not after:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;DD CB d op    →    operation on (IX+d)
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This reversed order exists because the Z80's internal pipeline needs the displacement early to begin the memory access while decoding the operation. It's an elegant microarchitectural optimization that reveals itself in the instruction encoding.&lt;/p&gt;
&lt;p&gt;There's an additional subtlety: undocumented behavior where DDCB/FDCB rotate and set/reset operations also copy their result into a register specified by the &lt;code&gt;z&lt;/code&gt; field of the opcode. &lt;code&gt;RLC (IX+5)&lt;/code&gt; with a &lt;code&gt;z&lt;/code&gt; field of 0 also loads the result into &lt;code&gt;B&lt;/code&gt;. This behavior is consistent across all real Z80 chips and is relied upon by some software.&lt;/p&gt;
&lt;h3&gt;The ALU&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/clean-room-z80-emulator/z80-die-shot.jpg" alt="High-resolution die photograph of a Zilog Z80 CPU showing the silicon layout of the ALU, register file, and control logic" style="float: right; max-width: 40%; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 10px 20px rgba(0,0,0,.1);" loading="lazy"&gt;&lt;/p&gt;
&lt;p&gt;The eight ALU operations — ADD, ADC, SUB, SBC, AND, XOR, OR, CP — share a common pattern in how they affect the flags register. Getting the flags right is the single most important aspect of Z80 emulation, and the area where most subtle bugs hide.&lt;/p&gt;
&lt;p&gt;The Z80's flag register contains eight bits, six of which are documented:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bit&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;td&gt;Sign (copy of bit 7 of result)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Z&lt;/td&gt;
&lt;td&gt;Zero (result is zero)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;F5&lt;/td&gt;
&lt;td&gt;Undocumented (copy of bit 5 of result*)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;H&lt;/td&gt;
&lt;td&gt;Half-carry (carry from bit 3 to bit 4)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;F3&lt;/td&gt;
&lt;td&gt;Undocumented (copy of bit 3 of result*)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;P/V&lt;/td&gt;
&lt;td&gt;Parity (logic ops) or Overflow (arithmetic ops)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;td&gt;Subtract (set if last operation was subtraction)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td&gt;Carry (carry out of bit 7)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The asterisk on F3 and F5 matters. For most operations, bits 3 and 5 come from the &lt;em&gt;result&lt;/em&gt;. But for &lt;code&gt;CP&lt;/code&gt; (compare), they come from the &lt;em&gt;operand&lt;/em&gt;, not the result. This is because &lt;code&gt;CP&lt;/code&gt; is internally a subtraction that discards the result and keeps only the flags — but the Z80 designers connected the F3 and F5 flag inputs to the operand bus rather than the internal result bus for this particular instruction. It's the kind of detail that only shows up when you're testing against real hardware behavior.&lt;/p&gt;
&lt;p&gt;The overflow flag computation deserves special attention. For addition, overflow occurs when two operands of the same sign produce a result of the opposite sign:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x80&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For subtraction, overflow occurs when two operands of different signs produce a result whose sign differs from the first operand:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x80&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;These one-liners replace what would otherwise be multi-branch conditional logic. They work because XOR detects sign differences, and AND combines the two conditions.&lt;/p&gt;
&lt;h3&gt;Block Operations&lt;/h3&gt;
&lt;p&gt;The Z80's block instructions are one of its most powerful features and one of the trickiest to implement correctly. &lt;code&gt;LDIR&lt;/code&gt; (Load, Increment, Repeat) copies a block of memory from the address in &lt;code&gt;HL&lt;/code&gt; to the address in &lt;code&gt;DE&lt;/code&gt;, decrementing &lt;code&gt;BC&lt;/code&gt; as a counter, and repeating until &lt;code&gt;BC&lt;/code&gt; reaches zero.&lt;/p&gt;
&lt;p&gt;The implementation requires careful attention to the repeat mechanism. When &lt;code&gt;BC&lt;/code&gt; is not yet zero, &lt;code&gt;LDIR&lt;/code&gt; decrements &lt;code&gt;PC&lt;/code&gt; by 2 so that the next instruction fetch re-executes the same &lt;code&gt;LDIR&lt;/code&gt; opcode. The repeated iteration takes 21 T-states; the final iteration (when BC reaches zero) takes only 16 T-states. This asymmetry matters for cycle-accurate emulation:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* LDI/LDD/LDIR/LDDR */&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rp_hl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;wb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rp_de&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Increment or decrement based on instruction */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;set_hl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rp_hl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;set_de&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rp_de&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;set_hl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rp_hl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;set_de&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rp_de&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;set_bc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rp_bc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* ... flag computation ... */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rp_bc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;PC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;repeat&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;repeat&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The flag behavior during block operations is another area where the specification requires careful reading. The &lt;code&gt;P/V&lt;/code&gt; flag reflects whether &lt;code&gt;BC&lt;/code&gt; is non-zero after the decrement — acting as a "more data" indicator. The undocumented F3 and F5 flags come from the sum of the transferred byte and the accumulator, with F5 derived from bit 1 rather than bit 5 of that sum. These details are well-documented in the secondary literature but require careful implementation.&lt;/p&gt;
&lt;p&gt;The search variants (&lt;code&gt;CPI&lt;/code&gt;, &lt;code&gt;CPIR&lt;/code&gt;, etc.) are even more nuanced. They compare the accumulator against memory, set Z if a match is found, and terminate on either a match or &lt;code&gt;BC&lt;/code&gt; reaching zero. The flags after a search operation encode both whether a match was found &lt;em&gt;and&lt;/em&gt; whether the counter has been exhausted — two independent pieces of information packed into the flag register.&lt;/p&gt;
&lt;h3&gt;T-State Timing&lt;/h3&gt;
&lt;p&gt;Every Z80 instruction has a specific T-state (clock cycle) count that's documented in the User Manual. For an emulator driving a simulated UART or polling for terminal input at realistic intervals, accurate timing is essential.&lt;/p&gt;
&lt;p&gt;The timing model uses a simple accumulator. Each call to &lt;code&gt;z80_step()&lt;/code&gt; returns the number of T-states consumed and adds them to a running total in the CPU state. The system emulator uses this to determine when to poll for input or deliver interrupts:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;quit_flag&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;t_states&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7373&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;t_states&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;z80_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cm"&gt;/* Poll for serial input, deliver interrupts */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The value 7373 represents approximately 2 milliseconds at 3.6864 MHz — the crystal frequency used by many Z80 SBC designs. This frequency was chosen historically because it divides cleanly to produce standard baud rates. At 9600 baud with 10 bits per character (start, 8 data, stop), you get approximately 960 characters per second, or about one character every 3,840 clock cycles. Polling at 7373-cycle intervals gives roughly two opportunities to check for input per character time — enough for reliable serial communication without excessive overhead.&lt;/p&gt;
&lt;p&gt;Conditional instructions have different cycle counts depending on whether the condition is met. A &lt;code&gt;JR Z,d&lt;/code&gt; takes 12 T-states when the jump is taken but only 7 when it falls through. &lt;code&gt;CALL cc,nn&lt;/code&gt; takes 17 T-states when taken, 10 when not. These differences reflect the real pipeline behavior of the Z80 — a taken branch requires additional cycles to flush the prefetch and load the new address.&lt;/p&gt;
&lt;h3&gt;The Interrupt System&lt;/h3&gt;
&lt;p&gt;The Z80 supports three interrupt modes and a non-maskable interrupt. Mode 1 is the simplest and most commonly used in SBC designs: a maskable interrupt causes the CPU to push the current PC and jump to address &lt;code&gt;0x0038&lt;/code&gt;, just like an &lt;code&gt;RST 38h&lt;/code&gt; instruction.&lt;/p&gt;
&lt;p&gt;Mode 2 is more sophisticated. The interrupting device places a vector byte on the data bus, which is combined with the &lt;code&gt;I&lt;/code&gt; register to form a 16-bit address into a vector table in memory. The CPU reads the actual interrupt service routine address from that table location. This provides up to 128 different interrupt vectors, enabling complex multi-device interrupt schemes.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;EI&lt;/code&gt; instruction has a subtle but critical behavior: it doesn't enable interrupts immediately. Instead, it sets a one-instruction delay, so the &lt;em&gt;next&lt;/em&gt; instruction after &lt;code&gt;EI&lt;/code&gt; executes before any pending interrupt can be serviced. This guarantees that &lt;code&gt;EI; RETI&lt;/code&gt; (enable interrupts, then return from interrupt) executes atomically — the return completes before any new interrupt can preempt it.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* EI */&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;IFF1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;IFF2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;ei_delay&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And in the interrupt handler:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80_interrupt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;z80_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;IFF1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;ei_delay&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cm"&gt;/* ... process interrupt ... */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;DAA: The Most Misunderstood Instruction&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;DAA&lt;/code&gt; (Decimal Adjust Accumulator) is arguably the Z80's most complex single instruction. It adjusts the result of a previous addition or subtraction to produce a valid BCD (Binary-Coded Decimal) result. The adjustment depends on three pieces of state: the current value of the accumulator, the carry flag, and the half-carry flag. It also behaves differently depending on whether the previous operation was addition or subtraction (tracked by the N flag).&lt;/p&gt;
&lt;p&gt;The algorithm: if the lower nibble exceeds 9 or the half-carry flag is set, add (or subtract) 0x06. If the upper nibble exceeds 9 or the carry flag is set, add (or subtract) 0x60. Update carry if the upper correction was applied.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;daa&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;z80_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;correction&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;carry&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Z80_CF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Z80_HF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x0F&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;correction&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x06&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;carry&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x99&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;correction&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x60&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;carry&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Z80_CF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Z80_NF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;correction&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;correction&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sz53p&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;carry&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Z80_NF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Z80_HF&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;BCD arithmetic was important in the era when the Z80 was designed. Financial calculations, display drivers, and industrial controllers all needed decimal precision without floating-point hardware. The Z80's DAA instruction made BCD arithmetic practical on an 8-bit processor by adjusting binary results back into valid decimal digits after each operation.&lt;/p&gt;
&lt;h3&gt;Testing: 117 Ways to Be Wrong&lt;/h3&gt;
&lt;p&gt;Writing a test suite for a CPU emulator is an exercise in paranoia. Every instruction has multiple paths through the flag logic, multiple edge cases in operand handling, and multiple interactions with the rest of the CPU state. The test suite covers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Register loads&lt;/strong&gt;: 8-bit immediate, register-to-register, 16-bit immediate, indirect through BC/DE/HL, absolute addressing, HL indirect&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;8-bit ALU&lt;/strong&gt;: All eight operations with basic values, carry/borrow propagation, overflow detection, half-carry, undocumented flag bits&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;16-bit arithmetic&lt;/strong&gt;: ADD HL with carry, SBC HL, ADC HL&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INC/DEC&lt;/strong&gt;: 8-bit with overflow and half-carry edge cases, 16-bit wrapping&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rotates and shifts&lt;/strong&gt;: RLCA/RRCA/RLA/RRA (accumulator), CB-prefixed RLC/RRC/RL/RR/SLA/SRA/SRL on registers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BIT/SET/RES&lt;/strong&gt;: Test, set, and reset individual bits&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Jumps and branches&lt;/strong&gt;: JP, JR, DJNZ with taken/not-taken paths&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Calls and returns&lt;/strong&gt;: CALL/RET with condition codes, RST vectors&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Stack operations&lt;/strong&gt;: PUSH/POP for all register pairs including AF&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Block operations&lt;/strong&gt;: LDI/LDIR/LDD, CPI/CPIR, INI/OUTI&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Exchange instructions&lt;/strong&gt;: EX AF, EXX, EX DE,HL, EX (SP),HL&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Interrupt system&lt;/strong&gt;: IM modes, Mode 1 and Mode 2 dispatch, NMI, EI delay&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;IX/IY indexed&lt;/strong&gt;: Loads, stores, arithmetic, IXH/IXL access, DDCB bit operations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;T-state timing&lt;/strong&gt;: Verified counts for representative instructions from each group&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;R register&lt;/strong&gt;: Increment behavior, bit 7 preservation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each test sets up a specific CPU state, loads a short instruction sequence into memory, executes it, and verifies the results. The test framework is minimal — just macros for assertions and a runner that reports pass/fail:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gh"&gt;Z80 CPU Test Suite&lt;/span&gt;
&lt;span class="gh"&gt;==================&lt;/span&gt;
test_nop                                                    PASS
&lt;span class="gh"&gt;test_ld_reg_imm                                             PASS&lt;/span&gt;
&lt;span class="gh"&gt;...&lt;/span&gt;
test_r_bit7_preserved                                       PASS

==================
Results: 117/117 passed
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;All 117 pass. But passing unit tests isn't the same as passing real software. The real validation comes from booting actual ROMs.&lt;/p&gt;
&lt;h3&gt;The System Emulator&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;zxs&lt;/code&gt; binary wraps the CPU core with enough peripheral emulation to run two classes of software: Grant Searle-style BASIC SBCs with ACIA serial I/O, and CP/M .COM programs with a minimal BDOS shim.&lt;/p&gt;
&lt;h4&gt;ACIA Serial Emulation&lt;/h4&gt;
&lt;p&gt;The Motorola MC6850 ACIA (Asynchronous Communications Interface Adapter) is the serial chip used in the Grant Searle Z80 SBC design and many similar projects. It presents two registers to the CPU:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Status register&lt;/strong&gt; (base address): Bit 0 = Receive Data Register Full (RDRF), Bit 1 = Transmit Data Register Empty (TDRE)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data register&lt;/strong&gt; (base + 1): Read for received data, write to transmit&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The emulation maps these to terminal I/O. TDRE is always set (the "transmitter" is always ready since we're writing directly to stdout). RDRF is set when non-blocking &lt;code&gt;read()&lt;/code&gt; has captured a character from stdin. The ACIA's interrupt capability is emulated — when receive interrupts are enabled and data is available, the emulator delivers an RST 38h interrupt to the CPU.&lt;/p&gt;
&lt;h4&gt;Serial Port Auto-Detection&lt;/h4&gt;
&lt;p&gt;Rather than hardcoding the ACIA port address, the emulator scans the loaded ROM for &lt;code&gt;IN A,(n)&lt;/code&gt; (&lt;code&gt;DB xx&lt;/code&gt;) and &lt;code&gt;OUT (n),A&lt;/code&gt; (&lt;code&gt;D3 xx&lt;/code&gt;) instruction patterns. It collects the referenced port addresses and looks for adjacent pairs (status + data ports) that have both IN and OUT references — the signature of a serial peripheral. For the Grant Searle ROM, this reliably detects port base &lt;code&gt;0x80&lt;/code&gt;. For ROMs that use different port configurations, a &lt;code&gt;--port&lt;/code&gt; flag provides a manual override.&lt;/p&gt;
&lt;h4&gt;CP/M Mode&lt;/h4&gt;
&lt;p&gt;For &lt;code&gt;.com&lt;/code&gt; and &lt;code&gt;.cim&lt;/code&gt; files, the emulator switches to CP/M mode: the program is loaded at &lt;code&gt;0x0100&lt;/code&gt;, the stack pointer is set to &lt;code&gt;0xFFFE&lt;/code&gt; with a return address of &lt;code&gt;0x0000&lt;/code&gt; pushed, and BDOS calls are intercepted at address &lt;code&gt;0x0005&lt;/code&gt;. Only the essential BDOS functions are implemented — console output (function 2) and string output (function 9) — but this is enough to run many CP/M utilities and test programs.&lt;/p&gt;
&lt;h4&gt;System Auto-Detection&lt;/h4&gt;
&lt;p&gt;File extension determines the system type: &lt;code&gt;.com&lt;/code&gt; and &lt;code&gt;.cim&lt;/code&gt; files run in CP/M mode, everything else runs as a BASIC SBC. Intel HEX files are detected and parsed regardless of extension. The &lt;code&gt;--system&lt;/code&gt; flag overrides auto-detection when needed.&lt;/p&gt;
&lt;h3&gt;Booting BASIC&lt;/h3&gt;
&lt;p&gt;The real test of any emulator is whether it runs real software. Here's what happens when you point &lt;code&gt;zxs&lt;/code&gt; at Grant Searle's BASIC ROM:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;$&lt;span class="w"&gt; &lt;/span&gt;./zxs&lt;span class="w"&gt; &lt;/span&gt;basic.rom
Loaded&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8192&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;at&lt;span class="w"&gt; &lt;/span&gt;0x0000
BASIC&lt;span class="w"&gt; &lt;/span&gt;SBC&lt;span class="w"&gt; &lt;/span&gt;mode,&lt;span class="w"&gt; &lt;/span&gt;serial&lt;span class="w"&gt; &lt;/span&gt;port&lt;span class="w"&gt; &lt;/span&gt;base:&lt;span class="w"&gt; &lt;/span&gt;0x80&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;Ctrl+&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;to&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
Z80&lt;span class="w"&gt; &lt;/span&gt;SBC&lt;span class="w"&gt; &lt;/span&gt;By&lt;span class="w"&gt; &lt;/span&gt;Grant&lt;span class="w"&gt; &lt;/span&gt;Searle

Memory&lt;span class="w"&gt; &lt;/span&gt;top?
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That banner — "Z80 SBC By Grant Searle" — represents thousands of Z80 instructions executing correctly. The ROM initializes memory, configures the ACIA, sets up the interrupt handler, and enters the BASIC interpreter's command loop. Each of those steps exercises a different subset of the CPU's instruction set. A single incorrectly implemented instruction — a wrong flag bit, a miscounted displacement, a botched stack operation — would cause the ROM to crash or produce garbage output.&lt;/p&gt;
&lt;p&gt;The RC2014 BASIC ROM boots as well, though it requires specifying the serial port base since its ROM references multiple I/O addresses:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;$&lt;span class="w"&gt; &lt;/span&gt;./zxs&lt;span class="w"&gt; &lt;/span&gt;--port&lt;span class="w"&gt; &lt;/span&gt;0x80&lt;span class="w"&gt; &lt;/span&gt;rc2014_56k.hex
Loaded&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8154&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;HEX&lt;span class="w"&gt; &lt;/span&gt;file
BASIC&lt;span class="w"&gt; &lt;/span&gt;SBC&lt;span class="w"&gt; &lt;/span&gt;mode,&lt;span class="w"&gt; &lt;/span&gt;serial&lt;span class="w"&gt; &lt;/span&gt;port&lt;span class="w"&gt; &lt;/span&gt;base:&lt;span class="w"&gt; &lt;/span&gt;0x80&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;Ctrl+&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;to&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

RC2014&lt;span class="w"&gt; &lt;/span&gt;-&lt;span class="w"&gt; &lt;/span&gt;MS&lt;span class="w"&gt; &lt;/span&gt;Basic&lt;span class="w"&gt; &lt;/span&gt;Loader
z88dk&lt;span class="w"&gt; &lt;/span&gt;-&lt;span class="w"&gt; &lt;/span&gt;feilipu

Memory&lt;span class="w"&gt; &lt;/span&gt;top?
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Intel HEX file loading is handled transparently — the emulator detects the format by checking for the &lt;code&gt;:&lt;/code&gt; record marker and parses the standard Intel HEX record format (data records, EOF records, address fields, checksums).&lt;/p&gt;
&lt;h3&gt;What I Learned About LLM Clean Room Development&lt;/h3&gt;
&lt;p&gt;This project taught me as much about working with LLMs as it did about the Z80. Some observations:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Specification quality determines output quality.&lt;/strong&gt; When I gave Claude a vague instruction like "implement the Z80," the result would have been a generic emulator shaped by whatever training data dominates. When I gave it a detailed architectural plan — bit field decoding, specific callback interfaces, T-state requirements — the result was a coherent, well-structured implementation that reflected the design decisions in the specification. Antirez observed the same thing: the LLM performs dramatically better when you provide documentation and constraints rather than open-ended prompts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;LLMs can work from datasheets, not just from memory.&lt;/strong&gt; The clean room constraint was the whole point: could Claude produce correct Z80 flag behavior, proper DDCB/FDCB displacement ordering, accurate block operation semantics — all derived from specification knowledge rather than memorized source code? The 117 passing tests and booting ROMs suggest it can. The code doesn't look like any particular existing emulator. The bit field decoder, the ALU structure, the prefix dispatch — these are architecturally reasonable but stylistically original.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The bug pattern was illuminating.&lt;/strong&gt; The one test failure in the initial implementation was a T-state timing issue: DD/FD prefix overhead was being double-counted. This is exactly the kind of bug a human developer would make when implementing prefix dispatch — a bookkeeping error at the boundary between the prefix handler and the main decoder. It was not the kind of error you'd see from copying existing code, where the timing would already be correct. The bug was &lt;em&gt;original&lt;/em&gt;, which paradoxically increases confidence that the implementation is too.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Z80's instruction encoding is remarkably systematic.&lt;/strong&gt; Once you express the x/y/z/p/q bit field decomposition in the architectural plan, the entire instruction set becomes a small number of patterns applied consistently across register indices and operation codes. Claude picked up on this structure immediately and produced a decoder that reads like the specification. The elegance of Zilog's encoding is invisible in an opcode table but obvious in a decoder — and an LLM can see that structure when pointed at it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The DD/FD prefix system is essentially a register renaming mechanism.&lt;/strong&gt; It doesn't introduce new operations — it modifies existing ones by replacing HL with IX or IY. Expressing this in the plan as "replace HL→IX/IY, H→IXH/IYH, L→IXL/IYL, (HL)→(IX+d)/(IY+d)" gave Claude the conceptual framework to implement DD/FD support as a modifier on the existing decoder rather than duplicating 200+ instruction handlers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Flag behavior is the specification.&lt;/strong&gt; Two Z80 emulators can produce identical results for every instruction and still differ in their flag register output. The undocumented F3 and F5 bits, the special CP flag behavior, the block instruction flag computations — these are what distinguish a correct emulator from an approximately correct one. Claude got the CP flag anomaly right (F3/F5 from the operand, not the result), which suggests it was working from specification knowledge about the Z80's internal bus routing rather than just copying a known-good flag computation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Clean room constraints make LLM output more trustworthy, not less.&lt;/strong&gt; There's an irony here: by &lt;em&gt;restricting&lt;/em&gt; what the LLM can reference, you get &lt;em&gt;more&lt;/em&gt; confidence in the result. If Claude had produced code that looked suspiciously like MAME's Z80 core, you'd wonder whether it was simply reciting training data. Instead, it produced an implementation that's structurally sound, stylistically distinct, and correct — the hallmarks of working from specifications rather than from examples.&lt;/p&gt;
&lt;h3&gt;The Code&lt;/h3&gt;
&lt;p&gt;The complete source is &lt;a href="https://baud.rs/Ae0K75"&gt;on GitHub&lt;/a&gt; — five files totaling roughly 3,000 lines of C. It builds with &lt;code&gt;make&lt;/code&gt;, produces zero warnings with &lt;code&gt;-Wall -Wextra&lt;/code&gt;, and runs Grant Searle and RC2014 BASIC ROMs out of the box.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;$&lt;span class="w"&gt; &lt;/span&gt;make
cc&lt;span class="w"&gt; &lt;/span&gt;-Wall&lt;span class="w"&gt; &lt;/span&gt;-Wextra&lt;span class="w"&gt; &lt;/span&gt;-O2&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;zxs&lt;span class="w"&gt; &lt;/span&gt;zxs.c&lt;span class="w"&gt; &lt;/span&gt;z80.c
cc&lt;span class="w"&gt; &lt;/span&gt;-Wall&lt;span class="w"&gt; &lt;/span&gt;-Wextra&lt;span class="w"&gt; &lt;/span&gt;-O2&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;z80_test&lt;span class="w"&gt; &lt;/span&gt;z80_test.c&lt;span class="w"&gt; &lt;/span&gt;z80.c

$&lt;span class="w"&gt; &lt;/span&gt;./z80_test
Z80&lt;span class="w"&gt; &lt;/span&gt;CPU&lt;span class="w"&gt; &lt;/span&gt;Test&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Suite&lt;/span&gt;
&lt;span class="o"&gt;==================&lt;/span&gt;
...
Results:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;117&lt;/span&gt;/117&lt;span class="w"&gt; &lt;/span&gt;passed

$&lt;span class="w"&gt; &lt;/span&gt;./zxs&lt;span class="w"&gt; &lt;/span&gt;basic.rom
Z80&lt;span class="w"&gt; &lt;/span&gt;SBC&lt;span class="w"&gt; &lt;/span&gt;By&lt;span class="w"&gt; &lt;/span&gt;Grant&lt;span class="w"&gt; &lt;/span&gt;Searle
Memory&lt;span class="w"&gt; &lt;/span&gt;top?
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;There is more to do. ZEXALL compliance would be the next validation milestone — it tests every instruction against known-good results captured from real Z80 hardware. ZX Spectrum emulation would require adding ULA video, keyboard matrix scanning, and Spectrum-specific memory banking. Cycle-exact timing would enable accurate sound emulation and demo-scene effects.&lt;/p&gt;
&lt;p&gt;But for now, the ROM boots, BASIC runs, and every line of the emulator traces back to Z80 specifications and documentation rather than someone else's &lt;code&gt;z80.c&lt;/code&gt;. An LLM wrote it, but a human designed it, constrained it, tested it, and validated it against real hardware ROM images. The clean room constraint didn't just produce a trustworthy emulator — it produced a trustworthy &lt;em&gt;process&lt;/em&gt; for using LLMs on systems programming tasks. Give the model a specification instead of an open-ended prompt. Enforce constraints that prevent training data regurgitation. Validate against real-world artifacts, not just unit tests.&lt;/p&gt;
&lt;p&gt;Antirez asked whether LLMs create original code or decompress training data. This project is one more data point on the side of original creation — but only when you set up the conditions for it. The clean room is what makes the difference.&lt;/p&gt;</description><category>acia</category><category>ai</category><category>c</category><category>claude</category><category>clean room</category><category>cp/m</category><category>cpu design</category><category>emulation</category><category>grant searle</category><category>instruction decoding</category><category>llm</category><category>rc2014</category><category>retrocomputing</category><category>serial</category><category>z80</category><category>zilog</category><guid>https://tinycomputers.io/posts/clean-room-z80-emulator.html</guid><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate></item><item><title>The Jevons Counter-Thesis: Why AI Displacement Scenarios Underweight Demand Expansion</title><link>https://tinycomputers.io/posts/the-jevons-counter-thesis-why-ai-displacement-scenarios-underweight-demand-expansion.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-jevons-counter-thesis-why-ai-displacement-scenarios-underweight-demand-expansion_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;34 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/jevons-counter-thesis/jevons-portrait.jpg" alt="Portrait of William Stanley Jevons, the English economist who first described the paradox of efficiency-driven demand expansion in 1865" style="float: right; max-width: 300px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Citrini Research recently published a piece called &lt;a href="https://baud.rs/global-intel-crisis"&gt;"The 2028 Global Intelligence Crisis"&lt;/a&gt; — a thought experiment modeling a scenario in which AI-driven white-collar displacement triggers a cascading economic crisis. In their telling, AI replaces workers, spending drops, firms invest more in AI to protect margins, AI improves, and the cycle repeats. They call it the "Intelligence Displacement Spiral" and project a 57% peak-to-trough drawdown in the S&amp;amp;P 500. No natural brake. No soft landing.&lt;/p&gt;
&lt;p&gt;It is a well-constructed stress test, and worth reading on its own terms. But the scenario achieves its conclusion by modeling only the displacement side of an efficiency revolution while treating the demand-expansion side as essentially zero. This is the core analytical gap, and it is precisely the gap that Jevons Paradox addresses.&lt;/p&gt;
&lt;p&gt;I have &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;written about Jevons Paradox before&lt;/a&gt; in the context of the semiconductor industry — how improvements in energy efficiency from the transistor through GPUs have consistently driven &lt;em&gt;more&lt;/em&gt; total energy consumption, not less, by making computing cheap enough to permeate every corner of the economy. The same framework applies to AI and cognitive labor, and the Citrini piece is a useful foil for exploring why.&lt;/p&gt;
&lt;h3&gt;What Jevons Paradox Actually Says&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/jevons-counter-thesis/coal-question-cover.jpg" alt="Title page of The Coal Question by W. Stanley Jevons, second edition, 1866, published by Macmillan and Co." style="float: left; max-width: 200px; margin: 0 1.5em 1em 0; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;In 1865, the English economist William Stanley Jevons observed something counterintuitive about coal consumption in Britain. James Watt's steam engine had made coal use dramatically more efficient — you could extract far more useful work per ton of coal than before. The intuitive expectation was that Britain would use less coal. The opposite happened. Total coal consumption surged, because the efficiency gains made coal-powered activities so much cheaper that entirely new applications emerged. Factories that couldn't justify coal-powered machinery at the old efficiency levels now could. Industries that had never used steam power adopted it. The per-unit savings were overwhelmed by the explosion in total units demanded.&lt;/p&gt;
&lt;p&gt;This pattern has recurred across nearly every major input cost reduction in economic history. Semiconductor efficiency improved by roughly a trillionfold over six decades, and total spending on computing did not decline — it expanded from a niche military and scientific expenditure to a multi-trillion-dollar global industry. Bandwidth costs collapsed through the 1990s and 2000s, and total bandwidth consumption didn't decrease — it increased by orders of magnitude as streaming video, social media, cloud computing, and mobile internet emerged. LED lighting is roughly 90% more efficient than incandescent bulbs, and total global illumination has increased, not decreased, as cheap lighting enabled new architectural designs, 24-hour commercial operations, and decorative applications that were uneconomical before.&lt;/p&gt;
&lt;p&gt;The mechanism is straightforward: when a critical input becomes dramatically cheaper, the addressable market for everything that uses that input expands. New use cases emerge that were previously uneconomical. Existing use cases scale to populations that were previously priced out. The total consumption of the now-cheaper input rises even as the per-unit cost falls.&lt;/p&gt;
&lt;p&gt;The Citrini piece implicitly models AI as a &lt;em&gt;substitution&lt;/em&gt; technology — it replaces human cognitive labor, and that's the end of the transaction. Jevons Paradox suggests AI is simultaneously, and perhaps primarily, an &lt;em&gt;expansion&lt;/em&gt; technology — it makes cognitive services so cheap that demand for them can grow faster than the displacement effect.&lt;/p&gt;
&lt;h3&gt;Latent Demand Is Enormous and Unmeasured&lt;/h3&gt;
&lt;p&gt;The Citrini scenario treats the economy as having a fixed quantity of cognitive work. AI absorbs that work, the workers who performed it lose their income, and aggregate demand collapses. But the reason cognitive work costs what it does is that human intelligence has been scarce and expensive. This scarcity has suppressed enormous categories of demand that simply don't show up in current GDP accounting because they've never been economically feasible.&lt;/p&gt;
&lt;p&gt;Consider education. The average American family cannot afford personalized tutoring. A human tutor at \$50-100 per hour is a luxury good. If AI reduces the cost of competent, personalized educational support to near zero, the addressable market isn't the current tutoring market — it's every student in the country. That is a market expansion of potentially 50x or more relative to the existing tutoring industry. The humans who previously worked as tutors are displaced, yes, but the economic activity generated by tens of millions of students receiving personalized education — and the downstream productivity gains from a better-educated workforce — is a new demand category that didn't exist before.&lt;/p&gt;
&lt;p&gt;The same logic applies across dozens of sectors. Legal services: roughly 80% of Americans who need legal help cannot afford it. Personalized financial planning: currently available only to households with six-figure investable assets. Preventive health analysis: limited by the number of available clinicians. Custom software for small businesses: a \$50,000 engagement is out of reach for a business generating \$300,000 in annual revenue. Architecture and design services for middle-income homeowners. Personalized nutrition and fitness programming. Translation and localization for businesses that currently operate only in one language.&lt;/p&gt;
&lt;p&gt;These are not speculative categories. They are documented, unmet needs constrained by the cost of the human intelligence required to serve them. When AI collapses those costs, the question is whether the demand expansion across all of these categories — and others we haven't imagined — can offset the displacement in existing roles. The Citrini piece assumes the answer is no without modeling the question. Jevons Paradox, and the historical base rate, suggests the answer is more likely yes.&lt;/p&gt;
&lt;h3&gt;The Citrini Piece's Own Evidence Supports Jevons&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/jevons-counter-thesis/uk-coal-production.png" alt="UK coal production from 1860 to 2010 showing the dramatic surge in output that Jevons predicted despite improving efficiency — production quadrupled from 75 million tonnes in 1860 to nearly 300 million tonnes by 1913" style="max-width: 100%; margin: 1em 0; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;The piece acknowledges two canonical examples of technological displacement that didn't produce net job losses: ATMs (bank teller employment rose for 20 years after their introduction, because cheaper branch operations led to more branches) and the internet (travel agencies, the Yellow Pages, and brick-and-mortar retail were disrupted, but entirely new industries emerged in their place). It then dismisses both by asserting that "AI is different because it improves at the very tasks humans would redeploy to."&lt;/p&gt;
&lt;p&gt;But this is precisely what critics said at every prior inflection point. When mechanized looms were introduced in the early 19th century, displaced textile workers could not "redeploy" to weaving — the machines did the very thing they were trained for. What actually happened was that radically cheaper cloth created demand for fashion, retail distribution, global trade logistics, cotton cultivation, and marketing — categories that scarcely existed at the prior cost structure. The weavers didn't get their old jobs back. They moved into an economy that had been restructured around abundant, cheap textiles, and that economy was far larger than the one it replaced.&lt;/p&gt;
&lt;p&gt;The Citrini piece's own scenario contains evidence of Jevons-style expansion that it frames exclusively as destruction. The section on agentic commerce describes consumers using AI agents that eliminate friction — price-matching across platforms, renegotiating subscriptions, rebooking travel. The article frames this as the death of intermediation moats. But it is equally a story of market expansion. When an AI agent assembles a complete travel itinerary faster and cheaper than Expedia, the result isn't just that Expedia loses revenue. It's that people who previously found trip planning too cumbersome or too expensive now take trips. Total travel volume can increase even as per-trip intermediation costs fall.&lt;/p&gt;
&lt;p&gt;The DoorDash example is even more explicit. The article describes vibe-coded competitors passing 90-95% of delivery fees to drivers, and AI agents shopping across twenty platforms for the best deal. Delivery becomes cheaper for consumers and more remunerative for drivers. The article frames this as destruction of DoorDash's moat. From a Jevons perspective, it's a textbook demand expansion setup: cheaper delivery means more people order delivery, more restaurants offer delivery, and total delivery volume grows.&lt;/p&gt;
&lt;h3&gt;The Feedback Loop Has a Natural Brake&lt;/h3&gt;
&lt;p&gt;The article's most powerful rhetorical device is the claim that the Intelligence Displacement Spiral has "no natural brake." This is the critical assertion on which the entire doom scenario depends, and it is the assertion most directly challenged by Jevons Paradox.&lt;/p&gt;
&lt;p&gt;The natural brake is price-driven demand expansion. As AI makes cognitive services cheaper, consumers gain access to goods and services they couldn't previously afford. This is true even for displaced workers operating at lower income levels. A former product manager earning \$45,000 as an Uber driver cannot afford a human financial advisor, but can access AI-driven financial planning for near zero cost. They cannot afford a human tutor for their children, but can access AI tutoring. They cannot afford custom software to start a small business, but can build an application using AI tools. The consumption basket shifts — less spending on expensive human-mediated services, more consumption of cheap AI-mediated services that were previously unattainable.&lt;/p&gt;
&lt;p&gt;This doesn't make the individual worse off on net — it partially offsets the income decline through dramatically lower cost of living for intelligence-intensive services. The article's "Ghost GDP" concept — output that shows up in national accounts but doesn't circulate through the real economy — assumes that the efficiency gains accrue entirely to capital owners. But the article itself documents intense competition. Dozens of vibe-coded delivery startups competing for share. Agentic shoppers forcing prices down across every category. Stablecoin payment rails bypassing card interchange fees. In competitive markets, efficiency gains don't stay with producers — they flow to consumers through lower prices. That flow is the transmission mechanism through which Jevons effects operate, and the article describes it vividly while somehow not recognizing it as a countervailing force.&lt;/p&gt;
&lt;h3&gt;The OpEx Substitution Framing Conceals the Demand Side&lt;/h3&gt;
&lt;p&gt;The article makes an astute observation that AI investment increased even as the economy contracted, because companies were substituting AI OpEx for labor OpEx. A company spending \$100M on employees and \$5M on AI shifts to \$70M on employees and \$20M on AI — total spending falls while AI spending rises. This explains why the AI infrastructure complex continued performing even as the broader economy deteriorated.&lt;/p&gt;
&lt;p&gt;This is a credible supply-side analysis. But it omits the demand-side consequence. If a company produces the same output with fewer workers at lower total cost, competitive pressure pushes the price of that output down. Falling prices expand the addressable market. A SaaS product that cost \$500,000 annually and was affordable only to the Fortune 500 now costs \$50,000 and is accessible to mid-market companies. A consulting engagement that cost \$2 million and was reserved for large enterprises now costs \$200,000 and is available to growth-stage companies. The total number of transactions can grow even as per-transaction revenue falls.&lt;/p&gt;
&lt;p&gt;The article models a world where output stays constant, prices stay constant, costs drop, and the entire surplus accrues to shareholders. In practice, the intense competition the article itself describes — incumbents in knife-fights with each other and with upstart challengers — is precisely the mechanism that prevents this. Competition distributes efficiency gains through lower prices, and lower prices expand markets.&lt;/p&gt;
&lt;h3&gt;The Intelligence Premium Unwind Is Also a Jevons Story&lt;/h3&gt;
&lt;p&gt;The article's most compelling framing is that human intelligence has been the scarce input in the economy for all of modern history, and AI is unwinding that premium. Every institution — the labor market, mortgage underwriting, the tax code — was designed for a world where human cognition was expensive and irreplaceable. As AI makes intelligence abundant, these institutions crack.&lt;/p&gt;
&lt;p&gt;Jevons Paradox applied to this framing produces a different conclusion. When intelligence becomes abundant and cheap, the economy doesn't just produce the same cognitive output more efficiently — it restructures around consuming vastly more intelligence. We don't merely replicate the existing quantity of analysis, decisions, creative output, and coordination at lower cost. We produce orders of magnitude more of it.&lt;/p&gt;
&lt;p&gt;The article's own data point supports this: by March 2027 in their scenario, the median American was consuming 400,000 tokens per day, a 10x increase from the end of 2026. The article cites this as evidence of disruption, but it is fundamentally a Jevons data point. People are consuming &lt;em&gt;more&lt;/em&gt; intelligence, not less. That consumption drives economic activity — someone is building the products and services that consume those tokens, maintaining the infrastructure, curating quality, arbitrating edge cases, and inventing new applications.&lt;/p&gt;
&lt;p&gt;The question is whether that new economic activity employs enough people at high enough wages to offset the displacement. The article assumes it doesn't. History suggests it tends to, though the transition period can be painful and the new employment categories often look nothing like the old ones.&lt;/p&gt;
&lt;h3&gt;The GDP Composition Argument Cuts Both Ways&lt;/h3&gt;
&lt;p&gt;The article makes much of the fact that 70% of US GDP is consumer spending, and that white-collar workers drive a disproportionate share of that spending. When those workers lose income, the consumption base collapses, and GDP follows. This is mechanically sound as far as it goes.&lt;/p&gt;
&lt;p&gt;But Jevons Paradox suggests that the &lt;em&gt;composition&lt;/em&gt; of GDP shifts during efficiency revolutions, not just the level. When agricultural mechanization displaced 90% of farm workers over the course of a century, it did not produce a permanent 90% unemployment rate. GDP restructured around manufacturing and services — categories that were economically marginal when most human labor was occupied with food production. The displaced agricultural workers didn't return to farming. They moved into an economy where cheap food freed up income and labor for other activities.&lt;/p&gt;
&lt;p&gt;The analogous question for AI is: when cognitive labor becomes cheap, what does the economy restructure around? The Citrini piece doesn't attempt to answer this, which is understandable — predicting the specific industries of the future is a fool's errand. But the &lt;em&gt;pattern&lt;/em&gt; is well-established. Cheap food led to a manufacturing economy. Cheap manufacturing led to a services economy. Cheap cognitive services leads to something else. The article's scenario assumes the chain terminates with "cheap cognitive services leads to nothing," which is historically unprecedented.&lt;/p&gt;
&lt;p&gt;One plausible direction: the economy shifts toward activities where physical presence, human trust, and embodied experience carry a premium precisely &lt;em&gt;because&lt;/em&gt; cognitive tasks are commoditized. Healthcare delivery (not diagnosis, but care), skilled trades, experiential services, community-oriented businesses, and creative work that is valued specifically for its human origin. These are not futuristic speculations — they are existing sectors where human presence is intrinsic to the value proposition. As AI deflates the cost of cognitive services, the &lt;em&gt;relative&lt;/em&gt; value of irreducibly human activities increases, and spending may shift toward them.&lt;/p&gt;
&lt;p&gt;Another direction, potentially larger: entirely new categories of economic activity that we cannot yet name, because they only become viable when intelligence is cheap and abundant. The internet didn't just make existing activities more efficient — it created social media, the gig economy, e-commerce logistics, content creation as a profession, and cloud computing as an industry. None of these were predicted in advance. The equivalent AI-native industries may already be emerging in nascent form, invisible to a GDP accounting framework built for the prior economic structure.&lt;/p&gt;
&lt;h3&gt;Where the Speed Concern Is Legitimate&lt;/h3&gt;
&lt;p&gt;The strongest element of the Citrini scenario is the speed argument. Prior Jevons cycles unfolded over decades — long enough for institutions, education systems, and labor markets to adapt. The article's timeline compresses displacement into roughly 18-24 months, far faster than the demand-expansion side can respond.&lt;/p&gt;
&lt;p&gt;This is a legitimate concern, and it's where the Jevons counter-argument is weakest. If displacement is fast and demand expansion is slow, the interim period can be genuinely severe — even if the long-run equilibrium is positive. Policy response, education, and institutional adaptation all operate on timescales measured in years, not quarters.&lt;/p&gt;
&lt;p&gt;However, the article makes an asymmetric assumption on this point. It models disruption happening at AI speed — step-function capability jumps, immediate corporate adoption, rapid layoff cycles. But it models demand expansion as essentially static, only emerging when government intervention eventually arrives. This ignores that entrepreneurial response to dramatically cheaper inputs has historically been fast. The smartphone created a trillion-dollar app economy in under five years. Cloud computing spawned tens of thousands of SaaS companies within a decade. When a critical input becomes 100x cheaper, entrepreneurs move quickly to build products that exploit the new cost structure, because the profit opportunity is enormous.&lt;/p&gt;
&lt;p&gt;The article's scenario includes dozens of vibe-coded delivery startups appearing rapidly, which is itself evidence of fast entrepreneurial response to cheaper intelligence. It just doesn't extend that observation to other sectors.&lt;/p&gt;
&lt;h3&gt;Where the Counter-Argument Must Be Honest&lt;/h3&gt;
&lt;p&gt;Jevons Paradox is not a universal law. It describes a tendency — a strong historical pattern — not an iron guarantee. The Citrini piece's most potent rebuttal is that prior Jevons cycles involved specific resource inputs (coal, compute, bandwidth, lighting), while AI targets the general-purpose input of intelligence itself. If AI can perform not only existing cognitive tasks but also the &lt;em&gt;new&lt;/em&gt; tasks that would emerge from demand expansion, then the rebound effect could be muted or eliminated entirely. A coal-fired loom couldn't design fashion or run a retail chain. But an AI that can code, analyze, write, plan, and reason might well be capable of staffing the very industries that Jevons expansion would create.&lt;/p&gt;
&lt;p&gt;This is a genuine uncertainty, and intellectual honesty requires acknowledging it. The question reduces to whether human judgment, taste, coordination, creativity, physical presence, and social trust constitute a durable residual demand — activities where humans remain preferred or necessary even when AI is technically capable — or whether those too get absorbed over time. The honest answer is that we don't know.&lt;/p&gt;
&lt;p&gt;What we do know is that the historical base rate strongly favors Jevons over the doom loop. Every prior prediction that a general-purpose technology would produce permanent mass unemployment — mechanized agriculture, factory automation, computerization, the internet — has been wrong, and wrong for the same reason: the predictions modeled displacement without modeling demand expansion. The Citrini piece, for all its sophistication, repeats that analytical pattern.&lt;/p&gt;
&lt;h3&gt;The Bottom Line&lt;/h3&gt;
&lt;p&gt;The Citrini piece is worth reading as a risk scenario. The transitional pain it describes is plausible, and portfolio construction should account for it. But as a base case for the future of the economy, it requires assuming that the most consistent empirical pattern in economic history — that radically cheaper inputs generate demand that exceeds displacement — has finally broken. That's a bet against a very long track record.&lt;/p&gt;
&lt;p&gt;For more on the mechanics of Jevons Paradox and how it has played out across the semiconductor industry from vacuum tubes to modern AI accelerators, see my earlier piece: &lt;a href="https://tinycomputers.io/posts/jevons-paradox.html"&gt;Jevons Paradox and the Semiconductor Industry&lt;/a&gt;.&lt;/p&gt;</description><category>ai</category><category>citrini research</category><category>demand expansion</category><category>economics</category><category>efficiency</category><category>jevons paradox</category><category>labor displacement</category><category>macroeconomics</category><category>technology</category><category>white-collar employment</category><guid>https://tinycomputers.io/posts/the-jevons-counter-thesis-why-ai-displacement-scenarios-underweight-demand-expansion.html</guid><pubDate>Tue, 24 Feb 2026 14:00:00 GMT</pubDate></item><item><title>A Stack-Based Bytecode VM for Lattice: 100 Opcodes, Serialization, and a Self-Hosted Compiler</title><link>https://tinycomputers.io/posts/a-stack-based-bytecode-vm-for-lattice.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/a-stack-based-bytecode-vm-for-lattice_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;29 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;When I &lt;a href="https://tinycomputers.io/posts/from-tree-walker-to-bytecode-vm-compiling-lattice.html"&gt;first wrote about&lt;/a&gt; Lattice's move from a tree-walking interpreter to a bytecode VM, the instruction set had 62 opcodes, concurrency primitives still delegated to the tree-walker, and programs couldn't be serialized. The VM was a foundation — correct and complete enough to become the default, but clearly a starting point.&lt;/p&gt;
&lt;p&gt;That was ten versions ago. The bytecode VM now has 100 opcodes, compiles concurrency primitives into standalone sub-chunks with zero AST dependency at runtime, ships a binary serialization format for ahead-of-time compilation, includes an ephemeral bump arena for short-lived string temporaries, and — perhaps most satisfyingly — has a self-hosted compiler written entirely in Lattice that produces the same &lt;code&gt;.latc&lt;/code&gt; bytecode files as the C implementation.&lt;/p&gt;
&lt;p&gt;This post walks through what changed and why. The full technical treatment is available as a &lt;a href="https://tinycomputers.io/papers/lattice_vm.pdf"&gt;research paper&lt;/a&gt;; this is the practitioner's version.&lt;/p&gt;
&lt;h3&gt;Why Keep Going&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://tinycomputers.io/posts/from-tree-walker-to-bytecode-vm-compiling-lattice.html"&gt;original bytecode VM&lt;/a&gt; solved the immediate problems: it eliminated recursive AST dispatch overhead and gave Lattice a single execution path for file execution, the REPL, and the WASM playground. But three issues remained.&lt;/p&gt;
&lt;p&gt;First, &lt;code&gt;OP_SCOPE&lt;/code&gt; and &lt;code&gt;OP_SELECT&lt;/code&gt; — Lattice's structured concurrency opcodes — still stored AST node pointers in the constant pool and dropped into the tree-walking evaluator at runtime. This meant the AST had to stay alive during concurrent execution, which defeated one of the main motivations for having a bytecode VM in the first place.&lt;/p&gt;
&lt;p&gt;Second, the AST dependency made serialization impossible. You can serialize bytecode to a file, but you can't easily serialize an arbitrary C pointer to an AST node. Programs had to be parsed and compiled on every run.&lt;/p&gt;
&lt;p&gt;Third, the dispatch loop used a plain &lt;code&gt;switch&lt;/code&gt; statement. Not a crisis, but computed goto dispatch is a well-known improvement for bytecode interpreters, and leaving it on the table felt unnecessary.&lt;/p&gt;
&lt;p&gt;All three problems are solved now. Let me start with the instruction set, since everything else builds on it.&lt;/p&gt;
&lt;h3&gt;100 Opcodes&lt;/h3&gt;
&lt;p&gt;The instruction set grew from 62 to 100 opcodes, organized into 16 functional categories:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Representative opcodes&lt;/th&gt;
&lt;th style="text-align: right;"&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Stack manipulation&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CONSTANT&lt;/code&gt;, &lt;code&gt;NIL&lt;/code&gt;, &lt;code&gt;TRUE&lt;/code&gt;, &lt;code&gt;FALSE&lt;/code&gt;, &lt;code&gt;UNIT&lt;/code&gt;, &lt;code&gt;POP&lt;/code&gt;, &lt;code&gt;DUP&lt;/code&gt;, &lt;code&gt;SWAP&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arithmetic/logical&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ADD&lt;/code&gt;, &lt;code&gt;SUB&lt;/code&gt;, &lt;code&gt;MUL&lt;/code&gt;, &lt;code&gt;DIV&lt;/code&gt;, &lt;code&gt;MOD&lt;/code&gt;, &lt;code&gt;NEG&lt;/code&gt;, &lt;code&gt;NOT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bitwise&lt;/td&gt;
&lt;td&gt;&lt;code&gt;BIT_AND&lt;/code&gt;, &lt;code&gt;BIT_OR&lt;/code&gt;, &lt;code&gt;BIT_XOR&lt;/code&gt;, &lt;code&gt;BIT_NOT&lt;/code&gt;, &lt;code&gt;LSHIFT&lt;/code&gt;, &lt;code&gt;RSHIFT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comparison&lt;/td&gt;
&lt;td&gt;&lt;code&gt;EQ&lt;/code&gt;, &lt;code&gt;NEQ&lt;/code&gt;, &lt;code&gt;LT&lt;/code&gt;, &lt;code&gt;GT&lt;/code&gt;, &lt;code&gt;LTEQ&lt;/code&gt;, &lt;code&gt;GTEQ&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CONCAT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Variables&lt;/td&gt;
&lt;td&gt;&lt;code&gt;GET/SET_LOCAL&lt;/code&gt;, &lt;code&gt;GET/SET/DEFINE_GLOBAL&lt;/code&gt;, &lt;code&gt;GET/SET_UPVALUE&lt;/code&gt;, &lt;code&gt;CLOSE_UPVALUE&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control flow&lt;/td&gt;
&lt;td&gt;&lt;code&gt;JUMP&lt;/code&gt;, &lt;code&gt;JUMP_IF_FALSE&lt;/code&gt;, &lt;code&gt;JUMP_IF_TRUE&lt;/code&gt;, &lt;code&gt;JUMP_IF_NOT_NIL&lt;/code&gt;, &lt;code&gt;LOOP&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Functions&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CALL&lt;/code&gt;, &lt;code&gt;CLOSURE&lt;/code&gt;, &lt;code&gt;RETURN&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Iterators&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ITER_INIT&lt;/code&gt;, &lt;code&gt;ITER_NEXT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data structures&lt;/td&gt;
&lt;td&gt;&lt;code&gt;BUILD_ARRAY&lt;/code&gt;, &lt;code&gt;INDEX&lt;/code&gt;, &lt;code&gt;SET_INDEX&lt;/code&gt;, &lt;code&gt;GET_FIELD&lt;/code&gt;, &lt;code&gt;INVOKE&lt;/code&gt;, etc.&lt;/td&gt;
&lt;td style="text-align: right;"&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exceptions/defer&lt;/td&gt;
&lt;td&gt;&lt;code&gt;PUSH_EXCEPTION_HANDLER&lt;/code&gt;, &lt;code&gt;THROW&lt;/code&gt;, &lt;code&gt;DEFER_PUSH&lt;/code&gt;, &lt;code&gt;DEFER_RUN&lt;/code&gt;, etc.&lt;/td&gt;
&lt;td style="text-align: right;"&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase system&lt;/td&gt;
&lt;td&gt;&lt;code&gt;FREEZE&lt;/code&gt;, &lt;code&gt;THAW&lt;/code&gt;, &lt;code&gt;CLONE&lt;/code&gt;, &lt;code&gt;MARK_FLUID&lt;/code&gt;, &lt;code&gt;REACT&lt;/code&gt;, &lt;code&gt;BOND&lt;/code&gt;, &lt;code&gt;SEED&lt;/code&gt;, etc.&lt;/td&gt;
&lt;td style="text-align: right;"&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Builtins/modules&lt;/td&gt;
&lt;td&gt;&lt;code&gt;PRINT&lt;/code&gt;, &lt;code&gt;IMPORT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SCOPE&lt;/code&gt;, &lt;code&gt;SELECT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integer fast paths&lt;/td&gt;
&lt;td&gt;&lt;code&gt;INC_LOCAL&lt;/code&gt;, &lt;code&gt;DEC_LOCAL&lt;/code&gt;, &lt;code&gt;ADD_INT&lt;/code&gt;, &lt;code&gt;SUB_INT&lt;/code&gt;, &lt;code&gt;LOAD_INT8&lt;/code&gt;, etc.&lt;/td&gt;
&lt;td style="text-align: right;"&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wide variants&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CONSTANT_16&lt;/code&gt;, &lt;code&gt;GET_GLOBAL_16&lt;/code&gt;, &lt;code&gt;SET_GLOBAL_16&lt;/code&gt;, &lt;code&gt;DEFINE_GLOBAL_16&lt;/code&gt;, &lt;code&gt;CLOSURE_16&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Special&lt;/td&gt;
&lt;td&gt;&lt;code&gt;RESET_EPHEMERAL&lt;/code&gt;, &lt;code&gt;HALT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;&lt;strong&gt;100&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The growth came from three directions: the integer fast-path opcodes (8 new), the wide constant variants (5 new), and the concurrency/arena opcodes. Let me explain each.&lt;/p&gt;
&lt;h4&gt;Integer Fast Paths&lt;/h4&gt;
&lt;p&gt;Tight loops like &lt;code&gt;for i in 0..1000&lt;/code&gt; spend most of their time incrementing a counter and comparing it to a bound. The generic &lt;code&gt;OP_ADD&lt;/code&gt; has to check whether its operands are integers, floats, or strings (for concatenation), which adds branching overhead on every iteration.&lt;/p&gt;
&lt;p&gt;The integer fast-path opcodes — &lt;code&gt;OP_ADD_INT&lt;/code&gt;, &lt;code&gt;OP_SUB_INT&lt;/code&gt;, &lt;code&gt;OP_MUL_INT&lt;/code&gt;, &lt;code&gt;OP_LT_INT&lt;/code&gt;, &lt;code&gt;OP_LTEQ_INT&lt;/code&gt; — skip the type check entirely and operate directly on &lt;code&gt;int64_t&lt;/code&gt; values. &lt;code&gt;OP_INC_LOCAL&lt;/code&gt; and &lt;code&gt;OP_DEC_LOCAL&lt;/code&gt; handle the &lt;code&gt;i += 1&lt;/code&gt; and &lt;code&gt;i -= 1&lt;/code&gt; patterns as single-byte instructions that modify the stack slot in place, no push or pop required.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;OP_LOAD_INT8&lt;/code&gt; encodes a signed byte directly in the instruction stream. The integer &lt;code&gt;42&lt;/code&gt; becomes two bytes (&lt;code&gt;OP_LOAD_INT8&lt;/code&gt;, &lt;code&gt;0x2A&lt;/code&gt;) instead of a three-byte &lt;code&gt;OP_CONSTANT&lt;/code&gt; plus an eight-byte constant pool entry. Any integer in [-128, 127] gets this treatment.&lt;/p&gt;
&lt;h4&gt;Wide Constant Variants&lt;/h4&gt;
&lt;p&gt;The original instruction set used a single byte for constant pool indices, limiting each chunk to 256 constants. This is fine for most functions, but the self-hosted compiler — a 2,000-line Lattice program compiled as a single top-level script — blows past that limit easily.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;OP_CONSTANT_16&lt;/code&gt;, &lt;code&gt;OP_GET_GLOBAL_16&lt;/code&gt;, &lt;code&gt;OP_SET_GLOBAL_16&lt;/code&gt;, &lt;code&gt;OP_DEFINE_GLOBAL_16&lt;/code&gt;, and &lt;code&gt;OP_CLOSURE_16&lt;/code&gt; use two-byte big-endian indices, supporting up to 65,536 constants per chunk. The compiler automatically switches to wide variants when an index exceeds 255.&lt;/p&gt;
&lt;h3&gt;The Compiler&lt;/h3&gt;
&lt;p&gt;The bytecode compiler performs a single-pass walk over the AST. It maintains a chain of &lt;code&gt;Compiler&lt;/code&gt; structs linked via &lt;code&gt;enclosing&lt;/code&gt; pointers — one per function being compiled. Variable references resolve through three tiers: local (scan the current compiler's locals array), upvalue (recursively check enclosing compilers), and global (fall through to &lt;code&gt;OP_GET_GLOBAL&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Three compilation modes handle different use cases. &lt;code&gt;compile()&lt;/code&gt; is the standard file mode — it compiles all declarations and emits an implicit call to &lt;code&gt;main()&lt;/code&gt; if one is defined. &lt;code&gt;compile_module()&lt;/code&gt; is for imports — identical to &lt;code&gt;compile()&lt;/code&gt; but skips the auto-call. &lt;code&gt;compile_repl()&lt;/code&gt; preserves the last expression on the stack as the iteration's return value (displayed with &lt;code&gt;=&amp;gt;&lt;/code&gt; prefix) and keeps the known-enum table alive across REPL iterations so enum declarations persist.&lt;/p&gt;
&lt;p&gt;The compiler implements several optimizations during code generation. Binary operations on literal operands are folded at compile time — &lt;code&gt;3 + 4&lt;/code&gt; emits a single &lt;code&gt;OP_LOAD_INT8 7&lt;/code&gt; rather than two loads and an &lt;code&gt;OP_ADD&lt;/code&gt;. The pattern &lt;code&gt;x += 1&lt;/code&gt; is detected and emitted as the single-byte &lt;code&gt;OP_INC_LOCAL&lt;/code&gt;, which modifies the stack slot in place. And every statement is wrapped by &lt;code&gt;compile_stmt_reset()&lt;/code&gt;, which appends &lt;code&gt;OP_RESET_EPHEMERAL&lt;/code&gt; to trigger the ephemeral arena cleanup.&lt;/p&gt;
&lt;h3&gt;Computed Goto Dispatch&lt;/h3&gt;
&lt;p&gt;The dispatch loop now uses GCC/Clang's labels-as-values extension for computed goto:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#ifdef VM_USE_COMPUTED_GOTO&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;dispatch_table&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;OP_CONSTANT&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;lbl_OP_CONSTANT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;OP_NIL&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;lbl_OP_NIL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... all 100 entries&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="cp"&gt;#endif&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(;;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="cp"&gt;#ifdef VM_USE_COMPUTED_GOTO&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;dispatch_table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="cp"&gt;#endif&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;switch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Each opcode handler ends with a &lt;code&gt;goto *dispatch_table[READ_BYTE()]&lt;/code&gt; rather than breaking back to the top of the loop. This eliminates the switch statement's bounds check and branch table indirection, replacing it with a single indirect jump. The CPU's branch predictor sees different jump sites for different opcodes, which improves prediction accuracy compared to a single switch that all opcodes funnel through.&lt;/p&gt;
&lt;p&gt;On platforms without the extension, it falls back to a standard switch. The VM works correctly either way.&lt;/p&gt;
&lt;h3&gt;Pre-Compiled Concurrency&lt;/h3&gt;
&lt;p&gt;This is the change I'm most pleased with, because it solves the problem cleanly.&lt;/p&gt;
&lt;p&gt;Lattice has three concurrency primitives: &lt;code&gt;scope&lt;/code&gt; defines a concurrent region, &lt;code&gt;spawn&lt;/code&gt; launches a task within that region, and &lt;code&gt;select&lt;/code&gt; multiplexes over channels. In the tree-walker, these work by passing AST node pointers to spawned threads, which then evaluate the subtrees independently. The bytecode VM's original implementation did the same thing — &lt;code&gt;OP_SCOPE&lt;/code&gt; stored an &lt;code&gt;Expr*&lt;/code&gt; pointer in the constant pool and called the tree-walking evaluator at runtime.&lt;/p&gt;
&lt;p&gt;The solution is to compile each concurrent body into a standalone &lt;code&gt;Chunk&lt;/code&gt; at compile time. The compiler provides two helpers: &lt;code&gt;compile_sub_body()&lt;/code&gt; for statement blocks and &lt;code&gt;compile_sub_expr()&lt;/code&gt; for expressions. Each creates a fresh &lt;code&gt;Compiler&lt;/code&gt;, compiles the code into a new chunk, emits &lt;code&gt;OP_HALT&lt;/code&gt;, and stores the resulting chunk in the parent's constant pool as a &lt;code&gt;VAL_CLOSURE&lt;/code&gt; constant.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;OP_SCOPE&lt;/code&gt; uses variable-length encoding: a spawn count, a sync body chunk index, and one chunk index per spawn body. At runtime, the VM:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Exports locals&lt;/strong&gt; to the global environment using the &lt;code&gt;local_names&lt;/code&gt; debug table, so sub-chunks can access parent variables via &lt;code&gt;OP_GET_GLOBAL&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Runs the sync body&lt;/strong&gt; (if present) via a recursive &lt;code&gt;vm_run()&lt;/code&gt; call&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Spawns threads&lt;/strong&gt; for each spawn body, each running on a cloned VM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Joins&lt;/strong&gt; all threads and propagates errors&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;OP_SELECT&lt;/code&gt; similarly encodes per-arm metadata: flags, channel expression chunk index, body chunk index, and binding name index. The VM evaluates channel expressions, polls for readiness, and executes the winning arm.&lt;/p&gt;
&lt;p&gt;The key insight is that sub-chunks run as &lt;code&gt;FUNC_SCRIPT&lt;/code&gt; without lexical access to the parent's locals. Since they can't use upvalues to reach into the parent frame, the VM exports the parent's live locals into the global environment before running any sub-chunk, using a pushed scope that gets popped after all sub-chunks complete. This is slightly more expensive than true lexical capture, but it keeps the sub-chunks completely self-contained — no AST, no parent frame dependency, fully serializable.&lt;/p&gt;
&lt;h3&gt;Bytecode Serialization&lt;/h3&gt;
&lt;p&gt;With AST dependency eliminated, serialization becomes straightforward. The &lt;code&gt;.latc&lt;/code&gt; binary format starts with an 8-byte header:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;[4C 41 54 43]  magic: "LATC"
[01 00]        format version: 1
[00 00]        reserved
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The rest is a recursive chunk encoding: code length + bytecode bytes, line numbers for source mapping, typed constants (with a one-byte type tag for each), and local name debug info. Constants use seven type tags:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: right;"&gt;Tag&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Encoding&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;0&lt;/td&gt;
&lt;td&gt;Int&lt;/td&gt;
&lt;td&gt;8-byte signed LE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;1&lt;/td&gt;
&lt;td&gt;Float&lt;/td&gt;
&lt;td&gt;8-byte IEEE 754&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;td&gt;Bool&lt;/td&gt;
&lt;td&gt;1 byte&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;3&lt;/td&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;length-prefixed (u32 + bytes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;4&lt;/td&gt;
&lt;td&gt;Nil&lt;/td&gt;
&lt;td&gt;no payload&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;5&lt;/td&gt;
&lt;td&gt;Unit&lt;/td&gt;
&lt;td&gt;no payload&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;6&lt;/td&gt;
&lt;td&gt;Closure&lt;/td&gt;
&lt;td&gt;param count + variadic flag + recursive sub-chunk&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The &lt;code&gt;Closure&lt;/code&gt; tag is what makes this recursive: a function constant contains its parameter metadata followed by a complete serialized sub-chunk. Nested functions serialize naturally to arbitrary depth.&lt;/p&gt;
&lt;p&gt;The CLI integrates this cleanly:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Compile to .latc&lt;/span&gt;
clat&lt;span class="w"&gt; &lt;/span&gt;compile&lt;span class="w"&gt; &lt;/span&gt;input.lat&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;output.latc

&lt;span class="c1"&gt;# Run pre-compiled bytecode (auto-detects .latc suffix)&lt;/span&gt;
clat&lt;span class="w"&gt; &lt;/span&gt;output.latc

&lt;span class="c1"&gt;# Or compile and run in one step (the default)&lt;/span&gt;
clat&lt;span class="w"&gt; &lt;/span&gt;input.lat
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Loading validates magic bytes, checks the format version, and uses a bounds-checking &lt;code&gt;ByteReader&lt;/code&gt; that produces descriptive error messages for truncated or malformed inputs.&lt;/p&gt;
&lt;h3&gt;The Ephemeral Bump Arena&lt;/h3&gt;
&lt;p&gt;String concatenation is a common source of short-lived allocations. An expression like &lt;code&gt;"hello " + name + "!"&lt;/code&gt; creates intermediate strings that are immediately consumed and discarded. In a language with deep-clone-on-read semantics, these temporaries add up.&lt;/p&gt;
&lt;p&gt;The ephemeral bump arena is a simple optimization: string concatenation in &lt;code&gt;OP_ADD&lt;/code&gt; and &lt;code&gt;OP_CONCAT&lt;/code&gt; allocates into a bump arena (&lt;code&gt;vm-&amp;gt;ephemeral&lt;/code&gt;) instead of the general-purpose heap. These allocations are tagged with &lt;code&gt;REGION_EPHEMERAL&lt;/code&gt;, and &lt;code&gt;OP_RESET_EPHEMERAL&lt;/code&gt; — emitted by the compiler at every statement boundary — resets the arena in O(1), reclaiming all temporary strings at once.&lt;/p&gt;
&lt;p&gt;The tricky part is escape analysis. If a temporary string gets assigned to a global variable, stored in an array, or passed to a compiled closure, it needs to be promoted out of the ephemeral arena before the arena is reset. The VM handles this at specific escape points: &lt;code&gt;OP_DEFINE_GLOBAL&lt;/code&gt;, &lt;code&gt;OP_CALL&lt;/code&gt; (for compiled closures), &lt;code&gt;array.push&lt;/code&gt;, and &lt;code&gt;OP_SET_INDEX_LOCAL&lt;/code&gt;. Each of these calls &lt;code&gt;vm_promote_value()&lt;/code&gt;, which deep-clones the string to the regular heap if its region is ephemeral.&lt;/p&gt;
&lt;p&gt;The arena uses a page-based allocator with 4 KB pages. Resetting doesn't free pages — it just moves the bump pointer back to zero, so subsequent allocations reuse the same memory without any &lt;code&gt;malloc&lt;/code&gt;/&lt;code&gt;free&lt;/code&gt; overhead. The full design and safety proof are covered in a &lt;a href="https://tinycomputers.io/papers/lattice_arena_safety.pdf"&gt;companion paper&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Closures and the Storage Hack&lt;/h3&gt;
&lt;p&gt;The upvalue system hasn't changed architecturally since the &lt;a href="https://tinycomputers.io/posts/from-tree-walker-to-bytecode-vm-compiling-lattice.html"&gt;first VM post&lt;/a&gt; — it's still the Lua-inspired open/closed model where &lt;code&gt;ObjUpvalue&lt;/code&gt; structs start pointing into the stack and get closed (deep-cloned to the heap) when variables go out of scope. But the encoding grew to accommodate the wider instruction set.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;OP_CLOSURE&lt;/code&gt; uses variable-length encoding: a constant pool index for the function's compiled chunk, an upvalue count, and then &lt;code&gt;[is_local, index]&lt;/code&gt; byte pairs for each captured variable. &lt;code&gt;OP_CLOSURE_16&lt;/code&gt; uses a two-byte big-endian function index for chunks with more than 256 constants.&lt;/p&gt;
&lt;p&gt;The storage hack — repurposing &lt;code&gt;closure.body&lt;/code&gt; (NULL), &lt;code&gt;closure.native_fn&lt;/code&gt; (Chunk pointer), &lt;code&gt;closure.captured_env&lt;/code&gt; (ObjUpvalue** cast), and &lt;code&gt;region_id&lt;/code&gt; (upvalue count) — remains unchanged. A sentinel value &lt;code&gt;VM_NATIVE_MARKER&lt;/code&gt; distinguishes C-native functions from compiled closures:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#define VM_NATIVE_MARKER ((struct Expr **)(uintptr_t)0x1)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A closure with &lt;code&gt;body == NULL&lt;/code&gt; and &lt;code&gt;native_fn != NULL&lt;/code&gt; is either a C native (if &lt;code&gt;default_values == VM_NATIVE_MARKER&lt;/code&gt;) or a compiled bytecode function (otherwise). This avoids adding VM-specific fields to the &lt;code&gt;LatValue&lt;/code&gt; union, which matters when values are deep-cloned frequently.&lt;/p&gt;
&lt;h3&gt;The Self-Hosted Compiler&lt;/h3&gt;
&lt;p&gt;The file &lt;code&gt;compiler/latc.lat&lt;/code&gt; is a bytecode compiler written entirely in Lattice — approximately 2,060 lines that read &lt;code&gt;.lat&lt;/code&gt; source, produce bytecode, and write &lt;code&gt;.latc&lt;/code&gt; files using the same binary format as the C implementation:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Use the self-hosted compiler&lt;/span&gt;
clat&lt;span class="w"&gt; &lt;/span&gt;compiler/latc.lat&lt;span class="w"&gt; &lt;/span&gt;input.lat&lt;span class="w"&gt; &lt;/span&gt;output.latc

&lt;span class="c1"&gt;# Run the result&lt;/span&gt;
clat&lt;span class="w"&gt; &lt;/span&gt;output.latc
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The architecture mirrors the C compiler: lexing via the built-in &lt;code&gt;tokenize()&lt;/code&gt; function, a recursive-descent parser, single-pass code emission, and scope management with upvalue resolution. But Lattice's value semantics required some creative workarounds.&lt;/p&gt;
&lt;p&gt;The biggest constraint is that structs and maps are pass-by-value. In C, the compiler uses a &lt;code&gt;Compiler&lt;/code&gt; struct with mutable fields — local arrays, scope depth, a chunk pointer. In Lattice, passing a struct to a function creates a copy, so mutations in the callee don't propagate back. The self-hosted compiler works around this with parallel global arrays: &lt;code&gt;code&lt;/code&gt;, &lt;code&gt;constants&lt;/code&gt;, &lt;code&gt;c_lines&lt;/code&gt;, &lt;code&gt;local_names&lt;/code&gt;, &lt;code&gt;local_depths&lt;/code&gt;, &lt;code&gt;local_captured&lt;/code&gt;. Since array mutations via &lt;code&gt;.push()&lt;/code&gt; and index assignment are in-place (via &lt;code&gt;resolve_lvalue&lt;/code&gt;), global arrays work where structs don't.&lt;/p&gt;
&lt;p&gt;Nested function compilation uses explicit &lt;code&gt;save_compiler()&lt;/code&gt; / &lt;code&gt;restore_compiler()&lt;/code&gt; functions that copy all global arrays to local temporaries and back. It's verbose but correct. The Buffer type (used for serialization output) is also pass-by-value, so a global &lt;code&gt;ser_buf&lt;/code&gt; accumulates serialized bytes across function calls.&lt;/p&gt;
&lt;p&gt;Other language constraints: no &lt;code&gt;else if&lt;/code&gt; (requires &lt;code&gt;else { if ... }&lt;/code&gt; or &lt;code&gt;match&lt;/code&gt;), mandatory type annotations on function parameters (&lt;code&gt;fn foo(a: any)&lt;/code&gt;), and &lt;code&gt;test&lt;/code&gt; is a keyword so you can't use it as an identifier.&lt;/p&gt;
&lt;p&gt;The self-hosted compiler currently handles expressions, variables, functions with closures, control flow (if/else, while, loop, for, break, continue, match), structs, enums, exceptions, defer, string interpolation, and imports. Not yet implemented: concurrency primitives and advanced phase operations (react, bond, seed). The bootstrapping chain is:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;latc.lat → [C VM interprets] → output.latc → [C VM executes]
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Full self-hosting — where &lt;code&gt;latc.lat&lt;/code&gt; compiles itself — requires adding concurrency support and closing the remaining feature gaps.&lt;/p&gt;
&lt;h3&gt;The VM Execution Engine&lt;/h3&gt;
&lt;p&gt;The VM maintains a 4,096-slot value stack, a 256-frame call stack, an exception handler stack (64 entries), a defer stack (256 entries), a global environment, the open upvalue linked list, the ephemeral arena, and a module cache. A pre-allocated &lt;code&gt;fast_args[16]&lt;/code&gt; buffer avoids heap allocation for most native function calls.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;OP_CALL&lt;/code&gt; instruction discriminates three callee types. Native C functions (marked with &lt;code&gt;VM_NATIVE_MARKER&lt;/code&gt;) get the fast path — arguments are popped into &lt;code&gt;fast_args&lt;/code&gt;, the C function pointer is invoked, and the return value is pushed. No call frame allocated. Compiled closures get the full treatment: the VM promotes ephemeral values in the current frame (so the callee's &lt;code&gt;OP_RESET_EPHEMERAL&lt;/code&gt; doesn't invalidate the caller's temporaries), then pushes a new &lt;code&gt;CallFrame&lt;/code&gt; with the instruction pointer at byte 0 of the callee's chunk. Callable structs look up a constructor-named field and dispatch accordingly.&lt;/p&gt;
&lt;p&gt;Exception handling uses a handler stack. &lt;code&gt;OP_PUSH_EXCEPTION_HANDLER&lt;/code&gt; records the current IP, chunk, call frame index, and stack top. When &lt;code&gt;OP_THROW&lt;/code&gt; executes, the nearest handler is popped, the call frame and value stacks are unwound, the error value is pushed, and execution resumes at the handler's saved IP. Deferred blocks interact correctly — &lt;code&gt;OP_DEFER_RUN&lt;/code&gt; executes all defer entries registered at or above the current frame before the frame is popped by &lt;code&gt;OP_RETURN&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Iterators avoid closure allocation entirely. &lt;code&gt;OP_ITER_INIT&lt;/code&gt; converts a range or array into an internal iterator occupying two stack slots (collection + cursor index). &lt;code&gt;OP_ITER_NEXT&lt;/code&gt; advances the cursor, pushes the next element, or jumps to a specified offset when exhausted. The tree-walker used closure-based iterators for &lt;code&gt;for&lt;/code&gt; loops — the bytecode version is simpler and avoids the allocation.&lt;/p&gt;
&lt;h3&gt;Ref&amp;lt;T&amp;gt;: The Escape Hatch from Value Semantics&lt;/h3&gt;
&lt;p&gt;Everything described so far operates in a world where values are deep-cloned on every read. Maps are pass-by-value. Structs are pass-by-value. Pass a collection to a function and the function gets its own copy — mutations don't propagate back. This is correct and eliminates aliasing bugs, but it creates a real problem: how do you share mutable state when you actually need to?&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Ref&amp;lt;T&amp;gt;&lt;/code&gt; is the answer. It's a reference-counted shared mutable wrapper — the one type in Lattice that deliberately breaks value semantics:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;LatRef&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// the wrapped inner value&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;refcount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// reference count&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When a &lt;code&gt;Ref&lt;/code&gt; is cloned (which happens on every variable read, like everything else), the VM bumps the refcount and copies the pointer. It does &lt;em&gt;not&lt;/em&gt; deep-clone the inner value. Multiple copies of a &lt;code&gt;Ref&lt;/code&gt; share the same underlying &lt;code&gt;LatRef&lt;/code&gt;, so mutations through one are visible through all others. This is the explicit opt-in to reference semantics that the rest of the language avoids.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;let r = Ref::new([1, 2, 3])
let r2 = r              // shallow copy — same LatRef
r.push(4)
print(r2.get())          // [1, 2, 3, 4] — shared state
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The VM provides transparent proxying: &lt;code&gt;OP_INDEX&lt;/code&gt;, &lt;code&gt;OP_SET_INDEX&lt;/code&gt;, and &lt;code&gt;OP_INVOKE&lt;/code&gt; all check for &lt;code&gt;VAL_REF&lt;/code&gt; and delegate to the inner value. Indexing into a &lt;code&gt;Ref&amp;lt;Array&amp;gt;&lt;/code&gt; indexes the inner array. Calling &lt;code&gt;.push()&lt;/code&gt; on a &lt;code&gt;Ref&amp;lt;Array&amp;gt;&lt;/code&gt; mutates the inner array directly. At the language level, a Ref mostly behaves like the value it wraps — you just get shared mutation instead of isolated copies.&lt;/p&gt;
&lt;p&gt;Ref has its own methods — &lt;code&gt;get()&lt;/code&gt;/&lt;code&gt;deref()&lt;/code&gt; to clone the inner value out, &lt;code&gt;set(v)&lt;/code&gt; to replace it, &lt;code&gt;inner_type()&lt;/code&gt; to inspect the wrapped type — plus proxied methods for whatever the inner value supports (map &lt;code&gt;set&lt;/code&gt;/&lt;code&gt;get&lt;/code&gt;/&lt;code&gt;keys&lt;/code&gt;, array &lt;code&gt;push&lt;/code&gt;/&lt;code&gt;pop&lt;/code&gt;, etc.).&lt;/p&gt;
&lt;p&gt;The phase system applies to Refs too. Freezing a Ref blocks all mutation: &lt;code&gt;set()&lt;/code&gt;, &lt;code&gt;push()&lt;/code&gt;, index assignment all check &lt;code&gt;obj-&amp;gt;phase == VTAG_CRYSTAL&lt;/code&gt; and error with "cannot set on a frozen Ref." This makes frozen Refs safe to share across concurrent boundaries — they're immutable handles to immutable data.&lt;/p&gt;
&lt;p&gt;This introduces a third memory management strategy alongside the dual-heap (mark-and-sweep for fluid values, arenas for crystal values) and the ephemeral bump arena. Refs use reference counting: &lt;code&gt;ref_retain()&lt;/code&gt; on clone, &lt;code&gt;ref_release()&lt;/code&gt; on free, with the inner value freed when the count hits zero. It's a deliberate trade-off — reference counting is simple and deterministic, and since Refs are the uncommon case (most Lattice code uses value semantics), the lack of cycle collection hasn't been an issue in practice.&lt;/p&gt;
&lt;h3&gt;Validation&lt;/h3&gt;
&lt;p&gt;The VM is validated by &lt;strong&gt;815 tests&lt;/strong&gt; covering every feature: arithmetic, closures, upvalues, phase transitions, exception handling, defer, iterators, data structures, concurrency, modules, bytecode serialization, and the self-hosted compiler.&lt;/p&gt;
&lt;p&gt;All 815 tests pass under both normal compilation and AddressSanitizer builds (&lt;code&gt;make asan&lt;/code&gt;), which dynamically checks for heap buffer overflows, use-after-free, stack buffer overflows, and memory leaks. For a VM with manual memory management, upvalue lifetime tracking, and an ephemeral arena that reclaims memory at statement boundaries, ASan validation is essential.&lt;/p&gt;
&lt;p&gt;Both execution modes — bytecode VM (default) and tree-walker (&lt;code&gt;--tree-walk&lt;/code&gt;) — share the same test suite and produce identical results:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;make&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;test&lt;/span&gt;&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;# bytecode VM: 815 passed&lt;/span&gt;
make&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;TREE_WALK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;# tree-walker: 815 passed&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Feature parity is complete:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th style="text-align: center;"&gt;Tree-walker&lt;/th&gt;
&lt;th style="text-align: center;"&gt;Bytecode VM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Phase system (freeze/thaw/clone/forge)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Closures with upvalues&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exception handling (try/catch/throw)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Defer blocks&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pattern matching&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structs with methods&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enums with payloads&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arrays, maps, tuples, sets, buffers&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Iterators (for-in, ranges)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Module imports&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency (scope/spawn/select)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Channels&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase reactions/bonds/seeds&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Contracts (require/ensure)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Variable tracking (history)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bytecode serialization (.latc)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;---&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Computed goto dispatch&lt;/td&gt;
&lt;td style="text-align: center;"&gt;---&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ephemeral bump arena&lt;/td&gt;
&lt;td style="text-align: center;"&gt;---&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Specialized integer ops&lt;/td&gt;
&lt;td style="text-align: center;"&gt;---&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The last four rows are VM-only features that have no tree-walker equivalent.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The VM is feature-complete but not performance-optimized. The obvious next steps are register allocation to reduce stack traffic, type-specialized dispatch paths guided by runtime profiling, tail call optimization for recursive patterns, and constant pool deduplication across compilation units. Further out, the bytecode provides a natural intermediate representation for JIT compilation.&lt;/p&gt;
&lt;p&gt;On the self-hosting front, adding concurrency primitives to &lt;code&gt;latc.lat&lt;/code&gt; would close the gap to full self-compilation — where the Lattice compiler compiles itself, producing a &lt;code&gt;.latc&lt;/code&gt; file that can then compile other programs without the C implementation in the loop.&lt;/p&gt;
&lt;p&gt;The full technical details — including encoding diagrams, the complete opcode listing, compilation walkthroughs, and references to related work in Lua, CPython, YARV, and WebAssembly — are in the &lt;a href="https://tinycomputers.io/papers/lattice_vm.pdf"&gt;research paper&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The source code is at &lt;a href="https://baud.rs/fIe3gx"&gt;github.com/ajokela/lattice&lt;/a&gt;, and the project site is at &lt;a href="https://baud.rs/bwvnYT"&gt;lattice-lang.org&lt;/a&gt;.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;git clone https://github.com/ajokela/lattice.git
cd lattice &amp;amp;&amp;amp; make
./clat
&lt;/pre&gt;&lt;/div&gt;</description><category>bytecode</category><category>c</category><category>closures</category><category>compilers</category><category>concurrency</category><category>interpreters</category><category>language design</category><category>lattice</category><category>phase system</category><category>programming languages</category><category>self-hosting</category><category>serialization</category><category>upvalues</category><category>virtual machine</category><guid>https://tinycomputers.io/posts/a-stack-based-bytecode-vm-for-lattice.html</guid><pubDate>Fri, 20 Feb 2026 18:00:00 GMT</pubDate></item><item><title>Rue: Steve Klabnik's AI-Assisted Experiment in Memory Safety Without the Pain</title><link>https://tinycomputers.io/posts/rue-programming-language-review.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/rue-programming-language-review_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;30 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;The Pitch&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/rue-programming-language/rue-lang-dev-homepage.png" alt="The rue-lang.dev homepage showing the tagline 'Exploring memory safety that's easier to use' with feature cards for Early Stage, Familiar Syntax, and Native Compilation" style="float: right; max-width: 50%; margin: 0 0 1em 1.5em; border-radius: 4px;"&gt;&lt;/p&gt;
&lt;p&gt;Every few years, a new programming language appears that promises to fix what its predecessors got wrong. Most quietly vanish. A handful become infrastructure. What makes &lt;a href="https://baud.rs/rue-lang"&gt;Rue&lt;/a&gt; interesting is not that it promises to replace Rust—its creator explicitly says it won't—but that the person making it is one of the most qualified people alive to attempt the experiment.&lt;/p&gt;
&lt;p&gt;Steve Klabnik spent thirteen years in the Rust ecosystem. He co-authored &lt;a href="https://baud.rs/rust-book-nostarch"&gt;&lt;em&gt;The Rust Programming Language&lt;/em&gt;&lt;/a&gt;, the official book that has introduced more people to Rust than any other resource. He served on the Rust core team, led its documentation team, and worked at Mozilla during Rust's formative years. He later joined Oxide Computer Company, where he contributed to a scratch-built operating system written in Rust. Before all of that, he was a prolific contributor to Ruby on Rails and authored &lt;a href="https://baud.rs/rust-ruby-learn"&gt;&lt;em&gt;Rust for Rubyists&lt;/em&gt;&lt;/a&gt;, one of the earliest attempts to bridge the Ruby and Rust communities.&lt;/p&gt;
&lt;p&gt;If anyone has earned the right to look at Rust and say "I think we can do this differently," it's Klabnik. And that's exactly what Rue is: a research project asking whether memory safety without garbage collection can be achieved with less cognitive overhead than Rust demands.&lt;/p&gt;
&lt;p&gt;But Rue is also something else entirely—an experiment in whether a single person, assisted by an AI, can build a programming language from scratch without funding, without a team, and without a multi-year timeline. The compiler is written in Rust, designed by Klabnik, and implemented primarily by Claude, Anthropic's AI assistant. The project went from nothing to a working compiler in roughly two weeks, producing approximately 100,000 lines of Rust code across 700+ commits.&lt;/p&gt;
&lt;h3&gt;The Person Behind It&lt;/h3&gt;
&lt;p&gt;Understanding Rue requires understanding Klabnik's trajectory through programming language communities. He entered professional programming through Ruby, becoming one of the most prolific open-source contributors to the Rails ecosystem in the early 2010s. His final commits to Rails landed in late 2013, around the time he discovered Rust 0.5 during a Christmas visit to his parents' house in rural Pennsylvania.&lt;/p&gt;
&lt;p&gt;What followed was a thirteen-year involvement with Rust that few can match. Beyond the official book (now in its second edition, co-authored with Carol Nichols and published by No Starch Press), Klabnik shaped how the Rust community communicates, documents, and teaches. His work on Rust's documentation established patterns that other language communities later adopted. At Mozilla, he helped shepherd Rust through its 1.0 release. At Oxide, he worked on systems software at the lowest levels the language supports.&lt;/p&gt;
&lt;p&gt;Klabnik describes himself as having been an AI skeptic until 2025, when he found that large language models had crossed a threshold of genuine usefulness for programming. The shift was dramatic enough that he now writes most of his code with AI assistance. Rue is the product of that conversion—not just a language experiment but a methodology experiment, testing what happens when a deeply experienced language designer directs an AI to implement his vision.&lt;/p&gt;
&lt;p&gt;The name follows a deliberate pattern from his career: Ruby, Rust, Rue. He notes three associations—"rue the day" (the negative connotation, a nod to the skepticism any new language faces), the rue plant (paralleling Rust's fungal connotation), and brevity.&lt;/p&gt;
&lt;h3&gt;What Rue Is Today&lt;/h3&gt;
&lt;p&gt;Let's be direct about the current state: Rue is a version 0.1.0 research project. The website prominently warns that it is "not ready for real use" and to "expect bugs, missing features, and breaking changes." Klabnik himself has described the language as "still very janky" and cautions against reading too deeply into current implementation details. The &lt;a href="https://baud.rs/rue-readme"&gt;GitHub README&lt;/a&gt; puts it plainly: "Not everything in here is good, or accurate, or anything: I'm just messing around."&lt;/p&gt;
&lt;p&gt;With that caveat firmly in place, here's what exists.&lt;/p&gt;
&lt;p&gt;Rue compiles to native machine code targeting x86-64 and ARM64. There is no virtual machine, no interpreter, and no garbage collector. The compiler is written in Rust (95.8% of the repository) and produces binaries directly. It builds using &lt;a href="https://baud.rs/buck2-build"&gt;Buck2&lt;/a&gt;, Meta's build system, and the project includes a specification, a ten-chapter tutorial, a blog, and a benchmark dashboard that tracks compilation time, memory usage, and binary size across Linux and macOS.&lt;/p&gt;
&lt;p&gt;The language itself is statically typed with type inference. Variables are declared with &lt;code&gt;let&lt;/code&gt; (immutable by default) or &lt;code&gt;let mut&lt;/code&gt; (mutable), following Rust's convention. The type system includes signed and unsigned integers (&lt;code&gt;i8&lt;/code&gt; through &lt;code&gt;i64&lt;/code&gt;, &lt;code&gt;u8&lt;/code&gt; through &lt;code&gt;u64&lt;/code&gt;), booleans, fixed-size arrays, structs, and enums. There are no strings. There is no standard library. There are no traits, no closures, no iterators, no modules, no error handling beyond panics, and no heap allocation. Generics exist behind a &lt;code&gt;--preview comptime&lt;/code&gt; flag—more on that below—but they are not yet part of the stable language.&lt;/p&gt;
&lt;p&gt;Read that list of absences again. It is long. And it is honest.&lt;/p&gt;
&lt;h3&gt;The Type System and Memory Model&lt;/h3&gt;
&lt;p&gt;Rue's central technical bet is that affine types with mutable value semantics can deliver memory safety more intuitively than Rust's borrow checker and lifetime annotations.&lt;/p&gt;
&lt;p&gt;In practice, this means values in Rue are moved by default. When you assign a struct to a new variable or pass it to a function, the original becomes invalid—the compiler enforces single ownership at the type level. This is Rust's move semantics without the escape hatches that references and borrowing provide. If you want a type to be copied instead of moved, you annotate the struct definition with &lt;code&gt;@copy&lt;/code&gt;, which enables value duplication.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/rue-programming-language/rue-borrow-inout.png" alt="Rue's Borrow and Inout tutorial page showing the inout keyword at the call site with comparisons to Go and Python where mutation is invisible" style="float: right; max-width: 50%; margin: 0 0 1em 1.5em; border-radius: 4px;"&gt;&lt;/p&gt;
&lt;p&gt;Where Rust provides shared references (&lt;code&gt;&amp;amp;T&lt;/code&gt;) and mutable references (&lt;code&gt;&amp;amp;mut T&lt;/code&gt;) governed by the borrow checker's aliasing rules, Rue provides two simpler mechanisms: &lt;code&gt;borrow&lt;/code&gt; and &lt;code&gt;inout&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A &lt;code&gt;borrow&lt;/code&gt; parameter grants read-only access to a value without copying it. An &lt;code&gt;inout&lt;/code&gt; parameter grants temporary mutable access—the function can modify the value, and changes persist after the function returns. Critically, both are marked at the call site, not just in the function signature. When reading Rue code, you can immediately see which arguments a function will modify:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sort(inout values, borrow config);
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The tutorial makes the design motivation explicit: "When you read code, you want to understand what it does without tracing through every function." In Go or Python, a function receiving a slice or object might mutate it invisibly. In Rue, mutation is always syntactically visible where the function is called.&lt;/p&gt;
&lt;p&gt;This is genuinely appealing. One of Rust's persistent pain points is that understanding what a function does to its arguments requires reading the signature carefully—and even then, interior mutability patterns like &lt;code&gt;RefCell&lt;/code&gt; can make the signature misleading. Rue's approach trades expressiveness for legibility.&lt;/p&gt;
&lt;p&gt;The trade-off, however, is severe. Without general references, you cannot build self-referential data structures. Linked lists, trees, and graphs—the bread and butter of systems programming data structures—have no obvious implementation path in current Rue. The Hacker News discussion around the language's announcement surfaced this concern immediately: without references that can be stored in data structures, you cannot implement iterators that borrow from containers. This is not a missing feature that will be added later; it is a fundamental consequence of the design choice.&lt;/p&gt;
&lt;p&gt;Klabnik has acknowledged this directly: "There is going to inherently be some expressiveness loss. There is no silver bullet." The question Rue poses is whether the expressiveness that remains is sufficient for a useful class of programs. The answer today is: we don't know yet.&lt;/p&gt;
&lt;h3&gt;What the Tutorial Reveals&lt;/h3&gt;
&lt;p&gt;The ten-chapter tutorial walks from installation through a capstone quicksort implementation. It covers variables, types, functions, control flow, arrays, structs, enums, and the borrow/inout system. The final chapter implements partitioning and recursive sorting, demonstrating how the language's features compose.&lt;/p&gt;
&lt;p&gt;Working through it reveals a language that feels like a simplified Rust with the hard parts surgically removed. Pattern matching on enums is exhaustive, as in Rust. Structs have named fields and move semantics. Arrays are fixed-size with runtime bounds checking that panics on out-of-bounds access. Integers overflow-check by default.&lt;/p&gt;
&lt;p&gt;The syntax is deliberately familiar to anyone who has written Rust, Go, or C. Function signatures look like Rust without lifetimes:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nt"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;partition&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nt"&gt;arr&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;inout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cp"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="cp"&gt;]&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;low&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;u64&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;high&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;u64&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;u64&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Control flow uses &lt;code&gt;if&lt;/code&gt;/&lt;code&gt;else&lt;/code&gt; and &lt;code&gt;while&lt;/code&gt; without parentheses around conditions. There is no &lt;code&gt;for&lt;/code&gt; loop—iteration requires manual index management with &lt;code&gt;while&lt;/code&gt;, which feels like a notable omission even for an early-stage language.&lt;/p&gt;
&lt;p&gt;What's conspicuously absent from the tutorial is any program that does something useful beyond computation. There is no file I/O, no string handling, no network access, no memory allocation. The only output mechanism is &lt;code&gt;@dbg()&lt;/code&gt;, a built-in debug print. The actual &lt;code&gt;hello.rue&lt;/code&gt; in the examples directory is revealing: it is &lt;code&gt;fn main() -&amp;gt; i32 { 42 }&lt;/code&gt;. There are no strings, so there is no "Hello, World." The fizzbuzz example is equally telling—it returns integers (1 for Fizz, 2 for Buzz, 3 for FizzBuzz) because it cannot print the words. The specification explicitly excludes coverage of a standard library "when one exists."&lt;/p&gt;
&lt;h3&gt;The AI Development Story&lt;/h3&gt;
&lt;p&gt;The most widely discussed aspect of Rue is not its type system but how it was built. Klabnik's blog posts describe a process where he provided architectural direction and design decisions while Claude wrote the vast majority of the implementation code. The project's blog posts are co-credited to both Klabnik and Claude, with some posts authored solely by the AI.&lt;/p&gt;
&lt;p&gt;The timeline is striking. The first commit landed on December 15, 2025. By December 22—one week later—the compiler could handle basic types, structs, control flow, and had accumulated 130 commits. By early January 2026, the project had 777 specification tests across two platforms and the compiler had grown to handle enums, pattern matching, arrays, and the borrow/inout system.&lt;/p&gt;
&lt;p&gt;The repository now contains over 700 commits from 4 contributors, with 1,100+ GitHub stars. For a personal hobby project by a well-known developer, this represents meaningful interest—though not the kind of momentum that suggests organic community adoption.&lt;/p&gt;
&lt;p&gt;This development model raises legitimate questions. When Claude writes the compiler and Claude writes the blog posts, what does it mean for a human to have "designed" the language? Klabnik's answer is that he makes all architectural and design decisions—what features to include, how the type system works, what trade-offs to accept—while the AI handles implementation. He compares it to an architect and a construction crew: the architect doesn't lay bricks, but the building reflects the architect's vision.&lt;/p&gt;
&lt;p&gt;The analogy is imperfect. A construction crew doesn't suggest design changes mid-build based on patterns learned from every other building ever constructed. But the broader point—that directing implementation is itself a form of authorship—is reasonable, and Klabnik's thirteen years of language implementation experience give him credibility that a less experienced designer directing an AI would lack.&lt;/p&gt;
&lt;h3&gt;What the Source Code Reveals&lt;/h3&gt;
&lt;p&gt;The documentation and tutorial tell one story. The &lt;a href="https://baud.rs/eWuAO3"&gt;GitHub repository&lt;/a&gt; tells a more nuanced one. Digging into the examples directory and compiler crates reveals features in various stages of development that the official tutorial has not yet caught up with.&lt;/p&gt;
&lt;p&gt;Most notably, generics exist as a preview feature. The &lt;code&gt;examples/generics.rue&lt;/code&gt; file demonstrates Zig-style comptime type parameters:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kd"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;comptime&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;bigger&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;comptime T: type&lt;/code&gt; syntax tells the compiler to monomorphize—creating specialized versions (&lt;code&gt;max__i32&lt;/code&gt;, &lt;code&gt;max__bool&lt;/code&gt;, etc.) for each concrete type used at call sites. This is a meaningful step beyond what the tutorial covers, and it follows the Zig model rather than Rust's trait-bounded generics. Whether this approach will scale to real-world code—and whether it can support the kind of type-level abstraction that traits and interfaces provide—remains to be seen. You must compile with &lt;code&gt;rue --preview comptime&lt;/code&gt; to access this feature, signaling that it is experimental even by Rue's standards.&lt;/p&gt;
&lt;p&gt;The compiler architecture itself is surprisingly mature for a project of this age. The repository contains 18 separate crates: a lexer, parser, intermediate representation (&lt;code&gt;rue-rir&lt;/code&gt;), an abstract intermediate representation (&lt;code&gt;rue-air&lt;/code&gt;), control flow graph analysis (&lt;code&gt;rue-cfg&lt;/code&gt;), code generation, linking, a fuzzer, a spec test runner, and a VS Code extension. This is not a toy single-file compiler—it is a modular pipeline that reflects real compiler engineering, even if the language it compiles remains minimal.&lt;/p&gt;
&lt;p&gt;Beyond generics, the gaps are still significant. There are no traits or interfaces, so there is no polymorphism beyond what enums and comptime monomorphization provide. There are no closures or first-class functions. There is no error handling mechanism—functions can panic, but there is no &lt;code&gt;Result&lt;/code&gt; type or equivalent. There are no modules or visibility controls. There are no methods on types. There is no string type.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;unchecked&lt;/code&gt; keyword provides an escape hatch similar to Rust's &lt;code&gt;unsafe&lt;/code&gt;, allowing low-level memory operations while keeping such code "visibly separate from normal safe code." The specification includes chapters on unchecked code syntax and unchecked intrinsics, suggesting that Klabnik recognizes the eventual need for the kind of low-level access that Rust provides through its unsafe system.&lt;/p&gt;
&lt;p&gt;The performance page tracks compilation metrics—time, memory, binary size—across x86-64 and ARM64 on both Linux and macOS, but provides no runtime benchmarks comparing Rue's generated code to C, Rust, or Go. This is understandable for a research project, but it means we cannot evaluate whether Rue's native compilation delivers competitive performance.&lt;/p&gt;
&lt;h3&gt;The Fundamental Question&lt;/h3&gt;
&lt;p&gt;Every new systems language must answer the same question: what programs can I write in this language that I couldn't write (or couldn't write as well) in an existing one?&lt;/p&gt;
&lt;p&gt;For Rust, the answer was clear from early on: memory-safe systems programs without garbage collection pauses. For Go, it was concurrent network services with fast compilation. For Zig, it was a C replacement with better defaults and comptime.&lt;/p&gt;
&lt;p&gt;Rue's answer, as it stands today, is: nothing. Not yet. You cannot write a program in Rue that you couldn't write more easily in any mainstream language, because Rue cannot perform I/O, allocate memory, manipulate strings, or interact with the operating system.&lt;/p&gt;
&lt;p&gt;This is not a criticism so much as a statement of development stage. The interesting question is what Rue's answer &lt;em&gt;could&lt;/em&gt; become if development continues. The borrow/inout model is genuinely simpler than Rust's borrow checker for the cases it handles. Generics are already emerging through the comptime preview. If Rue can stabilize them, grow a heap allocator, a string type, and enough standard library to write real programs—without reintroducing the complexity it was designed to avoid—it could serve programmers who find Rust's learning curve prohibitive but need more safety than C or Go provide.&lt;/p&gt;
&lt;p&gt;That is a big "if." History is littered with languages that simplified an existing language's hard parts and then discovered that the hard parts existed for good reasons. Rust's borrow checker is complex because the problems it solves are complex. Removing it and replacing it with a simpler system necessarily means either accepting less expressiveness or finding a genuinely novel solution that the Rust team somehow missed in over a decade of research.&lt;/p&gt;
&lt;p&gt;Klabnik is explicit that he doesn't claim to have found such a solution. Rue is a research project exploring a design space, not a product claiming to have mapped it.&lt;/p&gt;
&lt;h3&gt;Who Should Pay Attention&lt;/h3&gt;
&lt;p&gt;If you're looking for a language to write software in today, Rue is not it. The project's own documentation says so clearly and repeatedly.&lt;/p&gt;
&lt;p&gt;If you're interested in programming language design, Rue is worth following for two reasons. First, the borrow/inout model is a clean articulation of an alternative to Rust's reference system, and watching it encounter real-world requirements will be educational regardless of whether it succeeds. Second, the AI-assisted development methodology is itself a data point in the ongoing question of what role AI can play in building complex software systems.&lt;/p&gt;
&lt;p&gt;If you're a Rust programmer who has ever thought "there must be a simpler way to express this," Rue represents one concrete exploration of what "simpler" might look like—and what it costs. The language makes the trade-offs visible in a way that abstract arguments about borrow checker complexity do not.&lt;/p&gt;
&lt;p&gt;And if you're Steve Klabnik, messing around with a hobby project after thirteen years of working on someone else's language, Rue looks like exactly the kind of thing a deeply experienced language person should be doing with their evenings and weekends.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Rue is best understood as three things simultaneously: a technical experiment in memory safety ergonomics, a methodological experiment in AI-assisted language development, and a personal project by someone with unusually deep expertise in the problem space. As a technical experiment, it has articulated a clean alternative to Rust's borrow checker that is worth studying even if it proves too restrictive for general use. As a methodological experiment, the speed of development—700+ commits and a working multi-platform compiler in weeks—is genuinely remarkable, though the long-term maintainability of AI-generated compiler code remains unproven. As a personal project, it is honest about its limitations in a way that many language announcements are not.&lt;/p&gt;
&lt;p&gt;The language cannot do anything useful yet. It may never be able to. But the questions it asks—can memory safety be simpler? can one person with an AI build a compiler? what happens when you remove the borrow checker and try something else?—are worth asking. And the person asking them has spent over a decade earning the credibility to make the attempt interesting rather than naive.&lt;/p&gt;
&lt;p&gt;Watch the &lt;a href="https://baud.rs/eWuAO3"&gt;GitHub repository&lt;/a&gt;. Read the &lt;a href="https://baud.rs/8joWb8"&gt;specification&lt;/a&gt;. Try the &lt;a href="https://baud.rs/bUJS2D"&gt;tutorial&lt;/a&gt; if you're curious about what a post-Rust systems language might feel like. Just don't write anything important in it yet.&lt;/p&gt;</description><category>ai-assisted development</category><category>claude</category><category>compilers</category><category>language design</category><category>memory safety</category><category>programming languages</category><category>rue</category><category>rust</category><category>steve klabnik</category><category>systems programming</category><guid>https://tinycomputers.io/posts/rue-programming-language-review.html</guid><pubDate>Fri, 20 Feb 2026 12:00:00 GMT</pubDate></item><item><title>Review of "Crafting Interpreters" by Robert Nystrom</title><link>https://tinycomputers.io/posts/review-of-crafting-interpreters-by-robert-nystrom.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/review-of-crafting-interpreters-by-robert-nystrom_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;25 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/crafting-interpreters/cover-0001.png" alt="Crafting Interpreters by Robert Nystrom — cover art depicting a hand-drawn mountain with paths representing compilation stages from source code to machine code" class="book-cover-image" style="float: right; max-width: 300px; margin: 0 0 1em 1.5em;"&gt;&lt;/p&gt;
&lt;p&gt;There is a particular category of programming book that transcends its subject matter, becoming not just a reference but an experience. &lt;a href="https://baud.rs/crafting-interp"&gt;"Crafting Interpreters" by Robert Nystrom&lt;/a&gt; belongs firmly in this category. Originally published in 2021 after six years of development, the book tackles what many programmers consider one of the most intimidating topics in computer science—building a programming language from scratch—and makes it not just accessible but genuinely enjoyable.&lt;/p&gt;
&lt;p&gt;Nystrom is no stranger to technical writing that connects with practitioners. His earlier book &lt;a href="https://baud.rs/game-dev-patterns"&gt;"Game Programming Patterns"&lt;/a&gt; demonstrated a talent for explaining complex software concepts through clear prose and practical examples. With "Crafting Interpreters," he applies that same skill to language implementation, a domain traditionally guarded by &lt;a href="https://baud.rs/compilers-dragon"&gt;dense academic texts&lt;/a&gt; and &lt;a href="https://baud.rs/backus-naur-form"&gt;formal notation&lt;/a&gt; that sends most working programmers running.&lt;/p&gt;
&lt;p&gt;The book's central premise is both ambitious and elegant: build the same programming language twice. First as a tree-walk interpreter in Java (called &lt;code&gt;jlox&lt;/code&gt;), then as a bytecode virtual machine in C (called &lt;code&gt;clox&lt;/code&gt;). This dual implementation strategy isn't just a structural gimmick. It serves a deep pedagogical purpose, allowing readers to first grasp the conceptual architecture of language processing in a high-level language before rebuilding everything from raw memory and pointer arithmetic. The result is a book that manages to teach compiler theory, language design, software architecture, and low-level systems programming simultaneously.&lt;/p&gt;
&lt;p&gt;The entire text is freely available at &lt;a href="https://baud.rs/crafting-interpreters-site"&gt;craftinginterpreters.com&lt;/a&gt;, which speaks to Nystrom's commitment to making this knowledge widely accessible. The physical edition, published with care and featuring hand-drawn illustrations throughout, is worth owning for anyone who works through the material.&lt;/p&gt;
&lt;h3&gt;The Language: Lox&lt;/h3&gt;
&lt;p&gt;Before diving into implementation, Nystrom introduces Lox, the language both interpreters will execute. Lox is a dynamically typed, garbage-collected language with C-family syntax that supports first-class functions, closures, and class-based object orientation with single inheritance. It is deliberately modest in scope—no arrays, no module system, no standard library to speak of—but this restraint is precisely the point.&lt;/p&gt;
&lt;p&gt;Every feature in Lox exists because it teaches something important about language implementation. Dynamic typing means building a runtime type system. Garbage collection means understanding memory management at the deepest level. Closures require wrestling with variable capture and lifetime semantics. Classes and inheritance demand method resolution and the vtable-like dispatch mechanisms that underpin most object-oriented languages. Lox is small enough to implement in a book but complex enough that implementing it forces the reader to confront every major challenge in language design.&lt;/p&gt;
&lt;p&gt;The choice of a custom language rather than a subset of an existing one is significant. It frees Nystrom from having to explain why certain features are omitted or work differently than readers might expect. Lox is exactly what it needs to be, nothing more.&lt;/p&gt;
&lt;h3&gt;Part II: The Tree-Walk Interpreter (&lt;code&gt;jlox&lt;/code&gt;)&lt;/h3&gt;
&lt;p&gt;The first implementation spans thirteen chapters and builds a complete interpreter in Java. Nystrom begins where every language implementation must—with scanning. The scanner chapter walks through converting raw source text into tokens, handling string literals, numbers, keywords, and the inevitable edge cases that make lexical analysis more interesting than it first appears.&lt;/p&gt;
&lt;p&gt;From there, the book moves into parsing, where Nystrom introduces recursive descent parsing with a clarity that makes the technique feel almost obvious in hindsight. Rather than reaching for parser generators like &lt;a href="https://baud.rs/yacc-parser"&gt;YACC&lt;/a&gt; or &lt;a href="https://baud.rs/antlr-parser"&gt;ANTLR&lt;/a&gt;, every line of the parser is written by hand. This decision is characteristic of the book's philosophy: no black boxes, no magic, no dependencies. The reader understands every piece because the reader built every piece.&lt;/p&gt;
&lt;p&gt;The chapters on expression evaluation and statement execution establish the runtime model, but the book truly hits its stride in the chapters on scope and environments. Nystrom's explanation of lexical scoping—using a chain of environment objects that form what he calls a "cactus stack"—is one of the clearest treatments of this topic in any programming text. The hand-drawn illustration of nested environments, with their parent pointers threading back through enclosing scopes, communicates in a single image what paragraphs of formal specification struggle to convey.&lt;/p&gt;
&lt;p&gt;Functions and closures represent the first major conceptual challenge, and Nystrom handles them with characteristic patience. The problem of captured variables—where a closure must hold onto variables from an enclosing scope that may have already returned—is presented as a puzzle to be solved rather than a rule to be memorized. The resolver pass that performs static analysis to determine variable binding is introduced as a natural response to a concrete bug, not as an abstract compiler phase.&lt;/p&gt;
&lt;p&gt;The object-oriented chapters add classes, methods, constructors, inheritance, and super expressions. By the time &lt;code&gt;jlox&lt;/code&gt; is complete, the reader has built a language implementation capable of running recursive algorithms, managing object hierarchies, and handling the scoping rules that trip up even experienced programmers in production languages.&lt;/p&gt;
&lt;p&gt;What makes this section exceptional is Nystrom's willingness to show the design process, not just the final design. When a naive approach creates a bug or performance problem, the reader sees it happen and participates in fixing it. This iterative development style mirrors how real software is built and teaches debugging intuition alongside language implementation.&lt;/p&gt;
&lt;h3&gt;Part III: The Bytecode Virtual Machine (&lt;code&gt;clox&lt;/code&gt;)&lt;/h3&gt;
&lt;p&gt;If Part II is the approachable on-ramp, Part III is where the book reveals its true ambition. Across seventeen chapters, Nystrom rebuilds everything in C, this time compiling Lox to bytecode and executing it on a stack-based virtual machine. The motivation is made concrete early: &lt;code&gt;jlox&lt;/code&gt; takes 72 seconds to compute the 40th Fibonacci number recursively, while C can do it in half a second. The bytecode VM will close that gap dramatically.&lt;/p&gt;
&lt;p&gt;The transition from Java to C is itself educational. Readers who have grown comfortable with Java's automatic memory management, dynamic arrays, and hash maps must now implement all of these from scratch. Nystrom builds a dynamic array type, a hash table, and ultimately a mark-sweep garbage collector, all in service of the language implementation. These data structures are not taught in isolation—they emerge because the VM needs them.&lt;/p&gt;
&lt;p&gt;The chunk and instruction design chapters teach the reader to think about data representation at the byte level. Each bytecode instruction is a single byte, followed by operands that encode constants, variable slots, or jump offsets. The disassembler that Nystrom builds alongside the VM is a thoughtful touch, providing a debugging tool that makes the otherwise invisible bytecode tangible.&lt;/p&gt;
&lt;p&gt;The single-pass compiler that replaces &lt;code&gt;jlox&lt;/code&gt;'s separate parsing and resolution phases is a masterclass in practical compiler construction. Nystrom uses &lt;a href="https://baud.rs/parser-techniques"&gt;Pratt parsing&lt;/a&gt; for expressions—a technique he explains with such clarity that this chapter alone has become a widely referenced resource for anyone implementing expression parsers. The Pratt parser's elegant handling of precedence and associativity through a simple table of parsing functions is one of those ideas that, once understood, feels like it should have been obvious all along.&lt;/p&gt;
&lt;p&gt;The chapters on closures in &lt;code&gt;clox&lt;/code&gt; deserve special mention. Where &lt;code&gt;jlox&lt;/code&gt; could lean on Java's garbage collector and object references to capture variables, &lt;code&gt;clox&lt;/code&gt; must solve the "upvalue" problem explicitly. Nystrom introduces the concept of upvalues—runtime objects that represent captured variables—and walks through the mechanism by which stack-allocated locals are "closed over" and moved to the heap when their enclosing function returns. The complexity of this implementation, managed through careful incremental development, demonstrates why closures are considered one of the hardest features to implement correctly in a bytecode VM.&lt;/p&gt;
&lt;p&gt;The garbage collection chapter is the book's peak of systems programming depth. Nystrom implements a mark-sweep collector, explaining reachability, root sets, and the tricolor abstraction. The treatment is practical rather than theoretical—the reader sees exactly when collection triggers, how objects are traced, and why the collector must handle the subtle case of the VM itself allocating memory during collection (which could invalidate pointers being traced). The self-adjusting heap threshold that balances collection frequency against memory usage is a detail that separates a textbook GC from one that works in practice.&lt;/p&gt;
&lt;h3&gt;Writing Style and Presentation&lt;/h3&gt;
&lt;p&gt;Nystrom's prose is the book's secret weapon. Technical writing about compilers tends toward one of two failure modes: impenetrable formalism or hand-waving oversimplification. Nystrom avoids both. His writing is conversational without being sloppy, precise without being dry. Footnotes contain genuine wit. Asides acknowledge the reader's likely confusion at exactly the moments when confusion is most natural.&lt;/p&gt;
&lt;p&gt;The hand-drawn illustrations scattered throughout the book serve a purpose beyond aesthetics. They signal that this is a personal, crafted work rather than a mass-produced textbook. The diagrams of memory layouts, parse trees, and stack states during execution are clearer than their machine-generated equivalents in most compiler texts, partly because they include exactly the detail needed and nothing more.&lt;/p&gt;
&lt;p&gt;The "Design Note" sections that appear between chapters are mini-essays on language design philosophy—why dynamic typing exists, what makes a feature "elegant," how language designers balance expressiveness against implementation complexity. These sections transform the book from a pure implementation guide into something closer to a meditation on programming language design as a creative discipline.&lt;/p&gt;
&lt;h3&gt;Strengths&lt;/h3&gt;
&lt;p&gt;The book's greatest achievement is making compiler construction feel like a natural extension of everyday programming rather than a specialized academic pursuit. By avoiding formal grammars, &lt;a href="https://baud.rs/automata-theory-book"&gt;automata theory&lt;/a&gt;, and the mathematical notation that dominates traditional compiler texts, Nystrom demonstrates that you don't need a PhD to build a working language implementation.&lt;/p&gt;
&lt;p&gt;The dual-implementation approach pays dividends throughout. Concepts that are murky in one implementation become clear in the other. The tree-walk interpreter makes the abstract concepts tangible; the bytecode VM reveals the performance and engineering considerations that production language implementations face. Together, they provide a stereoscopic view of language implementation that neither could achieve alone.&lt;/p&gt;
&lt;p&gt;The no-dependency philosophy deserves praise. There is no lexer generator, no parser generator, no framework, no library. Every line of code in both implementations is written in the book and understood by the reader. This means that upon completion, the reader owns their understanding completely—there is no mysterious tool doing critical work behind the scenes.&lt;/p&gt;
&lt;p&gt;The incremental development style produces a book that is remarkably difficult to get lost in. Each chapter begins with working code and ends with working code. The reader is never more than a few pages from being able to compile and run something. For a topic as complex as language implementation, this steady cadence of progress is essential for maintaining motivation.&lt;/p&gt;
&lt;h3&gt;Limitations&lt;/h3&gt;
&lt;p&gt;The book is not without its shortcomings. The choice of dynamic typing for Lox means that static type systems—one of the most active and important areas of modern language design—receive no coverage. Type inference, generics, algebraic data types, and pattern matching are absent. A reader completing both implementations still would not know how to add a type checker, which is arguably the most practically relevant compiler phase for working programmers today.&lt;/p&gt;
&lt;p&gt;Optimization is largely unexplored. The &lt;code&gt;clox&lt;/code&gt; VM is faster than &lt;code&gt;jlox&lt;/code&gt; by virtue of being a bytecode interpreter written in C, but Nystrom does not cover constant folding, dead code elimination, register allocation, or any of the optimization passes that distinguish a teaching compiler from a production one. JIT compilation, increasingly the standard for high-performance language runtimes, is mentioned only in passing.&lt;/p&gt;
&lt;p&gt;The error handling and recovery throughout both implementations is minimal. Production parsers need sophisticated error recovery to provide useful diagnostics. Nystrom acknowledges this gap but does not address it, leaving readers who want to build user-facing tools with significant work ahead of them.&lt;/p&gt;
&lt;p&gt;Lox's deliberate simplicity means that several common language features—arrays, iterators, modules, pattern matching, exception handling—are left as exercises. While this keeps the book focused, it means that readers must figure out on their own how to implement the features that most real languages require. The gap between Lox and a practical language is significant.&lt;/p&gt;
&lt;h3&gt;Who Should Read This Book&lt;/h3&gt;
&lt;p&gt;"Crafting Interpreters" is ideal for working programmers who have always been curious about how languages work but have been intimidated by the traditional compiler literature. Comfortable familiarity with Java and C is assumed—this is not a book for learning either language. But the reader need not have any prior knowledge of compilers, formal languages, or automata theory.&lt;/p&gt;
&lt;p&gt;Computer science students will find it an excellent companion to a formal compilers course, providing the practical intuition that textbooks like Aho's "Dragon Book" deliberately omit. Conversely, self-taught programmers who never took a compilers course will find this book fills a significant gap in their education.&lt;/p&gt;
&lt;p&gt;Language enthusiasts who have tinkered with toy interpreters but never built anything with closures, classes, or garbage collection will find exactly the guidance they need to level up. And anyone who simply enjoys beautifully crafted technical writing will find the book rewarding even as a pure reading experience.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;"Crafting Interpreters" is one of the best programming books published in recent years. It takes a subject that most programmers consider forbiddingly complex and renders it not just comprehensible but engaging. Nystrom's combination of clear writing, thoughtful pedagogy, practical focus, and genuine craft produces a book that teaches far more than its nominal subject. Beyond scanning, parsing, and code generation, the reader learns how to approach complex software design, how to build systems incrementally, and how to think about the tools they use every day at a deeper level.&lt;/p&gt;
&lt;p&gt;The book will not make you a compiler engineer. It will not teach you how to build a production language runtime, optimize generated code, or implement a sophisticated type system. What it will do is demystify the machinery that powers every programming language you have ever used, and give you the confidence and foundation to explore further. For most programmers, that is more than enough. It is, in fact, exactly what was needed.&lt;/p&gt;</description><category>bytecode</category><category>c</category><category>compilers</category><category>garbage collection</category><category>interpreters</category><category>java</category><category>language design</category><category>parsing</category><category>programming languages</category><category>robert nystrom</category><category>virtual machines</category><guid>https://tinycomputers.io/posts/review-of-crafting-interpreters-by-robert-nystrom.html</guid><pubDate>Thu, 19 Feb 2026 16:30:00 GMT</pubDate></item><item><title>Upgrading ROCm 7.0 to 7.2 on AMD Strix Halo (gfx1151)</title><link>https://tinycomputers.io/posts/upgrading-rocm-7.0-to-7.2-on-amd-strix-halo-gfx1151.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/upgrading-rocm-7.0-to-7.2-on-amd-strix-halo-gfx1151_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;15 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;If you're running AMD's Strix Halo hardware -- specifically the Ryzen AI MAX+ 395 with its integrated Radeon 8060S GPU -- you already know the software ecosystem is a moving target. The gfx1151 architecture sits in an awkward spot: powerful hardware that isn't officially listed on AMD's ROCm support matrix, yet functional enough to run real workloads with the right driver stack. When ROCm 7.2 landed in early 2026, upgrading from 7.0.2 was a priority. The newer stack brings an updated HSA runtime, a refreshed amdgpu kernel module, and broader compatibility improvements that matter on bleeding-edge silicon.&lt;/p&gt;
&lt;p&gt;This post documents the complete upgrade procedure from ROCm 7.0.2 to 7.2 on a production Ubuntu 24.04 system. It's not a theoretical exercise -- this was performed on a live server running QEMU virtual machines and network services, with the expectation that everything would come back online after a single reboot.&lt;/p&gt;
&lt;p&gt;AMD's official documentation states that in-place ROCm upgrades are not supported. The recommended path is a full uninstall followed by a clean reinstall. That's exactly what we did, and the entire process took about 20 minutes of wall-clock time (excluding the reboot).&lt;/p&gt;
&lt;h3&gt;System Overview&lt;/h3&gt;
&lt;p&gt;The target system is a &lt;a href="https://baud.rs/WZgnl1"&gt;Bosgame mini PC&lt;/a&gt; running the Ryzen AI MAX+ 395 APU. If you've read the &lt;a href="https://tinycomputers.io/posts/amd-ai-max+-395-system-review-a-comprehensive-analysis/"&gt;earlier review&lt;/a&gt; of this hardware, you'll be familiar with the specs. For context on this upgrade, here's what matters:&lt;/p&gt;
&lt;h4&gt;Hardware&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CPU&lt;/strong&gt;: AMD Ryzen AI MAX+ 395, 16 cores / 32 threads, Zen 5&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU&lt;/strong&gt;: Integrated Radeon 8060S, 40 Compute Units, RDNA 3.5 (gfx1151)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory&lt;/strong&gt;: 32 GB DDR5, unified architecture with 96 GB allocatable to GPU&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Peak GPU Clock&lt;/strong&gt;: 2,900 MHz&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Software (Pre-Upgrade)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OS&lt;/strong&gt;: Ubuntu 24.04.3 LTS (Noble Numbat)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kernel&lt;/strong&gt;: 6.14.0-37-generic (HWE, pinned)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ROCm&lt;/strong&gt;: 7.0.2&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;amdgpu-dkms&lt;/strong&gt;: 6.14.14 (from &lt;code&gt;repo.radeon.com/amdgpu/30.10.2&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ROCk Module&lt;/strong&gt;: 6.14.14&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Running Services&lt;/h4&gt;
&lt;p&gt;The system was actively serving several roles during the upgrade:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Five QEMU virtual machines (three x86, two aarch64)&lt;/li&gt;
&lt;li&gt;A PXE boot server (dnsmasq) for the local network&lt;/li&gt;
&lt;li&gt;Docker daemon with various containers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;None of these services are tied to the GPU driver stack, so the plan was to perform the upgrade and reboot without shutting them down first. The VMs and network services would come back automatically after the reboot.&lt;/p&gt;
&lt;h3&gt;Why Upgrade&lt;/h3&gt;
&lt;p&gt;ROCm 7.0.2 worked on this hardware. Models loaded, inference ran, &lt;code&gt;rocminfo&lt;/code&gt; detected the GPU. So why bother upgrading?&lt;/p&gt;
&lt;p&gt;Three reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Driver maturity for gfx1151&lt;/strong&gt;: The amdgpu kernel module jumped from 6.14.14 to 6.16.13 between the two releases. That's not a minor revision -- it represents months of kernel driver development. On hardware that isn't officially supported, newer drivers tend to bring meaningful stability improvements as AMD's internal teams encounter and fix issues on adjacent architectures.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;HSA Runtime improvements&lt;/strong&gt;: ROCm 7.2 ships HSA Runtime Extension version 1.15, up from 1.11 in ROCm 7.0.2. The HSA (Heterogeneous System Architecture) runtime is the lowest layer of the ROCm software stack -- it handles device discovery, memory management, and kernel dispatch. Improvements here affect everything built on top of it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Ecosystem alignment&lt;/strong&gt;: PyTorch wheels, Ollama builds, and other ROCm-dependent tools increasingly target 7.2 as the baseline. Running 7.0.2 was becoming an exercise in version pinning and compatibility workarounds.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;The Kernel Hold: Why It Matters&lt;/h3&gt;
&lt;p&gt;Before diving into the procedure, a note on kernel management. This system runs the Ubuntu HWE (Hardware Enablement) kernel, which provides newer kernel versions on LTS releases. At the time of this upgrade, the HWE kernel was 6.14.0-37-generic. The upstream kernel had already moved to 6.17, but we didn't want the ROCm upgrade to pull in a kernel that AMD's DKMS module might not build against.&lt;/p&gt;
&lt;p&gt;The solution is &lt;code&gt;apt-mark hold&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;apt-mark&lt;span class="w"&gt; &lt;/span&gt;hold&lt;span class="w"&gt; &lt;/span&gt;linux-generic-hwe-24.04&lt;span class="w"&gt; &lt;/span&gt;linux-headers-generic-hwe-24.04&lt;span class="w"&gt; &lt;/span&gt;linux-image-generic-hwe-24.04
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This prevents &lt;code&gt;apt&lt;/code&gt; from upgrading the kernel meta-packages, effectively pinning the system to 6.14.0-37-generic. The hold was already in place before the upgrade and remained untouched throughout. After the upgrade, we confirmed it was still active:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;apt-mark&lt;span class="w"&gt; &lt;/span&gt;showhold
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;linux-generic-hwe-24.04
linux-headers-generic-hwe-24.04
linux-image-generic-hwe-24.04
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If you're running Strix Halo or any other hardware where kernel compatibility with &lt;code&gt;amdgpu-dkms&lt;/code&gt; is uncertain, kernel holds are essential. A kernel upgrade that breaks the DKMS build means no GPU driver after reboot.&lt;/p&gt;
&lt;h3&gt;Upgrade Procedure&lt;/h3&gt;
&lt;h4&gt;Step 1: Uninstall the Current ROCm Stack&lt;/h4&gt;
&lt;p&gt;AMD provides the &lt;code&gt;amdgpu-uninstall&lt;/code&gt; script for exactly this purpose. It removes all ROCm userspace packages and the amdgpu-dkms kernel module in a single operation:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;amdgpu-uninstall&lt;span class="w"&gt; &lt;/span&gt;-y
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This command removed approximately 120 packages, including the full HIP runtime, rocBLAS, MIOpen, MIGraphX, ROCm SMI, the LLVM-based compiler toolchain, and the Mesa graphics drivers that ship with ROCm. The DKMS module was purged, which means the amdgpu kernel module was removed from the 6.14.0-37-generic kernel's module tree.&lt;/p&gt;
&lt;p&gt;After the ROCm stack was removed, we purged the &lt;code&gt;amdgpu-install&lt;/code&gt; meta-package itself:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;purge&lt;span class="w"&gt; &lt;/span&gt;-y&lt;span class="w"&gt; &lt;/span&gt;amdgpu-install
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This also cleaned up the APT repository entries that &lt;code&gt;amdgpu-install&lt;/code&gt; had configured in &lt;code&gt;/etc/apt/sources.list.d/&lt;/code&gt;. The old repos -- &lt;code&gt;repo.radeon.com/amdgpu/30.10.2&lt;/code&gt;, &lt;code&gt;repo.radeon.com/rocm/apt/7.0.2&lt;/code&gt;, and &lt;code&gt;repo.radeon.com/graphics/7.0.2&lt;/code&gt; -- were all removed automatically.&lt;/p&gt;
&lt;h4&gt;Step 2: Clean Up Leftover Files&lt;/h4&gt;
&lt;p&gt;The package removal was thorough but not perfect. A few leftover directories remained in &lt;code&gt;/opt/&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;ls&lt;span class="w"&gt; &lt;/span&gt;/opt/&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;grep&lt;span class="w"&gt; &lt;/span&gt;rocm
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;rocm-7.0.0
rocm-7.0.2
rocm-7.9.0
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;rocm-7.0.0&lt;/code&gt; directory was from a previous installation attempt. The &lt;code&gt;rocm-7.9.0&lt;/code&gt; was from an earlier experiment with a release candidate build. The &lt;code&gt;rocm-7.0.2&lt;/code&gt; directory contained a single orphaned shared library (&lt;code&gt;libamdhip64.so.6&lt;/code&gt;) that dpkg couldn't remove because the directory wasn't empty. All three were cleaned up manually:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;rm&lt;span class="w"&gt; &lt;/span&gt;-rf&lt;span class="w"&gt; &lt;/span&gt;/opt/rocm-7.0.0&lt;span class="w"&gt; &lt;/span&gt;/opt/rocm-7.0.2&lt;span class="w"&gt; &lt;/span&gt;/opt/rocm-7.9.0
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It's worth checking for stale ROCm directories after any uninstall. They consume negligible disk space but can confuse build systems and scripts that scan &lt;code&gt;/opt/rocm*&lt;/code&gt; for active installations.&lt;/p&gt;
&lt;h4&gt;Step 3: Install the ROCm 7.2 Installer&lt;/h4&gt;
&lt;p&gt;AMD distributes ROCm through a meta-package called &lt;code&gt;amdgpu-install&lt;/code&gt;. Each ROCm release has its own version of this package, which configures the appropriate APT repositories. The 7.2 installer was downloaded directly from AMD's repository:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nb"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;/tmp
wget&lt;span class="w"&gt; &lt;/span&gt;https://repo.radeon.com/amdgpu-install/7.2/ubuntu/noble/amdgpu-install_7.2.70200-1_all.deb
sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;-y&lt;span class="w"&gt; &lt;/span&gt;./amdgpu-install_7.2.70200-1_all.deb
sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;update
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After installation and &lt;code&gt;apt update&lt;/code&gt;, three new repositories were active:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;https://repo.radeon.com/amdgpu/30.30/ubuntu noble&lt;/code&gt; -- the kernel driver and Mesa components&lt;/li&gt;
&lt;li&gt;&lt;code&gt;https://repo.radeon.com/rocm/apt/7.2 noble&lt;/code&gt; -- the ROCm userspace stack&lt;/li&gt;
&lt;li&gt;&lt;code&gt;https://repo.radeon.com/graphics/7.2/ubuntu noble&lt;/code&gt; -- graphics libraries&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The version numbering can be confusing. The &lt;code&gt;amdgpu-install&lt;/code&gt; package version is &lt;code&gt;30.30.0.0.30300000-2278356.24.04&lt;/code&gt;, which maps to the amdgpu driver release 30.30. The ROCm version is 7.2.0. These are different version tracks that AMD maintains in parallel.&lt;/p&gt;
&lt;h4&gt;Step 4: Install ROCm 7.2&lt;/h4&gt;
&lt;p&gt;With the repositories configured, the actual installation was a single command:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;amdgpu-install&lt;span class="w"&gt; &lt;/span&gt;-y&lt;span class="w"&gt; &lt;/span&gt;--usecase&lt;span class="o"&gt;=&lt;/span&gt;graphics,rocm
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;--usecase=graphics,rocm&lt;/code&gt; flag tells the installer to include both the Mesa graphics drivers and the full ROCm compute stack. This is the right choice for a system that needs both display output and GPU compute capabilities.&lt;/p&gt;
&lt;p&gt;The installation took approximately 10 minutes and included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;amdgpu-dkms 6.16.13&lt;/strong&gt;: The kernel module, compiled via DKMS against the running kernel&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Full ROCm 7.2 stack&lt;/strong&gt;: HIP runtime, hipcc compiler, rocBLAS, rocFFT, MIOpen, MIGraphX, RCCL, ROCm SMI, ROCProfiler, and dozens of other libraries&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mesa graphics&lt;/strong&gt;: Updated EGL, OpenGL, and Vulkan drivers from the amdgpu Mesa fork&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ROCm LLVM toolchain&lt;/strong&gt;: The LLVM-based compiler infrastructure that HIP uses for kernel compilation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The DKMS build is the critical step. During installation, DKMS compiled the amdgpu module against the kernel headers for 6.14.0-37-generic. The output confirmed a successful build:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;depmod...
update-initramfs: Generating /boot/initrd.img-6.14.0-37-generic
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The initramfs was regenerated to include the new module, ensuring it would be loaded at boot.&lt;/p&gt;
&lt;h4&gt;Step 5: Verify DKMS&lt;/h4&gt;
&lt;p&gt;Before rebooting, we confirmed the DKMS status:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;dkms&lt;span class="w"&gt; &lt;/span&gt;status
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;amdgpu/6.16.13-2278356.24.04, 6.14.0-37-generic, x86_64: installed
virtualbox/7.0.16, 6.14.0-36-generic, x86_64: installed
virtualbox/7.0.16, 6.14.0-37-generic, x86_64: installed
virtualbox/7.0.16, 6.8.0-100-generic, x86_64: installed
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The new amdgpu module (6.16.13) was built and installed for 6.14.0-37-generic. Note that it only built for the currently running kernel, unlike VirtualBox which had modules built for older kernels as well. This is expected -- DKMS builds against available kernel headers, and the old kernel headers for 6.14.0-36 and 6.8.0-100 were still present from earlier installations.&lt;/p&gt;
&lt;h4&gt;Step 6: Reboot&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;reboot
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The server came back online in approximately 50 seconds.&lt;/p&gt;
&lt;h3&gt;Post-Reboot Verification&lt;/h3&gt;
&lt;h4&gt;rocminfo&lt;/h4&gt;
&lt;p&gt;The first check after reboot was &lt;code&gt;rocminfo&lt;/code&gt;, which queries the HSA runtime for available agents:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;rocminfo
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;ROCk&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;6.16&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;loaded&lt;/span&gt;
&lt;span class="o"&gt;=====================&lt;/span&gt;
&lt;span class="n"&gt;HSA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;System&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Attributes&lt;/span&gt;
&lt;span class="o"&gt;=====================&lt;/span&gt;
&lt;span class="n"&gt;Runtime&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="mf"&gt;1.18&lt;/span&gt;
&lt;span class="n"&gt;Runtime&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Ext&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mf"&gt;1.15&lt;/span&gt;
&lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="o"&gt;==========&lt;/span&gt;
&lt;span class="n"&gt;HSA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Agents&lt;/span&gt;
&lt;span class="o"&gt;==========&lt;/span&gt;
&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;AMD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;RYZEN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;AI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MAX&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;395&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Radeon&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8060&lt;/span&gt;&lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gfx1151&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GPU&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Marketing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="n"&gt;AMD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Radeon&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Graphics&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Compute&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Unit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Max&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Clock&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Freq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MHz&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;2900&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Memory&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="n"&gt;APU&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;ISA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;amdgcn&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;amd&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;amdhsa&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;gfx1151&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;ISA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;amdgcn&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;amd&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;amdhsa&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;gfx11&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;generic&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Key observations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;ROCk module 6.16.13&lt;/strong&gt;: The new kernel module loaded successfully.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Runtime Ext Version 1.15&lt;/strong&gt;: Upgraded from 1.11 in ROCm 7.0.2.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;gfx1151 detected&lt;/strong&gt;: The GPU was recognized with its correct ISA identifier.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;gfx11-generic ISA&lt;/strong&gt;: ROCm 7.2 also exposes a generic gfx11 ISA, which allows software compiled for the broader RDNA 3 family to run on this device without gfx1151-specific builds.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;APU memory&lt;/strong&gt;: The memory properties correctly identify this as an APU with unified memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;ROCm SMI&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;rocm-smi
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Device  Node  Temp    Power     SCLK  MCLK     Fan  Perf  VRAM%  GPU%
0       1     33.0C   9.087W    N/A   1000Mhz  0%   auto  0%     0%
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The GPU was visible and reporting telemetry. The 0% VRAM reading is expected on an APU -- &lt;code&gt;rocm-smi&lt;/code&gt; reports dedicated VRAM usage, but on a unified memory architecture, GPU memory allocations come from system RAM and aren't reflected in this counter.&lt;/p&gt;
&lt;h4&gt;ROCm Version&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;cat&lt;span class="w"&gt; &lt;/span&gt;/opt/rocm/.info/version
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="mf"&gt;7.2.0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;DKMS&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;dkms&lt;span class="w"&gt; &lt;/span&gt;status
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Confirmed &lt;code&gt;amdgpu/6.16.13&lt;/code&gt; remained installed for 6.14.0-37-generic after reboot.&lt;/p&gt;
&lt;h3&gt;PyTorch Validation&lt;/h3&gt;
&lt;p&gt;With the driver stack verified, the next step was confirming that PyTorch could see and use the GPU. ROCm 7.2 ships with prebuilt PyTorch wheels on AMD's repository.&lt;/p&gt;
&lt;h4&gt;Installing PyTorch for ROCm 7.2&lt;/h4&gt;
&lt;p&gt;We set up a Python virtual environment and installed the ROCm-specific wheels:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;python3&lt;span class="w"&gt; &lt;/span&gt;-m&lt;span class="w"&gt; &lt;/span&gt;venv&lt;span class="w"&gt; &lt;/span&gt;.venv
&lt;span class="nb"&gt;source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;.venv/bin/activate
pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;--upgrade&lt;span class="w"&gt; &lt;/span&gt;pip
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The PyTorch wheel for ROCm 7.2 requires a matching ROCm-specific build of Triton. Both are available from AMD's manylinux repository. The order matters -- Triton must be installed first, since the PyTorch wheel declares it as a dependency with a specific version that doesn't exist on PyPI:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/triton-3.5.1%2Brocm7.2.0.gita272dfa8-cp312-cp312-linux_x86_64.whl
pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torch-2.9.1%2Brocm7.2.0.lw.git7e1940d4-cp312-cp312-linux_x86_64.whl
pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torchvision-0.24.0%2Brocm7.2.0.gitb919bd0c-cp312-cp312-linux_x86_64.whl
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;These are the ROCm 7.2 builds for Python 3.12. AMD also provides wheels for Python 3.10, 3.11, and 3.13.&lt;/p&gt;
&lt;h4&gt;Smoke Test&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"PyTorch:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__version__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"CUDA available:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_available&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Device:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_device_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"VRAM:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_device_properties&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_memory&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;1e9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"GB"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;PyTorch&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;2.9&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;rocm7&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;git7e1940d4&lt;/span&gt;
&lt;span class="n"&gt;CUDA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;available&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;True&lt;/span&gt;
&lt;span class="n"&gt;Device&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;AMD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Radeon&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Graphics&lt;/span&gt;
&lt;span class="n"&gt;VRAM&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;103.1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GB&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;PyTorch detected the GPU through ROCm's HIP-to-CUDA translation layer. The 103.1 GB figure represents the total addressable memory on this unified-memory APU, which includes both the 96 GB GPU allocation and additional system memory accessible through the HSA runtime.&lt;/p&gt;
&lt;p&gt;Note the use of &lt;code&gt;torch.cuda&lt;/code&gt; despite this being an AMD GPU. ROCm's HIP runtime presents itself through PyTorch's CUDA interface, so all CUDA API calls in PyTorch (device selection, memory management, kernel launches) work transparently with AMD hardware.&lt;/p&gt;
&lt;h3&gt;Before and After Summary&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;ROCm 7.0.2&lt;/th&gt;
&lt;th&gt;ROCm 7.2.0&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ROCm Version&lt;/td&gt;
&lt;td&gt;7.0.2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.2.0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;amdgpu-dkms&lt;/td&gt;
&lt;td&gt;6.14.14&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6.16.13&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ROCk Module&lt;/td&gt;
&lt;td&gt;6.14.14&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6.16.13&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HSA Runtime Ext&lt;/td&gt;
&lt;td&gt;1.11&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.15&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;amdgpu Repo&lt;/td&gt;
&lt;td&gt;30.10.2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30.30&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyTorch&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;2.9.1+rocm7.2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Triton&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;3.5.1+rocm7.2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel&lt;/td&gt;
&lt;td&gt;6.14.0-37-generic&lt;/td&gt;
&lt;td&gt;6.14.0-37-generic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel Holds&lt;/td&gt;
&lt;td&gt;In place&lt;/td&gt;
&lt;td&gt;In place&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;Notes on gfx1151 Support&lt;/h3&gt;
&lt;p&gt;It's worth being explicit about the support situation. As of February 2026, gfx1151 (Strix Halo) is &lt;strong&gt;not listed&lt;/strong&gt; on AMD's official ROCm support matrix. The supported RDNA 3 targets are gfx1100 (Navi 31, RX 7900 XTX) and gfx1101 (Navi 32). Strix Halo's gfx1151 is an RDNA 3.5 derivative that shares much of the ISA with gfx1100 but has architectural differences in the memory subsystem and compute unit layout.&lt;/p&gt;
&lt;p&gt;In practice, ROCm 7.2 works on gfx1151. The kernel driver loads, &lt;code&gt;rocminfo&lt;/code&gt; detects the GPU, and PyTorch can allocate tensors and dispatch compute kernels. The &lt;code&gt;gfx11-generic&lt;/code&gt; ISA target in ROCm 7.2 is particularly helpful -- it provides a compatibility path for software that hasn't been explicitly compiled for gfx1151.&lt;/p&gt;
&lt;p&gt;However, "works" and "fully supported" are different things. There are known quirks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;rocm-smi VRAM reporting&lt;/strong&gt;: Always shows 0% on the APU since it only tracks discrete VRAM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No official PyTorch gfx1151 builds&lt;/strong&gt;: The ROCm PyTorch wheels target gfx1100. They run on gfx1151 through ISA compatibility, but performance may not be optimal&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Large model loading latency&lt;/strong&gt;: Moving large models to the GPU device can be slow on the unified memory architecture, as the HSA runtime handles page migration differently than discrete GPU DMA transfers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you're considering this hardware for production AI workloads, treat ROCm support as "functional but experimental." It works well enough for development, testing, and moderate inference workloads. For production training or latency-sensitive deployment, stick with hardware on AMD's official support list.&lt;/p&gt;
&lt;h3&gt;Rollback Plan&lt;/h3&gt;
&lt;p&gt;If the upgrade fails -- the DKMS module doesn't build, the GPU isn't detected after reboot, or something else goes wrong -- the rollback path is straightforward:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Uninstall ROCm 7.2:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;amdgpu-uninstall&lt;span class="w"&gt; &lt;/span&gt;-y
sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;purge&lt;span class="w"&gt; &lt;/span&gt;-y&lt;span class="w"&gt; &lt;/span&gt;amdgpu-install
&lt;/pre&gt;&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Reinstall ROCm 7.0.2:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;wget&lt;span class="w"&gt; &lt;/span&gt;https://repo.radeon.com/amdgpu-install/30.10.2/ubuntu/noble/amdgpu-install_30.10.2.0.30100200-2226257.24.04_all.deb
sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;-y&lt;span class="w"&gt; &lt;/span&gt;./amdgpu-install_30.10.2.0.30100200-2226257.24.04_all.deb
sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;update
sudo&lt;span class="w"&gt; &lt;/span&gt;amdgpu-install&lt;span class="w"&gt; &lt;/span&gt;-y&lt;span class="w"&gt; &lt;/span&gt;--usecase&lt;span class="o"&gt;=&lt;/span&gt;graphics,rocm
sudo&lt;span class="w"&gt; &lt;/span&gt;reboot
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The entire rollback takes about 15 minutes. Keep the old &lt;code&gt;amdgpu-install&lt;/code&gt; deb URL handy -- it's not linked from AMD's current download pages once a newer version is published.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Upgrading ROCm on hardware that isn't officially supported always carries some risk, but this upgrade from 7.0.2 to 7.2 on gfx1151 was uneventful. The procedure follows AMD's documented uninstall-reinstall approach with no deviations. The kernel hold strategy kept the kernel stable, the DKMS module built cleanly against 6.14.0-37-generic, and all post-reboot checks passed.&lt;/p&gt;
&lt;p&gt;The improvements in ROCm 7.2 -- particularly the HSA runtime bump to 1.15 and the introduction of the &lt;code&gt;gfx11-generic&lt;/code&gt; ISA target -- represent meaningful progress for Strix Halo users. The ecosystem is slowly catching up to the hardware. It's not there yet, but each release closes the gap.&lt;/p&gt;
&lt;p&gt;For anyone running a Ryzen AI MAX+ 395 or similar Strix Halo hardware on Ubuntu 24.04, this upgrade is worth doing. The procedure is well-defined, the rollback path is clear, and the newer driver stack brings tangible benefits. Just remember to hold your kernel first.&lt;/p&gt;
&lt;h3&gt;Recommended Resources&lt;/h3&gt;
&lt;h4&gt;Hardware&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/WZgnl1"&gt;Bosgame M5 AI Mini PC (Ryzen AI MAX+ 395)&lt;/a&gt; - The system used in this post&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/q87EAZ"&gt;GMKtec EVO X2 (Ryzen AI MAX+ 395)&lt;/a&gt; - Another Strix Halo mini PC option on Amazon&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Books&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/NTAPGg"&gt;&lt;em&gt;Deep Learning with PyTorch&lt;/em&gt;&lt;/a&gt; by Stevens, Antiga, Huang, Viehmann - Comprehensive guide to building, training, and tuning neural networks with PyTorch&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/Iu8KR4"&gt;&lt;em&gt;Programming PyTorch for Deep Learning&lt;/em&gt;&lt;/a&gt; by Ian Pointer - Practical guide to creating and deploying deep learning applications&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/zmKSQj"&gt;&lt;em&gt;Understanding Deep Learning&lt;/em&gt;&lt;/a&gt; by Simon Prince - Modern treatment of deep learning fundamentals&lt;/li&gt;
&lt;/ul&gt;</description><category>amd</category><category>amdgpu</category><category>dkms</category><category>driver upgrade</category><category>gfx1151</category><category>gpu computing</category><category>linux</category><category>pytorch</category><category>rocm</category><category>ryzen ai</category><category>strix halo</category><category>ubuntu</category><guid>https://tinycomputers.io/posts/upgrading-rocm-7.0-to-7.2-on-amd-strix-halo-gfx1151.html</guid><pubDate>Wed, 18 Feb 2026 16:00:00 GMT</pubDate></item><item><title>From Tree-Walker to Bytecode VM: Compiling Lattice</title><link>https://tinycomputers.io/posts/from-tree-walker-to-bytecode-vm-compiling-lattice.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/from-tree-walker-to-bytecode-vm-compiling-lattice_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;16 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Lattice is a programming language built around a &lt;a href="https://tinycomputers.io/posts/introducing-lattice-a-crystallization-based-programming-language.html"&gt;crystallization-based phase system&lt;/a&gt; — values start as mutable "flux" and can be frozen into immutable "fix," with the runtime enforcing the transition and providing &lt;a href="https://tinycomputers.io/posts/mutability-as-a-first-class-concept-the-lattice-phase-system.html"&gt;reactions, bonds, contracts, and temporal tracking&lt;/a&gt; around it. It's implemented in C with no external dependencies.&lt;/p&gt;
&lt;p&gt;When I started building &lt;a href="https://baud.rs/q5yFwI"&gt;Lattice&lt;/a&gt;, a tree-walking interpreter was the obvious first move. You parse source into an AST, walk the nodes recursively, and evaluate as you go. It's straightforward, easy to debug, and lets you iterate on language semantics quickly without worrying about a second representation. &lt;a href="https://baud.rs/crafting-interpreters"&gt;&lt;em&gt;Crafting Interpreters&lt;/em&gt;&lt;/a&gt; calls this approach "the simplest way to build an interpreter," and it's right.&lt;/p&gt;
&lt;p&gt;But tree-walkers have well-known limitations. Every expression evaluation descends through function calls — &lt;code&gt;eval_expr&lt;/code&gt; calling &lt;code&gt;eval_binary&lt;/code&gt; calling &lt;code&gt;eval_expr&lt;/code&gt; twice more. The overhead compounds. You're chasing pointers through heap-allocated AST nodes with poor cache locality. And the call stack of the host language (C, in Lattice's case) becomes tangled with the call stack of the guest language, making it harder to implement features like error recovery and coroutines cleanly.&lt;/p&gt;
&lt;p&gt;Lattice v0.3.0 shipped a bytecode compiler and stack-based virtual machine alongside the tree-walker. In v0.3.1, the bytecode VM became the default for file execution, the interactive REPL, and the browser-based playground. The tree-walker is still available via &lt;code&gt;--tree-walk&lt;/code&gt;, but the VM now handles everything. This post walks through the architecture of that VM, some design decisions that turned out to matter, and a mutation bug that only surfaces when you combine deep-clone-on-read semantics with in-place method dispatch.&lt;/p&gt;
&lt;h3&gt;Architecture Overview&lt;/h3&gt;
&lt;p&gt;The bytecode pipeline has three stages: lexing and parsing (shared with the tree-walker), compilation from AST to bytecode chunks, and execution on a stack-based VM. The compiler and VM together add about 8,200 lines of C to the codebase, bringing the total to around 33,000 lines.&lt;/p&gt;
&lt;p&gt;A &lt;code&gt;Chunk&lt;/code&gt; is the compilation unit — a dynamic array of bytecode instructions, a constant pool, and debug metadata mapping instructions back to source line numbers. The compiler walks the AST and emits bytes into a chunk. The VM reads bytes from the chunk and executes them against a value stack.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;typedef&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="c1"&gt;// bytecode array&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;constants&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;// constant pool&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;const_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;const_cap&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;// source line per instruction&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;local_names&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// slot → variable name (debug)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;local_name_cap&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The VM itself is a &lt;code&gt;for(;;)&lt;/code&gt; loop with a &lt;code&gt;switch&lt;/code&gt; on the current opcode byte — the textbook approach. No computed gotos, no threaded dispatch, no JIT. Just a switch. On modern hardware with branch prediction, a well-organized switch over 62 opcodes is fast enough that the overhead is negligible compared to the cost of actual operations (string allocation, hash table lookups, deep cloning).&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(;;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;switch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;OP_CONSTANT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;OP_ADD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;OP_CALL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// 59 more cases&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The value stack holds 4,096 slots. The call frame stack holds 256 frames. Each &lt;code&gt;CallFrame&lt;/code&gt; tracks its own instruction pointer, a base pointer into the value stack for its local variables, and an array of captured upvalues for closures. When you call a function, the VM pushes a new frame pointing at the callee's chunk. When the function returns, the frame pops and execution resumes in the caller.&lt;/p&gt;
&lt;h3&gt;The Instruction Set&lt;/h3&gt;
&lt;p&gt;Lattice's instruction set has 62 opcodes. Some are standard — &lt;code&gt;OP_ADD&lt;/code&gt;, &lt;code&gt;OP_JUMP_IF_FALSE&lt;/code&gt;, &lt;code&gt;OP_RETURN&lt;/code&gt;. Others exist because of Lattice-specific semantics.&lt;/p&gt;
&lt;p&gt;The phase system needs dedicated opcodes. &lt;code&gt;OP_FREEZE&lt;/code&gt; pops a value, deep-clones it into a crystal region with &lt;code&gt;VTAG_CRYSTAL&lt;/code&gt; tags, and pushes the frozen result. &lt;code&gt;OP_THAW&lt;/code&gt; does the reverse. &lt;code&gt;OP_MARK_FLUID&lt;/code&gt; sets the phase tag to &lt;code&gt;VTAG_FLUID&lt;/code&gt; — this is what &lt;code&gt;flux&lt;/code&gt; bindings emit after their initializer. &lt;code&gt;OP_FREEZE_VAR&lt;/code&gt; and &lt;code&gt;OP_THAW_VAR&lt;/code&gt; handle the case where &lt;code&gt;freeze(x)&lt;/code&gt; targets a named variable and needs to write back the result, carrying extra operands to identify the variable's location (local slot, upvalue, or global name).&lt;/p&gt;
&lt;p&gt;Phase reactions and bonds each have their own opcodes: &lt;code&gt;OP_REACT&lt;/code&gt;, &lt;code&gt;OP_UNREACT&lt;/code&gt;, &lt;code&gt;OP_BOND&lt;/code&gt;, &lt;code&gt;OP_UNBOND&lt;/code&gt;, &lt;code&gt;OP_SEED&lt;/code&gt;, &lt;code&gt;OP_UNSEED&lt;/code&gt;. These could theoretically be implemented as native function calls, but making them opcodes lets the compiler emit the variable name as a constant operand — the VM needs the name to look up the correct reaction/bond registration in its tracking tables, and encoding it in the bytecode avoids a runtime string lookup.&lt;/p&gt;
&lt;p&gt;Structured concurrency uses an interesting hybrid. &lt;code&gt;OP_SCOPE&lt;/code&gt; and &lt;code&gt;OP_SELECT&lt;/code&gt; each carry a constant-pool index that stores a pointer to the original AST &lt;code&gt;Expr*&lt;/code&gt; node. When the VM hits one of these opcodes, it invokes the tree-walking evaluator on that subtree. This is a deliberate design choice — the concurrency primitives involve spawning threads and managing channels, which requires the evaluator's full environment machinery. Rather than reimplement all of that in the VM, the bytecode compiler punts to the tree-walker for these specific constructs. The rest of the program runs on the VM; only &lt;code&gt;scope&lt;/code&gt; and &lt;code&gt;select&lt;/code&gt; blocks briefly drop into interpretation.&lt;/p&gt;
&lt;h3&gt;Closures and Upvalues&lt;/h3&gt;
&lt;p&gt;Closures are where bytecode VMs get interesting, and Lattice follows the upvalue model that Lua pioneered and Crafting Interpreters popularized.&lt;/p&gt;
&lt;p&gt;When a function is defined inside another function and references variables from the enclosing scope, those variables need to outlive their original stack frame. The solution is upvalues — indirection objects that start pointing into the stack and get "closed over" when the variable goes out of scope.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;typedef&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;ObjUpvalue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// points to stack slot or &amp;amp;closed&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;closed&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="c1"&gt;// holds value after scope exit&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;ObjUpvalue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;next&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// linked list for open upvalues&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ObjUpvalue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;While the enclosing function is still executing, &lt;code&gt;location&lt;/code&gt; points directly at the stack slot. When the enclosing function returns, &lt;code&gt;OP_CLOSE_UPVALUE&lt;/code&gt; copies the stack value into the &lt;code&gt;closed&lt;/code&gt; field and repoints &lt;code&gt;location&lt;/code&gt; to &lt;code&gt;&amp;amp;closed&lt;/code&gt;. The closure doesn't know or care about the switch — it always dereferences &lt;code&gt;location&lt;/code&gt;. This is why upvalues work: they're a level of indirection that transparently survives stack frame destruction.&lt;/p&gt;
&lt;p&gt;The compiler resolves variable references in three stages: first it checks local scope (&lt;code&gt;resolve_local&lt;/code&gt;), then upvalues (&lt;code&gt;resolve_upvalue&lt;/code&gt;, which walks the compiler chain recursively), then falls back to globals via &lt;code&gt;OP_GET_GLOBAL&lt;/code&gt;. The &lt;code&gt;OP_CLOSURE&lt;/code&gt; instruction is followed by a series of &lt;code&gt;(is_local, index)&lt;/code&gt; byte pairs, one per upvalue, telling the VM whether to capture from the current frame's stack or from the parent frame's upvalue array.&lt;/p&gt;
&lt;p&gt;A concrete example makes this clearer. Consider a counter factory:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;fn make_counter() {
    flux count = 0
    return |n| { count += n; count }
}

let c = make_counter()
print(c(5))   // 5
print(c(3))   // 8
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When &lt;code&gt;make_counter&lt;/code&gt; returns, its stack frame is destroyed — but &lt;code&gt;count&lt;/code&gt; needs to survive, because the returned closure references it. During compilation, the compiler sees that the closure's body references &lt;code&gt;count&lt;/code&gt;, which is local to the enclosing &lt;code&gt;make_counter&lt;/code&gt;. It emits an &lt;code&gt;(is_local=true, index=1)&lt;/code&gt; upvalue descriptor. At runtime, &lt;code&gt;OP_CLOSURE&lt;/code&gt; calls &lt;code&gt;capture_upvalue()&lt;/code&gt;, which either reuses an existing &lt;code&gt;ObjUpvalue&lt;/code&gt; pointing at that stack slot or creates a new one. When &lt;code&gt;make_counter&lt;/code&gt; returns, &lt;code&gt;OP_CLOSE_UPVALUE&lt;/code&gt; copies the stack value of &lt;code&gt;count&lt;/code&gt; into the upvalue's &lt;code&gt;closed&lt;/code&gt; field and repoints &lt;code&gt;location&lt;/code&gt;. The closure keeps working, oblivious to the frame being gone.&lt;/p&gt;
&lt;p&gt;One implementation detail worth noting: Lattice stores the upvalue array by repurposing the closure's &lt;code&gt;captured_env&lt;/code&gt; field (normally an &lt;code&gt;Env*&lt;/code&gt; in the tree-walker) and the upvalue count in the &lt;code&gt;region_id&lt;/code&gt; field. This avoids adding new fields to the &lt;code&gt;LatValue&lt;/code&gt; union, which matters when values are deep-cloned frequently — every field adds to the clone cost.&lt;/p&gt;
&lt;h3&gt;Compiling for the REPL&lt;/h3&gt;
&lt;p&gt;A REPL that runs on a bytecode VM needs different compilation from file execution. The difference is small but important.&lt;/p&gt;
&lt;p&gt;In file mode, &lt;code&gt;compile_module()&lt;/code&gt; compiles a complete program and terminates with &lt;code&gt;OP_UNIT; OP_RETURN&lt;/code&gt; — the module returns unit, and any expression results along the way are discarded with &lt;code&gt;OP_POP&lt;/code&gt;. This is the right behavior for scripts: you don't want every intermediate expression to accumulate on the stack.&lt;/p&gt;
&lt;p&gt;In REPL mode, &lt;code&gt;compile_repl()&lt;/code&gt; needs the opposite behavior for the last expression. When you type &lt;code&gt;42&lt;/code&gt; at the REPL prompt, you want to see &lt;code&gt;=&amp;gt; 42&lt;/code&gt;. So if the last item in the compiled chunk is a bare expression statement, &lt;code&gt;compile_repl()&lt;/code&gt; compiles the expression but &lt;em&gt;skips the &lt;code&gt;OP_POP&lt;/code&gt;&lt;/em&gt;, leaving the value on the stack. Then it emits &lt;code&gt;OP_RETURN&lt;/code&gt;, and the VM receives the value as the chunk's return value.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last_is_expr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;item_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;item_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ITEM_STMT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;item_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;STMT_EXPR&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_is_expr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;emit_byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OP_RETURN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;// value already on stack&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;emit_byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OP_UNIT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;// no expression — return unit&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;emit_byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OP_RETURN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For function definitions, struct declarations, and enum definitions, the result is unit, and the REPL silently suppresses the &lt;code&gt;=&amp;gt;&lt;/code&gt; output. This matches user expectations — defining a function shouldn't print anything. The effect in practice:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;lattice&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="n"&gt;lattice&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"hello"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;" world"&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"hello world"&lt;/span&gt;
&lt;span class="n"&gt;lattice&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;lattice&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;
&lt;span class="n"&gt;lattice&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;lattice&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;49&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Each line is independently compiled and executed on the persistent VM. Globals defined in one line (&lt;code&gt;flux x = 10&lt;/code&gt;) are visible in subsequent lines because they're stored in the VM's environment, which persists across iterations. The &lt;code&gt;Chunk&lt;/code&gt; for each line is freed after execution — constants that matter (like global variable values) have already been deep-cloned into the environment.&lt;/p&gt;
&lt;p&gt;The other critical difference is enum persistence. &lt;code&gt;compile_module()&lt;/code&gt; frees its known-enum registry after compilation, because the compiler is done. &lt;code&gt;compile_repl()&lt;/code&gt; must not, because enums defined in REPL iteration N need to be visible in iteration N+1. The REPL calls &lt;code&gt;compiler_free_known_enums()&lt;/code&gt; only on exit. The same lifetime concern applies to parsed programs — struct and function declarations store &lt;code&gt;Expr*&lt;/code&gt; pointers that compiled chunks reference at runtime. The REPL accumulates all parsed programs in a dynamic array and frees them only when the session ends.&lt;/p&gt;
&lt;h3&gt;The Global Mutation Bug&lt;/h3&gt;
&lt;p&gt;This is the story I find most instructive, because it reveals a subtle interaction between two independently reasonable design decisions.&lt;/p&gt;
&lt;p&gt;Lattice has &lt;strong&gt;deep-clone-on-read&lt;/strong&gt; semantics. When you access a variable, the environment doesn't hand you a reference to the stored value — it hands you a fresh deep clone. This eliminates aliasing entirely: two variables never share underlying memory, passing a map to a function gives the function its own copy, and there's no way to create spooky action at a distance through shared mutable state.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;env_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;lat_map_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;scopes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="mi"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value_deep_clone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// always a fresh copy&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is expensive but correct. It gives Lattice pure value semantics without needing a borrow checker or persistent data structures.&lt;/p&gt;
&lt;p&gt;The tree-walking evaluator handles in-place mutation (like &lt;code&gt;array.push()&lt;/code&gt;) with a separate &lt;code&gt;resolve_lvalue()&lt;/code&gt; mechanism that obtains a direct mutable pointer into the environment's storage, bypassing the deep clone. Push, pop, index assignment — these all go through &lt;code&gt;resolve_lvalue&lt;/code&gt; and mutate the stored value directly.&lt;/p&gt;
&lt;p&gt;The bytecode VM needed the same distinction. For local variables, this is straightforward: locals live on the value stack, and the VM has a direct pointer to them via &lt;code&gt;frame-&amp;gt;slots[slot]&lt;/code&gt;. I added &lt;code&gt;OP_INVOKE_LOCAL&lt;/code&gt;, which takes a stack slot index as an operand and passes a pointer to &lt;code&gt;vm_invoke_builtin()&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;OP_INVOKE_LOCAL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;method_idx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;arg_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;method_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;constants&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;method_idx&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;slots&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// direct pointer&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm_invoke_builtin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;method_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;arg_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;local_var_name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// builtin mutated obj in-place — mutation persists&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... fall through to closure/method dispatch&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When &lt;code&gt;.push()&lt;/code&gt; grows the array by reallocating &lt;code&gt;obj-&amp;gt;as.array.elems&lt;/code&gt; and incrementing &lt;code&gt;obj-&amp;gt;as.array.len&lt;/code&gt;, it's directly modifying the stack slot. The mutation persists because &lt;code&gt;obj&lt;/code&gt; &lt;em&gt;is&lt;/em&gt; the variable.&lt;/p&gt;
&lt;p&gt;For globals, the situation is different. Globals live in the environment (a scope-chain of hash maps), and &lt;code&gt;env_get()&lt;/code&gt; deep-clones. The generic &lt;code&gt;OP_INVOKE&lt;/code&gt; opcode works by evaluating the receiver expression onto the stack — which, for a global variable, means emitting &lt;code&gt;OP_GET_GLOBAL&lt;/code&gt;, which calls &lt;code&gt;env_get()&lt;/code&gt;, which deep-clones — and then dispatching the method on the cloned value. After the builtin mutates the clone, &lt;code&gt;OP_INVOKE&lt;/code&gt; pops and &lt;em&gt;frees&lt;/em&gt; it. The mutation vanishes.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux nums = [1, 2, 3]
nums.push(4)
print(nums)  // still [1, 2, 3] — the push mutated a clone
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is the kind of bug that's obvious in retrospect but invisible when you're implementing things one piece at a time. &lt;code&gt;env_get()&lt;/code&gt; deep-cloning is correct. &lt;code&gt;OP_INVOKE&lt;/code&gt; popping the receiver after dispatch is correct. Each piece behaves correctly in isolation. The bug emerges from their composition.&lt;/p&gt;
&lt;p&gt;The fix is &lt;code&gt;OP_INVOKE_GLOBAL&lt;/code&gt; — a new opcode that knows the receiver is a global variable and writes back after mutation:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;OP_INVOKE_GLOBAL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name_idx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;method_idx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;arg_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;global_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;constants&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name_idx&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;method_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;constants&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;method_idx&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;obj_val&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;env_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;global_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;obj_val&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;VM_ERROR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"undefined variable '%s'"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;global_name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm_invoke_builtin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;obj_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;method_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;arg_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;global_name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* handle error */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// Write back the mutated clone to the environment&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;env_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;global_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;obj_val&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... fall through for non-builtin methods&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The compiler emits &lt;code&gt;OP_INVOKE_GLOBAL&lt;/code&gt; when it sees a method call on an identifier that isn't a local variable or an upvalue:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;EXPR_METHOD_CALL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;method_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;EXPR_IDENT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;resolve_local&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;method_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="c1"&gt;// ... emit OP_INVOKE_LOCAL&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;upvalue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;resolve_upvalue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;method_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;upvalue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="c1"&gt;// Not local, not upvalue — must be global&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="c1"&gt;// ... emit OP_INVOKE_GLOBAL&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... fall through to generic OP_INVOKE&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This gives us three tiers of method dispatch: &lt;code&gt;OP_INVOKE_LOCAL&lt;/code&gt; for locals (direct pointer, no clone), &lt;code&gt;OP_INVOKE_GLOBAL&lt;/code&gt; for globals (clone + write-back), and &lt;code&gt;OP_INVOKE&lt;/code&gt; for everything else (computed receivers like &lt;code&gt;get_array().push(x)&lt;/code&gt;, where there's nothing to write back to). With the fix:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux nums = [1, 2, 3]
nums.push(4)
nums.push(5)
print(nums)  // [1, 2, 3, 4, 5] — mutations persist
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;All mutating builtins — &lt;code&gt;push&lt;/code&gt;, &lt;code&gt;pop&lt;/code&gt;, &lt;code&gt;set&lt;/code&gt;, &lt;code&gt;remove&lt;/code&gt;, &lt;code&gt;insert&lt;/code&gt;, &lt;code&gt;remove_at&lt;/code&gt; — now work correctly on global variables. The same pattern applies to maps, sets, and any other type with in-place methods.&lt;/p&gt;
&lt;p&gt;The broader lesson is that deep-clone-on-read semantics create an impedance mismatch with in-place mutation. In a reference-based language, &lt;code&gt;obj.push(x)&lt;/code&gt; just works — &lt;code&gt;obj&lt;/code&gt; is a reference, and the mutation happens wherever the reference points. In a value-based language, you need to explicitly handle the write-back for every level of variable storage. The tree-walker's &lt;code&gt;resolve_lvalue&lt;/code&gt; is one solution. The VM's tiered invoke opcodes are another. Both exist because of the same underlying tension.&lt;/p&gt;
&lt;h3&gt;The WASM Playground&lt;/h3&gt;
&lt;p&gt;Lattice's browser-based &lt;a href="https://baud.rs/odS816"&gt;playground&lt;/a&gt; compiles the entire VM to WebAssembly via Emscripten. The WASM API exposes four functions: &lt;code&gt;lat_init()&lt;/code&gt;, &lt;code&gt;lat_run_line()&lt;/code&gt;, &lt;code&gt;lat_is_complete()&lt;/code&gt;, and &lt;code&gt;lat_destroy()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The playground runs the same bytecode VM as the native binary. Each line of input goes through the same pipeline: lex, parse, &lt;code&gt;compile_repl()&lt;/code&gt;, &lt;code&gt;vm_run()&lt;/code&gt;. The &lt;code&gt;lat_is_complete()&lt;/code&gt; function checks bracket depth to determine whether the user is mid-expression, enabling multi-line input by waiting for balanced braces before compiling.&lt;/p&gt;
&lt;p&gt;Previously the playground used the tree-walking evaluator, which meant code could behave differently in the browser than on the command line. Switching the WASM build to the bytecode VM eliminates that inconsistency — the playground, the REPL, and file execution all use the same compilation and execution path.&lt;/p&gt;
&lt;h3&gt;What Didn't Change&lt;/h3&gt;
&lt;p&gt;It's worth noting what the bytecode VM &lt;em&gt;doesn't&lt;/em&gt; change about Lattice.&lt;/p&gt;
&lt;p&gt;The value representation is identical. A &lt;code&gt;LatValue&lt;/code&gt; is still a tagged union with a type tag, phase tag, and payload. Phase transitions still deep-clone data across heap regions. The dual-heap architecture — mark-and-sweep for fluid data, arena-based regions for crystal data — is unchanged. Global variables still live in a scope-chain environment.&lt;/p&gt;
&lt;p&gt;The parser and AST are completely shared. The compiler reads the same &lt;code&gt;Program&lt;/code&gt; structure that the tree-walker reads. A single set of test programs validates both execution paths — all 771 tests pass on both.&lt;/p&gt;
&lt;p&gt;The phase system compiles one-to-one. &lt;code&gt;freeze()&lt;/code&gt; becomes &lt;code&gt;OP_FREEZE&lt;/code&gt;. &lt;code&gt;thaw()&lt;/code&gt; becomes &lt;code&gt;OP_THAW&lt;/code&gt;. Bonds, reactions, seeds, pressure constraints — each has a corresponding opcode that does exactly what the tree-walker's evaluator function did, just driven by bytecode dispatch instead of recursive AST traversal.&lt;/p&gt;
&lt;h3&gt;Performance&lt;/h3&gt;
&lt;p&gt;I haven't done rigorous benchmarking, and I'm deliberately not making performance claims. The motivation for the bytecode VM wasn't speed — it was consistency (one execution path everywhere) and architectural cleanliness (the VM is easier to extend than the tree-walker's deeply nested switch statements).&lt;/p&gt;
&lt;p&gt;That said, bytecode VMs are generally faster than tree-walkers for the structural reasons mentioned earlier: better cache locality (sequential byte array vs. pointer-chasing through AST nodes), less call overhead (one switch dispatch vs. recursive function calls), and a compact representation that fits more of the program in cache. Whether this matters for Lattice programs depends on the workload. For a language whose core runtime cost is dominated by deep cloning, the dispatch overhead is rarely the bottleneck.&lt;/p&gt;
&lt;h3&gt;Looking Forward&lt;/h3&gt;
&lt;p&gt;The VM is feature-complete but not optimized. There's no constant folding, no dead code elimination, no register allocation (it's a pure stack machine). The &lt;code&gt;OP_SCOPE&lt;/code&gt; and &lt;code&gt;OP_SELECT&lt;/code&gt; concurrency opcodes still delegate to the tree-walker. The dispatch loop is a plain switch rather than computed gotos.&lt;/p&gt;
&lt;p&gt;These are all well-understood optimizations with clear implementation paths. The point of v0.3.1 is that the bytecode VM is now the default, passes all tests, and handles the full language surface including the phase system. Optimization is a separate project.&lt;/p&gt;
&lt;p&gt;The source code is at &lt;a href="https://baud.rs/fIe3gx"&gt;github.com/ajokela/lattice&lt;/a&gt;, and you can try it in the browser at &lt;a href="https://baud.rs/bwvnYT"&gt;lattice-lang.web.app&lt;/a&gt;. The bytecode VM, compiler, REPL, and all 62 opcodes are in four files: &lt;code&gt;compiler.c&lt;/code&gt;, &lt;code&gt;vm.c&lt;/code&gt;, &lt;code&gt;chunk.c&lt;/code&gt;, and &lt;code&gt;opcode.c&lt;/code&gt;.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;git clone https://github.com/ajokela/lattice.git
cd lattice &amp;amp;&amp;amp; make
./clat
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Recommended Resources&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/crafting-interpreters"&gt;&lt;em&gt;Crafting Interpreters&lt;/em&gt;&lt;/a&gt; by Robert Nystrom - The definitive guide to building interpreters and bytecode VMs, and a major influence on Lattice's upvalue implementation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/P6ofTE"&gt;&lt;em&gt;Writing A Compiler In Go&lt;/em&gt;&lt;/a&gt; by Thorsten Ball - Practical companion covering bytecode compilation and stack-based VMs&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/BSTqlt"&gt;&lt;em&gt;Engineering a Compiler&lt;/em&gt;&lt;/a&gt; by Cooper &amp;amp; Torczon - Comprehensive treatment of compiler internals from front-end to optimization&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/JhMFPU"&gt;&lt;em&gt;Compilers: Principles, Techniques, and Tools&lt;/em&gt;&lt;/a&gt; by Aho, Lam, Sethi, Ullman - The classic &lt;em&gt;Dragon Book&lt;/em&gt; covering parsing, code generation, and optimization theory&lt;/li&gt;
&lt;/ul&gt;</description><category>bytecode</category><category>c</category><category>compilers</category><category>interpreters</category><category>language design</category><category>lattice</category><category>programming languages</category><category>virtual machine</category><guid>https://tinycomputers.io/posts/from-tree-walker-to-bytecode-vm-compiling-lattice.html</guid><pubDate>Tue, 17 Feb 2026 18:00:00 GMT</pubDate></item><item><title>Image Editing on 10-Year-Old GPUs: NVIDIA P40 vs AMD Strix Halo</title><link>https://tinycomputers.io/posts/image-editing-on-10-year-old-gpus-nvidia-p40-vs-amd-strix-halo.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/image-editing-on-10-year-old-gpus-nvidia-p40-vs-amd-strix-halo_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;20 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;There's a certain satisfaction in making old hardware do new tricks. When NVIDIA released the Tesla P40 in 2016, deep learning was still finding its footing. ImageNet classification was the benchmark everyone cared about, GANs were generating blurry faces, and the idea of a 57-billion-parameter image editing model would have seemed like science fiction.&lt;/p&gt;
&lt;p&gt;Around the middle of 2017, when the P40 would have been seeing peak adoption in datacenters, I found myself in an advanced pattern recognition course, my final credits needed for a masters in computer science — the name hadn't been updated to reflect more contemporary terminology like "machine learning," let alone "deep learning." The textbook was Bishop's &lt;a href="https://baud.rs/pme3zz"&gt;&lt;em&gt;Pattern Recognition and Machine Learning&lt;/em&gt;&lt;/a&gt;, a book that managed to make Bayesian inference feel both rigorous and approachable. We spent the last two weeks of the course looking at deep learning using TensorFlow, but we didn't even have GPU infrastructure available. Everything ran on CPU. It would have been great to have experienced the P40 in its prime, when 24 GB of VRAM and 3,840 CUDA cores made it one of the most capable inference GPUs money could buy. Instead, I'm getting acquainted with it a decade later, asking it to do things its designers never imagined.&lt;/p&gt;
&lt;p&gt;Fast forward to 2026, and here I am, running a 57-billion-parameter model on four of these decade-old GPUs — and comparing the results against AMD's latest Strix Halo APU, a chip that didn't exist until 2025.&lt;/p&gt;
&lt;p&gt;The model in question is &lt;a href="https://baud.rs/W8MlgE"&gt;FireRed-Image-Edit-1.0&lt;/a&gt; from FireRedTeam, a 57.7GB diffusion model built on the QwenImageEditPlusPipeline architecture. It takes an input image and a text prompt, then produces an edited version. The kind of thing that would have required a massive cloud GPU a couple of years ago.&lt;/p&gt;
&lt;p&gt;This post documents the full journey: the precision pitfalls of running modern diffusion models on Pascal-era GPUs, the quantization trade-offs that make or break image quality, and the head-to-head performance comparison that produced some genuinely surprising results. All of the inference scripts and output images are available on &lt;a href="https://baud.rs/V3qpTJ"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;The Hardware&lt;/h3&gt;
&lt;h4&gt;NVIDIA Tesla P40 (2016)&lt;/h4&gt;
&lt;p&gt;The P40 was NVIDIA's inference-focused datacenter GPU from the Pascal generation. The key specs for our purposes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Architecture&lt;/strong&gt;: Pascal (sm_6.1)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CUDA Cores&lt;/strong&gt;: 3,840&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory&lt;/strong&gt;: 24 GB GDDR5X&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory Bandwidth&lt;/strong&gt;: 346 GB/s&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;FP32 Performance&lt;/strong&gt;: 12 TFLOPS&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;FP16 Performance&lt;/strong&gt;: Limited — no native FP16 tensor cores&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BF16 Support&lt;/strong&gt;: None&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Price today&lt;/strong&gt;: &lt;a href="https://baud.rs/QaDJDo"&gt;~$100-200 per card on the secondary market&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I have four of these cards in a server, giving me 96 GB of total VRAM — but spread across four separate memory spaces, which introduces its own challenges.&lt;/p&gt;
&lt;h4&gt;AMD Ryzen AI MAX+ 395 / Strix Halo (2025)&lt;/h4&gt;
&lt;p&gt;AMD's Strix Halo is a different beast entirely. It's an APU — CPU and GPU on the same die, sharing the same memory pool:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GPU Architecture&lt;/strong&gt;: RDNA 3.5 (gfx1151)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compute Units&lt;/strong&gt;: 40 CUs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory&lt;/strong&gt;: 128 GB unified LPDDR5X (32 GB for CPU, 96 GB for VRAM)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory Bandwidth&lt;/strong&gt;: ~256 GB/s (shared)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BF16 Support&lt;/strong&gt;: Yes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;FP16 Support&lt;/strong&gt;: Yes (Fast F16 Operation)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ROCm&lt;/strong&gt;: 7.9.0&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Price&lt;/strong&gt;: &lt;a href="https://baud.rs/q87EAZ"&gt;~$2,000+ for the complete system&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The unified memory architecture means all 96 GB is accessible to the GPU without any PCIe transfer overhead, and the entire model can live in a single memory space.&lt;/p&gt;
&lt;h3&gt;The Model: FireRed-Image-Edit-1.0&lt;/h3&gt;
&lt;p&gt;FireRed-Image-Edit is a diffusion-based image editing model with three major components:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Transformer&lt;/td&gt;
&lt;td&gt;40.9 GB&lt;/td&gt;
&lt;td&gt;QwenImageTransformer2DModel, 60 layers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text Encoder&lt;/td&gt;
&lt;td&gt;16.6 GB&lt;/td&gt;
&lt;td&gt;Qwen2.5-VL 7B vision-language model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VAE&lt;/td&gt;
&lt;td&gt;~0.3 GB&lt;/td&gt;
&lt;td&gt;AutoencoderKL for encoding/decoding images&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Total: &lt;strong&gt;57.7 GB&lt;/strong&gt; of model weights. The scheduler is FlowMatchEulerDiscreteScheduler, and the pipeline uses true classifier-free guidance (CFG), which roughly doubles the memory needed during inference since it runs both conditional and unconditional passes.&lt;/p&gt;
&lt;p&gt;The test task: take this input image and apply the prompt &lt;em&gt;"Add a red hat on the cat"&lt;/em&gt; — the model draws a cat wearing a red hat onto the book cover, rendered in the style of the O'Reilly animal illustrations.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/firered-input.png.webp" alt="Input image — a person holding an O'Reilly Python book" style="width: 480px; box-shadow: 0 30px 40px rgba(0,0,0,.1); float: right; padding: 20px;" loading="lazy"&gt;&lt;/p&gt;
&lt;h3&gt;The P40 Challenge: When FP16 Breaks Everything&lt;/h3&gt;
&lt;h4&gt;The Precision Problem&lt;/h4&gt;
&lt;p&gt;The first — and biggest — challenge with the P40s is numerical precision. Modern diffusion models are designed for BF16 (bfloat16), which has the same exponent range as FP32 (8 exponent bits, range ±3.4×10³⁸) but with reduced mantissa precision. The P40, being a Pascal-era GPU, supports neither BF16 nor proper FP16 tensor operations.&lt;/p&gt;
&lt;p&gt;FP16 has only 5 exponent bits, giving it a range of ±65,504. This might seem sufficient, but the diffusion scheduler's internal sigma values and the VAE's convolution operations routinely produce intermediate values that overflow this range. The FlowMatchEulerDiscreteScheduler, in particular, works with sigma schedules that can produce large intermediate values during the noise prediction and scaling steps. When these overflow FP16's limited range, they become NaN or Inf, and these corrupt values propagate through every subsequent operation — matrix multiplications, attention computations, residual connections — until the entire tensor is garbage.&lt;/p&gt;
&lt;p&gt;The result: NaN propagation that silently corrupts the entire pipeline, producing an all-black output image.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/firered-p40-fp16-black.png.webp" alt="The output of FP16 inference on the P40 — a completely black image from NaN corruption" style="width: 480px; box-shadow: 0 30px 40px rgba(0,0,0,.1); float: left; padding: 20px;" loading="lazy"&gt;&lt;/p&gt;
&lt;p&gt;This was the most time-consuming discovery in the entire project. The model would load, the progress bar would advance through all 40 denoising steps without any indication of trouble, and then the output would be perfectly black — &lt;code&gt;mean=0.0, min=0, max=0&lt;/code&gt;. No error messages. No warnings. No NaN detection exceptions. Just silent numerical corruption that only becomes visible when you look at the final image.&lt;/p&gt;
&lt;p&gt;The debugging process was particularly frustrating because the corruption happens gradually. Partial NaN contamination in early steps doesn't crash anything — the attention mechanisms and residual connections continue to produce tensor outputs of the expected shapes. The model appears to be working normally right up until the final image is decoded from all-zero latents.&lt;/p&gt;
&lt;h4&gt;The FP32 Solution (and a Speed Surprise)&lt;/h4&gt;
&lt;p&gt;The fix was to run the entire pipeline in FP32 — scheduler, VAE, and all non-quantized transformer layers. The quantized weights themselves stay compressed (INT8 or NF4), but every arithmetic operation uses full 32-bit precision.&lt;/p&gt;
&lt;p&gt;It wasn't enough to just set the quantization compute dtype to FP32 — that only fixes the dequantized matmul operations inside the quantized layers. The scheduler's sigma arithmetic, the VAE's convolution operations, and the non-quantized components (layer norms, biases, attention scaling) all needed FP32 as well. Similarly, loading the pipeline with &lt;code&gt;torch_dtype=torch.float32&lt;/code&gt; but leaving the transformer's non-quantized layers in FP16 caused a dtype mismatch in the attention mechanism — PyTorch's scaled dot-product attention requires query, key, and value tensors to share the same dtype. Every component in the computational chain needed to be FP32.&lt;/p&gt;
&lt;p&gt;The one exception is the text encoder, which runs once before the denoising loop begins. It stays in FP16 on its own GPU, and its output embeddings are upcast to FP32 when transferred to the main device. This is safe because the text encoder doesn't participate in the iterative process where precision errors compound.&lt;/p&gt;
&lt;p&gt;Here's where things got interesting: &lt;strong&gt;FP32 was actually faster than FP16 on the P40.&lt;/strong&gt; The first attempts with FP16 ran at approximately 9 minutes per denoising step. After switching to FP32, the same operations completed in about 2.4 minutes per step with NF4, and 1.5 minutes per step with INT8. The P40's FP32 throughput is its native strength — it was designed for FP32 datacenter inference, after all. FP16 on Pascal is handled through slower pathways that add overhead rather than saving it.&lt;/p&gt;
&lt;h4&gt;Multi-GPU Device Orchestration&lt;/h4&gt;
&lt;p&gt;With 57.7 GB of model weights and only 24 GB per GPU, some form of model sharding or quantization is mandatory. After extensive testing, the optimal configuration for the P40s turned out to be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GPU 0&lt;/strong&gt;: INT8-quantized transformer (~22 GB)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU 1&lt;/strong&gt;: Text encoder in FP16 (~16.6 GB)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU 2&lt;/strong&gt;: VAE in FP32 (~6.6 GB including decode workspace)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU 3&lt;/strong&gt;: Unused&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This layout requires significant monkey-patching of the diffusers pipeline. The &lt;code&gt;_execution_device&lt;/code&gt; property must be overridden to ensure latents are created on the correct GPU. The &lt;code&gt;encode_prompt&lt;/code&gt; method needs patching to route inputs to the text encoder's GPU and move the resulting embeddings back. And for the INT8 configuration, the VAE's encode and decode methods need wrappers to handle cross-device tensor transfers.&lt;/p&gt;
&lt;p&gt;The text encoder stays in FP16 because it fits on a single GPU and its outputs are immediately upcast to FP32 when moved to the main device. This is safe because the text encoder runs once at the beginning — it doesn't participate in the iterative denoising loop where precision matters most.&lt;/p&gt;
&lt;h4&gt;Quantization Quality: INT8 vs NF4&lt;/h4&gt;
&lt;p&gt;With the FP32 pipeline in place, I tested both INT8 (8-bit) and NF4 (4-bit) quantization for the transformer:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NF4 (4-bit quantization):&lt;/strong&gt;
The NF4 approach uses bitsandbytes' normalized float 4-bit quantization with double quantization enabled. The transformer compresses from 40.9 GB to roughly 10 GB, easily fitting on a single P40 alongside the VAE. However, the output quality was significantly degraded — heavy noise and grain throughout the image, even at the full 40 denoising steps. Each denoising step introduces small numerical errors from the 4-bit weight approximations, and these errors compound across 40 iterations.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/firered-p40-nf4-noisy.png.webp" alt="P40 NF4 output — 4-bit quantization introduces heavy noise that compounds over 40 denoising steps" style="width: 480px; box-shadow: 0 30px 40px rgba(0,0,0,.1); float: right; padding: 20px;" loading="lazy"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;INT8 (8-bit quantization):&lt;/strong&gt;
INT8 produced dramatically better results. The output was clean and sharp, visually comparable to what you'd expect from full-precision inference on a modern GPU. The 8-bit precision preserves enough information in the weights that the per-step errors don't accumulate into visible artifacts.&lt;/p&gt;
&lt;div style="clear: both;"&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/firered-p40-int8-clean.png.webp" alt="P40 INT8 output — clean and sharp, with a cat in a red hat added to the book cover" style="width: 480px; box-shadow: 0 30px 40px rgba(0,0,0,.1); float: left; padding: 20px;" loading="lazy"&gt;&lt;/p&gt;
&lt;p&gt;The trade-off is memory: the INT8 transformer occupies ~22 GB, nearly filling an entire P40. This is why the VAE had to move to a third GPU — there wasn't enough headroom on GPU 0 for the VAE's convolution workspace during the decode phase. An early attempt that kept the VAE on GPU 0 ran all 40 denoising steps successfully, only to crash with an out-of-memory error at the very last operation.&lt;/p&gt;
&lt;div style="clear: both;"&gt;&lt;/div&gt;

&lt;h3&gt;The Strix Halo Experience: Simplicity Wins&lt;/h3&gt;
&lt;h4&gt;BF16 Full Precision&lt;/h4&gt;
&lt;p&gt;Running the same model on the Strix Halo was refreshingly simple. With 96 GB of unified VRAM and native BF16 support, the entire pipeline loads in a few lines:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;QwenImageEditPlusPipeline&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"cuda:0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;No quantization. No multi-GPU patching. No device transfer hooks. No FP32 workarounds. The model loads in BF16 and runs natively.&lt;/p&gt;
&lt;div style="clear: both;"&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/firered-strix-bf16-clean.png.webp" alt="Strix Halo BF16 output — visually identical to the P40 INT8 result" style="width: 480px; box-shadow: 0 30px 40px rgba(0,0,0,.1); float: right; padding: 20px;" loading="lazy"&gt;&lt;/p&gt;
&lt;p&gt;During inference, the pipeline consumed approximately 75 GB of VRAM (the true CFG doubles the workspace requirements), well within the 96 GB budget.&lt;/p&gt;
&lt;p&gt;The first run did take about 35 minutes of JIT kernel compilation before producing any inference steps — ROCm compiles HIP kernels for the gfx1151 architecture on first use. During this phase, the GPU sits at 100% utilization with no visible progress, which can be alarming if you're not expecting it. The GPU temperature climbed from 31°C idle to 69°C, and power draw went from 9W to 119W as the compiler worked through the hundreds of unique kernel configurations needed by a 60-layer transformer. These compiled kernels are cached, so subsequent runs skip this overhead entirely.&lt;/p&gt;
&lt;div style="clear: both;"&gt;&lt;/div&gt;

&lt;h4&gt;Quantization on Strix Halo: Does It Help?&lt;/h4&gt;
&lt;p&gt;Given the surprising performance parity between the two systems at full precision, I tested whether quantization could speed up the Strix Halo by reducing memory traffic. The theory was that if the workload is memory-bandwidth-limited, smaller model weights should mean faster inference.&lt;/p&gt;
&lt;p&gt;The results were definitive:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;Per-Step Time&lt;/th&gt;
&lt;th&gt;40-Step Estimate&lt;/th&gt;
&lt;th&gt;VRAM Used&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;BF16 (full precision)&lt;/td&gt;
&lt;td&gt;82.6s&lt;/td&gt;
&lt;td&gt;55 min&lt;/td&gt;
&lt;td&gt;~75 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NF4 (4-bit)&lt;/td&gt;
&lt;td&gt;83.5s&lt;/td&gt;
&lt;td&gt;56 min&lt;/td&gt;
&lt;td&gt;~30 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;INT8 (8-bit)&lt;/td&gt;
&lt;td&gt;94.9s&lt;/td&gt;
&lt;td&gt;63 min&lt;/td&gt;
&lt;td&gt;~44 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;NF4 quantization produced virtually identical speed to full BF16. The model shrank from 75 GB to 30 GB of VRAM usage, but inference time didn't improve at all. INT8 was actually &lt;em&gt;slower&lt;/em&gt; — the bitsandbytes INT8 matmul path adds dequantization overhead that more than offsets any memory bandwidth savings.&lt;/p&gt;
&lt;p&gt;This tells us something important about the Strix Halo's performance profile for this workload: &lt;strong&gt;it's compute-bound, not memory-bound.&lt;/strong&gt; The RDNA 3.5 GPU's 40 compute units are the bottleneck, not the LPDDR5X memory bandwidth. Reducing the model size doesn't help because the GPU is already busy with arithmetic, not waiting on memory.&lt;/p&gt;
&lt;p&gt;This contrasts with LLM inference workloads (text generation), where the Strix Halo's large memory pool is a genuine advantage — LLM token generation is almost entirely memory-bound, making quantization directly translate to speed improvements. Each token generation pass reads the entire model's weights but performs relatively little computation per weight. Diffusion models are the opposite: each denoising step runs a full forward pass through 60 transformer layers with dense matrix multiplications, attention computations, and residual connections. The arithmetic intensity is much higher, putting the pressure squarely on the GPU's compute units rather than its memory subsystem.&lt;/p&gt;
&lt;h3&gt;Head-to-Head: The Numbers&lt;/h3&gt;
&lt;p&gt;Here's the complete performance comparison across all tested configurations:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;Per-Step&lt;/th&gt;
&lt;th&gt;40 Steps&lt;/th&gt;
&lt;th&gt;Image Quality&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Strix Halo&lt;/td&gt;
&lt;td&gt;BF16 full precision&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;82.6s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;55 min&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Clean, sharp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strix Halo&lt;/td&gt;
&lt;td&gt;NF4 (4-bit)&lt;/td&gt;
&lt;td&gt;83.5s&lt;/td&gt;
&lt;td&gt;56 min&lt;/td&gt;
&lt;td&gt;Clean (10-step test)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strix Halo&lt;/td&gt;
&lt;td&gt;INT8 (8-bit)&lt;/td&gt;
&lt;td&gt;94.9s&lt;/td&gt;
&lt;td&gt;63 min&lt;/td&gt;
&lt;td&gt;Clean (10-step test)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4× P40&lt;/td&gt;
&lt;td&gt;INT8 + FP32 pipeline&lt;/td&gt;
&lt;td&gt;87.5s&lt;/td&gt;
&lt;td&gt;58 min&lt;/td&gt;
&lt;td&gt;Clean, sharp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4× P40&lt;/td&gt;
&lt;td&gt;NF4 + FP32 pipeline&lt;/td&gt;
&lt;td&gt;145.9s&lt;/td&gt;
&lt;td&gt;97 min&lt;/td&gt;
&lt;td&gt;Heavy noise/grain&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The headline result: &lt;strong&gt;a single AMD Strix Halo APU from 2025 is about 6% faster per step than four NVIDIA P40s from 2016 running INT8-quantized inference.&lt;/strong&gt; That's not exactly the generational leap you might expect from a decade of GPU evolution.&lt;/p&gt;
&lt;p&gt;To be fair, the comparison isn't entirely apples-to-apples. The P40 is running an 8-bit quantized model (less computation per step but with dequantization overhead), while the Strix Halo runs the full BF16 model. The P40's dedicated GDDR5X provides 346 GB/s of bandwidth to a single GPU, while the Strix Halo's LPDDR5X shares its ~256 GB/s between the CPU and GPU. And the P40 setup requires three GPUs working in concert, while the Strix Halo uses a single unified memory space.&lt;/p&gt;
&lt;h3&gt;Lessons Learned&lt;/h3&gt;
&lt;h4&gt;Old GPUs Are Surprisingly Capable&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://baud.rs/QaDJDo"&gt;Four P40s&lt;/a&gt; at ~$500 total produce inference quality and speed that's competitive with a &lt;a href="https://baud.rs/q87EAZ"&gt;$2,000+ modern APU system&lt;/a&gt;. The P40's 346 GB/s memory bandwidth per card and strong FP32 throughput remain relevant even for models that were designed for hardware two generations newer. The main challenge is software engineering — working around the precision limitations and multi-GPU complexity takes significant effort.&lt;/p&gt;
&lt;h4&gt;Precision Matters More Than Speed&lt;/h4&gt;
&lt;p&gt;The single most impactful discovery in this project was that FP16 silently corrupts diffusion model outputs on Pascal GPUs. There are no error messages, no NaN warnings during inference — just a black image at the end. The fix (using FP32 everywhere) actually improved performance, which was counterintuitive. The lesson: when dealing with older hardware, always validate your numerical precision assumptions before optimizing for speed.&lt;/p&gt;
&lt;h4&gt;Quantization Is Not Free&lt;/h4&gt;
&lt;p&gt;On the P40s, INT8 quantization was essential (the model simply wouldn't fit otherwise) and produced excellent results. NF4 was too aggressive — the 4-bit precision degraded output quality visibly.&lt;/p&gt;
&lt;p&gt;On the Strix Halo, quantization was unnecessary and even counterproductive. INT8 added overhead without any speed benefit, and NF4 didn't save time despite dramatically reducing memory usage. The takeaway: quantization's value depends entirely on your bottleneck. If you're compute-bound, smaller weights don't help.&lt;/p&gt;
&lt;h4&gt;Unified Memory Is Underrated&lt;/h4&gt;
&lt;p&gt;The Strix Halo's greatest advantage wasn't raw performance — it was simplicity. Loading a 57.7 GB model into a single 96 GB memory space eliminates an entire category of engineering problems: no device placement, no cross-GPU tensor transfers, no monkey-patching encode/decode methods, no VAE OOM surprises at the decode step. The inference script for the Strix Halo is about 50 lines. The P40 version is over 150, most of it careful device orchestration code.&lt;/p&gt;
&lt;p&gt;For anyone who values development velocity and code maintainability over squeezing the last dollar of cost-efficiency out of used datacenter hardware, unified memory APUs have a compelling argument even when they don't win on raw throughput.&lt;/p&gt;
&lt;h3&gt;What About Newer NVIDIA GPUs?&lt;/h3&gt;
&lt;p&gt;It's worth putting these numbers in context. An NVIDIA RTX 4090 with 24 GB of VRAM and native BF16/FP16 tensor core support would likely run this model (with INT8 quantization) at roughly 10-15 seconds per step — 5-8x faster than either system tested here. An A100 with 80 GB could run it unquantized in BF16 at similar or better speeds. The P40 and Strix Halo are both firmly in the "budget/accessible" tier of AI hardware.&lt;/p&gt;
&lt;p&gt;The more interesting comparison is cost-per-step. &lt;a href="https://baud.rs/QaDJDo"&gt;Four P40s from eBay&lt;/a&gt; cost about $500 total (plus a server that can host them). The &lt;a href="https://baud.rs/q87EAZ"&gt;Strix Halo system&lt;/a&gt; runs about $2,000+. Both produce essentially the same result at the same speed. The P40 route demands more engineering knowledge; the Strix Halo route demands more money.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Both systems successfully ran a 57.7 GB diffusion model that would have been considered impossibly large for consumer hardware just a few years ago. The P40s did it through clever quantization and multi-GPU orchestration. The Strix Halo did it by brute force — 96 GB of memory and native BF16 support.&lt;/p&gt;
&lt;p&gt;The performance story is more nuanced than "newer is always better." For diffusion model inference, the NVIDIA P40 — a card you can buy for $100 on eBay — remains remarkably competitive when properly configured. It requires more engineering effort, and you need to know the precision pitfalls, but the results speak for themselves.&lt;/p&gt;
&lt;p&gt;The Strix Halo's strength lies not in raw speed but in its unified memory architecture and modern instruction set support. It eliminates the multi-GPU complexity entirely, runs native BF16 without precision hacks, and provides a development experience that's orders of magnitude simpler. For iterating on models, testing new architectures, or just avoiding the headaches of cross-device tensor management, that simplicity has real value.&lt;/p&gt;
&lt;p&gt;If you're considering hardware for running large diffusion models locally, the choice comes down to how you value your time versus your budget. Four P40s and a weekend of debugging will get you to roughly the same place as a Strix Halo system that just works out of the box. Both paths lead to a cat in a red hat.&lt;/p&gt;
&lt;h3&gt;Recommended Resources&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Hardware&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/QaDJDo"&gt;NVIDIA Tesla P40 24GB&lt;/a&gt; - The GPU used in this post. Available on eBay for a fraction of the original price. You'll need a server with PCIe x16 slots and adequate cooling.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/q87EAZ"&gt;GMKtec EVO-X2 (AMD Ryzen AI MAX+ 395)&lt;/a&gt; - A compact Strix Halo mini PC with 128GB unified LPDDR5X 8000MHz, WiFi 7, and USB4. A representative platform for running large models on Strix Halo.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Books&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/pme3zz"&gt;&lt;em&gt;Pattern Recognition and Machine Learning&lt;/em&gt;&lt;/a&gt; by Christopher M. Bishop - The classic that introduced many to Bayesian methods and kernel machines. Still one of the best foundations for understanding the statistical principles behind modern ML.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/dnhZCN"&gt;&lt;em&gt;Hands-On Generative AI with Transformers and Diffusion Models&lt;/em&gt;&lt;/a&gt; by Omar Sanseviero, Pedro Cuenca, Apolinário Passos, and Jonathan Whitaker - A practical guide to building and fine-tuning diffusion models using the Hugging Face ecosystem, including the diffusers library used in this post.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/vTOHER"&gt;&lt;em&gt;Understanding Deep Learning&lt;/em&gt;&lt;/a&gt; by Simon J.D. Prince - A thorough modern treatment of deep learning fundamentals through diffusion models, with excellent visualizations and mathematical rigor.&lt;/li&gt;
&lt;/ul&gt;</description><category>ai hardware</category><category>amd strix halo</category><category>benchmarks</category><category>bf16</category><category>bitsandbytes</category><category>diffusion models</category><category>firered</category><category>fp32</category><category>gfx1151</category><category>gpu computing</category><category>image generation</category><category>int8</category><category>machine learning</category><category>nf4</category><category>nvidia p40</category><category>pascal</category><category>pytorch</category><category>quantization</category><category>rdna 3.5</category><category>rocm</category><guid>https://tinycomputers.io/posts/image-editing-on-10-year-old-gpus-nvidia-p40-vs-amd-strix-halo.html</guid><pubDate>Tue, 17 Feb 2026 18:00:00 GMT</pubDate></item><item><title>Part 4: 132 Tests, Zero Failures - Verifying the Sampo CPU on Real Hardware</title><link>https://tinycomputers.io/posts/sampo-fpga-isa-verification.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/sampo-fpga-isa-verification_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;12 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;In &lt;a href="https://tinycomputers.io/posts/sampo-16-bit-risc-cpu-part-1.html"&gt;Part 1&lt;/a&gt;, we designed the Sampo 16-bit RISC architecture. In &lt;a href="https://tinycomputers.io/posts/sampo-fpga-implementation-ulx3s.html"&gt;Part 2&lt;/a&gt;, we synthesized it to an ECP5 FPGA on the ULX3S board. In &lt;a href="https://tinycomputers.io/posts/sampo-llvm-backend-rust-compiler.html"&gt;Part 3&lt;/a&gt;, we built an LLVM backend so Rust could compile for it. But there was a glaring gap in the project: we'd never systematically verified that the hardware actually implements the ISA correctly.&lt;/p&gt;
&lt;p&gt;The "Hello, Sampo!" demo program exercises maybe 10 of the CPU's 66 instructions. The LLVM backend generates code that assumes the hardware matches the spec. If a single instruction is subtly wrong - a carry flag not set, a branch offset miscalculated, a byte load sign-extending when it shouldn't - the entire toolchain is built on sand.&lt;/p&gt;
&lt;p&gt;This post documents the process of building a comprehensive test suite, running it in simulation, finding a real pipeline hazard bug in the CPU, and then the surprisingly treacherous journey of getting those tests running on real FPGA hardware.&lt;/p&gt;
&lt;h3&gt;The Test Strategy&lt;/h3&gt;
&lt;p&gt;The approach is straightforward: write assembly programs that exercise every instruction in the ISA, compare results against known-good values, and report PASS or FAIL over UART. The testbench monitors the serial output, and if it sees "FAIL" anywhere, the test run fails.&lt;/p&gt;
&lt;p&gt;Each test follows the same pattern:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;; Load known inputs&lt;/span&gt;
&lt;span class="nf"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x1234&lt;/span&gt;
&lt;span class="nf"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x5678&lt;/span&gt;

&lt;span class="c1"&gt;; Execute the instruction under test&lt;/span&gt;
&lt;span class="nf"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R9&lt;/span&gt;

&lt;span class="c1"&gt;; Check the result&lt;/span&gt;
&lt;span class="nf"&gt;MOV&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R10&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="c1"&gt;; actual value&lt;/span&gt;
&lt;span class="nf"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x68AC&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; expected value&lt;/span&gt;
&lt;span class="nf"&gt;JALX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;check_eq&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;; prints PASS or FAIL&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;check_eq&lt;/code&gt; subroutine compares R4 (actual) against R5 (expected) and prints the result over the UART. This makes the test output human-readable and machine-parseable:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;=== ALU Tests ===
ADD basic: PASS
ADD zero: PASS
ADD carry out: PASS
ADD overflow: PASS
SUB basic: PASS
...
Done.
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;The Test Framework&lt;/h3&gt;
&lt;p&gt;Every test program begins with a block of helper subroutines that handle UART communication and result reporting. The core is a busy-wait loop that polls the MC6850-compatible UART status register:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="na"&gt;.equ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;ACIA_STATUS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x80&lt;/span&gt;
&lt;span class="na"&gt;.equ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;ACIA_DATA&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;0x81&lt;/span&gt;

&lt;span class="nl"&gt;print_char:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; R5 = character to output&lt;/span&gt;
&lt;span class="nl"&gt;.wait:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;INI&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;ACIA_STATUS&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; Read status register&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;AND&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;; Copy to R7&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ADDI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-2&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="c1"&gt;; Check if TX ready (bit 1)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;BNE&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;.wait&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="c1"&gt;; Loop until ready&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;OUTI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;ACIA_DATA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;; Send character&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;JR&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;RA&lt;/span&gt;&lt;span class="w"&gt;                 &lt;/span&gt;&lt;span class="c1"&gt;; Return&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;check_eq&lt;/code&gt; helper prints "PASS" or "FAIL" based on a register comparison, and the &lt;code&gt;print_str&lt;/code&gt; helper walks a null-terminated string byte by byte. These routines are duplicated in each test file rather than linked - there's no linker in this toolchain, just a single-file assembler.&lt;/p&gt;
&lt;h3&gt;Test Coverage&lt;/h3&gt;
&lt;p&gt;We organized the tests into 10 programs, each targeting a specific area of the instruction set:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test Program&lt;/th&gt;
&lt;th&gt;Instructions Tested&lt;/th&gt;
&lt;th&gt;Test Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;test_alu&lt;/td&gt;
&lt;td&gt;ADD, SUB, AND, OR, XOR, NEG + flags&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_addi&lt;/td&gt;
&lt;td&gt;ADDI with signed immediates + flags&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_shift&lt;/td&gt;
&lt;td&gt;SLL, SRL, SRA, ROL, ROR, SWAP (1/4/8-bit variants)&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_muldiv&lt;/td&gt;
&lt;td&gt;MUL, MULH, DIV, DIVU, REM, REMU&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_loadstore&lt;/td&gt;
&lt;td&gt;LW, LB, LBU, SW, SB + offset variants&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_branch&lt;/td&gt;
&lt;td&gt;All 16 branch conditions (taken + not taken)&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_jump&lt;/td&gt;
&lt;td&gt;J, JR, JALR, JX, JALX&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_stack&lt;/td&gt;
&lt;td&gt;PUSH, POP, CMP, TEST, MOV&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_misc&lt;/td&gt;
&lt;td&gt;EXX, GETF, SETF, SCF, CCF, NOP&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_extended&lt;/td&gt;
&lt;td&gt;ADDIX, SUBIX, ANDIX, ORIX, XORIX, SLLX, SRLX, SRAX&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;132&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The branch tests are particularly thorough - each of the 16 conditions (BEQ, BNE, BLT, BGE, BLTU, BGEU, BMI, BPL, BVS, BVC, BCS, BCC, BGT, BLE, BHI, BLS) gets tested both for the taken and not-taken case. We set up flags with arithmetic, then verify the branch goes the right way.&lt;/p&gt;
&lt;h3&gt;Finding a Real Bug: The Pipeline Hazard&lt;/h3&gt;
&lt;p&gt;The first time we ran the full test suite in simulation, 130 of 132 tests passed. Two tests in &lt;code&gt;test_loadstore&lt;/code&gt; were failing: the multi-word store/load test and a load with offset test.&lt;/p&gt;
&lt;p&gt;The failing pattern was consistent: any test that performed a store followed immediately by a load from a different address would read stale data. The load would return the value from the &lt;em&gt;previous&lt;/em&gt; memory operation instead of the current one.&lt;/p&gt;
&lt;p&gt;The root cause was a pipeline hazard between the MEMORY and FETCH states. Here's what was happening:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Cycle N:   MEMORY state - store completes, mem_ready asserts
Cycle N+1: FETCH state  - new instruction fetch begins
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The problem: &lt;code&gt;mem_ready&lt;/code&gt; is a one-cycle delayed version of &lt;code&gt;mem_valid&lt;/code&gt; (because the RAM is synchronous). When the CPU transitions from MEMORY to WRITEBACK to FETCH, the &lt;code&gt;mem_ready&lt;/code&gt; signal from the store was still asserted during the first cycle of the next FETCH. The CPU latched the stale &lt;code&gt;mem_rdata&lt;/code&gt; from the previous store operation as if it were the new instruction.&lt;/p&gt;
&lt;p&gt;The fix was to add a WRITEBACK state after every MEMORY operation - not just loads, but stores too. This gives &lt;code&gt;mem_ready&lt;/code&gt; a cycle to deassert before the next FETCH begins:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;Before&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MEMORY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FETCH&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem_ready&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;still&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;high&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;After&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;MEMORY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;WRITEBACK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FETCH&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem_ready&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;deasserts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;during&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;WRITEBACK&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A one-line change to the next-state logic:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="no"&gt;`ST_MEMORY&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;begin&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem_ready&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;begin&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// Always go through WRITEBACK after MEMORY.&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// For stores: allows mem_ready to deassert before&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// next FETCH (prevents stale rdata latch).&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;next_state&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;`ST_WRITEBACK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is exactly the kind of bug that simulation catches and manual inspection misses. The instruction executes correctly in isolation - it's only the &lt;em&gt;interaction&lt;/em&gt; between consecutive memory operations that triggers the hazard. After the fix, all 132 tests passed in simulation.&lt;/p&gt;
&lt;h3&gt;Taking It to the FPGA&lt;/h3&gt;
&lt;p&gt;With simulation clean, the next step was running the tests on real hardware. The ULX3S board has an &lt;a href="https://baud.rs/bJSrEK"&gt;FTDI&lt;/a&gt; FT231X USB-serial chip connected to the FPGA, so UART output appears on a serial port at 115200 baud.&lt;/p&gt;
&lt;p&gt;There was an immediate practical problem: the test programs run fast. At 12.5 MHz, the entire 20-test ALU suite completes in about 30 milliseconds. By the time openFPGALoader finishes programming the FPGA and releases the USB port, the test output is long gone. The FTDI chip has a small receive buffer, but 364 characters of test output overflows it before you can open the serial port.&lt;/p&gt;
&lt;p&gt;The solution: patch the hex files to loop instead of halting. Replace the HALT instruction with a delay loop followed by a jump back to the reset vector. The test runs, outputs its results, waits about half a second, and starts over. You can open the serial port at any time and catch a complete iteration.&lt;/p&gt;
&lt;h4&gt;The Delay Loop Patch&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;hex_loop_patch.py&lt;/code&gt; script performs binary patching on the assembled hex files. It finds the HALT instruction (encoded as &lt;code&gt;0xE100&lt;/code&gt;) and replaces it with a delay loop:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;; Delay ~0.38 seconds at 12.5 MHz&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x0008&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; outer counter&lt;/span&gt;
&lt;span class="nl"&gt;outer:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0xFFFF&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; inner counter = 65535&lt;/span&gt;
&lt;span class="nl"&gt;inner:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ADDI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-1&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;BNE&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;inner&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ADDI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-1&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;BNE&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;outer&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;JX&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;0x0100&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="c1"&gt;; jump back to reset vector&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The first version of this script &lt;em&gt;inserted&lt;/em&gt; these 10 words at the HALT position. This seemed obviously correct. The tests ran on FPGA. Characters appeared on the serial port.&lt;/p&gt;
&lt;p&gt;They were the wrong characters.&lt;/p&gt;
&lt;h3&gt;The Address Shift Bug&lt;/h3&gt;
&lt;p&gt;The FPGA output for the "Hello, Sampo!" test program was &lt;code&gt;\x08\x08\x08\x08&lt;/code&gt; - four backspace characters, repeating forever. The ALU test suite showed truncated output with roughly 45% of characters missing. Same pattern at 12.5 MHz and 6.25 MHz, ruling out timing violations. Simulation with realistic UART timing (1,080 cycles per byte, matching the hardware baud rate) passed perfectly.&lt;/p&gt;
&lt;p&gt;I spent considerable time investigating the wrong theories. Was the UART transmitter dropping bytes? Was there a clock domain crossing issue? Was &lt;code&gt;$readmemh&lt;/code&gt; in Yosys interpreting the hex file differently from Icarus Verilog? None of these panned out.&lt;/p&gt;
&lt;p&gt;The breakthrough came from staring at &lt;code&gt;\x08&lt;/code&gt;. That's the byte value 8. Where would 8 come from? The "Hello, Sampo!" program loads its message pointer with &lt;code&gt;LIX R4, message&lt;/code&gt; where &lt;code&gt;message&lt;/code&gt; is the label for the string data. In the assembled hex, &lt;code&gt;message&lt;/code&gt; resolves to address &lt;code&gt;0x011E&lt;/code&gt; - the byte immediately after the HALT instruction.&lt;/p&gt;
&lt;p&gt;And there it was. Look at the assembly structure:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nl"&gt;done:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;HALT&lt;/span&gt;&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="c1"&gt;; address 0x011C&lt;/span&gt;
&lt;span class="nl"&gt;message:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="na"&gt;.asciz&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"Hello, Sampo!\n"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;; address 0x011E&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The string data lives immediately after HALT. When &lt;code&gt;hex_loop_patch.py&lt;/code&gt; &lt;em&gt;inserts&lt;/em&gt; 10 words of delay loop code at the HALT position, it pushes the string data down by 20 bytes. But the &lt;code&gt;LIX R4, 0x011E&lt;/code&gt; instruction still points to the original address. At &lt;code&gt;0x011E&lt;/code&gt; there's now the second word of &lt;code&gt;LIX R8, 0x0008&lt;/code&gt; - which contains the value &lt;code&gt;0x0008&lt;/code&gt;. The low byte is &lt;code&gt;0x08&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The CPU faithfully reads byte &lt;code&gt;0x08&lt;/code&gt; from the patched address, outputs it via UART, advances the pointer to &lt;code&gt;0x011F&lt;/code&gt; where the high byte is &lt;code&gt;0x00&lt;/code&gt; (the null terminator), and stops. One &lt;code&gt;\x08&lt;/code&gt; per iteration, four iterations captured. Mystery solved.&lt;/p&gt;
&lt;p&gt;This same address shift corrupted every test program. The test strings ("ADD basic: ", "PASS\n", etc.) all live after HALT and all got displaced. The CPU was reading from locations that now contained delay loop machine code instead of ASCII text. Some fragments of text survived because adjacent strings partially overlapped with their shifted locations, producing the truncated output we saw.&lt;/p&gt;
&lt;h4&gt;The Fix&lt;/h4&gt;
&lt;p&gt;The correct approach: don't shift any data. Place the delay loop at address &lt;code&gt;0x0000&lt;/code&gt; - the 256 bytes of unused memory before the &lt;code&gt;0x0100&lt;/code&gt; reset vector - and replace the single-word HALT with a single-word relative &lt;code&gt;J&lt;/code&gt; (jump) instruction that jumps backward to the loop code. One word replaces one word. No data moves.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Place delay loop at address 0x0000 (unused space)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LOOP_PATCH&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;loop_base&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;

&lt;span class="c1"&gt;# Replace HALT with J instruction to address 0x0000&lt;/span&gt;
&lt;span class="c1"&gt;# J encoding: opcode 0x9, 12-bit signed offset&lt;/span&gt;
&lt;span class="c1"&gt;# target = PC + 2 + (sign_extend(offset) &amp;lt;&amp;lt; 1)&lt;/span&gt;
&lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_addr&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;halt_addr&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="n"&gt;j_word&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mh"&gt;0x9000&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0xFFF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;halt_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;j_word&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;There's a subtle complication: the J instruction shares opcode &lt;code&gt;0x9&lt;/code&gt; with JR (register indirect jump) and JALR (jump and link register). The decoder distinguishes them by specific bit patterns in the offset field. If the calculated offset happens to have &lt;code&gt;bits[3:0] == 0x1&lt;/code&gt; and &lt;code&gt;bits[11:8] != 0xF&lt;/code&gt;, the decoder interprets it as JALR instead of J. The script tries successive target addresses (&lt;code&gt;0x0000&lt;/code&gt;, &lt;code&gt;0x0002&lt;/code&gt;, &lt;code&gt;0x0004&lt;/code&gt;, ...) until it finds one that doesn't collide with the JR/JALR encoding space.&lt;/p&gt;
&lt;p&gt;After the fix, the patched hex files have exactly the same number of words as the originals. The only changes are the delay loop code written to the zero page and the HALT word replaced with a backward jump.&lt;/p&gt;
&lt;p&gt;With the corrected patcher, the "Hello, Sampo!" program finally works on the FPGA - looping cleanly with zero character loss:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/sampo-fpga-isa-verification/HelloSampo.png" style="width: 100%; max-width: 720px; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); margin: 1em 0;" loading="lazy" alt="Terminal showing Hello, Sampo! repeating on the ULX3S FPGA via cu serial connection"&gt;&lt;/p&gt;
&lt;h3&gt;The Testbench: Trusting but Verifying&lt;/h3&gt;
&lt;p&gt;One important discovery during this process: the simulation testbench had &lt;code&gt;tx_ready = 1&lt;/code&gt; permanently. The simulated UART never pushed back on the CPU - it accepted every byte instantly. This meant the CPU's busy-wait loop (&lt;code&gt;INI R6, ACIA_STATUS / ADDI R7, -2 / BNE wait&lt;/code&gt;) was never actually tested in simulation. The status register always returned "ready," so the loop body executed zero times.&lt;/p&gt;
&lt;p&gt;On real hardware, the UART transmitter takes about 87 microseconds per byte at 115200 baud. The busy-wait loop runs hundreds of times per character, exercising the INI instruction, the AND/ADDI flag-setting sequence, and the BNE branch in a tight loop. If any of those instructions had a subtle bug, it would only manifest on hardware.&lt;/p&gt;
&lt;p&gt;We added realistic UART timing to the testbench:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;parameter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TX_BYTE_CYCLES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;108&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// ~1080 cycles per byte&lt;/span&gt;
&lt;span class="kt"&gt;reg&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mh"&gt;15&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mh"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tx_delay_cnt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;always&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="k"&gt;posedge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;clk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;begin&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tx_valid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tx_ready&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;begin&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;tx_ready&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;tx_delay_cnt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TX_BYTE_CYCLES&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tx_delay_cnt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;begin&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;tx_delay_cnt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tx_delay_cnt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tx_delay_cnt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;tx_ready&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;With this change, simulation exercises the same code paths as the hardware. All 132 tests still pass - the UART flow control logic was correct all along, it just wasn't being tested.&lt;/p&gt;
&lt;h3&gt;Running All Tests on the FPGA&lt;/h3&gt;
&lt;video controls style="width: 100%; max-width: 720px; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); margin: 0 0 1em 0;"&gt;
&lt;source src="https://tinycomputers.io/sampo-fpga-test-suite.mp4" type="video/mp4"&gt;
Your browser does not support the video tag.
&lt;/source&gt;&lt;/video&gt;

&lt;p&gt;With the patch bug fixed, we ran the complete suite. Each test requires a separate FPGA build (Yosys synthesis, nextpnr place-and-route, ecppack bitstream generation), programming via JTAG, and serial capture. The Makefile automates the entire pipeline:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nf"&gt;fpga-%&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;&lt;span class="nv"&gt;BUILD_DIR&lt;/span&gt;&lt;span class="k"&gt;)&lt;/span&gt;/&lt;span class="n"&gt;sampo_&lt;/span&gt;%.&lt;span class="n"&gt;bit&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;openFPGALoader&lt;span class="w"&gt; &lt;/span&gt;-b&lt;span class="w"&gt; &lt;/span&gt;ulx3s&lt;span class="w"&gt; &lt;/span&gt;$&amp;lt;
&lt;span class="w"&gt;    &lt;/span&gt;sleep&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;python3&lt;span class="w"&gt; &lt;/span&gt;fpga_capture.py&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;SERIAL_PORT&lt;span class="k"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;SERIAL_BAUD&lt;span class="k"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;fpga_capture.py&lt;/code&gt; script opens the serial port, discards the first partial iteration (we might join mid-stream), waits for the &lt;code&gt;=== ... ===&lt;/code&gt; header line that starts each test, captures everything until the header repeats, and outputs one clean iteration.&lt;/p&gt;
&lt;p&gt;The results:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;========================================
=== FPGA: test_alu ===
========================================
=== ALU Tests ===
ADD basic: PASS
ADD zero: PASS
ADD carry out: PASS
...
AND clr C/V: PASS
All tests passed!

========================================
=== FPGA: test_addi ===
========================================
...
All tests passed!

...

========================================
FPGA Test Summary: 10 passed, 0 failed
========================================
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;All 10 test suites pass. All 132 individual tests pass. Zero failures on real hardware.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test Suite&lt;/th&gt;
&lt;th&gt;Tests&lt;/th&gt;
&lt;th&gt;FPGA Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ALU (ADD, SUB, AND, OR, XOR)&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ADDI (immediate arithmetic)&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shift (SLL, SRL, SRA, ROL, SWAP)&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MulDiv (MUL, DIV, REM variants)&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Load/Store (LW, LB, LBU, SW, SB)&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Branch (all 16 conditions)&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jump (J, JR, JALR, JX, JALX)&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stack (PUSH, POP, CMP, TEST, MOV)&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Misc (EXX, GETF, SETF, SCF, CCF, NOP)&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extended (ADDIX, SUBIX, SLLX, etc.)&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;132&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;All PASS&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;What This Means&lt;/h3&gt;
&lt;p&gt;Having all 132 ISA tests pass on hardware is a significant milestone for the project. It means:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Verilog RTL is correct.&lt;/strong&gt; Every instruction in the Sampo ISA produces the right result, sets the right flags, and handles edge cases (zero, overflow, carry, sign extension) correctly. Not just in behavioral simulation, but in synthesized logic on a real FPGA running at 12.5 MHz.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The assembler is correct.&lt;/strong&gt; All 66 instructions encode properly. Branch offsets calculate correctly. Extended instructions (LIX, JALX, OUTX) with their 32-bit encoding work. The &lt;code&gt;sasm&lt;/code&gt; Rust assembler and the Verilog decoder agree on every instruction format.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The LLVM backend has a solid foundation.&lt;/strong&gt; When the Rust compiler generates a &lt;code&gt;ADD&lt;/code&gt; or &lt;code&gt;BNE&lt;/code&gt; or &lt;code&gt;JALX&lt;/code&gt;, the hardware will execute it correctly. The test suite doesn't exercise every possible code generation pattern, but it validates every primitive instruction that the compiler builds upon.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The UART subsystem works end-to-end.&lt;/strong&gt; Status register polling, TX busy-wait, byte transmission, baud rate generation - all verified on hardware. The MC6850-compatible interface works exactly as specified.&lt;/p&gt;
&lt;h3&gt;Lessons Learned&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Test your assumptions.&lt;/strong&gt; The testbench had &lt;code&gt;tx_ready = 1&lt;/code&gt;. It went unnoticed because simulation "worked." The real hardware exercises code paths that simulation shortcuts. Add realistic peripheral timing to your testbenches from day one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Binary patching is fragile.&lt;/strong&gt; Inserting bytes into a binary without updating references is a classic relocation bug - the same class of problem that linkers exist to solve. If your patch changes the size of anything, every address reference past the patch point is wrong. The fix - placing the patch in unused address space and using a same-size replacement instruction - avoids the problem entirely.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Simulation is necessary but not sufficient.&lt;/strong&gt; The pipeline hazard bug was caught by simulation. The address shift bug was invisible to simulation (both used the same patching script, and the original programs - without patching - worked fine). You need both simulation and hardware testing, exercising different code paths and different failure modes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Systematic testing finds bugs that demos don't.&lt;/strong&gt; "Hello, Sampo!" worked on the FPGA from day one. It exercises &lt;code&gt;LIX&lt;/code&gt;, &lt;code&gt;LBU&lt;/code&gt;, &lt;code&gt;CMP&lt;/code&gt;, &lt;code&gt;BEQ&lt;/code&gt;, &lt;code&gt;INI&lt;/code&gt;, &lt;code&gt;OUTI&lt;/code&gt;, &lt;code&gt;ADDI&lt;/code&gt;, and &lt;code&gt;J&lt;/code&gt; - about 8 instructions. The pipeline hazard only manifested when a store was followed by a load to a different address, a pattern that doesn't occur in a simple print loop. You need tests specifically designed to exercise corner cases.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The entire Sampo project - assembler, emulator, Verilog RTL, FPGA build scripts, test suite, and LLVM backend - is open source on &lt;a href="https://baud.rs/r74wA8"&gt;GitHub&lt;/a&gt;. With hardware verification complete, the next steps might be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Running Rust-compiled code on the FPGA.&lt;/strong&gt; The LLVM backend generates assembly, the assembler produces hex files, and we now know the hardware executes them correctly. Closing this loop - &lt;code&gt;cargo build&lt;/code&gt; to blinking LEDs - is the obvious next milestone.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adding more peripherals.&lt;/strong&gt; The ULX3S has 32MB of SDRAM, an HDMI output, a microSD slot, and an ESP32 co-processor. Each of these opens up interesting possibilities for a working 16-bit computer.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Performance optimization.&lt;/strong&gt; The CPU currently runs at 12.5 MHz with a multi-cycle FSM (5-8 cycles per instruction). Pipelining could push this significantly higher on the ECP5.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But first: 132 tests, zero failures. The Sampo CPU works.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This is Part 4 of the Sampo series. &lt;a href="https://tinycomputers.io/posts/sampo-16-bit-risc-cpu-part-1.html"&gt;Part 1&lt;/a&gt; covers architecture design, &lt;a href="https://tinycomputers.io/posts/sampo-fpga-implementation-ulx3s.html"&gt;Part 2&lt;/a&gt; covers FPGA implementation, and &lt;a href="https://tinycomputers.io/posts/sampo-llvm-backend-rust-compiler.html"&gt;Part 3&lt;/a&gt; covers the LLVM backend.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;Recommended Resources&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/wvPosK"&gt;OrangeCrab ECP5 FPGA Board&lt;/a&gt; - A compact Lattice ECP5 board with DDR3 and USB-C, available on Amazon&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/6U3DBr"&gt;ECP5 FPGA Development Boards&lt;/a&gt; - Other ECP5 boards available on Amazon&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/RGjpAj"&gt;&lt;em&gt;Getting Started with FPGAs&lt;/em&gt;&lt;/a&gt; by Russell Merrick - Beginner-friendly introduction with Verilog and VHDL examples&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/bJSrEK"&gt;FTDI USB Serial Adapters&lt;/a&gt; - Useful for UART debugging with FPGAs&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/dBX5Ij"&gt;USB Logic Analyzers&lt;/a&gt; - Essential for debugging digital signals&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Source Code&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/r74wA8"&gt;github.com/ajokela/sampo&lt;/a&gt;&lt;/strong&gt; - CPU architecture, assembler, emulator, Verilog RTL, test suite, and FPGA build scripts&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/GCQDRa"&gt;github.com/ajokela/llvm-sampo&lt;/a&gt;&lt;/strong&gt; - LLVM backend and Rust target specification&lt;/li&gt;
&lt;/ul&gt;</description><category>cpu design</category><category>ecp5</category><category>fpga</category><category>hardware</category><category>isa</category><category>risc</category><category>sampo</category><category>testing</category><category>uart</category><category>ulx3s</category><category>verification</category><category>verilog</category><guid>https://tinycomputers.io/posts/sampo-fpga-isa-verification.html</guid><pubDate>Sun, 15 Feb 2026 20:00:00 GMT</pubDate></item><item><title>Playing Zork on a Real Z80: From CP/M Boot to the Great Underground Empire</title><link>https://tinycomputers.io/posts/zork-on-retroshield-z80-arduino-giga.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/zork-on-retroshield-z80-arduino-giga_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;16 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;This is the third post in a series about running CP/M 2.2 on a real Z80 processor connected to an Arduino Giga R1 WiFi. The &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;first post&lt;/a&gt; covered getting the custom level converter shield designed and manufactured. The &lt;a href="https://tinycomputers.io/posts/cpm-on-arduino-giga-r1-wifi.html"&gt;second post&lt;/a&gt; documented the hardware stack, the catastrophic TXB0108 level converter failures, the shadow register workaround, and the Rust sector server that provides disk I/O over WiFi. That post ended with a promise: CP/M was close to booting, and all the pieces were in place.&lt;/p&gt;
&lt;p&gt;This post is about keeping that promise. It covers the final debugging push from "almost boots" to a fully interactive game of Zork I running on real Z80 hardware — and the performance crisis that nearly made the whole thing unusable.&lt;/p&gt;
&lt;h3&gt;The Story So Far&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/giga-shield-bare-pcb.jpeg" alt="The bare Arduino Giga R1 Shield V0.1 PCB — a red board with nine TXB0108 level converter ICs in antistatic packaging" style="float: right; width: 45%; max-width: 420px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;p&gt;The hardware is straightforward: a &lt;a href="https://baud.rs/87wbBL"&gt;RetroShield Z80&lt;/a&gt; — a real &lt;a href="https://baud.rs/tFkBkH"&gt;Zilog Z80&lt;/a&gt; CPU on a shield board — plugged into an &lt;a href="https://baud.rs/poSQeo"&gt;Arduino Giga R1 WiFi&lt;/a&gt; through a custom level converter PCB. The Giga's STM32H747 (480MHz Cortex-M7) provides 64KB of Z80 RAM as a byte array in its internal SRAM, clocks the Z80, and serves memory read/write requests. Disk I/O goes over WiFi to a Rust TCP sector server instead of an SD card.&lt;/p&gt;
&lt;p&gt;The level converter uses nine &lt;a href="https://baud.rs/hY6ydl"&gt;TXB0108&lt;/a&gt; bidirectional level shifters to bridge the Giga's 3.3V logic and the RetroShield's 5V. And those TXB0108s are the source of almost every interesting engineering decision in the project. Their auto-direction sensing fails for several Z80 bus signals: &lt;code&gt;IORQ_N&lt;/code&gt; and &lt;code&gt;RD_N&lt;/code&gt; are permanently stuck, &lt;code&gt;WR_N&lt;/code&gt; only works during memory cycles, and the data bus is invisible from Z80-to-Arduino during I/O operations. The address bus works but lags by 1-3 clock ticks through the converter.&lt;/p&gt;
&lt;p&gt;These failures forced a fundamentally different approach to interfacing with the Z80. Instead of passively watching bus signals, the Arduino actively decodes the Z80's instruction stream and maintains software copies of the CPU's internal state:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Guard-only M1 detection&lt;/strong&gt; — a timing table (&lt;code&gt;tStates[256]&lt;/code&gt;) tells us how many clock cycles each instruction takes; the next memory read after the guard expires is the next opcode fetch&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Software PC (softPC)&lt;/strong&gt; — a software copy of the Z80's program counter, immune to address bus lag&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shadow registers&lt;/strong&gt; — software copies of A, B, C, D, E, H, L, F, and SP, updated by decoding each opcode from &lt;code&gt;z80RAM[softPC]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pre-writes&lt;/strong&gt; — memory store instructions write their values directly to &lt;code&gt;z80RAM&lt;/code&gt; at opcode detection time, using shadow register values and softPC-derived addresses, because the Z80's physical bus writes go to wrong addresses due to the propagation delay&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deferred writes&lt;/strong&gt; — for read-modify-write instructions like &lt;code&gt;INC (HL)&lt;/code&gt;, where pre-writing would cause the Z80 to read an already-modified value and double-apply the operation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The full technical details of this architecture are in the &lt;a href="https://tinycomputers.io/posts/cpm-on-arduino-giga-r1-wifi.html"&gt;previous post&lt;/a&gt;. What matters here is where that post left off: the shadow register system was working, the sector server was serving disk images over WiFi, and partial serial output confirmed that the Z80 was executing real code. What remained was completeness testing — making sure every instruction the Z80 actually executed was tracked correctly in the shadows.&lt;/p&gt;
&lt;h3&gt;CP/M Boots&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/giga-shield-assembled-top.jpeg" alt="The assembled stack: Arduino Giga R1 WiFi (blue) mounted on the red level converter PCB, with the RetroShield Z80 and its 40-pin Z80 DIP chip partially inserted on the right" style="float: left; width: 50%; max-width: 460px; margin: 0 1.5em 1em 0; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;p&gt;The first milestone came faster than expected. After expanding the shadow register switch statement to cover more of the Z80 instruction set — POP instructions, ADD HL with register pairs, DAA (decimal adjust), EX (SP),HL — CP/M booted.&lt;/p&gt;
&lt;p&gt;The boot loader loaded all 53 sectors of &lt;code&gt;CPM.SYS&lt;/code&gt; from the sector server over WiFi. The BIOS cold boot initialized correctly. And the console printed:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;RetroShield CP/M 2.2
56K TPA

a&amp;gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A real Z80, running real &lt;a href="https://baud.rs/YxWgtr"&gt;CP/M 2.2&lt;/a&gt;, with 56KB of Transient Program Area, booting from a disk image served over WiFi from a Rust TCP server. The &lt;code&gt;DIR&lt;/code&gt; command worked and showed the contents of drive A:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;a&amp;gt;dir
A: ZORK1    COM : ZORK1    DAT : ZORK2    COM : ZORK2    DAT
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Zork was right there, waiting.&lt;/p&gt;
&lt;h3&gt;The "Bad Load" Bug&lt;/h3&gt;
&lt;p&gt;Running &lt;code&gt;ZORK1.COM&lt;/code&gt; produced a single line of output and then nothing:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;zork1&lt;/span&gt;
&lt;span class="n"&gt;Bad&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;load&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;"Bad load" is a CP/M CCP (Console Command Processor) error. It means the CCP tried to load the .COM file into the TPA and something went wrong — either a disk read failed, or the CCP's internal logic decided the load was corrupt.&lt;/p&gt;
&lt;h4&gt;Finding the Root Cause&lt;/h4&gt;
&lt;p&gt;The CCP loads .COM files by repeatedly calling BDOS function 20 (Read Sequential), advancing the DMA address by 128 bytes after each successful sector read, until the file is fully loaded. The load loop lives in the CCP code at address &lt;code&gt;0xE6DE&lt;/code&gt;. After each BDOS call, it checks whether the DMA address has exceeded the TPA boundary at &lt;code&gt;0xE000&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;E6F5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;save&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;BDOS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;
&lt;span class="n"&gt;E6F6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;DMA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;high&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;byte&lt;/span&gt;
&lt;span class="n"&gt;E6F7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SUB&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;compare&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;against&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;E6F8&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SBC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;below&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;E6F9&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SBC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;D&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="n"&gt;TPA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;boundary&lt;/span&gt;
&lt;span class="n"&gt;E6FB&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;NC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;E771&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;past&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TPA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;loading&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;SBC A,D&lt;/code&gt; instruction at &lt;code&gt;E6F9&lt;/code&gt; (opcode &lt;code&gt;0x9A&lt;/code&gt;) subtracts the D register and the carry flag from A. This is a 16-bit comparison implemented as a high-byte subtract-with-borrow after the low-byte subtract at &lt;code&gt;E6F7&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The problem: &lt;strong&gt;opcode &lt;code&gt;0x9A&lt;/code&gt; was not in the shadow register tracking.&lt;/strong&gt; The switch statement had &lt;code&gt;SBC A,A&lt;/code&gt; (0x9F), &lt;code&gt;SBC A,B&lt;/code&gt; (0x98), and &lt;code&gt;SBC A,C&lt;/code&gt; (0x99), but not &lt;code&gt;SBC A,D&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Without tracking, the &lt;code&gt;SBC A,D&lt;/code&gt; instruction didn't update &lt;code&gt;shadowF&lt;/code&gt;. The carry flag in the shadow still reflected the preceding &lt;code&gt;SUB E&lt;/code&gt; instruction, which had set carry=0 (no borrow, since 0x80 - 0x00 = 0x80). But the real Z80 computed &lt;code&gt;SBC A,D&lt;/code&gt; with the actual register values and got carry=1 (borrow). When the &lt;code&gt;JP NC,E771&lt;/code&gt; branch came, our shadow said NC=true (carry clear, branch taken) while the Z80 said NC=false (carry set, branch not taken).&lt;/p&gt;
&lt;p&gt;SoftPC jumped to the "Bad load" error handler. The real Z80 continued the load loop. From that point on, softPC and the Z80's actual program counter were desynchronized — every subsequent opcode decode was wrong, and the system was effectively running blind.&lt;/p&gt;
&lt;h4&gt;The Fix&lt;/h4&gt;
&lt;p&gt;Add the missing instructions. All of them:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;// SBC A,r — subtract with carry&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x98&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowB&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shadowF&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FLAG_C&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="n"&gt;shadowF&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;flagsSub8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x99&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* SBC A,C */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x9A&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* SBC A,D */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x9B&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* SBC A,E */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x9C&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* SBC A,H */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x9D&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* SBC A,L */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x9E&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* SBC A,(HL) */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// ADC A,r — add with carry (same gap)&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x8A&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* ADC A,D */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x8B&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* ADC A,E */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x8C&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* ADC A,H */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x8D&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* ADC A,L */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x8E&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* ADC A,(HL) */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After this fix, &lt;code&gt;ZORK1.COM&lt;/code&gt; loaded all 68 sectors successfully — 8,704 bytes from DMA address &lt;code&gt;0x0100&lt;/code&gt; to &lt;code&gt;0x2300&lt;/code&gt;, with every BDOS read returning success.&lt;/p&gt;
&lt;h3&gt;Zork Starts — Barely&lt;/h3&gt;
&lt;p&gt;With the load fixed, &lt;a href="https://baud.rs/UdOkDt"&gt;Zork&lt;/a&gt; launched. It read its &lt;code&gt;.DAT&lt;/code&gt; file from disk. The copyright text appeared:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;ZORK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;I:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Great&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Underground&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Empire&lt;/span&gt;
&lt;span class="n"&gt;Copyright&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;1981&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;1982&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;1983&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Infocom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Inc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;All&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rights&lt;/span&gt;
&lt;span class="n"&gt;reserved&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;ZORK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;registered&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trademark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Infocom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Inc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Revision&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;88&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Serial&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;840726&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And then... nothing. Or rather, something, but at glacial speed. At approximately 18,000 Z80 clock cycles per second, the text took minutes to render. The game was technically running but practically frozen. Typing a command and waiting for a response meant staring at a blank terminal for an eternity.&lt;/p&gt;
&lt;p&gt;On a 480MHz Cortex-M7, 18,000 Z80 cycles per second means the Arduino was spending roughly &lt;strong&gt;26,000 of its own CPU cycles on every single Z80 clock tick&lt;/strong&gt;. Something was catastrophically wrong with the hot loop.&lt;/p&gt;
&lt;h3&gt;The Performance Crisis&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/giga-shield-assembled-overhead.jpeg" alt="Overhead view of the full hardware stack — the Giga's blue board seated on the red level converter shield, with the RetroShield Z80 extending to the right, USB cable connected" style="float: right; width: 50%; max-width: 460px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;p&gt;I added a performance counter that measured actual Z80 cycles per second. The numbers were dire: 9,000–18,000 cycles/sec depending on what the Z80 was doing. A real Z80 runs at 2.5–8 MHz. We were three orders of magnitude too slow.&lt;/p&gt;
&lt;p&gt;Five bottlenecks were hiding in the hot loop, each one multiplying the others.&lt;/p&gt;
&lt;h4&gt;Bottleneck 1: A Two-Millisecond Nap on Every Tick&lt;/h4&gt;
&lt;p&gt;Every clock tick included &lt;code&gt;delayMicroseconds(2)&lt;/code&gt; — a 2,000-nanosecond delay to let signals settle through the TXB0108 after toggling the clock. The TXB0108's actual propagation delay is about 4–12 nanoseconds. This was a 200x safety margin I'd added early in debugging and never removed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Replace with 24 inline NOP instructions. At 480MHz, each NOP is ~2ns, giving roughly 50ns of settle time — still 4x more than the TXB0108 needs, but 40x faster than the delay.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kr"&gt;inline&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;__attribute__&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;always_inline&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;busSettle&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kr"&gt;__asm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;volatile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;                   &lt;/span&gt;&lt;span class="s"&gt;"nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;                   &lt;/span&gt;&lt;span class="s"&gt;"nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;nop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Bottleneck 2: Flipping 8 Pins on Every Single Tick&lt;/h4&gt;
&lt;p&gt;This was the real killer. At the end of every &lt;code&gt;cpu_tick()&lt;/code&gt; call, &lt;code&gt;setDataBusInput()&lt;/code&gt; was called to tri-state the data bus pins — switching all 8 data lines from output to input mode. Then at the start of the next memory read, &lt;code&gt;setDataBusOutput()&lt;/code&gt; switched them all back. Each direction change went through the Arduino HAL &lt;code&gt;pinMode()&lt;/code&gt; function 8 times.&lt;/p&gt;
&lt;p&gt;On the STM32H747 with the &lt;a href="https://baud.rs/arduino-mbed"&gt;mbed-based Arduino core&lt;/a&gt;, each &lt;code&gt;pinMode()&lt;/code&gt; call involves HAL abstraction layers, pin table lookups, and clock configuration checks. Eight calls took approximately 16–32 microseconds. This was happening on &lt;em&gt;every single clock tick&lt;/em&gt;, both directions — 16 &lt;code&gt;pinMode()&lt;/code&gt; calls per tick.&lt;/p&gt;
&lt;p&gt;The irony: this direction switching was completely unnecessary. Since all Z80 bus writes are suppressed (pre-writes handle memory stores in software), the data bus never needs to read anything from the Z80 during normal operation. The bus can stay in output mode permanently.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Remove the per-tick &lt;code&gt;setDataBusInput()&lt;/code&gt; call entirely. For the rare cases where direction changes are still needed (certain IO operations), replace &lt;code&gt;pinMode()&lt;/code&gt; with direct GPIO MODER register writes:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#define GPIO_SET_OUTPUT(port, pin) \&lt;/span&gt;
&lt;span class="cp"&gt;    ((port)-&amp;gt;MODER = ((port)-&amp;gt;MODER &amp;amp; ~(3U &amp;lt;&amp;lt; ((pin)*2))) \&lt;/span&gt;
&lt;span class="cp"&gt;                     | (1U &amp;lt;&amp;lt; ((pin)*2)))&lt;/span&gt;
&lt;span class="cp"&gt;#define GPIO_SET_INPUT(port, pin) \&lt;/span&gt;
&lt;span class="cp"&gt;    ((port)-&amp;gt;MODER = ((port)-&amp;gt;MODER &amp;amp; ~(3U &amp;lt;&amp;lt; ((pin)*2))))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;One register write per pin instead of a full HAL function call.&lt;/p&gt;
&lt;h4&gt;Bottleneck 3: Arduino HAL in the Hot Loop&lt;/h4&gt;
&lt;p&gt;The Arduino &lt;code&gt;digitalRead()&lt;/code&gt; and &lt;code&gt;digitalWrite()&lt;/code&gt; functions are convenient abstractions, but on the STM32H747 they carry significant overhead — pin number lookups, port mapping tables, multiple function calls per operation. The original RetroShield code for the &lt;a href="https://baud.rs/JJg3wB"&gt;Mega 2560&lt;/a&gt; used direct AVR port registers (&lt;code&gt;PORTA&lt;/code&gt;, &lt;code&gt;PORTL&lt;/code&gt;) for fast parallel I/O. On the Giga, the pins are scattered across GPIO ports B, E, G, H, I, J, and K — no single-register solution — but direct register access is still orders of magnitude faster than the HAL.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Map every Arduino pin to its STM32H747 GPIO port and pin number, then replace all hot-path I/O with direct register access.&lt;/p&gt;
&lt;p&gt;The clock signal (toggled every tick) went from ~200ns per call through HAL to ~4ns via the BSRR (Bit Set/Reset Register):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;// Before&lt;/span&gt;
&lt;span class="cp"&gt;#define CLK_HIGH  digitalWrite(uP_CLK, HIGH)&lt;/span&gt;

&lt;span class="c1"&gt;// After — single atomic register write&lt;/span&gt;
&lt;span class="cp"&gt;#define CLK_HIGH  (GPIOK-&amp;gt;BSRR = (1U &amp;lt;&amp;lt; 2))    &lt;/span&gt;&lt;span class="c1"&gt;// PK2&lt;/span&gt;
&lt;span class="cp"&gt;#define CLK_LOW   (GPIOK-&amp;gt;BSRR = (1U &amp;lt;&amp;lt; 18))   &lt;/span&gt;&lt;span class="c1"&gt;// PK2 reset&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The BSRR register is an elegant STM32 feature: bits [15:0] set outputs high, bits [31:16] set outputs low, and the entire operation is atomic — no read-modify-write cycle needed.&lt;/p&gt;
&lt;p&gt;For the address bus (16 pins read every memory cycle), three GPIO IDR (Input Data Register) reads replace sixteen individual &lt;code&gt;digitalRead()&lt;/code&gt; calls:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kr"&gt;inline&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;readAddress&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;jIDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPIOJ&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;IDR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;kIDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPIOK&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;IDR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gIDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPIOG&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;IDR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jIDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// A0 = PJ12&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gIDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// A1 = PG13&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... 14 more bit extractions&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For the data bus (8 pins written every memory read cycle), pins sharing the same GPIO port are combined into a single BSRR write. Port I has three data bus pins, so they get folded into one register operation:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kr"&gt;inline&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;writeDataBus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;byte&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;GPIOE&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;BSRR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x01&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;GPIOK&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;BSRR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x02&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;GPIOB&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;BSRR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x04&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;GPIOH&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;BSRR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x08&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// Port I: combine 3 data bus pins into one write&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iBSRR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;iBSRR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// PI13&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;iBSRR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x40&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;26&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// PI10&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;iBSRR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// PI15&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;GPIOI&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;BSRR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iBSRR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;GPIOG&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;BSRR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;26&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Bottleneck 4: USB Serial Polling Every Tick&lt;/h4&gt;
&lt;p&gt;The MC6850 ACIA emulation checked &lt;code&gt;Serial.available()&lt;/code&gt; on every clock tick to detect incoming keystrokes. On the Giga, USB CDC serial operations are expensive — each call may involve USB stack processing. At 700K ticks/sec, checking every tick means 700,000 USB stack queries per second.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Check every 256 ticks. That's still a 2,700 Hz polling rate — more than fast enough for interactive typing, and it eliminates 99.6% of the USB overhead.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;clock_cycle_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;CONTROL_RTS_STATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Serial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;available&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;reg6850_STATUS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mb"&gt;0b00000001&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// RDRF set&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CONTROL_RX_INT_ENABLE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;INT_N_LOW&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Bottleneck 5: I-Cache Thrashing from Forced Inlining&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;cpu_tick()&lt;/code&gt; function is around 1,200 lines of code, dominated by the shadow register tracking &lt;code&gt;switch&lt;/code&gt; statement with hundreds of cases. It was marked &lt;code&gt;inline __attribute__((always_inline))&lt;/code&gt;, which forces the compiler to inline the entire function body into &lt;code&gt;loop()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The STM32H747's instruction cache is 16KB. Inlining a 1,200-line function creates a binary blob that doesn't fit, causing constant cache misses. Every iteration of the main loop refills the I-cache from flash.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Change to &lt;code&gt;__attribute__((noinline))&lt;/code&gt;. The function call overhead (a few nanoseconds for the branch and return) is negligible compared to the cache thrashing cost. This change also reduced the compiled binary by ~9KB, from 284KB to 276KB.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;__attribute__&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;noinline&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cpu_tick&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... 1,200 lines of bus interface and shadow tracking&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;The Result&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Z80 cycles/sec&lt;/td&gt;
&lt;td&gt;~9,000&lt;/td&gt;
&lt;td&gt;~690,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effective Z80 clock&lt;/td&gt;
&lt;td&gt;~0.009 MHz&lt;/td&gt;
&lt;td&gt;~0.69 MHz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary size&lt;/td&gt;
&lt;td&gt;284 KB&lt;/td&gt;
&lt;td&gt;276 KB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time per Z80 tick&lt;/td&gt;
&lt;td&gt;~111 µs&lt;/td&gt;
&lt;td&gt;~1.4 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;A &lt;strong&gt;75x speedup&lt;/strong&gt;. The system went from roughly 50,000 Cortex-M7 cycles per Z80 tick down to about 700. Enough for Zork to be fully interactive.&lt;/p&gt;
&lt;h3&gt;Network Reconnection&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/giga-shield-detail-usb.jpeg" alt="Close-up of the USB connection end of the Arduino Giga R1 mounted on the level converter shield, showing the jumper wire connecting 3.3V power between boards" style="float: left; width: 40%; max-width: 380px; margin: 0 1.5em 1em 0; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;p&gt;With the performance problem solved, a new issue appeared: the TCP connection to the sector server dropped during long idle periods. Zork is a text adventure — the player types a command, the game responds, and then nothing happens until the next command. During that idle time (which could be minutes while you think about whether to go north or east), the WiFi TCP socket would quietly die. The next disk operation would fail with "Bad Sector."&lt;/p&gt;
&lt;p&gt;The fix was automatic reconnection logic. Before each disk operation, &lt;code&gt;ensureServerConnection()&lt;/code&gt; checks if the TCP socket is still alive. If not, it reconnects to the sector server, re-opens the disk image file that was previously open, and re-seeks to the last position — all transparently, so the Z80 never knows the connection dropped.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;ensureServerConnection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connected&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Serial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"[NET] Connection lost, reconnecting..."&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SERVER_IP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SERVER_PORT&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;Serial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"[NET] Reconnected to server"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diskFileOpen&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;diskActiveFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;length&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;netSendFileCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DISK_CMD_OPEN_RW&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;diskActiveFile&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;netReadStatus&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;diskFileOpen&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;// Re-seek to last known position&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seekCmd&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;seekCmd&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;DISK_CMD_SEEK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;seekCmd&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;diskSeekPos&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;seekCmd&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diskSeekPos&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;seekCmd&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diskSeekPos&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xFF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seekCmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;netReadStatus&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;video controls style="float: right; width: 50%; max-width: 460px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;
&lt;source src="https://tinycomputers.io/zork-on-retroshield-z80-giga.mp4" type="video/mp4"&gt;
&lt;source src="https://tinycomputers.io/zork-on-retroshield-z80-giga.mov" type="video/quicktime"&gt;
Your browser does not support the video tag.
&lt;/source&gt;&lt;/source&gt;&lt;/video&gt;

&lt;h3&gt;Playing Zork&lt;/h3&gt;
&lt;p&gt;With all the pieces in place — shadow registers covering every instruction CP/M and Zork use, GPIO registers replacing Arduino HAL calls, network reconnection handling idle timeouts — it was time to play.&lt;/p&gt;
&lt;p&gt;Here's a complete boot-to-gameplay session, captured from the serial terminal:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;OK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;192.168&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="n"&gt;OK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;192.168&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mf"&gt;0.248&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;9000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Boot&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="n"&gt;OK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;loaded&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;Starting&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Z80&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;

&lt;span class="n"&gt;RetroShield&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Z80&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Boot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Loader&lt;/span&gt;
&lt;span class="n"&gt;Copyright&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2025&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Alex&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Jokela&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tinycomputers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;

&lt;span class="n"&gt;Loading&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CPM&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SYS&lt;/span&gt;&lt;span class="o"&gt;.....................................................&lt;/span&gt;
&lt;span class="n"&gt;Boot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;complete&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;RetroShield&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CP&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;2.2&lt;/span&gt;
&lt;span class="mi"&gt;56&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TPA&lt;/span&gt;

&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;zork1&lt;/span&gt;
&lt;span class="n"&gt;ZORK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;I&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Great&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Underground&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Empire&lt;/span&gt;
&lt;span class="n"&gt;Copyright&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1981&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1982&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1983&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Infocom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Inc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;All&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rights&lt;/span&gt;
&lt;span class="n"&gt;reserved&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;ZORK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;registered&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trademark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Infocom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Inc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Revision&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;88&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Serial&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;840726&lt;/span&gt;

&lt;span class="n"&gt;West&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;House&lt;/span&gt;
&lt;span class="n"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;standing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;an&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;west&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;white&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;house&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;with&lt;/span&gt;
&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;boarded&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;front&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;door&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;There&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;small&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mailbox&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;here&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mailbox&lt;/span&gt;
&lt;span class="n"&gt;Opening&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;small&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mailbox&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reveals&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;leaflet&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;take&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;leaflet&lt;/span&gt;
&lt;span class="n"&gt;Taken&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;go&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;south&lt;/span&gt;
&lt;span class="n"&gt;South&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;House&lt;/span&gt;
&lt;span class="n"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;facing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;south&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;side&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;white&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;house&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;There&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;
&lt;span class="n"&gt;door&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;here&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;all&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;windows&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;boarded&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;go&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;east&lt;/span&gt;
&lt;span class="n"&gt;Behind&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;House&lt;/span&gt;
&lt;span class="n"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;behind&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;white&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;house&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;leads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;into&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;forest&lt;/span&gt;
&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;east&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;In&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;one&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;corner&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;house&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;there&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;small&lt;/span&gt;
&lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;which&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;slightly&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ajar&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;window&lt;/span&gt;
&lt;span class="n"&gt;With&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;great&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;effort&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;far&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;enough&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;allow&lt;/span&gt;
&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;enter&lt;/span&gt;
&lt;span class="n"&gt;Kitchen&lt;/span&gt;
&lt;span class="n"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;kitchen&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;white&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;house&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seems&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;
&lt;span class="n"&gt;have&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;been&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;used&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;recently&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;preparation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;food&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;passage&lt;/span&gt;
&lt;span class="n"&gt;leads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;west&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;staircase&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;can&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;be&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seen&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;leading&lt;/span&gt;
&lt;span class="n"&gt;upward&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;chimney&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;leads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;down&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;east&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;small&lt;/span&gt;
&lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;which&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;On&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;an&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;elongated&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;brown&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;smelling&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;hot&lt;/span&gt;
&lt;span class="n"&gt;peppers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bottle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sitting&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;glass&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bottle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;water&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Every command produces the correct response, at interactive speed. The text appears as fast as you'd expect from a terminal session — no perceptible delay between pressing Enter and seeing the game's response.&lt;/p&gt;
&lt;h3&gt;How It All Fits Together&lt;/h3&gt;
&lt;p&gt;Here's what happens when you type &lt;code&gt;open mailbox&lt;/code&gt; at the Zork prompt, end to end:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Each keystroke arrives over USB serial. Every 256 Z80 clock ticks, the Arduino checks &lt;code&gt;Serial.available()&lt;/code&gt;, finds a character, and sets the MC6850 ACIA status register's RDRF bit.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Z80 is spinning in the BIOS console input loop, repeatedly executing &lt;code&gt;IN A,(0x80)&lt;/code&gt; to check the ACIA status register. Our shadow register system detects each &lt;code&gt;IN&lt;/code&gt; instruction at M1 time, calls &lt;code&gt;handle_io_read(0x80)&lt;/code&gt;, and drives the status byte onto the data bus during the IO cycle.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When RDRF is set, the Z80 executes &lt;code&gt;IN A,(0x81)&lt;/code&gt; to read the character. We return the byte from &lt;code&gt;Serial.read()&lt;/code&gt;, and &lt;code&gt;shadowA&lt;/code&gt; gets updated to match.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The BIOS echoes the character by executing &lt;code&gt;OUT (0x81),A&lt;/code&gt;. We detect this at M1 time, use &lt;code&gt;shadowA&lt;/code&gt; for the data value, and call &lt;code&gt;Serial.write()&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When the user presses Enter, the CCP passes the command to Zork. Zork parses it and starts executing game logic — hundreds of thousands of Z80 instructions manipulating its internal data structures.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When Zork needs to read from its &lt;code&gt;.DAT&lt;/code&gt; file, the BIOS executes a sequence of &lt;code&gt;OUT&lt;/code&gt; instructions to set up the disk operation: filename characters to port &lt;code&gt;0x13&lt;/code&gt;, seek position to ports &lt;code&gt;0x14&lt;/code&gt;/&lt;code&gt;0x15&lt;/code&gt;/&lt;code&gt;0x19&lt;/code&gt;, DMA address to ports &lt;code&gt;0x16&lt;/code&gt;/&lt;code&gt;0x17&lt;/code&gt;, and a block read command to port &lt;code&gt;0x18&lt;/code&gt;. Each &lt;code&gt;OUT&lt;/code&gt; is intercepted by the shadow register system and forwarded to the sector server over WiFi.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The sector server reads 128 bytes from the disk image file, sends them back over TCP. The Arduino writes them directly into &lt;code&gt;z80RAM&lt;/code&gt; at the DMA address.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Zork processes the data, generates response text, and prints it character by character through the ACIA — each character going through the same &lt;code&gt;OUT (0x81),A&lt;/code&gt; → &lt;code&gt;shadowA&lt;/code&gt; → &lt;code&gt;Serial.write()&lt;/code&gt; path.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Every single one of these operations relies on the shadow register system. The Z80 has no idea the Arduino can't see half its bus signals. It thinks it's talking to normal memory and I/O ports. The Arduino, meanwhile, is running a parallel simulation of the Z80's register state, intercepting every instruction, and making the illusion seamless.&lt;/p&gt;
&lt;h3&gt;The Full Architecture&lt;/h3&gt;
&lt;p&gt;For reference, here's the complete technical stack:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hardware:&lt;/strong&gt;
- Arduino Giga R1 WiFi (STM32H747, 480MHz Cortex-M7, 1MB SRAM, WiFi)
- RetroShield Z80 (real Zilog Z80 CPU, 5V logic)
- Custom level converter PCB (nine TXB0108PW, &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;design details here&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Z80 Memory:&lt;/strong&gt;
- 64KB byte array in Giga's internal SRAM (&lt;code&gt;uint8_t z80RAM[65536]&lt;/code&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Bus Interface (direct STM32 GPIO registers):&lt;/strong&gt;
- Clock: GPIOK pin 2 (BSRR for set/clear)
- Address: 3 IDR reads (GPIOJ, GPIOK, GPIOG) → 16-bit extraction
- Data: 6 BSRR writes (GPIOE, GPIOK, GPIOB, GPIOH, GPIOI combined, GPIOG)
- Control: MREQ via GPIOK pin 7, WR via GPIOE pin 6&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Software Architecture:&lt;/strong&gt;
- Guard-only M1 detection with &lt;code&gt;tStates[256]&lt;/code&gt; timing table
- Software PC tracking (&lt;code&gt;softPC&lt;/code&gt;) for all branch types including conditional
- Shadow registers (A, B, C, D, E, H, L, F, SP) with full ALU flag computation
- Pre-writes for all memory store instructions
- Deferred writes for read-modify-write instructions (INC/DEC (HL), CB prefix on (HL))
- IO handling at M1 time using shadow register values&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Disk I/O:&lt;/strong&gt;
- Rust TCP sector server on local network (192.168.0.248:9000)
- 128-byte CP/M sector transfers over WiFi
- Automatic reconnection with file re-open and seek restore&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Console I/O:&lt;/strong&gt;
- Emulated MC6850 ACIA on ports 0x80/0x81
- &lt;a href="https://baud.rs/xwHWlp"&gt;USB&lt;/a&gt; CDC serial at 115200 baud
- Interrupt-driven receive with throttled polling (every 256 ticks)&lt;/p&gt;
&lt;h3&gt;Pin Mapping Reference&lt;/h3&gt;
&lt;p&gt;For anyone attempting a similar project, here's the complete mapping from Arduino digital pins to STM32H747 GPIO ports. This is essential for the direct register access that makes the performance optimization possible:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Arduino Pin&lt;/th&gt;
&lt;th&gt;STM32 Port/Pin&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CLK&lt;/td&gt;
&lt;td&gt;D52&lt;/td&gt;
&lt;td&gt;PK2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MREQ_N&lt;/td&gt;
&lt;td&gt;D41&lt;/td&gt;
&lt;td&gt;PK7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WR_N&lt;/td&gt;
&lt;td&gt;D40&lt;/td&gt;
&lt;td&gt;PE6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IORQ_N&lt;/td&gt;
&lt;td&gt;D39&lt;/td&gt;
&lt;td&gt;PI14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;INT_N&lt;/td&gt;
&lt;td&gt;D50&lt;/td&gt;
&lt;td&gt;PI11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RESET_N&lt;/td&gt;
&lt;td&gt;D38&lt;/td&gt;
&lt;td&gt;PJ7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data bit 0&lt;/td&gt;
&lt;td&gt;D49&lt;/td&gt;
&lt;td&gt;PE4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data bit 1&lt;/td&gt;
&lt;td&gt;D48&lt;/td&gt;
&lt;td&gt;PK0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data bit 2&lt;/td&gt;
&lt;td&gt;D47&lt;/td&gt;
&lt;td&gt;PB2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data bit 3&lt;/td&gt;
&lt;td&gt;D46&lt;/td&gt;
&lt;td&gt;PH15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data bit 4&lt;/td&gt;
&lt;td&gt;D45&lt;/td&gt;
&lt;td&gt;PI13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data bit 5&lt;/td&gt;
&lt;td&gt;D44&lt;/td&gt;
&lt;td&gt;PG10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data bit 6&lt;/td&gt;
&lt;td&gt;D43&lt;/td&gt;
&lt;td&gt;PI10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data bit 7&lt;/td&gt;
&lt;td&gt;D42&lt;/td&gt;
&lt;td&gt;PI15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A0&lt;/td&gt;
&lt;td&gt;D22&lt;/td&gt;
&lt;td&gt;PJ12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A1&lt;/td&gt;
&lt;td&gt;D23&lt;/td&gt;
&lt;td&gt;PG13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A2&lt;/td&gt;
&lt;td&gt;D24&lt;/td&gt;
&lt;td&gt;PG12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A3&lt;/td&gt;
&lt;td&gt;D25&lt;/td&gt;
&lt;td&gt;PJ0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A4&lt;/td&gt;
&lt;td&gt;D26&lt;/td&gt;
&lt;td&gt;PJ14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A5&lt;/td&gt;
&lt;td&gt;D27&lt;/td&gt;
&lt;td&gt;PJ1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A6&lt;/td&gt;
&lt;td&gt;D28&lt;/td&gt;
&lt;td&gt;PJ15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A7&lt;/td&gt;
&lt;td&gt;D29&lt;/td&gt;
&lt;td&gt;PJ2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A8&lt;/td&gt;
&lt;td&gt;D37&lt;/td&gt;
&lt;td&gt;PJ6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A9&lt;/td&gt;
&lt;td&gt;D36&lt;/td&gt;
&lt;td&gt;PK6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A10&lt;/td&gt;
&lt;td&gt;D35&lt;/td&gt;
&lt;td&gt;PJ5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A11&lt;/td&gt;
&lt;td&gt;D34&lt;/td&gt;
&lt;td&gt;PK5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A12&lt;/td&gt;
&lt;td&gt;D33&lt;/td&gt;
&lt;td&gt;PJ4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A13&lt;/td&gt;
&lt;td&gt;D32&lt;/td&gt;
&lt;td&gt;PK4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A14&lt;/td&gt;
&lt;td&gt;D31&lt;/td&gt;
&lt;td&gt;PJ3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Addr A15&lt;/td&gt;
&lt;td&gt;D30&lt;/td&gt;
&lt;td&gt;PK3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The address bus pins are spread across three GPIO ports (J, G, K), so a 16-bit address read requires three IDR register reads and individual bit extraction. Not ideal, but still orders of magnitude faster than sixteen &lt;code&gt;digitalRead()&lt;/code&gt; calls.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The immediate win would be using the Giga's 8MB SDRAM as a disk cache. Download entire disk images over WiFi at boot, then serve all disk I/O from memory. No network latency, no TCP overhead, no reconnection logic needed. CP/M running at SRAM speed on a RAM disk — faster than any physical media the Z80 ever had access to.&lt;/p&gt;
&lt;p&gt;There's also the question of the TXB0108 itself. The level converter PCB works, but three of its nine ICs are essentially decorative — the signals they're supposed to translate (&lt;code&gt;IORQ_N&lt;/code&gt;, &lt;code&gt;RD_N&lt;/code&gt;, and data bus Z80→Arduino during IO) are broken, and the software works around them. A v0.2 of the board replaces 5 of the 9 TXB0108s with purpose-matched ICs that don't rely on auto-direction sensing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;74LVC541&lt;/strong&gt; (U1–U3): Unidirectional buffers for the address bus and control inputs (&lt;code&gt;MREQ_N&lt;/code&gt;, &lt;code&gt;IORQ_N&lt;/code&gt;, &lt;code&gt;RD_N&lt;/code&gt;, &lt;code&gt;WR_N&lt;/code&gt;). VCC at 3.3V with 5V-tolerant inputs — they simply translate 5V→3.3V with no direction ambiguity. This eliminates the stuck-HIGH failures on &lt;code&gt;IORQ_N&lt;/code&gt; and &lt;code&gt;RD_N&lt;/code&gt; entirely.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;74AHCT541&lt;/strong&gt; (U4): Unidirectional buffer for control outputs (&lt;code&gt;CLK&lt;/code&gt;, &lt;code&gt;RESET_N&lt;/code&gt;, &lt;code&gt;INT_N&lt;/code&gt;, &lt;code&gt;NMI_N&lt;/code&gt;). VCC at 5V with TTL-compatible inputs that accept 3.3V drive levels — clean 3.3V→5V translation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SN74LVC4245A&lt;/strong&gt; (U5): Bidirectional transceiver for the data bus, with an explicit DIR pin controlled by a Giga GPIO. No more auto-sensing guesswork — the firmware tells the chip which side is driving, so Z80→Arduino data is visible during IO writes for the first time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TXB0108&lt;/strong&gt; (U6–U9): Retained for the remaining 40 channels of pass-through GPIO, where auto-direction sensing works fine.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The firmware payoff is substantial: the entire shadow register architecture — roughly 1,300 lines of opcode tracking, softPC maintenance, pre-writes, and deferred write logic — could be replaced by a single &lt;code&gt;digitalWrite()&lt;/code&gt; to flip the data bus direction pin. That's a lot of complexity removed for one additional GPIO wire.&lt;/p&gt;
&lt;p&gt;But there's something satisfying about the current approach. The shadow register system transforms a passive bus controller into something that understands the Z80's instruction stream at a semantic level. The Arduino doesn't just shuttle bytes — it knows what the Z80 is thinking. And if the goal is to play &lt;a href="https://baud.rs/UdOkDt"&gt;Zork&lt;/a&gt; in the Great Underground Empire on real 1980s hardware controlled by a modern microcontroller over WiFi, well, we're there.&lt;/p&gt;
&lt;p&gt;A note on tooling: this project would have taken considerably longer without &lt;a href="https://baud.rs/claude-code"&gt;Claude Code&lt;/a&gt;. The debugging cycle for a project like this — where you're staring at Z80 opcode tables, cross-referencing flag behavior across hundreds of instructions, and hunting for one wrong carry bit in a 2,600-line Arduino sketch — is brutal. Claude Code served as a tireless pair programmer throughout the process, helping trace through instruction semantics, spotting missing opcodes in the shadow register implementation, working through the GPIO register mappings for the STM32H747, and iterating on performance optimizations. The feedback loop that would normally stretch across days of manual datasheet cross-referencing compressed into hours.&lt;/p&gt;
&lt;div style="clear: both;"&gt;&lt;/div&gt;

&lt;div class="sponsor-widget"&gt;
&lt;div class="sponsor-widget-header"&gt;&lt;a href="https://baud.rs/youwpy"&gt;&lt;img src="https://tinycomputers.io/images/pcbway-logo.png" alt="PCBWay" style="height: 22px; vertical-align: middle; margin-right: 8px;"&gt;&lt;/a&gt; Sponsored Hardware&lt;/div&gt;
&lt;p&gt;This project was made possible by &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt;, who sponsored the manufacturing of the custom level converter shield. PCBWay offers PCB prototyping, assembly, CNC machining, and 3D printing services — from one-off prototypes to production runs. Their support covered the fabrication costs for this board, letting me focus on the engineering instead of the budget. If you have a PCB design ready to go, check them out at &lt;a href="https://baud.rs/youwpy"&gt;pcbway.com&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;h3&gt;Source Code&lt;/h3&gt;
&lt;p&gt;All source code, firmware, and hardware design files for this project are open source:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/GvjdJK"&gt;retroshield-z80-cpm-giga&lt;/a&gt;&lt;/strong&gt; — Arduino Giga R1 firmware, CP/M system files, and disk image (BSD 3-Clause)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/60cj4a"&gt;retroshield-sector-server&lt;/a&gt;&lt;/strong&gt; — Rust TCP sector server for WiFi-based disk I/O (BSD 3-Clause)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/9s81Mz"&gt;retroshield-level-shifter-pcb&lt;/a&gt;&lt;/strong&gt; — KiCad design files, Gerber files, BOM, and schematic for the level converter shield (CC BY-SA 4.0)&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;This is the third post in the Arduino Giga R1 + RetroShield Z80 series:&lt;/em&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;My Experience Using Fiverr for Custom PCB Design: A $468 Arduino Giga Shield&lt;/a&gt; — designing the level converter&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://tinycomputers.io/posts/cpm-on-arduino-giga-r1-wifi.html"&gt;Porting CP/M to the Arduino Giga R1: When Level Converters Fight Back&lt;/a&gt; — the hardware stack, TXB0108 failures, shadow registers, and sector server&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Playing Zork on a Real Z80 (this post) — getting CP/M to boot, the "Bad load" bug, 75x performance optimization, and interactive Zork gameplay&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;</description><category>arduino</category><category>arduino giga</category><category>cp/m</category><category>gpio</category><category>hardware</category><category>infocom</category><category>level shifter</category><category>performance</category><category>retro computing</category><category>retroshield</category><category>rust</category><category>sector server</category><category>stm32</category><category>wifi</category><category>z80</category><category>zork</category><guid>https://tinycomputers.io/posts/zork-on-retroshield-z80-arduino-giga.html</guid><pubDate>Sun, 15 Feb 2026 12:00:00 GMT</pubDate></item><item><title>Mutability as a First-Class Concept: The Lattice Phase System</title><link>https://tinycomputers.io/posts/mutability-as-a-first-class-concept-the-lattice-phase-system.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/mutability-as-a-first-class-concept-the-lattice-phase-system_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;11 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h2&gt;Mutability as a First-Class Concept: The Lattice Phase System&lt;/h2&gt;
&lt;p&gt;Most programming languages treat mutability as a binary annotation. You write &lt;code&gt;const&lt;/code&gt; or &lt;code&gt;let&lt;/code&gt;, &lt;code&gt;final&lt;/code&gt; or &lt;code&gt;var&lt;/code&gt;, and the compiler enforces it statically. Rust goes further with its borrow checker, enforcing exclusive mutable access at compile time. JavaScript offers &lt;code&gt;Object.freeze()&lt;/code&gt;, a runtime operation that's shallow by default and provides no mechanism for observation or validation. These are all useful tools, but they share a common limitation: mutability is something you &lt;em&gt;declare&lt;/em&gt;, not something you &lt;em&gt;work with&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;In &lt;a href="https://baud.rs/bwvnYT"&gt;Lattice&lt;/a&gt;, I've been building something different. Mutability — what Lattice calls &lt;em&gt;phase&lt;/em&gt; — is a first-class runtime property that can be queried, constrained, validated, coordinated across variables, observed reactively, and even tracked historically. Over the last several releases (v0.2.3 through v0.2.6), this system has grown from simple freeze/thaw semantics into a full lifecycle framework. This post walks through that progression and the design decisions behind it.&lt;/p&gt;
&lt;h3&gt;The Metaphor: Crystallization&lt;/h3&gt;
&lt;p&gt;Lattice is built around the metaphor of crystallization. Values begin in a &lt;strong&gt;fluid&lt;/strong&gt; state (mutable) and can be &lt;strong&gt;frozen&lt;/strong&gt; into a &lt;strong&gt;crystal&lt;/strong&gt; state (immutable). The &lt;code&gt;thaw()&lt;/code&gt; operation creates a mutable copy of a crystal value, and &lt;code&gt;clone()&lt;/code&gt; performs a deep copy regardless of phase. This vocabulary isn't just cosmetic — it shapes how you think about data lifecycle in a Lattice program.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux temperature = 72.5       // fluid: mutable
temperature = 68.0             // allowed

freeze(temperature)            // now crystal: immutable
// temperature = 70.0          // ERROR: cannot mutate crystal value

flux copy = thaw(temperature)  // new fluid copy
copy = 70.0                    // allowed
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;flux&lt;/code&gt; keyword declares a fluid (mutable) binding. The &lt;code&gt;fix&lt;/code&gt; keyword declares a crystal (immutable) binding. And &lt;code&gt;let&lt;/code&gt; infers phase from context — fluid if the value is fluid, crystal if crystal. This alone isn't novel. What makes Lattice's approach interesting is everything that builds on top of it.&lt;/p&gt;
&lt;h3&gt;Phase Constraints: Mutability in Your Type Signatures&lt;/h3&gt;
&lt;p&gt;The first major addition (v0.2.3) was phase constraints on function parameters. In most languages, a function that receives data has no way to express whether it expects mutable or immutable input. You might document it, or rely on convention, but the language doesn't help. In Lattice, you can annotate parameters with their expected phase:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;fn mutate(data: flux Map) {
    data.set("modified", true)
}

fn inspect(data: fix Map) {
    print(data.get("name"))
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The runtime checks phase at call time. Pass a crystal value to &lt;code&gt;mutate()&lt;/code&gt; and you get an error. Pass a fluid value to &lt;code&gt;inspect()&lt;/code&gt; and it works fine — fluid is compatible with fix because it &lt;em&gt;can&lt;/em&gt; be read. The constraint is about what the function &lt;em&gt;needs&lt;/em&gt;, not what the caller &lt;em&gt;has&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The shorthand syntax uses &lt;code&gt;~&lt;/code&gt; for flux and &lt;code&gt;*&lt;/code&gt; for fix:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;fn process(data: ~Map) { ... }  // needs mutable
fn display(data: *Map) { ... }  // needs immutable
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Phase-Dependent Dispatch&lt;/h4&gt;
&lt;p&gt;Phase constraints enable something more powerful: dispatch based on runtime phase. You can define multiple implementations of the same function with different phase signatures, and the runtime selects the best match:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="n"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mutable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;can&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;before&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;serializing&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"serialized_at"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;time_now&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json_stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;immutable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;serialize&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;directly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;side&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;effects&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json_stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"host"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"localhost"&lt;/span&gt;
&lt;span class="n"&gt;serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;overload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;adds&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;

&lt;span class="n"&gt;freeze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fix&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;overload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mutation&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The overload resolution uses a scoring system. An exact phase match (fluid argument to flux parameter) scores highest. A compatible match (fluid to unphased) scores lower. An incompatible match (crystal to flux) is rejected entirely. When multiple overloads exist, the best-scoring one wins.&lt;/p&gt;
&lt;p&gt;This is genuinely useful in practice. A caching layer might have one implementation that updates a cache (requires mutable data) and another that reads through (works with immutable data). A serialization function might add metadata to mutable structures but serialize immutable ones directly. The caller doesn't need to know — the runtime dispatches based on what the data actually &lt;em&gt;is&lt;/em&gt;.&lt;/p&gt;
&lt;h3&gt;Crystallization Contracts: Validation at the Phase Boundary&lt;/h3&gt;
&lt;p&gt;The next question was: when data freezes, how do you ensure it's in a valid state? In real systems, immutable data often represents finalized configuration, committed transactions, or published records. You want to validate before that transition happens.&lt;/p&gt;
&lt;p&gt;Version 0.2.5 introduced crystallization contracts — validation closures attached to &lt;code&gt;freeze()&lt;/code&gt; with the &lt;code&gt;where&lt;/code&gt; keyword:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nv"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;config&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Map&lt;/span&gt;::&lt;span class="nv"&gt;new&lt;/span&gt;&lt;span class="ss"&gt;()&lt;/span&gt;
&lt;span class="nv"&gt;config&lt;/span&gt;[&lt;span class="s2"&gt;"host"&lt;/span&gt;]&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"localhost"&lt;/span&gt;
&lt;span class="nv"&gt;config&lt;/span&gt;[&lt;span class="s2"&gt;"port"&lt;/span&gt;]&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8080&lt;/span&gt;
&lt;span class="nv"&gt;config&lt;/span&gt;[&lt;span class="s2"&gt;"workers"&lt;/span&gt;]&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;

&lt;span class="nv"&gt;freeze&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;config&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;where&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nv"&gt;v&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;{
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nv"&gt;v&lt;/span&gt;.&lt;span class="nv"&gt;has&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"host"&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;{&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;throw&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"config missing 'host'"&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;}
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nv"&gt;v&lt;/span&gt;.&lt;span class="nv"&gt;has&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"port"&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;{&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;throw&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"config missing 'port'"&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;}
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;v&lt;/span&gt;[&lt;span class="s2"&gt;"workers"&lt;/span&gt;]&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;{&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;throw&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"need at least 1 worker"&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;}
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The contract receives a deep clone of the value (so the validation can't accidentally mutate the original), runs the closure, and if the closure throws, the freeze is aborted and the value remains fluid. If validation passes, the value transitions to crystal.&lt;/p&gt;
&lt;p&gt;This maps cleanly to real-world patterns. Database ORMs validate before persisting. Configuration systems validate before applying. Form submissions validate before accepting. The difference is that in Lattice, this validation is attached to the &lt;em&gt;phase transition itself&lt;/em&gt;, not to a separate method you have to remember to call.&lt;/p&gt;
&lt;p&gt;Contracts compose naturally with the rest of the phase system. You can use them with phase bonds (discussed next) or with phase-dependent dispatch. A function that accepts &lt;code&gt;fix Map&lt;/code&gt; knows its argument passed whatever contract was attached at freeze time.&lt;/p&gt;
&lt;h3&gt;Phase Bonds: Coordinated Freezing&lt;/h3&gt;
&lt;p&gt;Individual freeze/thaw operations work well for isolated values, but real programs have related data that should transition together. A web request's headers, body, and metadata should probably all be immutable before you send it. A transaction's debit and credit entries should freeze atomically.&lt;/p&gt;
&lt;p&gt;Phase bonds (also v0.2.5) let you declare these relationships:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux&lt;span class="w"&gt; &lt;/span&gt;header&lt;span class="w"&gt; &lt;/span&gt;=&lt;span class="w"&gt; &lt;/span&gt;Map::new()
flux&lt;span class="w"&gt; &lt;/span&gt;body&lt;span class="w"&gt; &lt;/span&gt;=&lt;span class="w"&gt; &lt;/span&gt;Map::new()
flux&lt;span class="w"&gt; &lt;/span&gt;footer&lt;span class="w"&gt; &lt;/span&gt;=&lt;span class="w"&gt; &lt;/span&gt;Map::new()

header["content-type"]&lt;span class="w"&gt; &lt;/span&gt;=&lt;span class="w"&gt; &lt;/span&gt;"text/html"
body["content"]&lt;span class="w"&gt; &lt;/span&gt;=&lt;span class="w"&gt; &lt;/span&gt;"&lt;span class="nt"&gt;&amp;lt;h1&amp;gt;&lt;/span&gt;Hello&lt;span class="nt"&gt;&amp;lt;/h1&amp;gt;&lt;/span&gt;"
footer["timestamp"]&lt;span class="w"&gt; &lt;/span&gt;=&lt;span class="w"&gt; &lt;/span&gt;time_now()

bond(header,&lt;span class="w"&gt; &lt;/span&gt;body,&lt;span class="w"&gt; &lt;/span&gt;footer)

freeze(header)&lt;span class="w"&gt;              &lt;/span&gt;//&lt;span class="w"&gt; &lt;/span&gt;cascades&lt;span class="w"&gt; &lt;/span&gt;to&lt;span class="w"&gt; &lt;/span&gt;body&lt;span class="w"&gt; &lt;/span&gt;AND&lt;span class="w"&gt; &lt;/span&gt;footer
print(phase_of(body))&lt;span class="w"&gt;       &lt;/span&gt;//&lt;span class="w"&gt; &lt;/span&gt;"crystal"
print(phase_of(footer))&lt;span class="w"&gt;     &lt;/span&gt;//&lt;span class="w"&gt; &lt;/span&gt;"crystal"
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;bond(target, ...deps)&lt;/code&gt; call links dependencies to a target. When the target freezes, all its dependencies freeze too. Bonds are also transitive — if A is bonded to B and B is bonded to C, freezing A cascades through B to C.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux a = 1
flux b = 2
flux c = 3

bond(a, b)    // b depends on a
bond(b, c)    // c depends on b

freeze(a)     // freezes a → b → c
print(phase_of(c))  // "crystal"
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can remove bonds with &lt;code&gt;unbond()&lt;/code&gt; before the freeze happens:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;bond(header, body, footer)
unbond(header, footer)    // footer no longer cascades

freeze(header)            // freezes header and body, NOT footer
print(phase_of(footer))   // "fluid"
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Bonds solve a coordination problem that most languages leave to discipline. In a typical codebase, you'd need to remember to freeze all related values, or wrap them in a container and freeze that. Bonds make the relationship explicit and enforced.&lt;/p&gt;
&lt;h3&gt;Phase Reactions: Observing State Transitions&lt;/h3&gt;
&lt;p&gt;With constraints, contracts, and bonds, you can control &lt;em&gt;how&lt;/em&gt; and &lt;em&gt;when&lt;/em&gt; phase transitions happen. But sometimes you also need to know &lt;em&gt;that&lt;/em&gt; they happened. Logging, cache invalidation, UI updates, audit trails — these are all responses to state changes.&lt;/p&gt;
&lt;p&gt;Version 0.2.6 adds phase reactions: callbacks that fire automatically when a variable's phase changes.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux data = [1, 2, 3]

react(data, |phase, val| {
    print("data is now " + phase + ": " + to_string(val))
})

freeze(data)   // prints: "data is now crystal: [1, 2, 3]"
thaw(data)     // prints: "data is now fluid: [1, 2, 3]"
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The callback receives two arguments: the new phase name (as a string — "crystal", "fluid") and a deep clone of the current value. Multiple callbacks can be registered on the same variable, and they fire in registration order:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux counter = 0

react(counter, |phase, val| {
    print("logger: counter is now " + phase)
})

react(counter, |phase, val| {
    if phase == "crystal" {
        print("audit: counter finalized at " + to_string(val))
    }
})

counter = 42
freeze(counter)
// prints:
//   logger: counter is now crystal
//   audit: counter finalized at 42
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Reactions also fire during bond cascades. If variable B is bonded to A and has a reaction registered, freezing A will cascade to B and trigger B's reaction:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux primary = Map::new()
flux replica = Map::new()

bond(primary, replica)

react(replica, |phase, val| {
    print("replica transitioned to " + phase)
})

freeze(primary)
// prints: "replica transitioned to crystal"
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is a powerful combination. Bonds handle the &lt;em&gt;coordination&lt;/em&gt; of transitions, and reactions handle the &lt;em&gt;observation&lt;/em&gt;. Together they let you build systems where phase changes propagate and trigger side effects in a predictable, declarative way.&lt;/p&gt;
&lt;p&gt;Use &lt;code&gt;unreact()&lt;/code&gt; to remove all reactions from a variable:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;react(data, |phase, val| { print("fired") })
unreact(data)
freeze(data)  // no output — reaction was removed
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If a reaction callback throws an error, it propagates as a reaction error, giving you a clean way to handle failures in the observation chain.&lt;/p&gt;
&lt;h3&gt;Temporal Values: Phase History and Time Travel&lt;/h3&gt;
&lt;p&gt;The last piece of the phase system (also v0.2.5) is temporal values — the ability to track a variable's phase transitions and value changes over time.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux counter = 0
track("counter")

counter = 10
counter = 20
freeze(counter)

let history = phases("counter")
// [{phase: "fluid", value: 0},
//  {phase: "fluid", value: 10},
//  {phase: "fluid", value: 20},
//  {phase: "crystal", value: 20}]
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;track()&lt;/code&gt; function enables recording for a named variable. Every assignment and phase transition creates a snapshot. The &lt;code&gt;phases()&lt;/code&gt; function returns the full history as an array of maps, and &lt;code&gt;rewind()&lt;/code&gt; lets you retrieve past values by offset:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux x = 100
track("x")
x = 200
x = 300

print(rewind("x", 0))  // 300 (current)
print(rewind("x", 1))  // 200 (one step back)
print(rewind("x", 2))  // 100 (two steps back)
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Temporal values serve primarily as a debugging and auditing tool. When something goes wrong with a frozen value, you can inspect its history to see what mutations happened before the freeze. When testing phase-dependent dispatch, you can verify that the right transitions occurred. In production systems, you can use temporal tracking for audit logs or undo functionality.&lt;/p&gt;
&lt;h3&gt;The Bigger Picture: Why This Matters&lt;/h3&gt;
&lt;p&gt;Most programming languages treat mutability as a compiler concern — something to check at build time and forget about. Lattice treats it as a runtime property with the same richness as types or values. This opens up patterns that are difficult or impossible in other languages:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Gradual freezing.&lt;/strong&gt; Data starts fluid, accumulates state through a pipeline, and freezes when it's complete. Contracts validate at the boundary. Bonds ensure related data transitions together. This maps naturally to request processing, form building, transaction assembly, and configuration loading.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Observable state transitions.&lt;/strong&gt; Reactions let you attach behavior to phase changes without coupling the code that freezes with the code that responds. A module can register a reaction on shared data without knowing who or when the freeze will happen.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Phase-aware APIs.&lt;/strong&gt; Functions can express their mutability requirements in their signatures and dispatch based on the caller's data. Libraries can offer mutable and immutable code paths transparently.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Auditability.&lt;/strong&gt; Temporal tracking provides a built-in mechanism for understanding how data evolved, without external logging infrastructure.&lt;/p&gt;
&lt;p&gt;None of these features require abandoning the simple mental model. At its core, Lattice still has fluid and crystal — mutable and immutable. Everything else is opt-in machinery for programs that need more control.&lt;/p&gt;
&lt;h3&gt;Comparison with Other Approaches&lt;/h3&gt;
&lt;p&gt;It's worth comparing this to how other languages handle mutability:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Rust&lt;/th&gt;
&lt;th&gt;JavaScript&lt;/th&gt;
&lt;th&gt;Lattice&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mutability declaration&lt;/td&gt;
&lt;td&gt;&lt;code&gt;let&lt;/code&gt; / &lt;code&gt;let mut&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;const&lt;/code&gt; / &lt;code&gt;let&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;fix&lt;/code&gt; / &lt;code&gt;flux&lt;/code&gt; / &lt;code&gt;let&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enforcement&lt;/td&gt;
&lt;td&gt;Compile-time&lt;/td&gt;
&lt;td&gt;Runtime (shallow)&lt;/td&gt;
&lt;td&gt;Runtime (deep)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase transitions&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Object.freeze()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;freeze()&lt;/code&gt; / &lt;code&gt;thaw()&lt;/code&gt; / &lt;code&gt;clone()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Validation on freeze&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Crystallization contracts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coordinated freezing&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Phase bonds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transition observation&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Phase reactions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase-dependent dispatch&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Overload resolution by phase&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;History tracking&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Temporal values&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Rust's borrow checker is more powerful for preventing data races at compile time — Lattice doesn't attempt that. JavaScript's &lt;code&gt;Object.freeze()&lt;/code&gt; is more pragmatic but also more limited — it's shallow, provides no observation, and offers no coordination. Lattice occupies a different point in the design space: mutability as a &lt;em&gt;domain concept&lt;/em&gt; rather than a &lt;em&gt;compiler constraint&lt;/em&gt;.&lt;/p&gt;
&lt;h3&gt;Implementation Notes&lt;/h3&gt;
&lt;p&gt;The phase system is implemented in C as part of Lattice's tree-walking interpreter. Phase tags are stored directly on values (&lt;code&gt;VTAG_FLUID&lt;/code&gt;, &lt;code&gt;VTAG_CRYSTAL&lt;/code&gt;, &lt;code&gt;VTAG_UNPHASED&lt;/code&gt;), so phase checks are single comparisons. Bonds are stored as a dynamic array of &lt;code&gt;BondEntry&lt;/code&gt; structs on the evaluator, each mapping a target variable name to its dependencies. Reactions use a similar structure — &lt;code&gt;ReactionEntry&lt;/code&gt; maps a variable name to an array of callback closures. Temporal tracking stores &lt;code&gt;HistorySnapshot&lt;/code&gt; arrays containing phase names and deep-cloned values.&lt;/p&gt;
&lt;p&gt;The deep cloning is important throughout. Contract validation receives a clone so it can't mutate the original. Reaction callbacks receive clones so observers can't interfere with each other. Temporal snapshots are clones so history is independent of current state. This means the phase system has allocation costs proportional to value size, but it also means the invariants are strong — no spooky action at a distance.&lt;/p&gt;
&lt;p&gt;Freeze cascading through bonds is recursive, and reactions fire during cascading, so a single &lt;code&gt;freeze()&lt;/code&gt; call can trigger an arbitrary chain of transitions and callbacks. Error propagation is straightforward: if any reaction throws, the error surfaces immediately with context about which reaction failed.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The phase system is reaching a natural plateau in terms of core features. There are a few directions I'm considering:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Partial freezing&lt;/strong&gt; already exists in a basic form — you can freeze individual struct fields or map keys while leaving the container mutable. Expanding this to support more granular control (freeze all fields matching a pattern, freeze a subtree) could be useful for large data structures.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Phase-aware pattern matching&lt;/strong&gt; lets you match on phase in &lt;code&gt;match&lt;/code&gt; expressions using &lt;code&gt;~&lt;/code&gt; and &lt;code&gt;*&lt;/code&gt; qualifiers. This is already implemented but could be extended with more complex phase patterns.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Compile-time phase inference&lt;/strong&gt; is a longer-term goal. If the interpreter can prove that a value is always crystal by a certain point, it could skip runtime checks. This would bring some of Rust's static guarantees to Lattice without requiring explicit lifetime annotations.&lt;/p&gt;
&lt;p&gt;For now, the phase system provides a cohesive set of tools for working with mutability as a first-class concept. Whether you're building a configuration loader that validates before committing, a pipeline that coordinates related state transitions, or a reactive system that responds to phase changes, Lattice gives you the vocabulary and the enforcement to do it declaratively.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Lattice is open source and available at &lt;a href="https://baud.rs/bwvnYT"&gt;lattice-lang.org&lt;/a&gt;. The language compiles and runs on macOS and Linux with no dependencies beyond a C11 compiler. You can try it in your browser via the &lt;a href="https://baud.rs/odS816"&gt;playground&lt;/a&gt;, or clone the repo and run &lt;code&gt;make &amp;amp;&amp;amp; ./clat&lt;/code&gt; to start the REPL.&lt;/p&gt;</description><category>language design</category><category>lattice</category><category>mutability</category><category>phase system</category><category>programming languages</category><category>type systems</category><guid>https://tinycomputers.io/posts/mutability-as-a-first-class-concept-the-lattice-phase-system.html</guid><pubDate>Sat, 14 Feb 2026 23:00:00 GMT</pubDate></item><item><title>Porting CP/M to the Arduino Giga R1: When Level Converters Fight Back</title><link>https://tinycomputers.io/posts/cpm-on-arduino-giga-r1-wifi.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/cpm-on-arduino-giga-r1-wifi_tts.mp3" type="audio/mpeg"&gt;

&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;18 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;My &lt;a href="https://tinycomputers.io/posts/cpm-on-physical-retroshield-z80.html"&gt;previous CP/M build&lt;/a&gt; runs great. A real Z80 on a &lt;a href="https://baud.rs/87wbBL"&gt;RetroShield&lt;/a&gt;, DRAM shield for 64KB, SD card for disk images, all sitting on an &lt;a href="https://baud.rs/DzXGr4"&gt;Arduino Mega 2560&lt;/a&gt;. It boots CP/M, runs Zork, the works. So naturally I decided to make my life harder.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/giga-level-converter-retroshield.jpeg" alt="The Arduino Giga R1 WiFi (blue) mounted on the custom red level converter PCB, with the RetroShield Z80 partially inserted on the right" style="float: right; width: 55%; max-width: 500px; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://baud.rs/poSQeo"&gt;Arduino Giga R1 WiFi&lt;/a&gt; is a significantly more powerful board: a dual-core STM32H747 running at 480MHz, 1MB of internal SRAM, 8MB of SDRAM, and built-in WiFi. Where the Mega's 16MHz AVR crawled through bus cycles, the Giga could theoretically fly. And all that internal RAM means we can ditch the &lt;a href="https://baud.rs/iJn6Sd"&gt;KDRAM2560&lt;/a&gt; DRAM shield entirely — just a 64KB byte array in SRAM.&lt;/p&gt;
&lt;p&gt;There was just one problem. The Giga runs at 3.3V logic. The Z80 runs at 5V. And as I'd learn the hard way, bridging that gap would consume more debugging hours than everything else combined.&lt;/p&gt;
&lt;p&gt;This post documents the port: the hardware stack, the architectural pivot to WiFi-based disk I/O, the level converter nightmare, the shadow register workaround that saved the project, and the Rust sector server that ties it all together. If you want the backstory on the &lt;a href="https://tinycomputers.io/posts/fiverr-pcb-design-arduino-giga-shield.html"&gt;custom level converter PCB&lt;/a&gt;, designed by &lt;a href="https://baud.rs/tkQg41"&gt;Elijah on Fiverr&lt;/a&gt;, that's a separate post.&lt;/p&gt;
&lt;div class="sponsor-widget"&gt;
&lt;div class="sponsor-widget-header"&gt;&lt;a href="https://baud.rs/youwpy"&gt;&lt;img src="https://tinycomputers.io/images/pcbway-logo.png" alt="PCBWay" style="height: 22px; vertical-align: middle; margin-right: 8px;"&gt;&lt;/a&gt; Sponsored Hardware&lt;/div&gt;
&lt;p&gt;This project was made possible by &lt;a href="https://baud.rs/youwpy"&gt;PCBWay&lt;/a&gt;, who sponsored the manufacturing of the custom level converter shield. PCBWay offers PCB prototyping, assembly, CNC machining, and 3D printing services — from one-off prototypes to production runs. Their support covered the fabrication costs for this board, letting me focus on the engineering instead of the budget. If you have a PCB design ready to go, check them out at &lt;a href="https://baud.rs/youwpy"&gt;pcbway.com&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;h3&gt;The Hardware Stack&lt;/h3&gt;
&lt;p&gt;The upgraded system has fewer physical components than the Mega version, but more going on under the hood.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Mega Version&lt;/th&gt;
&lt;th&gt;Giga Version&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Processor&lt;/td&gt;
&lt;td&gt;ATmega2560, 16MHz&lt;/td&gt;
&lt;td&gt;STM32H747, 480MHz Cortex-M7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logic Level&lt;/td&gt;
&lt;td&gt;5V native&lt;/td&gt;
&lt;td&gt;3.3V (needs level converter)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Z80 RAM&lt;/td&gt;
&lt;td&gt;KDRAM2560 DRAM shield&lt;/td&gt;
&lt;td&gt;64KB byte array in SRAM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bus I/O&lt;/td&gt;
&lt;td&gt;AVR port registers (parallel)&lt;/td&gt;
&lt;td&gt;digitalRead/Write (per-pin)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disk Storage&lt;/td&gt;
&lt;td&gt;SD card (software SPI)&lt;/td&gt;
&lt;td&gt;WiFi to sector server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extra Hardware&lt;/td&gt;
&lt;td&gt;DRAM shield + SD adapter&lt;/td&gt;
&lt;td&gt;Level converter board only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The RetroShield Z80 plugs in the same way — it uses the same physical pin positions. The level converter board sits between the Giga and the RetroShield, translating all bus signals between 3.3V and 5V. The board uses &lt;a href="https://baud.rs/hY6ydl"&gt;TXB0108&lt;/a&gt; bidirectional level converters, which sense the drive direction automatically.&lt;/p&gt;
&lt;p&gt;At least, that's what they're supposed to do.&lt;/p&gt;
&lt;h4&gt;CP/M Memory Map&lt;/h4&gt;
&lt;p&gt;The Z80 sees the same 64KB address space as on the Mega:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="mf"&gt;0000&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;00&lt;/span&gt;&lt;span class="n"&gt;FF&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;Page&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Zero&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jump&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FCBs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mf"&gt;0100&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;DFFF&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;TPA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Transient&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Program&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Area&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;56&lt;/span&gt;&lt;span class="n"&gt;KB&lt;/span&gt;
&lt;span class="n"&gt;E000&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;E7FF&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;CCP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Console&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Command&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Processor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;E800&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;F5FF&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;BDOS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Basic&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Disk&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Operating&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kr"&gt;Sys&lt;/span&gt;&lt;span class="n"&gt;tem&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;F600&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;FFFF&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;BIOS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Basic&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;I&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;O&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kr"&gt;Sys&lt;/span&gt;&lt;span class="n"&gt;tem&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;On the Mega, this lived in the KDRAM2560's dynamic RAM with its complex refresh timing. On the Giga, it's just:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;byte&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80RAM&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;65536&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;One line. The Giga's 1MB of internal SRAM makes the entire DRAM shield unnecessary.&lt;/p&gt;
&lt;h3&gt;WiFi Instead of SD&lt;/h3&gt;
&lt;p&gt;The original plan was to keep the SD card. The Mega version used software SPI on pins 4-7 since the RetroShield claims the hardware SPI pins. On the Giga, the RetroShield still claims all 76 digital pins — but this time there are no spare analog pins conveniently routed for software SPI either.&lt;/p&gt;
&lt;p&gt;The initial idea was to wire a MicroSD adapter to the analog pins. But that meant more custom wiring on top of the already-custom level converter board. And anyone trying to replicate this project would need to solder yet another adapter.&lt;/p&gt;
&lt;p&gt;Then it hit me: the Giga has WiFi built in. Why not serve disk images over the network?&lt;/p&gt;
&lt;p&gt;The more I thought about it, the more sense it made:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;No additional hardware.&lt;/strong&gt; WiFi is built into the Giga. Zero extra wiring.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Better reproducibility.&lt;/strong&gt; The project already requires a custom level converter. Adding another custom wiring job makes it harder for others to build.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The sector server is just software.&lt;/strong&gt; Anyone can download and run a binary. Compare that to soldering an SD adapter to analog pins.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;It plays to the Giga's strengths.&lt;/strong&gt; If you're upgrading from a Mega, you might as well use what makes the Giga special: WiFi and RAM.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Future potential.&lt;/strong&gt; The 8MB SDRAM could cache entire disk images downloaded over WiFi at boot. CP/M on a RAM disk — faster than any physical media the Z80 ever had access to.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The tradeoff is that the system is no longer self-contained. It needs a WiFi network and a computer running the sector server. For a project that already requires a custom level converter PCB, this felt acceptable.&lt;/p&gt;
&lt;h3&gt;The Level Converter Problem&lt;/h3&gt;
&lt;p&gt;This is where the project nearly died.&lt;/p&gt;
&lt;p&gt;The TXB0108 is a popular bidirectional level converter. It uses auto-direction sensing — whichever side drives a signal stronger "wins," and the converter translates accordingly. This works well for simple I2C and SPI signals where direction is clear.&lt;/p&gt;
&lt;p&gt;It does not work well for a Z80 bus.&lt;/p&gt;
&lt;h4&gt;Signal-by-Signal Breakdown&lt;/h4&gt;
&lt;p&gt;I built a pin diagnostic sketch to test each signal through the level converter. Here's what I found:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Pin&lt;/th&gt;
&lt;th&gt;Expected&lt;/th&gt;
&lt;th&gt;Actual&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MREQ_N&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;Toggles on memory access&lt;/td&gt;
&lt;td&gt;Toggles correctly&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WR_N&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;Toggles on writes&lt;/td&gt;
&lt;td&gt;Memory writes only&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RD_N&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;Toggles on reads&lt;/td&gt;
&lt;td&gt;Stuck HIGH always&lt;/td&gt;
&lt;td&gt;Broken&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IORQ_N&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;td&gt;Toggles on I/O access&lt;/td&gt;
&lt;td&gt;Stuck HIGH always&lt;/td&gt;
&lt;td&gt;Broken&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Address Bus&lt;/td&gt;
&lt;td&gt;22-37&lt;/td&gt;
&lt;td&gt;16-bit address&lt;/td&gt;
&lt;td&gt;All bits correct&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Bus (A→Z80)&lt;/td&gt;
&lt;td&gt;42-49&lt;/td&gt;
&lt;td&gt;Arduino drives data&lt;/td&gt;
&lt;td&gt;Works&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Bus (Z80→A)&lt;/td&gt;
&lt;td&gt;42-49&lt;/td&gt;
&lt;td&gt;Z80 drives data&lt;/td&gt;
&lt;td&gt;Memory writes only&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Two signals completely stuck. Two signals working only half the time. The auto-direction sensing that makes the TXB0108 convenient is exactly what makes it unreliable here — the Z80's bus signals have complex timing relationships where drive strength varies throughout the cycle.&lt;/p&gt;
&lt;h4&gt;RD_N: Permanently Stuck&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;RD_N&lt;/code&gt; on pin 53 never toggles. It reads HIGH regardless of what the Z80 is doing. Pin 53 is the hardware SPI SCK pin on the Mega, and may have internal pull-ups or other conflicts on the Giga's STM32. Combined with the TXB0108's direction sensing, the signal simply can't get through.&lt;/p&gt;
&lt;p&gt;The fix is trivial once you realize it: during any bus cycle, the Z80 is either reading or writing. They're mutually exclusive. So:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#define STATE_RD_N (!STATE_WR_N)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If &lt;code&gt;WR_N&lt;/code&gt; isn't asserted, it must be a read. This works for all standard Z80 bus operations.&lt;/p&gt;
&lt;h4&gt;IORQ_N: The Big One&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;IORQ_N&lt;/code&gt; on pin 39 is also stuck HIGH. This is the signal that tells us the Z80 wants to talk to a peripheral — console I/O, disk I/O, everything. Without it, we have no way to detect I/O operations through the bus.&lt;/p&gt;
&lt;p&gt;But we have something the bus doesn't know about: we control the Z80's memory. Every byte the Z80 fetches comes from &lt;code&gt;z80RAM[]&lt;/code&gt;, which we serve. We can read the opcode stream and know exactly what instruction the Z80 is executing — including &lt;code&gt;OUT (n), A&lt;/code&gt; (opcode &lt;code&gt;0xD3&lt;/code&gt;) and &lt;code&gt;IN A, (n)&lt;/code&gt; (opcode &lt;code&gt;0xDB&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;So instead of watching for IORQ_N to go low, we watch for the Z80 to fetch an I/O instruction from memory.&lt;/p&gt;
&lt;h4&gt;Data Bus: Invisible During I/O OUT&lt;/h4&gt;
&lt;p&gt;This was the subtlest failure. During &lt;code&gt;OUT (n), A&lt;/code&gt;, the Z80 puts the A register value on the data bus. We should be able to read it. But we can't.&lt;/p&gt;
&lt;p&gt;The TXB0108 latches the last strongly-driven value. Since the Arduino drives the data bus during memory reads (pushing 3.3V through the converter to the Z80's 5V side), the converter's direction gets stuck. When the Z80 tries to drive data back during an I/O write, its 5V output can't overcome the converter's latched direction.&lt;/p&gt;
&lt;p&gt;I confirmed this by sampling the data bus at every clock tick during an &lt;code&gt;OUT&lt;/code&gt; cycle. It showed &lt;code&gt;0x80&lt;/code&gt; (the last value the Arduino had driven — a port number) at every single tick. Zero variation. The Z80's output was completely invisible.&lt;/p&gt;
&lt;h4&gt;WR_N: Only Works Sometimes&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;WR_N&lt;/code&gt; toggles correctly during memory write cycles but never during I/O write cycles. The timing or drive strength during I/O is just different enough that the converter can't track it.&lt;/p&gt;
&lt;p&gt;This meant we couldn't use WR_N to detect when an I/O write was complete either. Every signal we'd normally use for I/O detection was either stuck or unreliable.&lt;/p&gt;
&lt;h3&gt;Shadow Register Tracking&lt;/h3&gt;
&lt;p&gt;The solution to the data bus problem is to never read the data bus during I/O at all. Instead, we maintain a shadow copy of the Z80's A register by watching the opcode stream.&lt;/p&gt;
&lt;p&gt;Since we serve every byte the Z80 reads from &lt;code&gt;z80RAM[]&lt;/code&gt;, we can decode the instruction stream in real time. When we see &lt;code&gt;LD A, 0x03&lt;/code&gt; (opcode &lt;code&gt;0x3E 0x03&lt;/code&gt;), we set &lt;code&gt;shadowA = 0x03&lt;/code&gt;. When we later see &lt;code&gt;OUT (0x10), A&lt;/code&gt; (opcode &lt;code&gt;0xD3 0x10&lt;/code&gt;), we already know A contains &lt;code&gt;0x03&lt;/code&gt; — no need to read the bus.&lt;/p&gt;
&lt;h4&gt;M1 Detection&lt;/h4&gt;
&lt;p&gt;The first challenge is knowing when the Z80 is fetching an opcode (M1 cycle) versus reading data. We detect M1 by watching for the first MREQ-active read after a MREQ-inactive cycle — the rising-to-falling edge of MREQ activity:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mreq_active&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;digitalRead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uP_MREQ_N&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mreq_active&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;prevMREQ_active&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;opcodeSkip&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;opcode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80RAM&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;uP_ADDR&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// This is an M1 fetch — decode the instruction&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;The opcodeSkip Counter&lt;/h4&gt;
&lt;p&gt;After M1, the Z80 runs a refresh cycle (which reuses the address bus — when the I register is 0 after reset, refresh addresses overlap with boot code at 0x0000+). If we mistook a refresh cycle for another M1, we'd corrupt the shadow registers by "decoding" whatever data happened to be at the refresh address.&lt;/p&gt;
&lt;p&gt;The fix is a 256-entry lookup table that tells us how many MREQ-active read cycles to skip after each opcode:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;skipCount&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// 0x00-0x0F: NOP=1, LD BC,nn=3, ...&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... 256 entries covering every Z80 opcode&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A 1-byte instruction like &lt;code&gt;NOP&lt;/code&gt; gets skip=1 (refresh only). A 3-byte instruction like &lt;code&gt;LD HL, nn&lt;/code&gt; gets skip=3 (refresh + 2 operand reads). I/O instructions get skip=0 because their skipping is handled by the I/O state machine. After each M1, we set &lt;code&gt;opcodeSkip = skipCount[opcode]&lt;/code&gt; and decrement it on each subsequent MREQ-active read cycle.&lt;/p&gt;
&lt;h4&gt;Register Tracking&lt;/h4&gt;
&lt;p&gt;We don't need to track every Z80 register — just enough to know what value A holds when an &lt;code&gt;OUT&lt;/code&gt; happens. The BIOS uses a relatively small set of instructions to load registers before I/O:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;switch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opcode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// === IO Instructions ===&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xD3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// OUT (n), A&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80RAM&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;uP_ADDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;handle_io_write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xDB&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// IN A, (n)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80RAM&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;uP_ADDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;ioResponse&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;handle_io_read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ioResponse&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// IN updates A&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// === A register tracking ===&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x3E&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80RAM&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;uP_ADDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// LD A, n&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xAF&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;                     &lt;/span&gt;&lt;span class="c1"&gt;// XOR A&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x79&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;// LD A, C&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x78&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowB&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;// LD A, B&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x7C&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowH&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;// LD A, H&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x7D&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;// LD A, L&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x7E&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80RAM&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;shadowH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowL&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xE6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80RAM&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;uP_ADDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// AND n&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xF6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80RAM&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;uP_ADDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// OR n&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x2F&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;// CPL&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x3C&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;                         &lt;/span&gt;&lt;span class="c1"&gt;// INC A&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x3D&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowA&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;                         &lt;/span&gt;&lt;span class="c1"&gt;// DEC A&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// === B, C, H, L tracking (needed because A loads from them) ===&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x06&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowB&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80RAM&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;uP_ADDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// LD B, n&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x0E&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shadowC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80RAM&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;uP_ADDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// LD C, n&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x21&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// LD HL, nn&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;shadowL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80RAM&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;uP_ADDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;shadowH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80RAM&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;uP_ADDR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... plus INC/DEC HL, LD between registers, etc.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This isn't a full Z80 emulator — it's just enough to track register flow from loads to I/O instructions. If the BIOS uses an instruction we don't track, shadowA will be wrong and the I/O operation will get bad data. But the CP/M BIOS is a known codebase, so we can enumerate exactly which instructions it uses and make sure they're covered.&lt;/p&gt;
&lt;h4&gt;IO State Machine&lt;/h4&gt;
&lt;p&gt;For &lt;code&gt;OUT&lt;/code&gt;, we handle the write immediately when we detect &lt;code&gt;0xD3&lt;/code&gt; at M1 — we already know the port (from RAM) and the data (from shadowA). Then we let the skip counter consume the remaining machine cycles.&lt;/p&gt;
&lt;p&gt;For &lt;code&gt;IN&lt;/code&gt;, it's trickier because the Z80 needs to actually read our response off the data bus. We use a small state machine:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;IO_IDLE&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;detect&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;xDB&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;IO_IN_PENDING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="n"&gt;handle_io_read&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;IO_IN_PENDING&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;opcodeSkip&lt;/span&gt; &lt;span class="n"&gt;reaches&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;IO_IN_DRIVING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drive&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;IO_IN_DRIVING&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;next&lt;/span&gt; &lt;span class="n"&gt;MREQ&lt;/span&gt; &lt;span class="n"&gt;goes&lt;/span&gt; &lt;span class="n"&gt;active&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;IO_IDLE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;release&lt;/span&gt; &lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resume&lt;/span&gt; &lt;span class="n"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;During &lt;code&gt;IO_IN_DRIVING&lt;/code&gt;, we keep the response byte on the data bus and ignore other processing until the Z80's next M1 fetch (signaled by MREQ going active again).&lt;/p&gt;
&lt;h3&gt;The Sector Server&lt;/h3&gt;
&lt;p&gt;With the Arduino side handling bus-level I/O through shadow registers, the disk I/O ports translate to network messages. The Z80 BIOS writes a filename character-by-character to port &lt;code&gt;0x13&lt;/code&gt;, writes seek bytes to ports &lt;code&gt;0x14&lt;/code&gt;/&lt;code&gt;0x15&lt;/code&gt;/&lt;code&gt;0x19&lt;/code&gt;, sets the DMA address via ports &lt;code&gt;0x16&lt;/code&gt;/&lt;code&gt;0x17&lt;/code&gt;, then triggers a block read/write on port &lt;code&gt;0x18&lt;/code&gt;. The Arduino accumulates this state, then forwards the operation to the sector server over TCP.&lt;/p&gt;
&lt;h4&gt;Protocol&lt;/h4&gt;
&lt;p&gt;The server speaks a simple binary protocol that mirrors the BIOS port commands:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Byte&lt;/th&gt;
&lt;th&gt;Payload&lt;/th&gt;
&lt;th&gt;Response&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OPEN_READ&lt;/td&gt;
&lt;td&gt;0x01&lt;/td&gt;
&lt;td&gt;filename\0&lt;/td&gt;
&lt;td&gt;status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CREATE&lt;/td&gt;
&lt;td&gt;0x02&lt;/td&gt;
&lt;td&gt;filename\0&lt;/td&gt;
&lt;td&gt;status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OPEN_APPEND&lt;/td&gt;
&lt;td&gt;0x03&lt;/td&gt;
&lt;td&gt;filename\0&lt;/td&gt;
&lt;td&gt;status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SEEK_START&lt;/td&gt;
&lt;td&gt;0x04&lt;/td&gt;
&lt;td&gt;(none)&lt;/td&gt;
&lt;td&gt;status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLOSE&lt;/td&gt;
&lt;td&gt;0x05&lt;/td&gt;
&lt;td&gt;(none)&lt;/td&gt;
&lt;td&gt;status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DIR&lt;/td&gt;
&lt;td&gt;0x06&lt;/td&gt;
&lt;td&gt;(none)&lt;/td&gt;
&lt;td&gt;status + listing\0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OPEN_RW&lt;/td&gt;
&lt;td&gt;0x07&lt;/td&gt;
&lt;td&gt;filename\0&lt;/td&gt;
&lt;td&gt;status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SEEK&lt;/td&gt;
&lt;td&gt;0x08&lt;/td&gt;
&lt;td&gt;3 bytes LE offset&lt;/td&gt;
&lt;td&gt;status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;READ_BLOCK&lt;/td&gt;
&lt;td&gt;0x10&lt;/td&gt;
&lt;td&gt;(none)&lt;/td&gt;
&lt;td&gt;status + 128 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WRITE_BLOCK&lt;/td&gt;
&lt;td&gt;0x11&lt;/td&gt;
&lt;td&gt;128 bytes&lt;/td&gt;
&lt;td&gt;status&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Status is a single byte: &lt;code&gt;0x00&lt;/code&gt; for OK, &lt;code&gt;0x01&lt;/code&gt; for error. Block size is 128 bytes — a CP/M sector.&lt;/p&gt;
&lt;h4&gt;Implementation&lt;/h4&gt;
&lt;p&gt;The server is written in Rust with minimal dependencies (just &lt;code&gt;socket2&lt;/code&gt; for &lt;code&gt;SO_REUSEADDR&lt;/code&gt;). It started single-threaded, which worked fine until the Giga crashed and rebooted. The old TCP connection would hang in the server's blocking &lt;code&gt;read_exact()&lt;/code&gt;, and the Giga's new connection attempt would queue indefinitely. Classic deadlock.&lt;/p&gt;
&lt;p&gt;The fix was threaded connections with timeouts:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;incoming&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;match&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nb"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;base_dir&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Arc&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;base_dir&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;spawn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;move&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;handle_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;base_dir&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nb"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;eprintln!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"[!] Accept error: {}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Each client gets its own thread. The accept loop never blocks. Read timeouts (30s mid-command, 300s idle) automatically drop dead connections. &lt;code&gt;SO_REUSEADDR&lt;/code&gt; lets the server restart instantly without port conflicts.&lt;/p&gt;
&lt;p&gt;The server also sanitizes filenames (rejecting path traversal and special characters) and tracks session metrics:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gh"&gt;Session Summary&lt;/span&gt;
&lt;span class="gh"&gt;---------------&lt;/span&gt;
Duration:        00:00:00
Commands:        11
Files opened:    2
Seeks:           1
Sectors read:    5
Sectors written: 1
Bytes read:      640 (640 B)
Bytes written:   128 (128 B)
Errors:          0
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Running It&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Build&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;sector_server&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;cargo&lt;span class="w"&gt; &lt;/span&gt;build&lt;span class="w"&gt; &lt;/span&gt;--release

&lt;span class="c1"&gt;# Serve CP/M files on port 9000&lt;/span&gt;
./sector_server/target/release/sector_server&lt;span class="w"&gt; &lt;/span&gt;./kz80_cpm&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;9000&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The directory needs &lt;code&gt;boot.bin&lt;/code&gt;, &lt;code&gt;CPM.SYS&lt;/code&gt;, and the disk images (&lt;code&gt;A.DSK&lt;/code&gt;, &lt;code&gt;B.DSK&lt;/code&gt;, etc.).&lt;/p&gt;
&lt;h3&gt;Proof of Concept&lt;/h3&gt;
&lt;p&gt;Before wiring up the full shadow register machinery, I wrote a minimal POC sketch that tests just the WiFi + sector server communication. It connects, opens files, reads blocks, seeks, and closes — verifying the network layer end-to-end without any Z80 involvement.&lt;/p&gt;
&lt;p&gt;All 8 tests passed:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;=== Sector Server POC ===

WiFi: connecting to TP-Link_A8A8 ... OK (192.168.0.75)
Server: connecting to 192.168.0.248:9000 ... OK

--- Test 1: OPEN_READ boot.bin ---
  Status: OK
--- Test 2: READ_BLOCK (128 bytes) ---
  Status: OK
  Data:
F3 31 00 04 3E 03 D3 80 3E 15 D3 80 21 83 00 CD
7B 00 21 43 01 CD 70 00 3E 01 D3 10 DB 11 E6 02
--- Test 3: READ_BLOCK (next 128 bytes) ---
  Status: OK
  Data:
23 18 F8 0D 0A 52 65 74 72 6F 53 68 69 65 6C 64
20 5A 38 30 20 42 6F 6F 74 20 4C 6F 61 64 65 72
--- Test 4: CLOSE ---
  Status: OK
--- Test 5: OPEN_RW A.DSK ---
  Status: OK
--- Test 6: SEEK offset=6656 ---
  Status: OK
--- Test 7: READ_BLOCK (directory sector) ---
  Status: OK
  Data:
00 5A 4F 52 4B 31 20 20 20 43 4F 4D 00 00 00 44
--- Test 8: CLOSE ---
  Status: OK

=== All tests complete ===
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Some things to note: the first bytes of &lt;code&gt;boot.bin&lt;/code&gt; are &lt;code&gt;F3 31 00 04&lt;/code&gt;, which disassembles to &lt;code&gt;DI; LD SP, 0x0400&lt;/code&gt; — the boot loader's first instructions, correct. The second block contains the ASCII string "RetroShield Z80 Boot Loader." And the directory sector from &lt;code&gt;A.DSK&lt;/code&gt; shows &lt;code&gt;ZORK1   COM&lt;/code&gt; — Zork is on the disk, waiting.&lt;/p&gt;
&lt;h3&gt;Boot Sequence&lt;/h3&gt;
&lt;p&gt;When the full sketch runs, the expected boot sequence is:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="o"&gt;======================================&lt;/span&gt;
&lt;span class="n"&gt;RetroShield&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Z80&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CP&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;2.2&lt;/span&gt;
&lt;span class="n"&gt;Arduino&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Giga&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;R1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;WiFi&lt;/span&gt;
&lt;span class="o"&gt;======================================&lt;/span&gt;

&lt;span class="n"&gt;Z80&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;RAM&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;OK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="n"&gt;KB&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SRAM&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;WiFi&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="n"&gt;OK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;192.168&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="n"&gt;OK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;192.168&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mf"&gt;0.248&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;9000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Boot&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="n"&gt;OK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;loaded&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;Starting&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Z80&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After "Starting Z80...", the Z80 boot loader runs. It opens &lt;code&gt;CPM.SYS&lt;/code&gt; via the sector server, loads the CCP and BDOS into memory at &lt;code&gt;0xE000&lt;/code&gt;, jumps to the BIOS cold boot at &lt;code&gt;0xF600&lt;/code&gt;, and if everything works, prints a banner and the &lt;code&gt;A&amp;gt;&lt;/code&gt; prompt.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The WiFi communication works. The shadow register tracking mechanism works — I've confirmed partial serial output from the Z80 (the ACIA console I/O goes through the same shadow register path). The instruction skip counter prevents refresh cycle confusion. All the pieces are in place.&lt;/p&gt;
&lt;p&gt;What remains is completeness testing. The shadow register tracking needs to cover every instruction the BIOS and CCP actually use. If the Z80 executes an instruction that modifies A through a path we don't track — say, a &lt;code&gt;POP AF&lt;/code&gt; or &lt;code&gt;EX AF, AF'&lt;/code&gt; — the shadow will be wrong and the next &lt;code&gt;OUT&lt;/code&gt; will send garbage. The fix is straightforward (add more cases to the switch statement) but requires methodical testing.&lt;/p&gt;
&lt;p&gt;There's also the 8MB SDRAM sitting unused on the Giga. Once CP/M boots reliably, the obvious next step is downloading entire disk images into SDRAM over WiFi at startup. At that point, all disk I/O becomes memory-mapped — no network latency, no TCP overhead. CP/M running at memory speed on a RAM disk, served from a real Z80 that thinks it's talking to a floppy drive.&lt;/p&gt;
&lt;p&gt;Once the project is stable and CP/M boots reliably, I plan to open-source the full KiCad PCB design files for the level converter shield, along with the Arduino sketch and sector server. The level converter board uses nine TXB0108PW ICs in TSSOP-20 packages to translate all 72 signals between the Giga's 3.3V and the RetroShield's 5V — it's a straightforward two-layer design that anyone could get fabricated.&lt;/p&gt;
&lt;p&gt;The debugging journey from "IORQ_N is stuck" to "let's just decode the entire instruction stream in software" was not the path I expected to take. But it turned a level converter limitation into something arguably more interesting: a system where the Arduino doesn't just babysit the Z80's bus signals, but understands what the Z80 is thinking.&lt;/p&gt;</description><category>arduino</category><category>arduino giga</category><category>cp/m</category><category>hardware</category><category>level shifter</category><category>retro computing</category><category>retroshield</category><category>rust</category><category>sector server</category><category>stm32</category><category>wifi</category><category>z80</category><guid>https://tinycomputers.io/posts/cpm-on-arduino-giga-r1-wifi.html</guid><pubDate>Sat, 14 Feb 2026 21:00:00 GMT</pubDate></item><item><title>Milk-V Mars Review: SiFive's RISC-V Enters the SBC Arena</title><link>https://tinycomputers.io/posts/milk-v-mars-review.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/milk-v-mars-review_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;22 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/milk-v-mars/IMG_4333.jpeg" alt="Milk-V Mars packaging" style="float: right; max-width: 45%; margin: 0 0 20px 25px; border-radius: 4px;"&gt;&lt;/p&gt;
&lt;p&gt;The Milk-V Mars is a compact RISC-V single board computer built around StarFive's JH7110 system-on-chip, featuring four SiFive U74-MC application cores and 8GB of LPDDR4 RAM. Priced affordably and shipping in a Raspberry Pi-like form factor, the Mars targets developers and enthusiasts who want to explore the RISC-V ecosystem on real hardware without investing in high-end development boards.&lt;/p&gt;
&lt;p&gt;This review arrives at an interesting moment for RISC-V single board computers. We've already benchmarked the Orange Pi RV2 with its 8-core Ky(R) X1 processor, and the Milk-V Mars gives us a second data point in the RISC-V SBC landscape - this time with SiFive's cores, the company widely regarded as the pioneer of commercial RISC-V silicon. How does a board built on SiFive's mature U74 core design compare to the newer Ky X1, and where does it land against the ARM and x86 competition? The answers are illuminating, if not entirely encouraging.&lt;/p&gt;
&lt;h3&gt;Hardware Architecture: SiFive U74-MC on StarFive JH7110&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/milk-v-mars/IMG_4337.jpeg" alt="Milk-V Mars board top view" style="float: left; max-width: 45%; margin: 0 25px 20px 0; border-radius: 4px;"&gt;&lt;/p&gt;
&lt;p&gt;At the heart of the Milk-V Mars sits the StarFive JH7110, a 28nm SoC that pairs four SiFive U74-MC application cores with a single SiFive S7 monitor core. The U74 is SiFive's 64-bit application processor core implementing the RV64GC instruction set (rv64imafdc) - the standard RISC-V profile with integer multiplication, atomic operations, single and double-precision floating point, and compressed instructions. The cores feature an in-order, dual-issue, 8-stage pipeline with separate L1 instruction and data caches, backed by a shared 2MB L2 cache. For readers wanting to understand the ISA itself, &lt;a href="https://baud.rs/pbDcC6"&gt;The RISC-V Reader&lt;/a&gt; is an excellent and concise introduction to the architecture and its design rationale.&lt;/p&gt;
&lt;p&gt;The SiFive U74 has an important place in RISC-V history. As one of the first commercially available application-class RISC-V cores, it powered the HiFive Unmatched development board that many early RISC-V adopters used to explore the architecture. The U74-MC variant in the JH7110 is a cost-optimized version without the memory management features of the full U74, targeting embedded and SBC applications rather than server or workstation use.&lt;/p&gt;
&lt;p&gt;However, the U74's in-order pipeline design places it at a fundamental architectural disadvantage compared to modern out-of-order ARM cores. While the Cortex-A76 in a Raspberry Pi 5 can speculatively execute instructions, reorder them for optimal throughput, and predict branches with sophisticated algorithms refined over years of iteration, the U74 executes instructions strictly in program order. This simplicity reduces silicon area and power consumption but significantly limits single-threaded performance - the metric that matters most for many real-world workloads.&lt;/p&gt;
&lt;p&gt;The JH7110 is fabricated on TSMC's 28nm process, which is two to three generations behind the 8nm and 6nm processes used in contemporary ARM SoCs like the Rockchip RK3588. This older process node limits clock speeds, increases power consumption per transistor, and constrains the amount of logic that can be economically integrated into the die.&lt;/p&gt;
&lt;h3&gt;Board Specifications&lt;/h3&gt;
&lt;p&gt;Our test unit came configured with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;SoC:&lt;/strong&gt; StarFive JH7110 (28nm process)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CPU:&lt;/strong&gt; 4x SiFive U74-MC @ 1.5 GHz (RV64GC)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monitor Core:&lt;/strong&gt; 1x SiFive S7&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RAM:&lt;/strong&gt; 8GB LPDDR4&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Storage:&lt;/strong&gt; microSD card (29GB card installed, expandable)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Networking:&lt;/strong&gt; Gigabit Ethernet&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OS:&lt;/strong&gt; Debian GNU/Linux Bookworm (kernel 5.15.0)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;USB:&lt;/strong&gt; USB 3.0 and USB 2.0 ports&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Video:&lt;/strong&gt; HDMI output&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPIO:&lt;/strong&gt; 40-pin header (Raspberry Pi compatible layout)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The 8GB of RAM is generous for a board in this class - more than many ARM SBCs ship with by default - and proved more than adequate for our Rust compilation benchmarks. The microSD card storage is adequate but not exceptional; the board does support eMMC modules for better storage performance, though our test unit used a standard SD card.&lt;/p&gt;
&lt;p&gt;One notable issue we encountered during setup: the factory image shipped with a root partition of only 3.8GB on a 29GB SD card. This is common with SBC images but particularly problematic here because installing the Rust toolchain and build dependencies requires significant disk space. We expanded the partition using parted and resize2fs before proceeding - a standard operation, but one that new users might find daunting.&lt;/p&gt;
&lt;h3&gt;Software Environment&lt;/h3&gt;
&lt;p&gt;The Milk-V Mars runs Debian Bookworm with a Linux 5.15.0 kernel, a relatively old kernel version compared to the 6.x kernels shipping on more recent ARM and RISC-V boards. The kernel includes StarFive-specific patches for JH7110 hardware support, but the older base means newer kernel features and optimizations aren't available.&lt;/p&gt;
&lt;p&gt;The software experience is functional but bare. The system ships as a minimal Debian installation - no build tools, no git, no development packages preinstalled. Setting up the board for development required installing build-essential, git, curl, and bc via apt before we could proceed with Rust installation. The Debian repositories worked without issues, and rustup installed cleanly for the riscv64gc-unknown-linux-gnu target, delivering Rust 1.93.1 - the latest stable release at the time of testing.&lt;/p&gt;
&lt;p&gt;This is actually a positive sign for the RISC-V ecosystem. Rust's official support for RISC-V has matured to the point where rustup "just works" on riscv64gc targets. Compare this to just a few years ago when RISC-V Rust development required building the compiler from source. The toolchain infrastructure has come a long way, even if the hardware performance hasn't yet caught up to ARM. If you're new to Rust and want to understand the language driving these benchmarks, &lt;a href="https://baud.rs/kAPJDa"&gt;&lt;em&gt;The Rust Programming Language&lt;/em&gt;&lt;/a&gt; by Klabnik and Nichols is the definitive starting point.&lt;/p&gt;
&lt;h3&gt;Performance Testing: Rust Compilation Benchmarks&lt;/h3&gt;
&lt;p&gt;To assess the Milk-V Mars's CPU performance, we ran our standard benchmark: compiling the &lt;a href="https://baud.rs/QckusG"&gt;ballistics-engine&lt;/a&gt; Rust project in release mode, three times from a clean build state. This workload exercises all CPU cores through parallel compilation units, stresses the memory subsystem with large intermediate data structures, and measures real-world compiler and linker performance - the kind of task a developer would encounter daily.&lt;/p&gt;
&lt;h4&gt;Milk-V Mars Compilation Times&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Run 1:&lt;/strong&gt; 939.13 seconds (15 minutes 38 seconds)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Run 2:&lt;/strong&gt; 933.45 seconds (15 minutes 33 seconds)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Run 3:&lt;/strong&gt; 935.96 seconds (15 minutes 35 seconds)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Average:&lt;/strong&gt; 936.18 seconds (15 minutes 36 seconds)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Standard Deviation:&lt;/strong&gt; 2.85 seconds (0.30%)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first thing that stands out is the remarkable consistency. A standard deviation of just 2.85 seconds across three 15-minute runs indicates stable thermal behavior and no throttling - the board maintained consistent performance throughout the entire test sequence. This suggests good power delivery and adequate thermal management, even without active cooling.&lt;/p&gt;
&lt;p&gt;The second thing that stands out is the absolute time: over 15 minutes per compilation. This is the slowest result in our entire benchmark fleet.&lt;/p&gt;
&lt;h4&gt;Comparative Analysis&lt;/h4&gt;
&lt;p&gt;Here's how the Milk-V Mars stacks up against every board we've benchmarked:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;CPU&lt;/th&gt;
&lt;th&gt;Cores&lt;/th&gt;
&lt;th&gt;Average Time&lt;/th&gt;
&lt;th&gt;vs. Mars&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Orange Pi 5 Max&lt;/td&gt;
&lt;td&gt;ARM64&lt;/td&gt;
&lt;td&gt;Cortex-A55/A76&lt;/td&gt;
&lt;td&gt;8 (4+4)&lt;/td&gt;
&lt;td&gt;62.31s&lt;/td&gt;
&lt;td&gt;15.0x faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Raspberry Pi CM5&lt;/td&gt;
&lt;td&gt;ARM64&lt;/td&gt;
&lt;td&gt;Cortex-A76&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;71.04s&lt;/td&gt;
&lt;td&gt;13.2x faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;LattePanda IOTA&lt;/td&gt;
&lt;td&gt;x86_64&lt;/td&gt;
&lt;td&gt;Intel N150&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;72.21s&lt;/td&gt;
&lt;td&gt;13.0x faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Raspberry Pi 5&lt;/td&gt;
&lt;td&gt;ARM64&lt;/td&gt;
&lt;td&gt;Cortex-A76&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;76.65s&lt;/td&gt;
&lt;td&gt;12.2x faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Banana Pi CM5-Pro&lt;/td&gt;
&lt;td&gt;ARM64&lt;/td&gt;
&lt;td&gt;Cortex-A53/A72&lt;/td&gt;
&lt;td&gt;8 (4+4)&lt;/td&gt;
&lt;td&gt;167.15s&lt;/td&gt;
&lt;td&gt;5.6x faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Horizon X3 CM&lt;/td&gt;
&lt;td&gt;ARM64&lt;/td&gt;
&lt;td&gt;Cortex-A53&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;378.81s&lt;/td&gt;
&lt;td&gt;2.5x faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Orange Pi RV2&lt;/td&gt;
&lt;td&gt;RISC-V&lt;/td&gt;
&lt;td&gt;Ky(R) X1&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;650.60s&lt;/td&gt;
&lt;td&gt;1.4x faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Milk-V Mars&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;RISC-V&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;SiFive U74-MC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;936.18s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;baseline&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The Milk-V Mars is 15x slower than the fastest board in our fleet (Orange Pi 5 Max) and 12.2x slower than the ubiquitous Raspberry Pi 5. Even the much-maligned Horizon X3 CM with its ancient Cortex-A53 cores is 2.5x faster.&lt;/p&gt;
&lt;h4&gt;Understanding the Performance Gap&lt;/h4&gt;
&lt;p&gt;The 15x gap between the Milk-V Mars and the Orange Pi 5 Max isn't surprising when you understand the architectural differences at play:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In-order vs. out-of-order execution.&lt;/strong&gt; The U74's in-order pipeline means the CPU stalls whenever it encounters a cache miss, branch misprediction, or data dependency. Modern out-of-order cores like the Cortex-A76 can continue executing independent instructions while waiting for stalled operations to complete. For a workload like Rust compilation, which involves complex data structures, frequent branching, and irregular memory access patterns, out-of-order execution provides enormous benefits. Hennessy and Patterson's &lt;a href="https://baud.rs/Y0TnVh"&gt;Computer Architecture: A Quantitative Approach&lt;/a&gt; covers the engineering trade-offs between in-order and out-of-order pipelines in detail.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Clock speed.&lt;/strong&gt; The U74 runs at 1.5 GHz compared to 2.4 GHz for the Cortex-A76 cores in the Pi 5 and Orange Pi 5 Max. This 1.6x frequency disadvantage compounds with the architectural differences to create a much larger effective performance gap.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Process technology.&lt;/strong&gt; The JH7110's 28nm fabrication limits the transistor budget available for performance-enhancing features like larger caches, more complex branch predictors, and wider execution units. Modern ARM SoCs on 8nm or smaller processes can pack significantly more logic into the same power envelope.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Compiler maturity.&lt;/strong&gt; While LLVM's RISC-V backend has improved substantially, it still lacks many of the target-specific optimizations that the ARM backend has accumulated over years of development. Code generation for RISC-V may not always exploit the microarchitecture as effectively as ARM code generation exploits Cortex-A76 features.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Core count.&lt;/strong&gt; The Mars has four cores versus eight on the Orange Pi 5 Max. For parallel compilation workloads, more cores translate directly to faster builds, all else being equal.&lt;/p&gt;
&lt;h3&gt;RISC-V vs. RISC-V: Mars vs. Orange Pi RV2&lt;/h3&gt;
&lt;p&gt;Perhaps the most interesting comparison is between the two RISC-V boards in our fleet. The Orange Pi RV2 with its 8-core Ky(R) X1 processor compiled our benchmark in 650.60 seconds - 1.44x faster than the Milk-V Mars's 936.18 seconds.&lt;/p&gt;
&lt;p&gt;This 44% performance advantage for the Orange Pi RV2 can be attributed to two factors:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Double the cores.&lt;/strong&gt; The Ky X1 has eight cores versus the U74's four. For parallel compilation, this provides a near-linear speedup for the parallel portions of the build, though Amdahl's Law limits the overall benefit due to serial bottlenecks in linking and code generation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Newer core design.&lt;/strong&gt; The Ky X1 represents a more recent RISC-V core design than the U74, likely incorporating microarchitectural improvements that boost instructions-per-clock (IPC). While details about the Ky X1's pipeline are scarce, its performance-per-core appears to be modestly better than the U74's.&lt;/p&gt;
&lt;p&gt;However, both RISC-V boards exist in the same performance tier - dramatically slower than ARM and x86 alternatives. The 286-second gap between the Mars and the RV2 is notable but pales in comparison to the 874-second gap between the Mars and the Orange Pi 5 Max. Both RISC-V platforms are firmly in "pioneer hardware" territory, demonstrating architectural viability rather than competitive performance.&lt;/p&gt;
&lt;h3&gt;The SiFive Legacy: Historical Context&lt;/h3&gt;
&lt;p&gt;It's worth acknowledging what the SiFive U74 represents historically. SiFive, founded in 2015 by the creators of the RISC-V instruction set at UC Berkeley, was the first company to offer commercial RISC-V cores. The U74 was their first serious application-class core, designed at a time when RISC-V was still primarily an academic project.&lt;/p&gt;
&lt;p&gt;The HiFive Unmatched, which used the same JH7110-predecessor (the FU740), was one of the first RISC-V boards capable of running a full Linux desktop. It proved that RISC-V could work for general-purpose computing. The Milk-V Mars, using the JH7110 with U74-MC cores, is essentially a cost-reduced descendant of that pioneering effort.&lt;/p&gt;
&lt;p&gt;In this light, the U74's performance is less a failure and more a measure of how far the RISC-V ecosystem has come - and how far it still needs to go. The U74 was designed for correctness and compatibility, not performance leadership. Newer SiFive cores like the P670 and P870 incorporate out-of-order execution, branch prediction, and other modern features that should dramatically close the gap with ARM. We haven't yet seen these cores in affordable SBC form factors, but when they arrive, the comparison should be far more competitive.&lt;/p&gt;
&lt;h3&gt;Practical Considerations&lt;/h3&gt;
&lt;h4&gt;Who Should Buy the Milk-V Mars?&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;RISC-V learners and enthusiasts.&lt;/strong&gt; If you want to understand RISC-V at a hardware level - boot Linux, explore the ISA, compile software, and experiment with the architecture - the Mars is a low-cost entry point. The 8GB of RAM is generous, and the Debian software environment is functional enough for exploration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Embedded RISC-V developers.&lt;/strong&gt; If you're developing software that will eventually run on RISC-V embedded systems, the Mars provides a Linux-capable development and testing platform. Cross-compilation workflows (develop on x86, test on RISC-V) are viable, and native compilation works - it just takes patience.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;SBC collectors and architecture enthusiasts.&lt;/strong&gt; The Mars represents an important moment in RISC-V history. As SiFive's application-class cores in an affordable SBC, it has both practical and historical value for anyone tracking the evolution of processor architectures.&lt;/p&gt;
&lt;h4&gt;Who Should Look Elsewhere?&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Anyone needing competitive performance.&lt;/strong&gt; If compilation speed, application responsiveness, or computational throughput matter for your use case, the &lt;a href="https://baud.rs/idapji"&gt;Raspberry Pi 5&lt;/a&gt; delivers 12x better performance at a similar price point. There is no workload where the Mars outperforms modern ARM alternatives.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Production deployments.&lt;/strong&gt; The combination of slow CPU performance, older kernel, and limited software ecosystem makes the Mars unsuitable for production use cases. Even RISC-V-specific production deployments would benefit from waiting for next-generation hardware.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI and machine learning.&lt;/strong&gt; The Mars lacks any AI acceleration hardware, and its CPU performance makes even lightweight inference workloads impractical. Our experience running TinyLlama on the Orange Pi RV2 at 0.44 tokens per second suggests the Mars would be even slower given its lower core count and comparable per-core performance.&lt;/p&gt;
&lt;h3&gt;The Bigger Picture: RISC-V's Two-Board Story&lt;/h3&gt;
&lt;p&gt;Having now benchmarked two RISC-V single board computers, a pattern emerges. Both the Milk-V Mars (SiFive U74, 4 cores, 936s) and the Orange Pi RV2 (Ky X1, 8 cores, 651s) occupy a performance tier roughly 10-15x slower than mainstream ARM platforms. This isn't a coincidence - it reflects the current state of RISC-V application processor development.&lt;/p&gt;
&lt;p&gt;The good news is that this gap isn't architectural. Nothing about the RISC-V instruction set prevents high-performance implementations. SiFive's P-series cores, Ventana's Veyron, and Alibaba's Xuantie C920 all promise ARM-competitive performance. The gap exists because the affordable SBC market is currently served by first-generation cores on older process nodes with immature compiler support.&lt;/p&gt;
&lt;p&gt;The bad news is that closing this gap requires simultaneous progress on multiple fronts: newer core designs with out-of-order execution, migration to modern process nodes (7nm and below), compiler optimizations specific to new microarchitectures, and operating system tuning for RISC-V hardware features. This is years of work, and it's unclear whether the SBC market generates enough volume to justify the investment.&lt;/p&gt;
&lt;p&gt;For now, RISC-V single board computers remain in the "interesting to explore, impractical to deploy" category. The Milk-V Mars embodies this perfectly: a board that works, runs real software, and delivers real results - just not quickly enough to compete with the ARM and x86 boards sitting next to it on the shelf.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;The Milk-V Mars is a functional, stable, and affordable RISC-V single board computer that delivers on its basic promise: a Linux-capable platform built on SiFive's pioneering U74 cores. Its 8GB of RAM, Gigabit Ethernet, and standard Debian environment make it a reasonable development platform for RISC-V exploration.&lt;/p&gt;
&lt;p&gt;Its performance, however, tells the honest story of where RISC-V stands in early 2026. At 936 seconds average for our Rust compilation benchmark, the Mars is the slowest board we've tested - 15x slower than the Orange Pi 5 Max, 12.2x slower than the Raspberry Pi 5, and 1.4x slower than the Orange Pi RV2. These numbers reflect the U74's in-order pipeline, 1.5 GHz clock speed, and 28nm process node, compounded by RISC-V's still-maturing compiler toolchain.&lt;/p&gt;
&lt;p&gt;The remarkable consistency across benchmark runs (0.3% standard deviation) shows that the hardware is well-behaved and thermally stable - it's simply not fast. The board does exactly what it's supposed to do; it just does it slowly by modern standards.&lt;/p&gt;
&lt;p&gt;For RISC-V enthusiasts, the Milk-V Mars offers an affordable window into the architecture that may eventually reshape computing. For everyone else, the Raspberry Pi 5 remains the obvious choice for general-purpose single board computing. The Mars is a board for the curious and the patient - those willing to trade performance today for a front-row seat to an architectural revolution that's still finding its footing.&lt;/p&gt;
&lt;p&gt;The question isn't whether RISC-V will eventually match ARM and x86 performance in affordable SBCs. It will. The question is when, and whether boards like the Milk-V Mars will be remembered as charming relics of the early days or as stepping stones that helped build the ecosystem that made RISC-V competitive. Either way, they deserve a place in the story.&lt;/p&gt;
&lt;h3&gt;Specifications Summary&lt;/h3&gt;
&lt;p&gt;Processor:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;StarFive JH7110 (28nm process)&lt;/li&gt;
&lt;li&gt;4x SiFive U74-MC @ 1.5 GHz (RV64GC: rv64imafdc)&lt;/li&gt;
&lt;li&gt;1x SiFive S7 monitor core&lt;/li&gt;
&lt;li&gt;MMU: Sv39&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Memory &amp;amp; Storage:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;8GB LPDDR4 RAM (2GB and 4GB options also available)&lt;/li&gt;
&lt;li&gt;microSD card slot&lt;/li&gt;
&lt;li&gt;eMMC module support&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Video:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;HDMI output&lt;/li&gt;
&lt;li&gt;Hardware video decoding support&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Connectivity:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Gigabit Ethernet&lt;/li&gt;
&lt;li&gt;USB 3.0 and USB 2.0 ports&lt;/li&gt;
&lt;li&gt;40-pin GPIO header (Raspberry Pi compatible)&lt;/li&gt;
&lt;li&gt;I2C, SPI, UART&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Physical:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Compact SBC form factor&lt;/li&gt;
&lt;li&gt;Passive cooling adequate for sustained workloads&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Benchmark Performance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rust compilation: 936.18 seconds average&lt;/li&gt;
&lt;li&gt;15.0x slower than Orange Pi 5 Max (ARM64, RK3588)&lt;/li&gt;
&lt;li&gt;12.2x slower than Raspberry Pi 5 (ARM64, BCM2712)&lt;/li&gt;
&lt;li&gt;5.6x slower than Banana Pi CM5-Pro (ARM64, RK3576)&lt;/li&gt;
&lt;li&gt;2.5x slower than Horizon X3 CM (ARM64, Sunrise X3)&lt;/li&gt;
&lt;li&gt;1.4x slower than Orange Pi RV2 (RISC-V, Ky X1)&lt;/li&gt;
&lt;li&gt;Standard deviation: 2.85 seconds (0.30% - excellent consistency)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Software:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Debian GNU/Linux Bookworm&lt;/li&gt;
&lt;li&gt;Linux kernel 5.15.0 (StarFive patches)&lt;/li&gt;
&lt;li&gt;Rust 1.93.1 (official rustup support for riscv64gc)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Recommendation: Suitable for RISC-V exploration and development; not competitive with ARM or x86 alternatives for performance-sensitive workloads.&lt;/p&gt;</description><category>benchmarks</category><category>hardware review</category><category>jh7110</category><category>milk-v</category><category>milk-v mars</category><category>open source hardware</category><category>risc v</category><category>risc-v sbc</category><category>rust</category><category>sifive</category><category>single board computers</category><category>starfive</category><category>u74</category><guid>https://tinycomputers.io/posts/milk-v-mars-review.html</guid><pubDate>Fri, 13 Feb 2026 21:30:00 GMT</pubDate></item><item><title>Introducing Lattice: A Crystallization-Based Programming Language</title><link>https://tinycomputers.io/posts/introducing-lattice-a-crystallization-based-programming-language.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;p&gt;Most programming languages treat mutability as a binary property. A variable is either mutable or it's not. You declare it one way, and that's the end of the story. Rust adds nuance with its ownership and borrowing model, and functional languages sidestep the question by making everything immutable by default, but the fundamental framing remains the same: mutability is a static attribute decided at declaration time.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://baud.rs/4ysPkF"&gt;Lattice&lt;/a&gt; takes a different approach. In Lattice, mutability is a &lt;em&gt;phase&lt;/em&gt; — a state that a value passes through over its lifetime, like matter transitioning between liquid and solid. A value starts as mutable &lt;strong&gt;flux&lt;/strong&gt;, and when you're done shaping it, you &lt;strong&gt;freeze&lt;/strong&gt; it into immutable &lt;strong&gt;fix&lt;/strong&gt;. Need to modify it again? &lt;strong&gt;Thaw&lt;/strong&gt; it back to flux. Want to build something complex and immutable in one shot? Use a &lt;strong&gt;forge&lt;/strong&gt; block — a controlled mutation zone whose output automatically crystallizes.&lt;/p&gt;
&lt;p&gt;This isn't just a metaphor. The phase system is woven through Lattice's entire runtime, from its type representation to its memory management architecture. This post is a deep dive into what that means, how it works at the implementation level, and why it represents a genuinely different way of thinking about the relationship between mutability and memory.&lt;/p&gt;
&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/introducing-lattice-a-crystallization-based-programming-language_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;36 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;The Problem Lattice Solves&lt;/h3&gt;
&lt;p&gt;Every language designer eventually confronts the same tension: programmers need mutability to build things, but mutability is the source of most bugs. Shared mutable state causes race conditions. Unexpected mutation causes aliasing bugs. Mutable references that outlive their owners cause use-after-free errors.&lt;/p&gt;
&lt;p&gt;Different languages resolve this tension in different ways, and each approach carries trade-offs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Garbage-collected languages&lt;/strong&gt; (Java, Python, Go, JavaScript) let you mutate freely and use a garbage collector to clean up. This is convenient but pushes the cost to runtime — GC pauses, unpredictable memory usage, and no compile-time guarantees about who can modify what. You gain ease of use but lose control.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://baud.rs/gSnSwR"&gt;Rust's ownership model&lt;/a&gt;&lt;/strong&gt; provides compile-time guarantees through a sophisticated borrow checker. You can have either one mutable reference or many immutable references, but not both. This eliminates data races at compile time, but the cost is complexity — the borrow checker is notoriously difficult for newcomers, lifetime annotations add syntactic weight, and certain patterns (like self-referential structs or graph structures) require unsafe escape hatches.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Functional languages&lt;/strong&gt; (Haskell, Erlang, Clojure) default to immutability and model mutation through controlled mechanisms like monads, processes, or atoms. This produces correct programs but can feel unnatural for inherently stateful problems, and persistent data structures carry performance overhead.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;C and C++&lt;/strong&gt; give you full manual control and zero overhead, at the cost of memory safety. &lt;code&gt;const&lt;/code&gt; in C is advisory at best — you can cast it away, and the compiler won't stop you from freeing memory that someone else is still using.&lt;/p&gt;
&lt;p&gt;Lattice's phase system is an attempt to find a different point in this design space. The core insight is that in most programs, values have a natural lifecycle: they're constructed (requiring mutation), then used (requiring stability), and occasionally reconstructed (requiring mutation again). The phase system makes this lifecycle explicit and enforceable.&lt;/p&gt;
&lt;h3&gt;The Phase Model&lt;/h3&gt;
&lt;p&gt;Lattice has three binding keywords that correspond to mutability phases:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;flux&lt;/code&gt;&lt;/strong&gt; declares a mutable binding. A flux variable can be reassigned, and its contents can be modified in place. This is where you do your work — building arrays, populating maps, incrementing counters.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux counter = 0
counter += 1
counter += 1
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;fix&lt;/code&gt;&lt;/strong&gt; declares an immutable binding. A fix variable cannot be reassigned, and its contents cannot be modified. Attempting to mutate a fix binding is an error.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;fix&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;freeze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;3.14159&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// pi = 2.0  -- error: cannot assign to crystal binding&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;let&lt;/code&gt;&lt;/strong&gt; is the inferred form (available in casual mode). It doesn't enforce a phase — the value keeps whatever phase tag it already has.&lt;/p&gt;
&lt;p&gt;The transitions between phases are explicit function calls:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;freeze(value)&lt;/code&gt;&lt;/strong&gt; transitions a value from fluid to crystal. In strict mode, this is a &lt;em&gt;consuming&lt;/em&gt; operation — the original binding is removed from the environment. You can't accidentally keep a mutable reference to something you've declared immutable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;thaw(value)&lt;/code&gt;&lt;/strong&gt; creates a mutable deep clone of a crystal value. The original remains frozen; you get a completely independent mutable copy.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;clone(value)&lt;/code&gt;&lt;/strong&gt; creates a deep copy without changing phase.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And then there's the &lt;strong&gt;&lt;code&gt;forge&lt;/code&gt;&lt;/strong&gt; block, which is perhaps the most interesting construct:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;fix config = forge {
    flux temp = Map::new()
    temp.set("host", "localhost")
    temp.set("port", "8080")
    temp.set("debug", "true")
    freeze(temp)
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A forge block is a scoped computation whose result is automatically frozen. Inside the forge, you can use flux variables and mutate freely. But whatever value the block produces comes out crystallized. The temporary mutable state is gone — only the finished, immutable result survives.&lt;/p&gt;
&lt;p&gt;This addresses a real pain point. In functional languages, building a complex immutable data structure often requires awkward chains of constructor calls or builder patterns. In Lattice, you just... build it, mutably, in a forge block, and it comes out frozen. The forge acknowledges that construction is inherently a mutable process, while insisting that the &lt;em&gt;result&lt;/em&gt; of construction should be stable.&lt;/p&gt;
&lt;h3&gt;Under the Hood: How the Phase System Maps to Memory&lt;/h3&gt;
&lt;p&gt;Lattice is implemented as a tree-walking interpreter in C — roughly 6,000 lines across the lexer, parser, phase checker, and evaluator. The implementation reveals some interesting design decisions about how phase semantics interact with memory management.&lt;/p&gt;
&lt;h4&gt;Value Representation&lt;/h4&gt;
&lt;p&gt;Every runtime value in Lattice is a &lt;code&gt;LatValue&lt;/code&gt; struct — a tagged union carrying a type tag, a phase tag, and the value payload:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;ValueType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// VAL_INT, VAL_STR, VAL_ARRAY, VAL_MAP, ...&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;PhaseTag&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;phase&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// VTAG_FLUID, VTAG_CRYSTAL, VTAG_UNPHASED&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;union&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Primitive values (integers, floats, booleans) live inline in the union — no heap allocation. Compound values (strings, arrays, structs, maps, closures) own heap-allocated payloads. A string holds a heap-allocated character buffer. An array holds a &lt;code&gt;malloc&lt;/code&gt;'d element buffer. A map holds a pointer to an open-addressing hash table.&lt;/p&gt;
&lt;h4&gt;Deep-Clone-on-Read: Value Semantics Without a Compiler&lt;/h4&gt;
&lt;p&gt;The most consequential design decision in Lattice's runtime is that &lt;strong&gt;every variable read produces a deep clone&lt;/strong&gt;. When you access a variable, the environment doesn't hand you a reference to the stored value — it hands you a complete, independent copy.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;env_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;lat_map_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;scopes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="mi"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value_deep_clone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// always a fresh copy&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is expensive. Every array access clones the entire array. Every map read clones every key-value pair. But it eliminates an entire class of bugs. There is no aliasing in Lattice. Two variables never point to the same underlying memory. When you pass a map to a function, the function gets its own copy — mutations inside the function don't leak back to the caller. When you assign an array to a new variable, you get two independent arrays.&lt;/p&gt;
&lt;p&gt;This is the implementation strategy that makes Lattice's maps value types. In most languages, objects and collections are reference types — assigning them to a new variable creates a new reference to the same data. In Lattice, assignment means duplication. This is closer to how values work in mathematics than how they work in most programming languages.&lt;/p&gt;
&lt;p&gt;For in-place mutation within a scope (like &lt;code&gt;array.push()&lt;/code&gt; or &lt;code&gt;map.set()&lt;/code&gt;), Lattice uses a separate &lt;code&gt;resolve_lvalue()&lt;/code&gt; mechanism that obtains a direct mutable pointer into the environment's storage, bypassing the deep clone. This means local mutations are efficient — it's only cross-scope communication that pays the cloning cost.&lt;/p&gt;
&lt;h4&gt;The Dual Heap Architecture&lt;/h4&gt;
&lt;p&gt;Lattice's memory subsystem uses what the implementation calls a &lt;code&gt;DualHeap&lt;/code&gt; — two separate allocation regions with different management strategies:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The FluidHeap&lt;/strong&gt; manages mutable data using a mark-and-sweep garbage collector. It maintains a linked list of all heap allocations, with a mark bit on each. When memory pressure crosses a threshold (1 MB by default), the GC walks all reachable values from the environment and a shadow root stack, marks what's alive, and sweeps everything else.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The RegionManager&lt;/strong&gt; manages immutable data using arena-based regions. Each freeze creates a new region backed by a page-based arena — a linked list of 4 KB pages with bump allocation. When a value is frozen, it is deep-cloned entirely into the region's arena, giving crystal data cache locality and enabling O(1) bulk deallocation when the region becomes unreachable. Regions are collected during GC cycles based on reachability analysis.&lt;/p&gt;
&lt;p&gt;The key insight here is that &lt;strong&gt;immutable and mutable data have different lifecycle characteristics&lt;/strong&gt; and benefit from different management strategies. Mutable data changes frequently and has unpredictable lifetimes — mark-and-sweep handles this well. Immutable data, once created, never changes and tends to be long-lived — arena-based region allocation is more efficient for this pattern, as it enables bulk deallocation and better cache locality.&lt;/p&gt;
&lt;p&gt;This is conceptually similar to generational garbage collection (where young objects are collected differently from old objects), but the split is based on &lt;em&gt;mutability&lt;/em&gt; rather than &lt;em&gt;age&lt;/em&gt;. Lattice's phase tags provide the runtime with information that generational GCs have to infer statistically.&lt;/p&gt;
&lt;p&gt;The following chart shows how this plays out in practice across several benchmark programs. Fluid peak memory represents the high-water mark of the GC-managed heap, while crystal arena data shows how much data has been frozen into arena-backed regions:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Fluid Peak vs Crystal Arena Data" src="https://tinycomputers.io/images/lattice_fluid_vs_crystal.png"&gt;&lt;/p&gt;
&lt;h4&gt;Freeze and Thaw at the Memory Level&lt;/h4&gt;
&lt;p&gt;When you call &lt;code&gt;freeze()&lt;/code&gt; on a value, the runtime creates a new crystal region with a fresh arena, deep-clones the entire value tree into it, sets the &lt;code&gt;phase&lt;/code&gt; field to &lt;code&gt;VTAG_CRYSTAL&lt;/code&gt; on every node, and frees the original fluid heap pointers. The data physically migrates from the fluid heap into arena pages — freeze is a move operation, not just a metadata flip. This gives frozen data cache locality within contiguous arena pages and completely separates it from the garbage-collected fluid heap.&lt;/p&gt;
&lt;p&gt;But in strict mode, &lt;code&gt;freeze()&lt;/code&gt; is also a &lt;em&gt;consuming&lt;/em&gt; operation. It removes the original binding from the environment and returns the frozen value. This is effectively a move — after &lt;code&gt;freeze(x)&lt;/code&gt;, there is no &lt;code&gt;x&lt;/code&gt; anymore. You can bind the result to a new name (&lt;code&gt;fix y = freeze(x)&lt;/code&gt;), but the mutable original is gone. This prevents a common bug pattern where you freeze a value but accidentally keep mutating the original through a still-live reference.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;thaw()&lt;/code&gt; is more expensive: it performs a complete deep clone of the crystal value and then recursively sets all phase tags to &lt;code&gt;VTAG_FLUID&lt;/code&gt;. The original crystal value is untouched — you get a completely independent mutable copy. This is consistent with the principle that crystal values are permanent. Thawing doesn't melt the original; it creates a new fluid copy.&lt;/p&gt;
&lt;p&gt;In practice, both operations are fast. Across the benchmark suite, freeze and thaw costs stay well under a millisecond even for complex data structures:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Freeze/Thaw Cost by Benchmark" src="https://tinycomputers.io/images/lattice_freeze_thaw_timing.png"&gt;&lt;/p&gt;
&lt;p&gt;The number and type of phase transitions varies by workload. Some benchmarks are freeze-heavy (building immutable snapshots), others are thaw-heavy (repeatedly modifying frozen state), and some use deep clones for full value duplication:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Phase Transitions by Benchmark" src="https://tinycomputers.io/images/lattice_phase_transitions.png"&gt;&lt;/p&gt;
&lt;h3&gt;How This Compares to Existing Systems&lt;/h3&gt;
&lt;h4&gt;vs. Rust's Ownership and Borrowing&lt;/h4&gt;
&lt;p&gt;Rust solves the mutability problem at compile time through static analysis. The borrow checker ensures that mutable references are unique and that immutable references don't coexist with mutable ones. This gives Rust zero-runtime-cost safety guarantees that Lattice can't match.&lt;/p&gt;
&lt;p&gt;But Rust's approach operates at the reference level — it tracks who has access to data, not the data's intrinsic state. You can have an &lt;code&gt;&amp;amp;mut&lt;/code&gt; to data that is conceptually "done being built," or an &lt;code&gt;&amp;amp;&lt;/code&gt; to data that you wish you could modify. The permission model and the data lifecycle are orthogonal.&lt;/p&gt;
&lt;p&gt;Lattice's phase system operates on the data itself. A frozen value &lt;em&gt;is&lt;/em&gt; immutable — not because the type system prevents you from obtaining a mutable reference, but because the value has transitioned to a state where mutation doesn't apply. This is a simpler mental model at the cost of runtime enforcement rather than compile-time proof.&lt;/p&gt;
&lt;p&gt;The consuming &lt;code&gt;freeze()&lt;/code&gt; in strict mode is reminiscent of Rust's move semantics, where using a value after moving it is a compile error. Lattice achieves a similar effect at runtime — freeze consumes the binding, preventing further mutable access. It's not as strong a guarantee (runtime vs. compile time), but it's the same intuition: once you've declared something immutable, the mutable version shouldn't exist anymore.&lt;/p&gt;
&lt;h4&gt;vs. Garbage Collection&lt;/h4&gt;
&lt;p&gt;Traditional garbage collectors (Java, Go, Python) are phase-agnostic. They track reachability, not mutability. A &lt;code&gt;final&lt;/code&gt; field in Java prevents reassignment but doesn't inform the GC. An immutable object in Python is collected the same way as a mutable one.&lt;/p&gt;
&lt;p&gt;Lattice's dual-heap architecture uses phase information to make better allocation decisions. Crystal values go into arena-managed memory with reachability-based collection. Fluid values go into a mark-and-sweep heap. The GC can reason about immutable data more efficiently because it &lt;em&gt;knows&lt;/em&gt; the data won't change — it doesn't need to re-scan crystal regions for updated references.&lt;/p&gt;
&lt;p&gt;This is a form of phase-informed memory management that, to my knowledge, doesn't have a direct precedent in mainstream languages. The closest analogy might be Clojure's persistent data structures, which are structurally shared and immutable, but Clojure doesn't use this information to drive its garbage collection strategy differently.&lt;/p&gt;
&lt;h4&gt;vs. Functional Immutability&lt;/h4&gt;
&lt;p&gt;Haskell and other pure functional languages are immutable by default, with mutation confined to monads (&lt;code&gt;IORef&lt;/code&gt;, &lt;code&gt;STRef&lt;/code&gt;) or similar controlled mechanisms. This is elegant but can be awkward for imperative algorithms where you need to build something up step by step.&lt;/p&gt;
&lt;p&gt;Lattice's forge blocks address this directly. Instead of threading a builder through a chain of pure function calls, you write imperative mutation inside a forge and get an immutable result. This acknowledges that construction and consumption are different activities that benefit from different mutability guarantees.&lt;/p&gt;
&lt;p&gt;The philosophical difference is that functional languages treat immutability as the default and mutation as the exception. Lattice treats mutability as a &lt;em&gt;phase&lt;/em&gt; that values pass through — both flux and fix are natural, expected states, and the language provides explicit tools for transitioning between them.&lt;/p&gt;
&lt;h4&gt;vs. C/C++ Manual Memory Management&lt;/h4&gt;
&lt;p&gt;C gives you &lt;code&gt;malloc&lt;/code&gt; and &lt;code&gt;free&lt;/code&gt; and wishes you the best. C++ adds RAII, smart pointers, and &lt;code&gt;const&lt;/code&gt; correctness, but &lt;code&gt;const&lt;/code&gt; in both languages is fundamentally a compiler hint — it can be cast away, and the runtime has no awareness of it. A &lt;code&gt;const&lt;/code&gt; pointer in C doesn't prevent someone else from modifying the data through a non-const pointer to the same memory. The &lt;code&gt;const&lt;/code&gt; is a property of the &lt;em&gt;reference&lt;/em&gt;, not the &lt;em&gt;data&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Lattice's phase tags live on the data itself. When a value is crystal, it's crystal regardless of how you access it. There's no way to "cast away" a freeze — the only path back to mutability is &lt;code&gt;thaw()&lt;/code&gt;, which creates a new independent copy. This is a stronger guarantee than &lt;code&gt;const&lt;/code&gt; provides, because it operates on values rather than references.&lt;/p&gt;
&lt;p&gt;C++ move semantics share DNA with Lattice's consuming &lt;code&gt;freeze()&lt;/code&gt; in strict mode. A &lt;code&gt;std::move&lt;/code&gt; in C++ transfers ownership of resources, leaving the source in a valid-but-unspecified state. Lattice's strict freeze does something similar — it removes the binding entirely, ensuring the mutable version ceases to exist. But where C++ moves are primarily about avoiding copies for performance, Lattice's consuming freeze is about semantic correctness — ensuring that the transition from mutable to immutable is clean and total. Scott Meyers' &lt;a href="https://baud.rs/OK4IwA"&gt;Effective Modern C++&lt;/a&gt; remains the best guide to understanding these move semantics and other modern C++ patterns that Lattice's design draws from.&lt;/p&gt;
&lt;h4&gt;The Static Phase Checker&lt;/h4&gt;
&lt;p&gt;It's worth noting that Lattice doesn't rely solely on runtime enforcement. Before any code executes, a static phase checker walks the AST and catches phase violations at analysis time. This checker maintains its own scope stack mapping variable names to their declared phases and rejects programs that attempt to reassign crystal bindings, freeze already-frozen values, thaw already-fluid values, or use &lt;code&gt;let&lt;/code&gt; in strict mode where an explicit phase declaration is required.&lt;/p&gt;
&lt;p&gt;The static checker also enforces spawn boundaries — if Lattice's concurrency model (&lt;code&gt;spawn&lt;/code&gt;) is used, fluid bindings from the enclosing scope cannot be captured across the spawn point. Only crystal values can be shared into spawned computations. This is checked &lt;em&gt;before&lt;/em&gt; evaluation begins, catching potential data races at parse time rather than at runtime.&lt;/p&gt;
&lt;p&gt;This two-layer approach — static checking before evaluation, runtime enforcement during — provides confidence without requiring a full type system or borrow checker. It catches the obvious mistakes early and enforces the subtle invariants at runtime. For the theoretical foundations behind this kind of phase-based type analysis, Benjamin Pierce's &lt;a href="https://baud.rs/oMfDwe"&gt;Types and Programming Languages&lt;/a&gt; is the standard reference.&lt;/p&gt;
&lt;h3&gt;The Language Beyond Phases&lt;/h3&gt;
&lt;p&gt;While the phase system is Lattice's defining feature, the language has other characteristics worth noting.&lt;/p&gt;
&lt;p&gt;Structs in Lattice can hold closures as fields, enabling object-like patterns without a class system. A struct with function fields and a &lt;code&gt;self&lt;/code&gt; parameter in each closure behaves much like an object with methods — but the data flow is explicit, and there's no hidden &lt;code&gt;this&lt;/code&gt; pointer or vtable dispatch. When a closure captures &lt;code&gt;self&lt;/code&gt;, it receives a deep clone, ensuring that method calls don't produce spooky action at a distance.&lt;/p&gt;
&lt;p&gt;Control flow is expression-based — &lt;code&gt;if&lt;/code&gt;/&lt;code&gt;else&lt;/code&gt; blocks, &lt;code&gt;match&lt;/code&gt; expressions, and bare blocks all return values. This reduces the need for temporary variables and makes code more compositional. Error handling uses &lt;code&gt;try&lt;/code&gt;/&lt;code&gt;catch&lt;/code&gt; blocks with explicit error values rather than exception hierarchies.&lt;/p&gt;
&lt;p&gt;The self-hosted REPL is particularly notable. Written entirely in Lattice, it demonstrates that the language is expressive enough to implement its own interactive environment — parsing multi-line input, evaluating expressions, and managing session state. Running &lt;code&gt;./clat&lt;/code&gt; without arguments drops into this REPL, while &lt;code&gt;./clat file.lat&lt;/code&gt; executes a program directly.&lt;/p&gt;
&lt;p&gt;Lattice is implemented in C with no external dependencies. The entire codebase — roughly 6,000 lines across the lexer, parser, phase checker, evaluator, and data structures — compiles with a single &lt;code&gt;make&lt;/code&gt; invocation. This is a deliberate choice. The language is meant to be small, understandable, and self-contained. You can read the entire implementation in an afternoon. If you're interested in this kind of work, Robert Nystrom's &lt;a href="https://baud.rs/uTpA6y"&gt;&lt;em&gt;Crafting Interpreters&lt;/em&gt;&lt;/a&gt; is the best practical guide to building language implementations from scratch — it covers both tree-walking interpreters and bytecode VMs, and Lattice's architecture shares several design decisions with Nystrom's Lox language. For the C implementation side, Kernighan and Ritchie's &lt;a href="https://baud.rs/71h6l3"&gt;&lt;em&gt;The C Programming Language&lt;/em&gt;&lt;/a&gt; remains the definitive reference for writing the kind of clean, minimal C that Lattice targets.&lt;/p&gt;
&lt;h3&gt;Runtime Characteristics&lt;/h3&gt;
&lt;p&gt;To understand how the dual-heap architecture behaves in practice, Lattice includes a benchmark suite that exercises different memory patterns — allocation churn, closure-heavy computation, event sourcing, freeze/thaw cycles, game state rollback, long-lived crystal data, persistent tree construction, and undo/redo stacks.&lt;/p&gt;
&lt;p&gt;The overview below shows peak RSS (resident set size) alongside the number of live crystal regions at program exit. Benchmarks that use the phase system heavily (like freeze/thaw cycles and persistent trees) maintain more live regions, while purely fluid workloads like allocation churn and closure-heavy computation have none:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Peak RSS and Crystal Regions Overview" src="https://tinycomputers.io/images/lattice_overview.png"&gt;&lt;/p&gt;
&lt;p&gt;The memory churn ratio — total bytes allocated divided by peak live bytes — reveals how aggressively each benchmark recycles memory. A high ratio means the program allocates and discards data rapidly, relying on the GC to keep the working set small. Benchmarks using crystal regions (shown in purple) tend to have lower churn because frozen data is long-lived by design:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Memory Churn Ratio" src="https://tinycomputers.io/images/lattice_churn_ratio.png"&gt;&lt;/p&gt;
&lt;h3&gt;Research Papers&lt;/h3&gt;
&lt;p&gt;For readers interested in the formal foundations and empirical analysis, two companion papers are available:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://tinycomputers.io/papers/lattice_paper.pdf"&gt;The Lattice Phase System: First-Class Immutability with Dual-Heap Memory Management&lt;/a&gt;&lt;/strong&gt; — The full research paper covering the language design, formal operational semantics, six proved safety properties (phase monotonicity, value isolation, consuming freeze, forge soundness, heap separation, and thaw independence), implementation details of the dual-heap architecture, and empirical evaluation across eight benchmarks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://tinycomputers.io/papers/lattice_formal_semantics.pdf"&gt;Formal Semantics of the Lattice Phase System&lt;/a&gt;&lt;/strong&gt; — A standalone formal treatment containing the complete semantic domains, static phase-checking rules, big-step operational semantics, memory model, and full proofs of all six safety theorems.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Looking Forward&lt;/h3&gt;
&lt;p&gt;Lattice is at version 0.1.3, which means it's early. The dual-heap architecture is fully wired into the evaluator — freeze operations physically migrate data into arena-backed crystal regions, providing cache locality and O(1) bulk deallocation for immutable data. The mark-and-sweep GC handles fluid values, while crystal regions are collected through reachability analysis during GC cycles.&lt;/p&gt;
&lt;p&gt;The deep-clone-on-read strategy is correct but expensive. Future versions may introduce structural sharing for crystal values (since they can't be modified, sharing is safe) or copy-on-write semantics for fluid values that haven't actually been mutated. The phase tags provide the runtime with exactly the information needed to make these optimizations — which values can be shared safely, and which might change.&lt;/p&gt;
&lt;p&gt;There's also the question of concurrency. The phase system provides a natural foundation for safe concurrent programming: crystal values can be freely shared across threads (they're immutable), while fluid values are confined to their owning scope. The &lt;code&gt;spawn&lt;/code&gt; keyword exists in the parser and phase checker, with static analysis already preventing fluid bindings from crossing spawn boundaries — though concurrent execution isn't yet implemented.&lt;/p&gt;
&lt;p&gt;The source code is available on &lt;a href="https://baud.rs/fIe3gx"&gt;GitHub&lt;/a&gt; under the BSD 3-Clause license, and the project site is at &lt;a href="https://baud.rs/4ysPkF"&gt;lattice-lang.org&lt;/a&gt;. If you're interested in language design, memory management, or just want to play with a language that treats mutability as a physical process rather than a type annotation, it's worth a look.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;git clone https://github.com/ajokela/lattice.git
cd lattice &amp;amp;&amp;amp; make
./clat
&lt;/pre&gt;&lt;/div&gt;</description><category>c</category><category>immutability</category><category>interpreter</category><category>language design</category><category>lattice</category><category>memory management</category><category>mutability</category><category>phase system</category><category>programming languages</category><category>value semantics</category><guid>https://tinycomputers.io/posts/introducing-lattice-a-crystallization-based-programming-language.html</guid><pubDate>Tue, 10 Feb 2026 18:00:00 GMT</pubDate></item><item><title>SectorZ: A C Compiler in 733 Bytes of Z80 Assembly</title><link>https://tinycomputers.io/posts/sectorz-a-c-compiler-in-733-bytes-of-z80-assembly.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;p&gt;&lt;img src="https://tinycomputers.io/images/workbench.png" alt="A vintage green-phosphor CRT monitor displaying C source code and Z80 assembly on a 1970s electronics workbench" style="float: right; max-width: 350px; margin: 0 0 1em 1.5em; border-radius: 8px;"&gt;&lt;/p&gt;
&lt;p&gt;A friend recently brought to my attention a project called &lt;a href="https://baud.rs/sectorc"&gt;SectorC&lt;/a&gt; and it demonstrated something remarkable: a C compiler that fits in a 512-byte x86-16 boot sector. It compiles a substantial subset of C (variables, functions, if/while, 14 binary operators, pointer dereference, inline assembly) in less space than most error messages.&lt;/p&gt;
&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/sectorz-a-c-compiler-in-733-bytes-of-z80-assembly_tts.mp3" type="audio/mpeg"&gt;
Your browser does not support the audio element.
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;19 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;I wanted to see if the same idea could work on the Z80.&lt;/p&gt;
&lt;p&gt;The Z80 is a fundamentally different machine from x86-16. It has no memory-to-memory move instructions, no string operations like &lt;code&gt;stosw&lt;/code&gt;, no segment registers that double as a free 64K hash table. It's an 8-bit processor pretending to be 16-bit through register pairs. Every operation that x86 does in one instruction tends to take two or three on Z80. So the question wasn't whether a Z80 version would be bigger (it obviously would) but whether it could stay small enough to be interesting.&lt;/p&gt;
&lt;p&gt;The answer is 733 bytes.&lt;/p&gt;
&lt;h3&gt;What It Compiles&lt;/h3&gt;
&lt;p&gt;SectorZ implements the same "Barely C" language as SectorC. All tokens must be separated by spaces, which eliminates the need for a real tokenizer. You write C, but with mandatory whitespace:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;putch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;asm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;58&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;98&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;209&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;asm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;211&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;129&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;72&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;putch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The supported feature set:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Global variable declarations (&lt;code&gt;int name ;&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Function definitions (&lt;code&gt;void name ( ) { ... }&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Assignment (&lt;code&gt;x = expr ;&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Function calls (&lt;code&gt;func ( ) ;&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;If statements (&lt;code&gt;if ( expr ) { ... }&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;While loops (&lt;code&gt;while ( expr ) { ... }&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;14 binary operators: &lt;code&gt;+ - * &amp;amp; | ^ &amp;lt;&amp;lt; &amp;gt;&amp;gt; == != &amp;lt; &amp;gt; &amp;lt;= &amp;gt;=&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Pointer dereference for read (&lt;code&gt;* expr&lt;/code&gt;) and write (&lt;code&gt;* expr = expr ;&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Address-of operator (&lt;code&gt;&amp;amp; var&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Inline machine code (&lt;code&gt;asm ( byte byte ... ) ;&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Parenthesized subexpressions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;No function arguments, no local variables, no return values, no preprocessor, no error checking. The programmer is trusted completely, in the grand tradition of &lt;a href="https://baud.rs/71h6l3"&gt;1970s C&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;The Architecture Tax&lt;/h3&gt;
&lt;p&gt;SectorC fits in 512 bytes partly because x86-16 is dense. The &lt;code&gt;stosw&lt;/code&gt; instruction stores AX to &lt;code&gt;[ES:DI]&lt;/code&gt; and increments DI, all in a single byte. On the Z80, the equivalent operation (store a 16-bit value and advance the pointer) takes three bytes at minimum. SectorC uses segment registers to create a free 64K lookup table for variable and function hashing. The Z80 has no segments.&lt;/p&gt;
&lt;p&gt;This is the fundamental challenge: the Z80 instruction set is more orthogonal and regular than x86, but it pays for that regularity with verbosity. A simple "emit a 3-byte instruction" helper that writes an opcode and a 16-bit address costs 7 bytes. SectorC does the same thing with &lt;code&gt;stosw&lt;/code&gt; and a single &lt;code&gt;mov&lt;/code&gt;, effectively 4 bytes.&lt;/p&gt;
&lt;p&gt;The result is that SectorZ is larger than its x86 counterpart. But 733 bytes for a self-contained C compiler on an 8-bit processor from 1976 still feels pretty good.&lt;/p&gt;
&lt;h3&gt;Memory Layout&lt;/h3&gt;
&lt;p&gt;The compiler loads at address &lt;code&gt;$0000&lt;/code&gt; and uses the upper portion of the Z80's 64K address space for its data structures:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Address&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;$0000&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;733 bytes&lt;/td&gt;
&lt;td&gt;Compiler code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;$D000&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;256 bytes&lt;/td&gt;
&lt;td&gt;Function trampoline table (64 entries x 4 bytes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;$D100&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;256 bytes&lt;/td&gt;
&lt;td&gt;Variable storage (128 entries x 2 bytes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;$D200&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3 bytes&lt;/td&gt;
&lt;td&gt;Tokenizer state (semicolon buffer, number flag, EOF flag)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;$D300+&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~11.5K&lt;/td&gt;
&lt;td&gt;Generated code output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The compiler reads source code character by character from the MC6850 ACIA serial port, compiles it to Z80 machine code starting at &lt;code&gt;$D300&lt;/code&gt;, and then calls &lt;code&gt;main()&lt;/code&gt; through the trampoline table. The entire process (read, compile, execute) happens without ever touching a disk.&lt;/p&gt;
&lt;h3&gt;Key Design Decisions&lt;/h3&gt;
&lt;h4&gt;HL as the Code Pointer&lt;/h4&gt;
&lt;p&gt;The most important register allocation decision in the whole compiler is using HL as the output pointer. Z80's &lt;code&gt;LD (HL), n&lt;/code&gt; instruction stores an immediate byte to the address in HL in just 2 bytes. The alternative, using DE with &lt;code&gt;LD A, n / LD (DE), A&lt;/code&gt;, costs 3 bytes per emit site. Since the compiler emits bytes constantly, this saves roughly 25 bytes across all the emit sequences. It does mean HL is permanently occupied, so the tokenizer has to push/pop HL around every call, but the trade-off is clearly worth it.&lt;/p&gt;
&lt;h4&gt;The &lt;code&gt;atoi&lt;/code&gt; Hash Trick&lt;/h4&gt;
&lt;p&gt;This is borrowed directly from SectorC, and it's the single cleverest idea in the whole design. The tokenizer hashes every identifier using the same algorithm as &lt;code&gt;atoi&lt;/code&gt;: &lt;code&gt;hash = hash * 10 + char&lt;/code&gt;. For numeric tokens, it subtracts &lt;code&gt;'0'&lt;/code&gt; from each character first, so the hash is the actual integer value. For identifiers, the raw ASCII values are accumulated.&lt;/p&gt;
&lt;p&gt;The key insight is that the hash doubles as a lookup key. Variable names hash to 16-bit values; the low byte (masked to even alignment) indexes into the variable table at &lt;code&gt;$D100&lt;/code&gt;. Function names hash similarly, with the low byte (masked to 4-byte alignment) indexing into the trampoline table at &lt;code&gt;$D000&lt;/code&gt;. Keywords like &lt;code&gt;if&lt;/code&gt;, &lt;code&gt;while&lt;/code&gt;, and &lt;code&gt;void&lt;/code&gt; hash to fixed values that the compiler checks directly.&lt;/p&gt;
&lt;p&gt;No symbol table. No string comparison. Just arithmetic.&lt;/p&gt;
&lt;h4&gt;The &lt;code&gt;cp_de_imm&lt;/code&gt; Trick&lt;/h4&gt;
&lt;p&gt;Comparing a 16-bit register pair against a constant is expensive on Z80. The naive approach (&lt;code&gt;LD A, E / CP low / JR NZ, skip / LD A, D / CP high&lt;/code&gt;) costs 7 bytes, and the compiler does this check constantly (for every keyword and punctuation token). SectorZ uses an inline-constant trick instead:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nl"&gt;cp_de_imm:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ex&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;sp&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; Swap HL with return address on stack&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;; Load low byte of constant&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;cp&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;e&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="c1"&gt;; Compare with E&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;jr&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;nz&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;cp_de_ne&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;; Load high byte of constant&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;cp&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;d&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="c1"&gt;; Compare with D&lt;/span&gt;
&lt;span class="nl"&gt;cp_de_ne:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="c1"&gt;; Skip past constant regardless&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ex&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;sp&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; Restore HL, fix return address&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ret&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The 16-bit constant is embedded directly after the &lt;code&gt;CALL&lt;/code&gt;, as a &lt;code&gt;DW&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;cp_de_imm&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;dw&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;tok_if&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;; The constant to compare against&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;jr&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;do_if&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;; Branch if DE == tok_if&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The function reads the constant from the return address, advances the return address past it, and restores everything. Each comparison site costs just 5 bytes (3 for the call, 2 for the constant) instead of 7. With 15+ comparison sites in the compiler, this saves around 30 bytes.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;EX (SP), HL&lt;/code&gt; instruction is the hero here. It atomically swaps HL with the top of the stack, which is exactly what we need: get the return address into HL for reading, then put the updated address back. This instruction doesn't exist on x86 (SectorC uses &lt;code&gt;lodsw&lt;/code&gt; with a different approach), and it's one of the few places where Z80 is genuinely more elegant.&lt;/p&gt;
&lt;h4&gt;Runtime Helpers for Binary Operations&lt;/h4&gt;
&lt;p&gt;SectorC generates inline code for binary operators. The x86 &lt;code&gt;ADD AX, BX&lt;/code&gt; is just 2 bytes, so inlining is cheap. On Z80, a 16-bit add is &lt;code&gt;ADD HL, DE&lt;/code&gt; (1 byte), but subtraction requires &lt;code&gt;OR A / SBC HL, DE&lt;/code&gt; (3 bytes), and multiplication doesn't exist as a single instruction at all.&lt;/p&gt;
&lt;p&gt;SectorZ moves all binary operations into runtime helper functions. The generated code for any binary expression follows the same pattern: push left operand, evaluate right operand, pop left into DE, swap, call helper. This costs 6 bytes per operator use in the generated code (1 push + 1 pop + 1 ex + 3 call), but the compiler only needs to emit a uniform sequence, which keeps the compiler itself small.&lt;/p&gt;
&lt;p&gt;The 14 runtime helpers add 109 bytes to the compiler. The shift and comparison helpers alone would be prohibitively large to inline (the multiply routine is 25 bytes). By centralizing them, the compiler trades generated code density for compiler code density, which is the right call when you're trying to minimize the compiler.&lt;/p&gt;
&lt;h4&gt;Function Trampolines&lt;/h4&gt;
&lt;p&gt;Functions are called through a trampoline table at &lt;code&gt;$D000&lt;/code&gt;. When the compiler encounters a function definition, it writes a 3-byte &lt;code&gt;JP actual_address&lt;/code&gt; instruction into the trampoline slot. When generated code calls a function, it calls the trampoline, which jumps to the real code.&lt;/p&gt;
&lt;p&gt;This eliminates forward-reference problems entirely. Functions can be called before they're defined (as long as the caller executes after the definition has been compiled). The trampoline table has 64 entries, which means the low 8 bits of a function name's hash, masked to 4-byte alignment, must be unique across all functions in a program. For typical small programs, this works fine.&lt;/p&gt;
&lt;h3&gt;The Semicolon Hack&lt;/h3&gt;
&lt;p&gt;One of the trickier parsing problems in Barely C is the semicolon. Consider &lt;code&gt;x = 3 + 4 ;&lt;/code&gt;. The expression parser reads tokens until it hits something that isn't an operator. When it reads &lt;code&gt;;&lt;/code&gt;, it doesn't match any operator, so it returns. But it has already consumed the semicolon. The statement parser needs that semicolon to know the statement is complete.&lt;/p&gt;
&lt;p&gt;SectorC's solution, which SectorZ copies, is a one-character pushback buffer. The tokenizer treats semicolons specially: if it encounters a &lt;code&gt;;&lt;/code&gt; while accumulating a token, it saves a flag and returns the current token. The next call to &lt;code&gt;tok_next&lt;/code&gt; checks the flag first and returns a synthetic semicolon token without reading any input.&lt;/p&gt;
&lt;p&gt;This is 15 bytes of code that prevents the need for a much more complex token lookahead mechanism.&lt;/p&gt;
&lt;h3&gt;A Real Program: Prime Sieve&lt;/h3&gt;
&lt;p&gt;Hello World with &lt;code&gt;asm()&lt;/code&gt; blocks is a legitimate test, but it doesn't really exercise the compiler. Here's a &lt;a href="https://baud.rs/eratosthenes"&gt;Sieve of Eratosthenes&lt;/a&gt; that finds all primes below 100:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;putch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;asm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;58&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;98&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;209&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;asm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;211&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;129&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;printnum&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;putch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;putch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;putch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;57344&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;printnum&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;putch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;putch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This program demonstrates several things that aren't obvious from the language description.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pointer arithmetic as arrays.&lt;/strong&gt; Barely C has no array type, but &lt;code&gt;* ( s + i + i )&lt;/code&gt; reads a 16-bit value from address &lt;code&gt;s + 2*i&lt;/code&gt;, effectively treating a block of memory as an integer array. The sieve stores its flags at address 57344 (&lt;code&gt;$E000&lt;/code&gt;), well above both the compiler and the generated code.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Decimal output without division.&lt;/strong&gt; The language has no division or modulo operators, so &lt;code&gt;printnum&lt;/code&gt; extracts digits via repeated subtraction. The &lt;code&gt;f&lt;/code&gt; flag tracks whether a hundreds digit was printed, ensuring proper formatting of numbers like &lt;code&gt;2&lt;/code&gt; (just "2") vs &lt;code&gt;103&lt;/code&gt; ("103", not "13").&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The &lt;code&gt;putch&lt;/code&gt; function.&lt;/strong&gt; This is the I/O bridge between Barely C and the hardware. The &lt;code&gt;asm&lt;/code&gt; statement emits raw Z80 opcodes: &lt;code&gt;58 98 209&lt;/code&gt; is &lt;code&gt;LD A, ($D162)&lt;/code&gt; (load the low byte of variable &lt;code&gt;c&lt;/code&gt;), and &lt;code&gt;211 129&lt;/code&gt; is &lt;code&gt;OUT ($81), A&lt;/code&gt; (send it to the ACIA serial port). The programmer has to compute the variable's memory address from its hash (&lt;code&gt;c&lt;/code&gt; hashes to 99, masked to even alignment gives 98, at offset &lt;code&gt;$D162&lt;/code&gt;), which is admittedly inconvenient but functional.&lt;/p&gt;
&lt;p&gt;Running it through the emulator:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;$&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;cat&lt;span class="w"&gt; &lt;/span&gt;primes.bc&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;printf&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'\x1a'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;retroshield&lt;span class="w"&gt; &lt;/span&gt;sectorz.bin
&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;7&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;11&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;17&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;19&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;23&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;29&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;31&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;37&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;41&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;43&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;47&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;53&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;61&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;67&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;71&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;73&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;79&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;83&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;89&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;97&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;All 25 primes below 100, computed and printed by 733 bytes of compiler generating Z80 machine code on the fly.&lt;/p&gt;
&lt;h3&gt;Size Breakdown&lt;/h3&gt;
&lt;p&gt;Where do the 733 bytes go?&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Bytes&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Entry point&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;1.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Top-level parser (&lt;code&gt;compile&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;8.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Statement dispatch (&lt;code&gt;compile_stmts&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;7.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Assignment and calls&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;4.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;asm&lt;/code&gt; statement&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;3.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control flow (&lt;code&gt;if&lt;/code&gt;, &lt;code&gt;while&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;td&gt;8.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deref assign&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;3.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Expression parser&lt;/td&gt;
&lt;td&gt;59&lt;/td&gt;
&lt;td&gt;8.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unary expressions&lt;/td&gt;
&lt;td&gt;73&lt;/td&gt;
&lt;td&gt;10.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Helpers (&lt;code&gt;emit_var&lt;/code&gt;, &lt;code&gt;emit3&lt;/code&gt;, &lt;code&gt;func_addr&lt;/code&gt;, &lt;code&gt;emit_test&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;4.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cp_de_imm&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;1.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tokenizer&lt;/td&gt;
&lt;td&gt;98&lt;/td&gt;
&lt;td&gt;13.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;getch&lt;/code&gt; (serial I/O)&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;3.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operator table&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;td&gt;7.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime helpers&lt;/td&gt;
&lt;td&gt;109&lt;/td&gt;
&lt;td&gt;14.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The runtime helpers are the largest single component at 15% of the binary. The multiply routine alone is 25 bytes. If the Z80 had a hardware multiply instruction, the compiler would be noticeably smaller. The tokenizer at 13% is the next largest piece, driven primarily by the multiply-by-10 hash accumulation loop, which requires several register shuffles because the Z80 has no 16-bit multiply.&lt;/p&gt;
&lt;p&gt;The operator table is pure data: 14 entries of 4 bytes each (token hash + helper address) plus a 2-byte sentinel. It's an unavoidable cost of supporting 14 operators, but the table-driven approach keeps the expression parser compact at 59 bytes.&lt;/p&gt;
&lt;h3&gt;SectorC vs. SectorZ&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;SectorC (x86-16)&lt;/th&gt;
&lt;th&gt;SectorZ (Z80)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;512 bytes&lt;/td&gt;
&lt;td&gt;733 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Target&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;x86-16 real mode&lt;/td&gt;
&lt;td&gt;Z80 bare metal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;I/O&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;VGA memory, INT 16h&lt;/td&gt;
&lt;td&gt;MC6850 ACIA serial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Variables&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;64K segment&lt;/td&gt;
&lt;td&gt;256-byte table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Functions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Direct call&lt;/td&gt;
&lt;td&gt;JP trampoline table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Binary ops&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Inline generated code&lt;/td&gt;
&lt;td&gt;CALL to runtime helpers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code emit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;stosw&lt;/code&gt; (1 byte)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;LD (HL),n / INC HL&lt;/code&gt; (3 bytes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token compare&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;lodsw / cmp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;EX (SP),HL&lt;/code&gt; inline trick&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The 221-byte difference comes down to instruction set density. The x86 has a rich CISC heritage (string instructions, memory-to-register operations, implicit operand encoding) that makes tiny programs disproportionately easy. The Z80 is capable but verbose. Every extra byte in the instruction encoding cascades across every emit site, every comparison, every helper function.&lt;/p&gt;
&lt;p&gt;That said, the Z80 has a few tricks of its own. &lt;code&gt;EX (SP), HL&lt;/code&gt; is a single-byte instruction that enables the inline constant comparison technique. The &lt;code&gt;ADD HL, DE&lt;/code&gt; instruction does 16-bit addition in one byte. And &lt;code&gt;EX DE, HL&lt;/code&gt; swaps two register pairs in one byte, which is essential for getting operands into the right positions cheaply.&lt;/p&gt;
&lt;h3&gt;Running It&lt;/h3&gt;
&lt;p&gt;SectorZ runs on the &lt;a href="https://baud.rs/z80-emu"&gt;retro-z80-emulator&lt;/a&gt;, a Rust-based Z80 emulator that connects stdin/stdout to an emulated MC6850 ACIA serial port. It also runs on real hardware via the &lt;a href="https://baud.rs/QtfomG"&gt;RetroShield Z80&lt;/a&gt;. The compiler loads at address &lt;code&gt;$0000&lt;/code&gt;, reads source from serial, compiles and executes.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;$&lt;span class="w"&gt; &lt;/span&gt;z80asm&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;sectorz.bin&lt;span class="w"&gt; &lt;/span&gt;sectorz.asm
$&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;cat&lt;span class="w"&gt; &lt;/span&gt;examples/primes.bc&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;printf&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'\x1a'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;retroshield&lt;span class="w"&gt; &lt;/span&gt;sectorz.bin
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;\x1a&lt;/code&gt; (Ctrl-Z) at the end signals EOF to the compiler. The emulator's serial implementation silently drops null bytes, so the traditional CP/M EOF marker of &lt;code&gt;$00&lt;/code&gt; doesn't work. A minor debugging adventure that reinforced the value of reading the emulator source code before assuming how it handles edge cases.&lt;/p&gt;
&lt;h3&gt;What's Missing&lt;/h3&gt;
&lt;p&gt;Quite a lot, obviously. No function arguments means all communication happens through global variables. No local scope means recursive functions can't maintain independent state. No &lt;code&gt;else&lt;/code&gt; clause. No &lt;code&gt;for&lt;/code&gt; loop. No &lt;code&gt;return&lt;/code&gt; statement (functions run to the closing brace and always return). No character or string literals. No preprocessor.&lt;/p&gt;
&lt;p&gt;But these are the same limitations as SectorC. The point was never to build a production compiler. It's a demonstration that a meaningful C compiler can exist in a space that most programmers would consider insufficient for anything useful. Seven hundred thirty-three bytes is less than a single TCP packet. It's smaller than most compiler error messages. And yet it reads C source code, performs lexical analysis, parses expressions with arbitrary nesting, generates native machine code with forward-patched control flow, and executes the result, all on a processor designed in 1976.&lt;/p&gt;
&lt;p&gt;The source code is available on &lt;a href="https://baud.rs/z80-tiny"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you're interested in Z80 development, &lt;a href="https://baud.rs/Ch4htI"&gt;Design a Z80 Computer&lt;/a&gt; is a great hands-on guide, and &lt;a href="https://baud.rs/n39HUo"&gt;Learn Multiplatform Assembly Programming with ChibiAkumas&lt;/a&gt; covers Z80 assembly alongside other architectures.&lt;/p&gt;</description><category>8-bit</category><category>assembly</category><category>c</category><category>compiler</category><category>retrocomputing</category><category>retroshield</category><category>sectorc</category><category>z80</category><guid>https://tinycomputers.io/posts/sectorz-a-c-compiler-in-733-bytes-of-z80-assembly.html</guid><pubDate>Sun, 08 Feb 2026 02:00:00 GMT</pubDate></item><item><title>Review of "Getting Started with FPGAs" by Russell Merrick</title><link>https://tinycomputers.io/posts/review-of-getting-started-with-fpgas-by-russell-merrick.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/review-of-getting-started-with-fpgas-by-russell-merrick_tts.mp3" type="audio/mpeg"&gt;
Your browser does not support the audio element.
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;33 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;Introduction and Overview&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/getting-started-with-fpgas/cover-001.png" alt="Getting Started with FPGAs by Russell Merrick" style="float: right; max-width: 250px; margin: 0 0 1em 1.5em;"&gt;&lt;/p&gt;
&lt;p&gt;Field programmable gate arrays occupy a fascinating position in the landscape of digital electronics: immensely powerful, endlessly flexible, and yet stubbornly inaccessible to newcomers. Unlike microcontrollers, which have benefited from decades of beginner-friendly ecosystems like Arduino and Raspberry Pi, FPGAs have long remained the province of electrical engineering graduates and industry professionals. The learning curve is steep, the toolchains are complex, and the fundamental paradigm shift from sequential software thinking to parallel hardware description is enough to discourage many aspiring digital designers before they write their first line of HDL. Russell Merrick's &lt;a href="https://baud.rs/fpga-no-starch"&gt;&lt;em&gt;Getting Started with FPGAs: Digital Circuit Design, Verilog, and VHDL for Beginners,&lt;/em&gt;&lt;/a&gt; published by No Starch Press in 2024, makes a deliberate and largely successful attempt to change that.&lt;/p&gt;
&lt;p&gt;Merrick brings a distinctive combination of credentials to this task. A University of Massachusetts electrical engineering graduate with a master's degree in the same field, he has worked in defense at BAE Systems and L-3 Communications, in aerospace at satellite propulsion startup Accion Systems, and in commercial electronics at fitness wearable company WHOOP. More importantly for a book of this nature, he has been creating FPGA educational content at &lt;a href="https://baud.rs/nandland"&gt;nandland.com&lt;/a&gt; and its accompanying YouTube channel since 2014, even designing his own FPGA development board, the &lt;a href="https://baud.rs/nandland-go"&gt;Nandland Go Board&lt;/a&gt;. This decade of answering beginner questions on Stack Overflow and producing tutorial content informs every page of the book. Merrick knows exactly where newcomers get stuck, because he has been watching them get stuck for years.&lt;/p&gt;
&lt;p&gt;The book spans 11 chapters plus two appendices across roughly 280 pages, targeting the &lt;a href="https://baud.rs/ZDV7tS"&gt;Lattice iCE40&lt;/a&gt; family of FPGAs. This choice of hardware is itself a pedagogical decision: iCE40 devices are inexpensive, the toolchain (&lt;a href="https://baud.rs/latticesemi-icecube2"&gt;iCEcube2&lt;/a&gt; and Diamond Programmer) is lightweight, and the open source community has embraced Lattice parts for low-level hacking. More expensive FPGAs from AMD (Xilinx) or Intel (Altera) come with sophisticated but overwhelming development environments that can intimidate beginners. By choosing the simpler end of the market, Merrick keeps the focus on understanding FPGA fundamentals rather than wrestling with tool complexity.&lt;/p&gt;
&lt;h3&gt;The Dual-Language Approach&lt;/h3&gt;
&lt;p&gt;Perhaps the most distinctive pedagogical choice in the book is Merrick's decision to present every code example in both Verilog and VHDL, side by side. This is no small commitment for an author. It effectively doubles the code content of the book and requires careful attention to ensure that both versions are correct, idiomatic, and illustrative of the same concepts. The payoff, however, is substantial: readers can follow along with whichever language suits their situation without needing to purchase a second book or mentally translate between the two.&lt;/p&gt;
&lt;p&gt;Merrick provides a thoughtful comparison of the two languages in Chapter 1 that avoids the partisan flame wars common in FPGA circles. He notes that VHDL, born from the U.S. Department of Defense and inheriting Ada's strong typing, requires more verbose code but catches errors at compile time. Verilog, syntactically closer to C and weakly typed, is more concise but will happily let you write incorrect code without complaint. He even includes a Google Trends analysis showing regional preferences: Verilog dominates in the United States, China, and South Korea, while VHDL is preferred in Germany and France. His practical advice is refreshingly simple: learn whichever language your school or employer uses.&lt;/p&gt;
&lt;p&gt;The dual-language presentation also serves as an implicit lesson in the differences between the two HDLs. Readers can observe firsthand how VHDL's strong typing forces explicit resize() calls and type conversions that Verilog handles automatically, or how VHDL's process blocks map to Verilog's always blocks. These side-by-side comparisons provide a deeper understanding of both languages than either one alone could offer.&lt;/p&gt;
&lt;h3&gt;Building Foundations: Logic, Memory, and Time&lt;/h3&gt;
&lt;p&gt;The book's first four chapters establish the fundamental building blocks of FPGA design with admirable clarity. Chapter 1, "Meet the FPGA," provides historical context starting from the Xilinx XC2064 in 1985 and surveys the modern FPGA landscape, including the AMD acquisition of Xilinx for $35 billion and Intel's earlier purchase of Altera for $16.7 billion. The chapter's comparison of FPGAs versus microcontrollers versus ASICs across dimensions of cost, speed, power, flexibility, and ease of use is presented in a clean table that serves as a useful reference throughout the reader's career. Merrick is honest about where FPGAs fall short: they are more expensive than microcontrollers at scale, consume more power, and are harder to use. But when you need raw bandwidth, parallel computation, or hardware flexibility, nothing else will do.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/getting-started-with-fpgas/comparison-table-035.png" alt="Table 1-1: Comparing an FPGA vs. a Microcontroller vs. an ASIC" style="float: right; max-width: 350px; margin: 0 0 1em 1.5em;"&gt;&lt;/p&gt;
&lt;p&gt;Chapter 2 walks through hardware and tool setup, getting readers to their first working FPGA project: wiring switches to LEDs. This "hello world" equivalent may seem trivial, but it introduces the full development workflow: writing HDL code, creating a project, adding pin constraints, running the build, connecting the board, and programming the FPGA. Each step is a potential stumbling block for beginners, and Merrick guides through them methodically.&lt;/p&gt;
&lt;p&gt;Chapter 3 on Boolean algebra and the look-up table is where the book begins to reveal its deeper ambitions. Rather than treating logic gates as abstract mathematical curiosities, Merrick connects them directly to the physical reality inside an FPGA. The key insight, clearly articulated, is that discrete logic gates do not actually exist inside modern FPGAs. Instead, all Boolean operations are implemented through look-up tables, programmable devices that can represent any truth table you can imagine. A single three-input LUT can replace an AND gate, an OR gate, an XOR gate, or any combination thereof. This understanding, that LUTs and flip-flops are the two fundamental building blocks from which all FPGA designs are constructed, is the conceptual foundation upon which the entire book rests.&lt;/p&gt;
&lt;p&gt;Chapter 4 introduces the flip-flop and with it the concept of state. Where LUTs handle combinational logic, flip-flops provide sequential logic, giving the FPGA memory of what happened previously. The chapter carefully distinguishes between combinational and sequential logic, explains the clock signal and its role in synchronizing operations, and warns about the dangers of latches, an accidental design pattern that causes unpredictable timing behavior. Merrick's years of answering beginner questions are evident here; the latch warning, including the specific synthesis warning message readers should watch for, addresses one of the most common FPGA beginner mistakes.&lt;/p&gt;
&lt;h3&gt;Simulation, Testing, and the Black Box Problem&lt;/h3&gt;
&lt;p&gt;Chapter 5, on simulation, contains some of the book's most valuable practical wisdom. Merrick frames the motivation perfectly: your FPGA is essentially a black box. You can change the inputs and observe the outputs, but you cannot see what is happening inside. Simulation cracks open that black box, letting you examine every internal signal, register, and wire as your design executes.&lt;/p&gt;
&lt;p&gt;He drives this point home with an anecdote from his professional experience: a coworker spent weeks debugging an FPGA design using oscilloscopes and logic analyzers, trying to find a data corruption issue on the physical hardware. Merrick checked the code out, built a simulation testbench, and found the bug within hours. The lesson is clear: simulation is not an optional nicety but an essential part of the FPGA development process that will save you enormous amounts of time.&lt;/p&gt;
&lt;p&gt;The chapter introduces &lt;a href="https://baud.rs/edaplayground"&gt;EDA Playground&lt;/a&gt;, a free web-based simulator, as the primary tool. This is a pragmatic choice that eliminates the barrier of downloading and configuring multi-gigabyte vendor tools. Readers learn to write testbenches, the HDL code that exercises a design by providing inputs and monitoring outputs. The first testbench, for the AND gate project from Chapter 3, walks through every detail: declaring signals, instantiating the unit under test, driving stimulus with delay statements, and generating waveform output for visual analysis. The progression to more sophisticated testing, including self-checking testbenches that automatically verify correctness and a discussion of formal verification, shows that Merrick understands the professional importance of testing even as he keeps the material accessible.&lt;/p&gt;
&lt;h3&gt;Common Modules and the Building-Block Philosophy&lt;/h3&gt;
&lt;p&gt;Chapter 6, "Common FPGA Modules," represents a turning point in the book's complexity. Having established the primitive components, LUTs and flip-flops, Merrick now shows how to combine them into reusable building blocks: multiplexers, demultiplexers, shift registers, RAM, and FIFOs. Each module is explained conceptually, implemented in both Verilog and VHDL, and connected to practical applications.&lt;/p&gt;
&lt;p&gt;The FIFO (First In, First Out) implementation is particularly well done. Merrick walks through the complete design including read and write address management, element counting, full and empty flags, and "almost full" and "almost empty" threshold flags. The code is production-quality, handling edge cases like simultaneous read and write operations and providing anticipatory flags that let higher-level modules stop writing before the FIFO actually overflows. This is not a toy example; it is a genuinely useful piece of infrastructure that readers can adapt for their own projects.&lt;/p&gt;
&lt;p&gt;The Linear Feedback Shift Register (LFSR) implementation showcases a more specialized application: generating pseudo-random sequences using nothing more than shift registers and XOR gates. Merrick explains why this matters in practice, as LFSRs are used in everything from encryption to test pattern generation, and provides the implementation alongside a conceptual explanation of why specific feedback tap positions produce maximum-length sequences.&lt;/p&gt;
&lt;h3&gt;The Build Process Demystified&lt;/h3&gt;
&lt;p&gt;Chapter 7 tackles synthesis, place and route, and crossing clock domains, subjects that many beginner texts either skip or relegate to appendices. Merrick treats them as essential knowledge, and rightly so. Understanding what happens when you press the "Build FPGA" button is critical for writing efficient, correct designs.&lt;/p&gt;
&lt;p&gt;The synthesis discussion is particularly strong. Merrick explains logic optimization, the tool's process of minimizing the resources your design consumes, and connects it to the utilization report that tells you how many LUTs, flip-flops, and block RAMs your design uses. He provides practical guidance on what to do when your design does not fit: switch to a larger FPGA, rewrite resource-intensive modules, or remove functionality. His anecdote about a division operation that forced a million-dollar hardware upgrade to a larger FPGA family illustrates the real-world consequences of resource-intensive code.&lt;/p&gt;
&lt;p&gt;The section on non-synthesizable code addresses a source of deep confusion for beginners transitioning from software: not all valid Verilog or VHDL code can be translated into physical hardware. Time delays, print statements, file operations, and certain loop constructs exist solely for simulation and will be silently ignored or flagged during synthesis. Merrick's treatment of synthesizable versus non-synthesizable for loops is especially valuable. In software, a for loop iterates sequentially over time. In synthesizable FPGA code, a for loop unrolls into replicated hardware that executes simultaneously in a single clock cycle. Beginners who expect a 10-iteration loop to take 10 clock cycles will be baffled when it completes in one. Merrick shows both the pitfall and the correct pattern for implementing sequential iteration using counters and if statements.&lt;/p&gt;
&lt;p&gt;The clock domain crossing section covers a topic that trips up even experienced FPGA designers. When signals pass between parts of a design running at different clock frequencies, metastability can cause unpredictable behavior. Merrick explains the problem and presents standard solutions: double-flop synchronizers for single-bit signals and FIFOs for multi-bit data transfers.&lt;/p&gt;
&lt;h3&gt;State Machines and the Memory Game&lt;/h3&gt;
&lt;p&gt;Chapter 8, "The State Machine," brings together all the preceding material in the book's most ambitious project: an interactive memory game using a seven-segment display. State machines are the standard way to implement sequential behavior in FPGAs, a series of states connected by transitions triggered by events. Merrick presents two implementation styles (two-process and one-process blocks), discusses best practices, and then launches into a full project that requires planning the state machine, organizing the design across multiple modules, interfacing with a seven-segment display, and writing comprehensive testbenches.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/getting-started-with-fpgas/state-machine-175.png" alt="Figure 8-1: A state machine for a turnstile" style="float: right; max-width: 300px; margin: 0 0 1em 1.5em;"&gt;&lt;/p&gt;
&lt;p&gt;The memory game project is a strong capstone for the book's middle section. It requires the reader to synthesize knowledge of flip-flops, multiplexers, RAM, state machines, and physical I/O constraints into a working interactive system. The project is complex enough to feel like genuine FPGA engineering but simple enough to complete without despair.&lt;/p&gt;
&lt;h3&gt;FPGA Primitives and the Hardware Reality&lt;/h3&gt;
&lt;p&gt;Chapter 9 examines the specialized hardware blocks that differentiate FPGAs from simple arrays of LUTs and flip-flops. Block RAM provides dedicated memory resources that are faster and more efficient than flip-flop-based storage. The Digital Signal Processing (DSP) block offers hardened multiply-accumulate units that are essential for math-intensive applications like filtering and signal processing. The Phase-Locked Loop (PLL) generates new clock frequencies from an input clock, enabling designs that need multiple clock domains.&lt;/p&gt;
&lt;p&gt;Merrick explains both the capabilities and the creation process for each primitive, covering both instantiation (directly connecting to the hardware block in your code) and the GUI approach (using vendor tools to configure the block graphically). This dual treatment acknowledges that different workflows suit different situations and different engineers.&lt;/p&gt;
&lt;h3&gt;Numbers, Math, and the Rules of Binary Arithmetic&lt;/h3&gt;
&lt;p&gt;Chapter 10, "Numbers and Math," is arguably the most technically dense chapter in the book, and it may also be the most practically valuable. Performing mathematical operations inside an FPGA is fraught with subtle pitfalls that can produce silently incorrect results. Merrick distills years of hard-won experience into six clear rules that, if followed, will prevent the most common binary math errors.&lt;/p&gt;
&lt;p&gt;The chapter progresses methodically through addition, subtraction, multiplication, and division, showing both correct and incorrect implementations at each step. The emphasis on showing code that produces wrong answers is a particularly effective teaching strategy. When Merrick demonstrates that adding two 4-bit unsigned numbers (9 + 11) and storing the result in a 4-bit output gives 4 instead of 20, the reader understands viscerally why Rule #1 (the result should be at least 1 bit bigger than the biggest input) matters. The discussion of sign extension, the process of increasing a binary number's bit width while preserving its sign and value, is handled with exceptional clarity.&lt;/p&gt;
&lt;p&gt;The treatment of division is refreshingly honest. Merrick states plainly that division is resource-intensive and should be avoided when possible inside an FPGA. He presents three alternatives: restricting divisors to powers of 2 (implemented as simple shift-right operations), using precalculated lookup tables stored in block RAM, and spreading the operation across multiple clock cycles using iterative subtraction. His anecdote about "the million-dollar divide," where a single division operation forced an upgrade to a more expensive FPGA family at a cost exceeding $1 million in hardware changes, makes the point memorably.&lt;/p&gt;
&lt;p&gt;The fixed-point arithmetic section that closes the chapter is an excellent primer on representing decimal values in hardware without the complexity and resource cost of floating-point. Merrick introduces the UX.Y and SX.Y notation for unsigned and signed fixed-point formats, explains the conversion between formats, and works through addition and multiplication examples that demonstrate the rules for matching decimal widths and sizing outputs.&lt;/p&gt;
&lt;h3&gt;I/O, SerDes, and the Edge of the Chip&lt;/h3&gt;
&lt;p&gt;The final technical chapter covers getting data in and out of the FPGA, where the digital logic meets the physical world. Merrick covers GPIO pin configuration including I/O buffers, output enable signals, and bidirectional communication. He explains operating voltage standards (LVCMOS33, TTL, LVCMOS25), drive strength in milliamps, and slew rate, the speed at which a signal transitions between high and low. The discussion of single-ended versus differential signaling provides useful background for understanding high-speed interfaces.&lt;/p&gt;
&lt;p&gt;The SerDes (serializer/deserializer) section introduces the concept of converting parallel data to serial for high-speed transmission and back again at the receiver. While the iCE40 FPGAs used in the book's projects do not include SerDes blocks, Merrick covers the topic at a conceptual level to prepare readers who may eventually work with more capable devices. This forward-looking coverage, explaining concepts that exceed the current hardware's capabilities, is a sensible choice that increases the book's long-term value.&lt;/p&gt;
&lt;h3&gt;The Appendices: Beyond the Technical&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/getting-started-with-fpgas/nandland-go-board-282.png" alt="Figure A-1: The Nandland Go Board" style="float: right; max-width: 300px; margin: 0 0 1em 1.5em;"&gt;&lt;/p&gt;
&lt;p&gt;Appendix A surveys three development boards compatible with the book's projects: the Nandland Go Board, the Lattice iCEstick, and the Alchitry Cu. The Nandland Go Board, designed by Merrick himself, is naturally the most fully supported option, but the inclusion of alternatives from other vendors demonstrates good faith.&lt;/p&gt;
&lt;p&gt;Appendix B, "Tips for a Career in FPGA Engineering," is an unusual and welcome addition to a technical book. Merrick covers resume construction, interview preparation, and job offer negotiation with the practical specificity of someone who has been on both sides of the hiring table. He advises listing HDL projects prominently on resumes, preparing for whiteboard coding exercises in Verilog or VHDL, and understanding that FPGA engineering positions often command premium salaries due to the specialized skill set. For readers considering FPGA development as a career rather than a hobby, this appendix alone could be worth the price of the book.&lt;/p&gt;
&lt;h3&gt;Strengths and Unique Value&lt;/h3&gt;
&lt;p&gt;The book's greatest strength is its accessibility. Merrick has an uncommon talent for explaining hardware concepts in software-friendly terms without being condescending or imprecise. His comparison of low-level FPGA programming to building with individual LEGO bricks while high-level microcontroller programming is like working with preconstructed LEGO sets captures the essential difference in a way that immediately resonates. The parallel-versus-serial thinking distinction, hammered home from the first chapter, is the single most important conceptual hurdle for software developers entering FPGA territory, and Merrick addresses it directly and repeatedly.&lt;/p&gt;
&lt;p&gt;The hands-on projects embedded throughout the book, from wiring switches to LEDs through blinking an LED, debouncing a switch, selectively blinking LEDs, and building a memory game, provide a satisfying progression of complexity. Each project builds on concepts from previous chapters and results in something that works on real hardware, providing the tangible feedback that keeps learners motivated.&lt;/p&gt;
&lt;p&gt;The dual Verilog/VHDL presentation is a genuine differentiator. Most FPGA books choose one language and leave readers of the other to fend for themselves. Merrick's commitment to both languages, while surely doubling his authorial workload, produces a reference that serves a broader audience and provides implicit comparative education that deepens understanding of both languages.&lt;/p&gt;
&lt;p&gt;The inclusion of professional engineering wisdom, from the million-dollar divide anecdote to the latch warnings to the career advice appendix, gives the book a practical grounding that purely academic treatments lack. Merrick writes as someone who has shipped FPGA designs in defense, aerospace, and consumer electronics, and that experience informs his choices about what to emphasize and what to warn against.&lt;/p&gt;
&lt;h3&gt;Limitations and Missed Opportunities&lt;/h3&gt;
&lt;p&gt;The book's commitment to the iCE40 platform, while pedagogically sound, does impose limitations. These are small, inexpensive FPGAs with limited resources, no hard processor cores, and minimal specialized IP blocks. Readers who complete the book and want to tackle more ambitious projects, say, implementing a RISC-V soft processor or building a video processing pipeline, will need to transition to AMD or Intel FPGA platforms with significantly different (and more complex) toolchains. The book provides a conceptual foundation for that transition but no practical guidance through it.&lt;/p&gt;
&lt;p&gt;The Windows-centric tooling requirement is a notable friction point. Merrick acknowledges that the iCE40 tools work best on Windows and recommends a virtual machine for Mac and Linux users. In an era when open source FPGA tools like Yosys and nextpnr have matured significantly, especially for iCE40 targets, the absence of any mention of the open source toolchain feels like a missed opportunity. For Linux users in particular, the Yosys/nextpnr/IceStorm flow provides a native, arguably simpler development experience than running Windows tools in a VM.&lt;/p&gt;
&lt;p&gt;The book could benefit from more substantial treatment of debugging workflows. While simulation is covered well, on-FPGA debugging receives only brief mention. Topics like using integrated logic analyzers, reading back internal signals through JTAG, or structured approaches to narrowing down hardware bugs would strengthen the book's practical value for readers who move beyond simulation into real hardware deployment.&lt;/p&gt;
&lt;p&gt;Some readers may find the book's pace in early chapters slow if they already have digital logic background from university courses. The extensive coverage of Boolean algebra, truth tables, and logic gates in Chapter 3, while valuable for true beginners, may feel redundant for readers with prior exposure. Conversely, the later chapters on clock domain crossing, SerDes, and fixed-point arithmetic accelerate considerably and could benefit from additional worked examples.&lt;/p&gt;
&lt;h3&gt;Who Should Read This Book&lt;/h3&gt;
&lt;p&gt;The book is best suited for three audiences. First, software developers who are curious about hardware and want to understand what happens below the abstraction layer of their programming languages. The explicit comparisons between software concepts (sequential execution, for loops, variables) and their FPGA counterparts (parallel execution, hardware replication, signals and registers) make this a natural bridge text. Second, electronics hobbyists and makers who have experience with microcontrollers like Arduino or Raspberry Pi and want to explore the next level of hardware control. Third, university students encountering FPGAs in coursework who need a more approachable companion text to supplement dense academic material.&lt;/p&gt;
&lt;p&gt;Experienced FPGA engineers will find little new technical content here, though the book's explanations may provide useful language for mentoring junior colleagues. Readers looking for advanced topics like high-level synthesis, SystemVerilog verification methodology, or FPGA-based machine learning acceleration will need to look elsewhere.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Getting Started with FPGAs&lt;/em&gt; succeeds at its stated goal: building a solid foundation for anyone interested in the world of FPGA design. Russell Merrick has distilled a decade of educational content creation into a coherent, well-structured text that respects the reader's intelligence while acknowledging the genuine difficulty of the subject matter. The dual-language approach, the hands-on projects, the honest treatment of where FPGAs excel and where they fall short, and the professional engineering perspective all contribute to a book that fills a genuine gap in the FPGA literature.&lt;/p&gt;
&lt;p&gt;The book does not attempt to be comprehensive. It will not teach you everything about Verilog or VHDL, will not prepare you to design a production FPGA system from scratch, and will not cover the full depth of any single topic it addresses. What it will do is give you a clear mental model of how FPGAs work at a fundamental level, equip you with enough Verilog and VHDL to write and simulate basic designs, and provide the conceptual vocabulary to continue learning independently. For a subject as intimidating as FPGA development, that is no small achievement.&lt;/p&gt;
&lt;p&gt;In the broader context of No Starch Press's catalog of accessible technical books, &lt;em&gt;Getting Started with FPGAs&lt;/em&gt; fits naturally alongside titles that demystify complex subjects without dumbing them down. It occupies a niche that has been surprisingly underserved: the true beginner FPGA book that takes the reader seriously. For anyone who has stared at an FPGA development board with a mixture of curiosity and trepidation, wondering how to bridge the gap between software thinking and hardware reality, this book provides a clear and well-lit path forward.&lt;/p&gt;</description><category>digital design</category><category>digital logic</category><category>embedded systems</category><category>fpga</category><category>hardware description language</category><category>lattice ice40</category><category>no starch press</category><category>russell merrick</category><category>verilog</category><category>vhdl</category><guid>https://tinycomputers.io/posts/review-of-getting-started-with-fpgas-by-russell-merrick.html</guid><pubDate>Sat, 07 Feb 2026 14:13:49 GMT</pubDate></item><item><title>Three Paths to Rust on Custom Hardware</title><link>https://tinycomputers.io/posts/three-paths-to-rust-on-custom-hardware.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/three-paths-to-rust-on-custom-hardware_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;18 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;If you want to run &lt;a href="https://baud.rs/gSnSwR"&gt;Rust&lt;/a&gt; on hardware that Rust was never designed for—a Z80 from 1976, a custom 16-bit RISC CPU, a &lt;a href="https://baud.rs/jt9HTI"&gt;Game Boy&lt;/a&gt;—you have a problem. The Rust compiler targets LLVM, and LLVM doesn't know your CPU exists.&lt;/p&gt;
&lt;p&gt;I've spent some time solving this problem in different ways. I built &lt;a href="https://tinycomputers.io/posts/rust-on-z80-an-llvm-backend-odyssey.html"&gt;LLVM backends for both the Z80&lt;/a&gt; and my own &lt;a href="https://tinycomputers.io/posts/sampo-llvm-backend-rust-compiler.html"&gt;Sampo 16-bit RISC architecture&lt;/a&gt;. That's the "correct" solution—and it works—but it's also countless amounts of time wrestling with TableGen definitions and GlobalISel pipelines, though agentic coding tools help immensely.&lt;/p&gt;
&lt;p&gt;There's a recent project that offers a different path entirely: &lt;a href="https://baud.rs/XiwoCV"&gt;Eurydice&lt;/a&gt;, a Rust-to-C transpiler developed by researchers at Inria and Microsoft. The premise is simple. If your target already has a C compiler, you can skip LLVM entirely:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Rust → Eurydice → C → existing C compiler → your target
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For the Z80, that existing C compiler is &lt;a href="https://baud.rs/XOHX1N"&gt;SDCC&lt;/a&gt;, the Small Device C Compiler. It's mature, well-tested, and has supported the Z80 for decades.&lt;/p&gt;
&lt;p&gt;This article explores three distinct paths to getting Rust on custom hardware, and includes a hands-on walkthrough of the Eurydice approach—transpiling Rust to readable C, then compiling that C for the Z80 with SDCC.&lt;/p&gt;
&lt;h3&gt;Path 1: The Full LLVM Backend&lt;/h3&gt;
&lt;p&gt;This is what I did for both the Z80 and Sampo. You fork LLVM, implement a complete backend—register descriptions, instruction selection, calling conventions, type legalization, assembly printing—and teach the Rust compiler about your new target triple.&lt;/p&gt;
&lt;p&gt;The pipeline looks like this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Rust source → rustc frontend → LLVM IR → Your Backend → Assembly → Binary
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;What you get:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Full Rust language support (within hardware constraints)&lt;/li&gt;
&lt;li&gt;Access to LLVM's optimization passes—constant folding, dead code elimination, register allocation&lt;/li&gt;
&lt;li&gt;A single backend that works for Rust, C (via Clang), and any other LLVM frontend&lt;/li&gt;
&lt;li&gt;Native code quality that improves as LLVM improves&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;What it costs:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLVM is roughly 30 million lines of C++. The learning curve is &lt;a href="https://baud.rs/Jy0EBX"&gt;vertical&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A minimal backend requires 25-30 files of TableGen, C++, and CMake configuration&lt;/li&gt;
&lt;li&gt;Type legalization—teaching LLVM that your 8-bit CPU can't natively handle 64-bit integers—is where 60% of the effort lives&lt;/li&gt;
&lt;li&gt;Keeping your fork synchronized with upstream LLVM is ongoing maintenance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For the Z80, the register poverty problem alone was really the bane of the efforts. The &lt;a href="https://baud.rs/EsBekO"&gt;Z80&lt;/a&gt; has seven 8-bit registers, some of which can pair into 16-bit values. LLVM's register allocator expects 16 or 32 general-purpose registers. Every function call, every 16-bit addition, every pointer dereference requires careful choreography of a register file designed when RAM was measured in kilobytes.  If you follow this blog and have read about my efforts to get LLVM and Rust working for the Z80, you will recall that I needed hundreds of gigabytes of RAM on the build server just to allow full expansion of all the 8-bit registers to the 64-bit and 128-bit types in Rust.&lt;/p&gt;
&lt;p&gt;For Sampo, the experience was smoother—a 16-bit RISC with 16 registers is closer to what LLVM expects. But "smoother" is relative. The &lt;a href="https://tinycomputers.io/posts/sampo-llvm-backend-rust-compiler.html"&gt;Sampo LLVM backend&lt;/a&gt; still involved implementing GlobalISel pipelines, debugging opaque errors like "SmallVector capacity overflow," and building Rust's &lt;code&gt;libcore&lt;/code&gt; for a target that had never existed.&lt;/p&gt;
&lt;p&gt;The full LLVM approach gives you the best results. It's also the hardest path by a wide margin.&lt;/p&gt;
&lt;h3&gt;Path 2: Rust → C via Eurydice → Existing C Compiler&lt;/h3&gt;
&lt;p&gt;This is the path that caught my attention. Eurydice takes a fundamentally different approach: instead of teaching LLVM about your hardware, you transpile Rust to readable C and let an existing C compiler handle the target.  This is the path other niche languages, like &lt;a href="https://baud.rs/nimlang"&gt;nim&lt;/a&gt; use to make portable code.&lt;/p&gt;
&lt;h4&gt;What Is Eurydice?&lt;/h4&gt;
&lt;p&gt;Eurydice grew out of the &lt;a href="https://baud.rs/uxrujt"&gt;Aeneas&lt;/a&gt; formal verification project. Its predecessor, KaRaMeL, compiled F* (a dependently typed functional language used for cryptographic proofs) to C. Eurydice adapts this infrastructure for Rust.&lt;/p&gt;
&lt;p&gt;The pipeline has two stages:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Charon&lt;/strong&gt; extracts rustc's Medium-level Intermediate Representation (MIR) and dumps it as a JSON &lt;code&gt;.llbc&lt;/code&gt; file&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Eurydice&lt;/strong&gt; reads the &lt;code&gt;.llbc&lt;/code&gt;, applies roughly 30 &lt;a href="https://baud.rs/uTpA6y"&gt;optimization passes&lt;/a&gt; to lower Rust semantics to C, and emits &lt;code&gt;.c&lt;/code&gt; and &lt;code&gt;.h&lt;/code&gt; files&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The generated C is genuinely readable—not the kind of machine-generated nightmare you'd expect. Rust structs become C structs. Functions keep their names (with module prefixes). Control flow is preserved. The goal is C code a human could maintain, not just C code that compiles.&lt;/p&gt;
&lt;h4&gt;Why This Matters for Retro/Custom Hardware&lt;/h4&gt;
&lt;p&gt;Here's the insight that matters for this audience: many obscure targets already have a C compiler but will never get an LLVM backend. The Z80 has SDCC. The 6502 has cc65. The 68000 has multiple mature C compilers. The Game Boy has GBDK.&lt;/p&gt;
&lt;p&gt;If Eurydice can produce C that these compilers accept, you get Rust on all of these platforms without touching LLVM at all.&lt;/p&gt;
&lt;h4&gt;The Real-World Use Case&lt;/h4&gt;
&lt;p&gt;This isn't just theoretical. Eurydice's flagship use case is post-quantum cryptography. The ML-KEM (Kyber) key encapsulation algorithm was written and verified in Rust via the &lt;a href="https://baud.rs/3RNTiV"&gt;libcrux&lt;/a&gt; library, then transpiled to C via Eurydice for integration into:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Mozilla's NSS (Network Security Services)&lt;/li&gt;
&lt;li&gt;Microsoft's SymCrypt&lt;/li&gt;
&lt;li&gt;Google's BoringSSL&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These organizations need verified cryptographic implementations but can't take a dependency on the Rust toolchain in their C/C++ codebases. Eurydice bridges that gap.&lt;/p&gt;
&lt;h4&gt;Limitations&lt;/h4&gt;
&lt;p&gt;Eurydice is honest about what it can and can't do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;No &lt;code&gt;dyn&lt;/code&gt; traits&lt;/strong&gt; — dynamic dispatch isn't yet supported (vtable generation is planned)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Const generics&lt;/strong&gt; can cause Charon's MIR extraction to fail&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Iterators&lt;/strong&gt; get compiled to while loops with runtime state management—functional but potentially less efficient than hand-written C loops&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monomorphization&lt;/strong&gt; is required for generics, producing separate C functions for each type instantiation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Strict aliasing&lt;/strong&gt; — the generated code's handling of dynamically sized types violates C's strict-aliasing rules, requiring &lt;code&gt;-fno-strict-aliasing&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Panic-free code only&lt;/strong&gt; — Eurydice doesn't replicate Rust's panic semantics for integer overflow or bounds checking&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For retro targets, some of these limitations are actually advantages. &lt;code&gt;no_std&lt;/code&gt; embedded Rust code tends to avoid &lt;code&gt;dyn&lt;/code&gt; traits and complex iterators. The code that runs well on a Z80—small functions, fixed-size arrays, simple control flow—is exactly the subset Eurydice handles best.&lt;/p&gt;
&lt;h3&gt;Path 3: Manual &lt;code&gt;no_std&lt;/code&gt; with FFI to C&lt;/h3&gt;
&lt;p&gt;The minimal approach. You write your core logic in Rust targeting a supported architecture, then manually bridge to C via FFI for anything target-specific.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#![no_std]&lt;/span&gt;
&lt;span class="cp"&gt;#![no_main]&lt;/span&gt;

&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"C"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80_out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cp"&gt;#[no_mangle]&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"C"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;compute_trajectory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// Pure Rust computation here&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;some_math&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;unsafe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80_out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x03&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You compile the Rust portion for a supported target (like &lt;code&gt;thumbv6m-none-eabi&lt;/code&gt; for ARM Cortex-M0, the smallest Rust target), extract the algorithm logic, and rewrite the hardware interface in C or assembly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What you get:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rust's type safety and ownership model for algorithm development&lt;/li&gt;
&lt;li&gt;No toolchain modifications required&lt;/li&gt;
&lt;li&gt;Works today with stable Rust&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;What it costs:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You're not actually running Rust on your target—you're using Rust as a development language and manually porting&lt;/li&gt;
&lt;li&gt;No automated pipeline; changes to the Rust code require manual re-porting&lt;/li&gt;
&lt;li&gt;You lose Rust's guarantees at the FFI boundary&lt;/li&gt;
&lt;li&gt;Testing requires maintaining parallel implementations&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is really a development methodology, not a compilation strategy. It's useful for prototyping algorithms in Rust before implementing them in C for a constrained target, but it doesn't give you "Rust on Z80" in any meaningful sense.&lt;/p&gt;
&lt;h3&gt;Walkthrough: Rust → C → Z80&lt;/h3&gt;
&lt;p&gt;Let's do something concrete. We'll take a simple Rust program, transpile it to C with Eurydice, and compile the C for the Z80 with SDCC. I tested every step of this on my machine—what follows is real output, not approximations.&lt;/p&gt;
&lt;h4&gt;Prerequisites&lt;/h4&gt;
&lt;p&gt;You'll need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Nix&lt;/strong&gt; (recommended) or OCaml + OPAM for building Eurydice&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SDCC&lt;/strong&gt; for Z80 compilation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rust&lt;/strong&gt; (Eurydice pins its own nightly via Charon)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On macOS:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Install SDCC&lt;/span&gt;
brew&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;sdcc

&lt;span class="c1"&gt;# Install Nix (if you don't have it)&lt;/span&gt;
curl&lt;span class="w"&gt; &lt;/span&gt;-L&lt;span class="w"&gt; &lt;/span&gt;https://nixos.org/nix/install&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;sh
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Nix is the path of least resistance here. Eurydice depends on specific versions of OCaml, Charon, and KaRaMeL, and the Nix flake pins all of them. You &lt;em&gt;can&lt;/em&gt; build everything manually with OPAM, but you'll be chasing version mismatches for an afternoon.&lt;/p&gt;
&lt;h4&gt;Step 1: Write a Rust Program&lt;/h4&gt;
&lt;p&gt;Create a small Rust project. The key constraint: it needs to stay within the subset Eurydice handles well. No &lt;code&gt;dyn&lt;/code&gt; traits, no complex iterators, no standard library I/O.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;cargo&lt;span class="w"&gt; &lt;/span&gt;init&lt;span class="w"&gt; &lt;/span&gt;--name&lt;span class="w"&gt; &lt;/span&gt;z80demo
&lt;span class="nb"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;z80demo
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Replace &lt;code&gt;src/main.rs&lt;/code&gt; with something appropriate for a Z80:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="sd"&gt;/// Simple GCD computation — the kind of algorithm&lt;/span&gt;
&lt;span class="sd"&gt;/// you'd actually want on constrained hardware.&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="sd"&gt;/// Compute LCM using GCD&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;lcm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="sd"&gt;/// A lookup table — common pattern in embedded code&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FIBONACCI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;89&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;fib_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FIBONACCI&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;FIBONACCI&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;252&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;105&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="fm"&gt;assert_eq!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;lcm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="fm"&gt;assert_eq!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fib_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="fm"&gt;assert_eq!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is deliberately simple: &lt;code&gt;u16&lt;/code&gt; arithmetic (native to the Z80's 16-bit register pairs), no heap allocation, no traits, no closures. It's the kind of code that will transpile cleanly.&lt;/p&gt;
&lt;h4&gt;Step 2: Extract MIR with Charon&lt;/h4&gt;
&lt;p&gt;Charon hooks into the Rust compiler to extract its Medium-level Intermediate Representation (MIR). The critical detail I missed on my first attempt: Eurydice requires Charon to be invoked with &lt;code&gt;--preset=eurydice&lt;/code&gt;. Without it, Eurydice will reject the output with a cryptic error.&lt;/p&gt;
&lt;p&gt;Using Nix, you can run Charon directly without cloning or building anything:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;nix&lt;span class="w"&gt; &lt;/span&gt;--extra-experimental-features&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nix-command flakes"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;run&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'github:aeneasverif/eurydice#charon'&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--&lt;span class="w"&gt; &lt;/span&gt;cargo&lt;span class="w"&gt; &lt;/span&gt;--preset&lt;span class="o"&gt;=&lt;/span&gt;eurydice
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The first run takes a while as Nix fetches and builds Charon's Rust toolchain. Subsequent runs complete in seconds:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;Compiling&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80demo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="mf"&gt;.1.0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;Users&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;alexjokela&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;projects&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;eurydice&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;z80demo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Finished&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n n-Quoted"&gt;`dev`&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;profile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;[&lt;/span&gt;&lt;span class="n"&gt;unoptimized&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;debuginfo&lt;/span&gt;&lt;span class="err"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This produces &lt;code&gt;z80demo.llbc&lt;/code&gt;—a 107KB JSON file containing the type declarations, function bodies, and trait implementations in Charon's intermediate format.&lt;/p&gt;
&lt;p&gt;If Charon fails, the error usually points to an unsupported Rust feature. The fix is almost always to simplify the Rust code—replace iterators with explicit loops, avoid const generics, use concrete types instead of generics where possible.&lt;/p&gt;
&lt;h4&gt;Step 3: Transpile to C with Eurydice&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;nix&lt;span class="w"&gt; &lt;/span&gt;--extra-experimental-features&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nix-command flakes"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;run&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'github:aeneasverif/eurydice'&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--&lt;span class="w"&gt; &lt;/span&gt;z80demo.llbc
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Eurydice processes the LLBC through roughly 30 optimization passes and emits two files:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="mf"&gt;1&lt;/span&gt;&lt;span class="err"&gt;️⃣&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LLBC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;➡️&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;AST&lt;/span&gt;
&lt;span class="mf"&gt;2&lt;/span&gt;&lt;span class="err"&gt;️⃣&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Cleanup&lt;/span&gt;
&lt;span class="mf"&gt;3&lt;/span&gt;&lt;span class="err"&gt;️⃣&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Monomorphization&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;data&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;
&lt;span class="err"&gt;✅&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Here's the actual generated &lt;code&gt;z80demo.c&lt;/code&gt; (comments and headers trimmed for clarity):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;"z80demo.h"&lt;/span&gt;

&lt;span class="k"&gt;const&lt;/span&gt;
&lt;span class="n"&gt;Eurydice_arr_f5&lt;/span&gt;
&lt;span class="n"&gt;z80demo_FIBONACCI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;21U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;34U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;55U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;89U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_fib_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;uu____0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="mi"&gt;12U&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;uu____0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80demo_FIBONACCI&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;uu____0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0U&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;uu____0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**&lt;/span&gt;
&lt;span class="cm"&gt; Simple GCD computation — the kind of algorithm&lt;/span&gt;
&lt;span class="cm"&gt; you'd actually want on constrained hardware.&lt;/span&gt;
&lt;span class="cm"&gt;*/&lt;/span&gt;
&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0U&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**&lt;/span&gt;
&lt;span class="cm"&gt; Compute LCM using GCD&lt;/span&gt;
&lt;span class="cm"&gt;*/&lt;/span&gt;
&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_lcm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0U&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0U&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;uu____0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;uu____0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;z80demo_gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0U&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And the generated &lt;code&gt;z80demo.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;"eurydice_glue.h"&lt;/span&gt;

&lt;span class="k"&gt;typedef&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Eurydice_arr_f5_s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;12U&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Eurydice_arr_f5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Eurydice_arr_f5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80demo_FIBONACCI&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_fib_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_lcm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A few things to notice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Rust doc comments are preserved&lt;/strong&gt; as C comments. That's a nice touch.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Arrays are wrapped in structs&lt;/strong&gt; (&lt;code&gt;Eurydice_arr_f5&lt;/code&gt;). This gives C arrays value semantics—you can return and assign them, matching Rust's behavior. The tradeoff is that array access goes through &lt;code&gt;.data[n]&lt;/code&gt; instead of &lt;code&gt;[n]&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Arithmetic is widened to &lt;code&gt;uint32_t&lt;/code&gt;&lt;/strong&gt;. Eurydice promotes &lt;code&gt;u16 % u16&lt;/code&gt; to &lt;code&gt;uint32_t&lt;/code&gt; to avoid C's integer promotion pitfalls. On a Z80, this means 32-bit math library calls—SDCC handles this, but it's heavier than necessary. A hand-tuned version would keep the modulo at 16 bits.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;assert_eq!&lt;/code&gt; becomes &lt;code&gt;EURYDICE_ASSERT&lt;/code&gt;&lt;/strong&gt; with a pair struct. The generated &lt;code&gt;main()&lt;/code&gt; (which I'm omitting here) creates &lt;code&gt;const_uint16_t__x2&lt;/code&gt; structs to hold the two comparison operands. It works, but it's verbose compared to a simple &lt;code&gt;==&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Control flow is preserved&lt;/strong&gt;. The &lt;code&gt;if/else&lt;/code&gt; in &lt;code&gt;fib_lookup&lt;/code&gt;, the &lt;code&gt;while&lt;/code&gt; loop in &lt;code&gt;gcd&lt;/code&gt;—they're structurally identical to the Rust original.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Step 4: Adapt for Bare-Metal Z80&lt;/h4&gt;
&lt;p&gt;Here's where things get practical. Eurydice's &lt;code&gt;eurydice_glue.h&lt;/code&gt; includes &lt;code&gt;&amp;lt;stdio.h&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;stdlib.h&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;string.h&amp;gt;&lt;/code&gt;, and KaRaMeL headers—none of which exist on a bare-metal Z80. We need a minimal replacement that provides only what the generated code actually uses.&lt;/p&gt;
&lt;p&gt;Create &lt;code&gt;eurydice_glue.h&lt;/code&gt; in the project directory:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cm"&gt;/*&lt;/span&gt;
&lt;span class="cm"&gt; * Minimal eurydice_glue.h for bare-metal Z80 via SDCC.&lt;/span&gt;
&lt;span class="cm"&gt; * Replaces the full Eurydice glue header with only what z80demo needs.&lt;/span&gt;
&lt;span class="cm"&gt; */&lt;/span&gt;
&lt;span class="cp"&gt;#ifndef EURYDICE_GLUE_H&lt;/span&gt;
&lt;span class="cp"&gt;#define EURYDICE_GLUE_H&lt;/span&gt;

&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;stdint.h&amp;gt;&lt;/span&gt;

&lt;span class="cm"&gt;/* SDCC Z80: size_t is 16-bit */&lt;/span&gt;
&lt;span class="cp"&gt;#ifndef _SIZE_T_DEFINED&lt;/span&gt;
&lt;span class="k"&gt;typedef&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="cp"&gt;#define _SIZE_T_DEFINED&lt;/span&gt;
&lt;span class="cp"&gt;#endif&lt;/span&gt;

&lt;span class="cm"&gt;/* On bare metal, assertions just halt the CPU */&lt;/span&gt;
&lt;span class="cp"&gt;#define EURYDICE_ASSERT(test, msg)  \&lt;/span&gt;
&lt;span class="cp"&gt;  do {                              \&lt;/span&gt;
&lt;span class="cp"&gt;    if (!(test)) {                  \&lt;/span&gt;
&lt;span class="cp"&gt;      __asm                         \&lt;/span&gt;
&lt;span class="cp"&gt;        halt                        \&lt;/span&gt;
&lt;span class="cp"&gt;      __endasm;                     \&lt;/span&gt;
&lt;span class="cp"&gt;    }                               \&lt;/span&gt;
&lt;span class="cp"&gt;  } while (0)&lt;/span&gt;

&lt;span class="cp"&gt;#endif &lt;/span&gt;&lt;span class="cm"&gt;/* EURYDICE_GLUE_H */&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is the key insight for using Eurydice on constrained targets: the glue header is a compatibility layer, not a fundamental dependency. For any specific program, you can replace it with a minimal shim that provides only what that program's generated code actually references.&lt;/p&gt;
&lt;p&gt;Now create &lt;code&gt;z80_main.c&lt;/code&gt;—our bare-metal wrapper with serial I/O:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;stdint.h&amp;gt;&lt;/span&gt;

&lt;span class="cm"&gt;/* Import Eurydice-generated functions */&lt;/span&gt;
&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_lcm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_fib_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="cm"&gt;/* Z80 serial output via port 0x01 (e.g., MC6850 ACIA) */&lt;/span&gt;
&lt;span class="n"&gt;__sfr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;__at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x01&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;serial_data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;putchar_z80&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;serial_data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;putchar_z80&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;print_u16&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;putchar_z80&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;'0'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sc"&gt;'0'&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;putchar_z80&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80demo_gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;252&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;105&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"GCD(252,105) = "&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print_u16&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\r\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80demo_lcm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"LCM(12,18) = "&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print_u16&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\r\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80demo_fib_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Fib(10) = "&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print_u16&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\r\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kr"&gt;__asm&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;halt&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;__endasm&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Step 5: Compile with SDCC&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Compile the Eurydice-generated code&lt;/span&gt;
sdcc&lt;span class="w"&gt; &lt;/span&gt;-mz80&lt;span class="w"&gt; &lt;/span&gt;-c&lt;span class="w"&gt; &lt;/span&gt;--std-c11&lt;span class="w"&gt; &lt;/span&gt;-I.&lt;span class="w"&gt; &lt;/span&gt;z80demo.c

&lt;span class="c1"&gt;# Compile our Z80 wrapper&lt;/span&gt;
sdcc&lt;span class="w"&gt; &lt;/span&gt;-mz80&lt;span class="w"&gt; &lt;/span&gt;-c&lt;span class="w"&gt; &lt;/span&gt;--std-c11&lt;span class="w"&gt; &lt;/span&gt;z80_main.c

&lt;span class="c1"&gt;# Link — code at 0x0000, data at 0x8000&lt;/span&gt;
sdcc&lt;span class="w"&gt; &lt;/span&gt;-mz80&lt;span class="w"&gt; &lt;/span&gt;--code-loc&lt;span class="w"&gt; &lt;/span&gt;0x0000&lt;span class="w"&gt; &lt;/span&gt;--data-loc&lt;span class="w"&gt; &lt;/span&gt;0x8000&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;     &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;z80demo.ihx&lt;span class="w"&gt; &lt;/span&gt;z80_main.rel&lt;span class="w"&gt; &lt;/span&gt;z80demo.rel

&lt;span class="c1"&gt;# Convert to raw binary&lt;/span&gt;
makebin&lt;span class="w"&gt; &lt;/span&gt;-s&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;32768&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;z80demo.ihx&lt;span class="w"&gt; &lt;/span&gt;z80demo.bin
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Both compilation steps complete with zero warnings. The linker produces a 32KB ROM image. According to the memory map, the &lt;code&gt;_CODE&lt;/code&gt; segment is 717 bytes—our Rust-originated logic plus I/O wrappers and SDCC's runtime support for 32-bit division.&lt;/p&gt;
&lt;h4&gt;What the Z80 Assembly Looks Like&lt;/h4&gt;
&lt;p&gt;Here's the GCD function as SDCC compiled it, straight from the Eurydice-generated C:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nl"&gt;_z80demo_gcd:&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; while (b != 0U)&lt;/span&gt;
&lt;span class="err"&gt;00101&lt;/span&gt;&lt;span class="nl"&gt;$:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;d&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;or&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;e&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;jr&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;Z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00103&lt;/span&gt;&lt;span class="no"&gt;$&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; b = (uint32_t)a % (uint32_t)t;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;de&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;__modsint&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; a = t;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;jr&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;00101&lt;/span&gt;&lt;span class="no"&gt;$&lt;/span&gt;
&lt;span class="err"&gt;00103&lt;/span&gt;&lt;span class="nl"&gt;$:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; return a;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ex&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;de&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ret&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;SDCC's register-based calling convention (&lt;code&gt;sdcccall 1&lt;/code&gt;) passes the first 16-bit argument in &lt;code&gt;HL&lt;/code&gt; and the second in &lt;code&gt;DE&lt;/code&gt;, returning results in &lt;code&gt;DE&lt;/code&gt;. The GCD loop is tight—test for zero, call the modulo library routine, swap, repeat. The &lt;code&gt;__modsint&lt;/code&gt; call is where the &lt;code&gt;uint32_t&lt;/code&gt; widening lands; SDCC promotes to 32-bit for the modulo, which adds overhead but ensures correctness.&lt;/p&gt;
&lt;p&gt;The Fibonacci lookup is even cleaner:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nl"&gt;_z80demo_fib_lookup:&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; if ((size_t)n &amp;lt; (size_t)12U)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;a&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;#0&lt;/span&gt;&lt;span class="no"&gt;x00&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;c&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;#0&lt;/span&gt;&lt;span class="no"&gt;x0c&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;jr&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;NC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00102&lt;/span&gt;&lt;span class="no"&gt;$&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; return z80demo_FIBONACCI.data[(size_t)n]&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;de&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;#_z80demo_FIBONACCI+0&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;c&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;b&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;; n * 2 (16-bit entries)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;de&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;; base + offset&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ret&lt;/span&gt;
&lt;span class="err"&gt;00102&lt;/span&gt;&lt;span class="nl"&gt;$:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; return 0&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;de&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;#0&lt;/span&gt;&lt;span class="no"&gt;x0000&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ret&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The bounds check compiles to a single &lt;code&gt;SUB&lt;/code&gt;/&lt;code&gt;JR NC&lt;/code&gt; pair. The array lookup uses &lt;code&gt;ADD HL,HL&lt;/code&gt; to compute the 16-bit element offset—exactly what you'd write by hand.&lt;/p&gt;
&lt;h4&gt;What Just Happened&lt;/h4&gt;
&lt;p&gt;We took Rust source code, ran two commands (Charon, then Eurydice), got readable C, wrote a 25-line glue header, and compiled for the Z80 with SDCC. Total code size: 717 bytes. No LLVM fork. No TableGen. No hours or days of debugging register allocation.&lt;/p&gt;
&lt;p&gt;The entire Eurydice pipeline—from Rust to C—preserves the structure of the original code. The SDCC step is standard Z80 C compilation, unchanged from what you'd do with hand-written C. The main adaptation work is replacing the glue header, which took about five minutes once I understood what the generated code actually referenced.&lt;/p&gt;
&lt;h3&gt;Comparing the Three Paths&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;LLVM Backend&lt;/th&gt;
&lt;th&gt;Eurydice → C&lt;/th&gt;
&lt;th&gt;Manual FFI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rust coverage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full &lt;code&gt;no_std&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Subset (no &lt;code&gt;dyn&lt;/code&gt;, limited generics)&lt;/td&gt;
&lt;td&gt;None (development aid only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code quality&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native optimized&lt;/td&gt;
&lt;td&gt;Depends on C compiler&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintenance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Track LLVM upstream&lt;/td&gt;
&lt;td&gt;Track Eurydice + Charon&lt;/td&gt;
&lt;td&gt;Manual sync&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Automation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full pipeline&lt;/td&gt;
&lt;td&gt;Full pipeline&lt;/td&gt;
&lt;td&gt;Manual porting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LLVM expertise&lt;/td&gt;
&lt;td&gt;Nix or OCaml&lt;/td&gt;
&lt;td&gt;Basic C/Rust&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Target reuse&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All LLVM frontends&lt;/td&gt;
&lt;td&gt;C-only output&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The right choice depends on your timeline and ambitions. If you're building a serious toolchain for a custom CPU—something you'll maintain for years—the LLVM backend is worth the investment. If you need Rust on a platform that already has a C compiler and you're working with a constrained subset of the language, Eurydice is a compelling shortcut.&lt;/p&gt;
&lt;h3&gt;The Elephant in the Room&lt;/h3&gt;
&lt;p&gt;Eurydice works best for small, self-contained programs that avoid complex Rust features. Its primary limitation is Charon, the MIR extractor, which is "routinely foiled by more recent Rust features" according to the &lt;a href="https://baud.rs/LqCiem"&gt;LWN article&lt;/a&gt; that prompted this exploration. Const generics, complex trait bounds, and advanced pattern matching can all cause extraction failures.&lt;/p&gt;
&lt;p&gt;For embedded and retro targets, this might actually be fine. The Rust code you'd write for a Z80—&lt;code&gt;no_std&lt;/code&gt;, no allocator, fixed-size buffers, simple arithmetic—is exactly the subset that Eurydice handles well. You're not going to &lt;code&gt;impl Iterator&lt;/code&gt; your way through 64KB of address space.&lt;/p&gt;
&lt;p&gt;But if your Rust code is complex enough to genuinely benefit from Rust's type system—generics, trait objects, complex lifetime management—you've probably outgrown what Eurydice can transpile. At that point, you need an &lt;a href="https://baud.rs/Jy0EBX"&gt;LLVM backend&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The Eurydice team is actively working on expanding coverage. Dynamic dispatch via vtables is the next major feature. Broader standard library support is an ambitious goal for 2026. The project is dual-licensed under Apache 2.0 and MIT, and accepts outside contributions.&lt;/p&gt;
&lt;h3&gt;Where This Leaves Us&lt;/h3&gt;
&lt;p&gt;For my own projects, the LLVM backends for Z80 and Sampo remain the right choice—they support the full &lt;code&gt;no_std&lt;/code&gt; Rust language and produce optimized native code. But if someone asked me "how do I get started running Rust on my &lt;a href="https://baud.rs/build-z80-ciarcia"&gt;retro hardware&lt;/a&gt; &lt;em&gt;this weekend&lt;/em&gt;," I'd point them at Eurydice and SDCC. The barrier to entry dropped from "understand GlobalISel" to "install Nix and run two commands."&lt;/p&gt;
&lt;p&gt;That's genuine progress. The path from Rust to weird hardware just got shorter.&lt;/p&gt;</description><category>compilers</category><category>eurydice</category><category>llvm</category><category>retrocomputing</category><category>rust</category><category>sampo</category><category>sdcc</category><category>transpiler</category><category>z80</category><guid>https://tinycomputers.io/posts/three-paths-to-rust-on-custom-hardware.html</guid><pubDate>Fri, 06 Feb 2026 18:00:00 GMT</pubDate></item><item><title>The Z-80 Microcomputer Handbook: A 1978 Reference That Outlived Its Era</title><link>https://tinycomputers.io/posts/the-z80-microcomputer-handbook-william-barden.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-z80-microcomputer-handbook-william-barden_tts.mp3" type="audio/mpeg"&gt;
Your browser does not support the audio element.
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;26 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/z80-barden-book/cover-001.jpg" style="width: 400px; box-shadow: 0 30px 40px rgba(0,0,0,.1); float: right; margin: 25px;"&gt;&lt;/p&gt;
&lt;p&gt;In 1978, William Barden, Jr. set out to write a book with a threefold purpose: to acquaint the reader with the hardware of the Z80, to discuss its "almost overwhelming" software instruction set, and to describe the microcomputer systems being built around it. The result was &lt;em&gt;&lt;a href="https://baud.rs/5brWaW"&gt;The Z-80 Microcomputer Handbook&lt;/a&gt;&lt;/em&gt;, published by Howard W. Sams &amp;amp; Co.—one of the most prolific technical publishers of the era. At just over 300 pages, Barden delivered on all three promises, producing a reference that served as both tutorial and encyclopedia for what was arguably the most important &lt;a href="https://baud.rs/DlwHJZ"&gt;microprocessor&lt;/a&gt; of the late 1970s.&lt;/p&gt;
&lt;p&gt;The copy I have is the eighth printing from 1985. Eight printings. The book was first published in 1978, and Howard W. Sams was still running the presses seven years later. I'll confess I hadn't fully appreciated that the Z80's popularity carried that kind of momentum into the mid-1980s—by which point Intel's 80286 had been on the market for three years and IBM's AT was already sitting on desks in corporate offices across America. Yet here was a book about an 8-bit processor from 1976, still selling briskly enough to justify another print run. That tells you something about the Z80's staying power, and something about the quality of Barden's handbook.&lt;/p&gt;
&lt;h3&gt;The Author and the Publisher&lt;/h3&gt;
&lt;p&gt;William Barden, Jr. was a prolific technical author who wrote extensively about microprocessors and microcomputers throughout the late 1970s and 1980s. His writing style sits in a comfortable middle ground between the dry precision of a Zilog datasheet and the conversational approachability of a hobbyist magazine column. He assumes the reader has some technical foundation but doesn't demand an electrical engineering degree. The prose is clear, methodical, and—when the subject matter allows—occasionally wry.&lt;/p&gt;
&lt;p&gt;Howard W. Sams &amp;amp; Co., a subsidiary of Macmillan, was headquartered in Indianapolis and had built a reputation as one of the go-to publishers for electronics and computing references. Their catalog included the famous &lt;em&gt;&lt;a href="https://baud.rs/samstechnical"&gt;Photofact&lt;/a&gt;&lt;/em&gt; service manuals and a long list of titles covering everything from transistor theory to amateur radio. A Sams book on your shelf carried a certain implicit endorsement: this was going to be technically sound, well-organized, and useful.&lt;/p&gt;
&lt;h3&gt;Three Sections, One Processor&lt;/h3&gt;
&lt;p&gt;Barden organized the book into three distinct sections, each approaching the Z80 from a different angle. Section I covers Z80 hardware—the architecture, interface signals and timing, addressing modes, instruction set, flags and arithmetic operations, interrupt sequences, and interfacing memory and I/O devices. Section II shifts to Z80 software, beginning with the assembly process itself and then working through the major instruction groups: data movement, arithmetic and logical operations, shifting and bit manipulation, list and table operations, subroutine calls, I/O and interrupt operations, and commonly used subroutines. Section III surveys five commercial microcomputer systems built around the Z80.&lt;/p&gt;
&lt;p&gt;This three-part structure gives the book a completeness that many competing references lacked. Readers who wanted to understand the Z80 at the silicon level could camp out in Section I. Programmers who needed to write assembly code had a thorough software reference in Section II. And anyone trying to decide which Z80 system to buy—or trying to understand what made these systems different from one another—could turn to Section III for a comparative tour.&lt;/p&gt;
&lt;h3&gt;The Hardware Foundation&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/z80-barden-book/architecture-016.jpg" style="width: 500px; box-shadow: 0 30px 40px rgba(0,0,0,.1); float: left; margin: 25px;"&gt;&lt;/p&gt;
&lt;p&gt;Section I opens with a concise but thorough treatment of the Z80's internal architecture. Barden walks the reader through the processor's register set—fourteen general-purpose 8-bit registers organized in two banks (A through L, and their primed counterparts A' through L'), plus the special-purpose registers: two index registers (IX and IY), the stack pointer, program counter, interrupt vector register, and memory refresh counter.&lt;/p&gt;
&lt;p&gt;The dual register bank architecture receives careful attention, and rightly so. The ability to swap between two complete register sets with a single EXX instruction was one of the Z80's most distinctive features. Barden explains not just the mechanics but the motivation: fast interrupt handling without the overhead of pushing and popping registers to the stack. For real-time applications—process control, data acquisition, communications—this was a significant advantage over the Intel 8080A, which required explicit save-and-restore sequences.&lt;/p&gt;
&lt;p&gt;The flag register documentation is similarly thorough. Each of the six testable flags—Sign, Zero, Half-carry, Parity/Overflow, Subtract, and Carry—gets individual treatment, with clear diagrams showing bit positions and the conditions under which each flag is set or cleared. Barden's flag register diagram on page 19 is the kind of figure you'd photocopy and tape to the wall above your workbench.&lt;/p&gt;
&lt;p&gt;Chapter 3 dives into the Z80's interface signals and timing with a level of detail that borders on the exhaustive. Every control signal is documented: MREQ, IORQ, RD, WR, RFSH, HALT, WAIT, INT, NMI, BUSRQ, BUSAK, and the rest. The timing diagrams for M1 cycles, memory read and write cycles, I/O cycles, interrupt acknowledge sequences, and bus request/acknowledge handshakes are presented with the precision needed for hardware designers wiring up actual systems. This is reference material, not light reading—but it's the kind of reference material you desperately need when your homebrew system isn't behaving and you're staring at an oscilloscope trace trying to figure out why.&lt;/p&gt;
&lt;p&gt;Chapter 4's treatment of addressing modes deserves special mention. The Z80 supported ten addressing modes—implied, immediate, extended immediate, register, register indirect, extended, modified page zero, relative, indexed, and bit addressing. Barden documents each with examples showing the instruction encoding at the bit level. The indexed addressing mode, using IX or IY plus a displacement byte, was a Z80 innovation that made structured data access far more practical than on the 8080A. Barden's diagrams showing multi-byte instruction formats, with op-codes, displacement values, and immediate data laid out byte by byte, are models of technical illustration.&lt;/p&gt;
&lt;p&gt;The instruction set itself, covered in Chapter 5, is presented in tabular form with every detail a programmer needs: mnemonic, symbolic operation, flag effects, op-code encoding in binary, byte count, machine cycle count, and T-state count. These tables span dozens of pages and represent the kind of painstaking documentation that made the book worth keeping within arm's reach during coding sessions. The eleven instruction groups—from 8-bit loads through block transfers, arithmetic operations, rotates and shifts, bit manipulation, jumps, calls, and I/O—are each given systematic treatment.&lt;/p&gt;
&lt;p&gt;Chapter 8 rounds out the hardware section with a practical discussion of interfacing memory and I/O devices to the Z80. The treatment of the Z80 PIO (Parallel Input/Output) chip is particularly detailed, covering all four operating modes with programming examples. Barden walks through the initialization sequences for each mode, the interrupt vector configuration, and the handshaking protocols—exactly the kind of information you'd struggle to extract from &lt;a href="https://baud.rs/MIPV1T"&gt;Zilog's own documentation&lt;/a&gt; without considerable effort.&lt;/p&gt;
&lt;h3&gt;The Software Perspective&lt;/h3&gt;
&lt;p&gt;Section II opens with what might be the most pedagogically effective chapter in the book: Chapter 9, the Z80 Assembler. Rather than jumping straight into assembly language syntax, Barden starts with raw machine language. He presents a trivial program—adding the numbers one through ten—first as a series of mnemonics, then as hand-assembled machine code with each op-code and operand byte spelled out in hexadecimal. He then shows the same program rewritten with a loop, and walks through the manual assembly process step by step: calculating instruction lengths, assigning memory addresses, resolving label references, filling in the binary encoding of each instruction.&lt;/p&gt;
&lt;p&gt;This is brilliant pedagogy. By forcing the reader through the pain of manual assembly—calculating that a JP NZ,LOOP instruction at address 0105H needs to encode the target address 0103H as bytes 03H and 01H in little-endian order—Barden ensures they understand exactly what an assembler does before they start using one. The transition from manual assembly to symbolic assembly language feels earned rather than arbitrary. When Barden introduces labels, pseudo-operations, expression evaluation, and the two-pass assembly process, the reader understands &lt;em&gt;why&lt;/em&gt; these features exist, not just how to use them.&lt;/p&gt;
&lt;p&gt;Chapters 10 through 15 systematically work through the instruction groups from a programmer's perspective. Each chapter takes a logical group—data movement, arithmetic and logic, shifting and bit manipulation, list and table operations, subroutines, I/O and CPU control—and provides detailed examples showing how the instructions are used in practice. The block transfer instructions (LDI, LDIR, LDD, LDDR) and block search instructions (CPI, CPIR) receive particularly good coverage, as these were among the Z80's most powerful features and had no equivalent in the 8080A instruction set.&lt;/p&gt;
&lt;p&gt;Chapter 16, covering commonly used subroutines, is where the book transitions from reference to practical cookbook. Barden provides complete, tested subroutine implementations for comparison, timing loops, multiply, divide, multiple-precision arithmetic, ASCII-to-binary conversion, base conversion, memory fill, string comparison, and table search. Each subroutine is documented with its entry conditions, exit conditions, and register usage—the kind of disciplined documentation that professional assembly programmers live by. The table search routine, using IX as a base pointer with entry size in DE, is a clean example of the Z80's indexed addressing mode earning its keep.&lt;/p&gt;
&lt;h3&gt;A Snapshot of the Ecosystem&lt;/h3&gt;
&lt;p&gt;Section III is where the book transforms from a processor reference into a historical document. In two chapters, Barden surveys five companies manufacturing Z80-based microcomputer systems: Zilog itself, Technical Design Labs (TDL), Cromemco, The Digital Group, and Radio Shack.&lt;/p&gt;
&lt;p&gt;The Zilog chapter covers the Z80 MCB (Microcomputer Board), a complete single-board computer measuring 7.7 by 7.75 inches. With 4K of dynamic RAM, up to 4K of EPROM or PROM, a PIO for parallel I/O, a USART for serial communication, and a CTC for timing, the MCB was a capable development platform. Barden documents the memory map, I/O port addressing, interrupt configuration, and the 1K monitor program with its eight commands for examining memory, setting breakpoints, and controlling program execution. The minimum MCB system—the board itself, a 5-volt power supply, and a Teletype ASR-33—was a complete development environment, albeit a spartan one.&lt;/p&gt;
&lt;p&gt;The Cromemco coverage reveals a more ambitious ecosystem. Their Z-1 and Z-2 systems were built around the &lt;a href="https://tinycomputers.io/posts/george-morrow-pioneer-of-personal-computing.html"&gt;S-100 bus&lt;/a&gt;, with the Z-2 offering a chassis with 21 card slots, a 30-amp power supply, and room for serious expansion. Cromemco's peripheral lineup included a TV DAZZLER for color graphics, a Digital Interface Board with analog-to-digital and digital-to-analog converters, and their BYTESAVER EPROM programmer board. Their CONTROL BASIC—a specialized BASIC interpreter designed for process control and automated testing—hints at the industrial applications that were already finding the Z80.&lt;/p&gt;
&lt;p&gt;The Digital Group section reveals a company taking a different approach: offering CPU boards for multiple processor families—Motorola 6800, MOS Technology 6502, Intel 8080A, and Z80—all interchangeable at the board level. Their Phi-Deck cassette storage system, with 800 bytes per second transfer rates and CRC error checking, was a notably sophisticated approach to the cassette storage problem.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/z80-barden-book/trs80-273.jpg" style="width: 500px; box-shadow: 0 30px 40px rgba(0,0,0,.1); float: left; margin: 25px;"&gt;&lt;/p&gt;
&lt;p&gt;And then there's Radio Shack. Barden's description of the TRS-80 stands out because of what it represented: a completely integrated, turnkey system that a consumer could purchase, take home, plug in, and immediately begin programming in BASIC. While the other systems in this chapter required varying degrees of assembly, configuration, and technical knowledge, the TRS-80 was designed for people who wanted to use a computer, not build one. With its 53-key keyboard, 12-inch monitor, cassette storage, and 4K of ROM containing a BASIC interpreter, it was a vision of the microcomputer's commercial future—even if its 64-character-by-16-line display and 4K of RAM seem quaint today.&lt;/p&gt;
&lt;p&gt;Reading these system descriptions in sequence, you can see the microcomputer market stratifying in real time. At one end, boards like Zilog's MCB served engineers and serious hobbyists who wanted maximum flexibility. In the middle, S-100 systems from Cromemco and others offered expandability with some degree of standardization. And at the consumer end, Radio Shack was proving that microcomputers could be mass-market products. All of them ran on the Z80.&lt;/p&gt;
&lt;h3&gt;The Intel Shadow&lt;/h3&gt;
&lt;p&gt;What makes the 1985 printing date so remarkable is the context in which someone would have been buying this book. By 1985, the microcomputer landscape had shifted dramatically from the world Barden documented in 1978.&lt;/p&gt;
&lt;p&gt;Intel had introduced the 8086 in 1978—the same year this book was published—and its cost-reduced sibling, the 8088, in 1979. When IBM chose the 8088 for its Personal Computer in 1981, the x86 architecture gained a gravitational pull that would reshape the entire industry. The Intel 80286, launched in 1982, brought protected mode, a 16-megabyte address space, and hardware memory management. When IBM built the 80286 into the PC/AT in August 1984, it created what would become the standard business computer platform for years to come. The AT was fast, expandable, and—critically—backward compatible with the enormous library of software already written for the original PC.&lt;/p&gt;
&lt;p&gt;By 1985, the trajectory was clear. The x86 architecture was the future of personal computing. CP/M, which had been the dominant operating system for Z80 machines, was fading in the face of MS-DOS. The TRS-80 line was winding down. Cromemco had pivoted to 68000-based systems. The Z80's reign as the king of personal computing was effectively over.&lt;/p&gt;
&lt;p&gt;And yet the eighth printing rolled off the presses.&lt;/p&gt;
&lt;p&gt;The Z80 endured because personal computing was never the whole story. The processor had found its way into embedded systems, industrial controllers, point-of-sale terminals, scientific instruments, and countless other applications where its simplicity, low cost, and well-understood behavior were more valuable than raw performance. The Z80 didn't need a 16-megabyte address space to control a factory floor. It didn't need protected mode to run a cash register. It needed to be cheap, reliable, and thoroughly documented—and books like Barden's were part of that documentation ecosystem.&lt;/p&gt;
&lt;p&gt;There's also the educational angle. In 1985, the Z80 was still one of the best processors for &lt;em&gt;learning&lt;/em&gt; computer architecture. Its instruction set was complex enough to illustrate real-world design tradeoffs—accumulator-based operations, register pairs for 16-bit addressing, multiple addressing modes, condition flags—without being so complex as to overwhelm a student. Many universities and technical colleges were still teaching microprocessor courses using the Z80 well into the late 1980s. For those students and their instructors, Barden's handbook was still entirely relevant.&lt;/p&gt;
&lt;h3&gt;The Book as Reference&lt;/h3&gt;
&lt;p&gt;Evaluating &lt;em&gt;The Z-80 Microcomputer Handbook&lt;/em&gt; as a technical reference, it holds up remarkably well within its domain. The instruction set tables in Chapter 5 are comprehensive and clearly formatted, with every detail needed for hand-coding or verifying assembler output. The appendices—covering electrical specifications, an 8080-to-Z80 instruction cross-reference, a complete instruction summary, binary and hexadecimal tables, and ASCII codes—round out the reference material.&lt;/p&gt;
&lt;p&gt;The 8080/Z80 comparison in Appendix B is particularly useful for readers coming from the Intel side. Since the Z80 included the entire 8080A instruction set as a subset (using Zilog's own mnemonics rather than Intel's), this cross-reference served as a Rosetta Stone for programmers transitioning between the two architectures. Many Z80 systems needed to run software originally written for the 8080A, and understanding the mapping between Intel and Zilog mnemonics was a practical necessity.&lt;/p&gt;
&lt;p&gt;Where the book shows its age most clearly is in Section III. The specific microcomputer systems described—the Zilog MCB, the TDL ZPUTM and Xitan, the Cromemco Z-1 and Z-2, the Digital Group systems, and the TRS-80—are all long discontinued. But this is precisely what makes Section III valuable today: it's a primary source document of a hardware ecosystem that existed for a brief, vibrant moment and then vanished. You won't find this level of detail about the Digital Group's Phi-Deck cassette system or TDL's System Monitor Board in Wikipedia.&lt;/p&gt;
&lt;h3&gt;A Companion Piece&lt;/h3&gt;
&lt;p&gt;Readers of this site may notice a natural pairing with Steve Ciarcia's &lt;em&gt;&lt;a href="https://tinycomputers.io/posts/build-your-own-z80-computer-steve-ciarcia.html"&gt;Build Your Own Z80 Computer&lt;/a&gt;&lt;/em&gt; (&lt;a href="https://baud.rs/build-z80-ciarcia"&gt;Amazon&lt;/a&gt;), which we reviewed previously. Where Ciarcia's book is a construction manual—guiding the reader through building a complete Z80 system from power supply to CRT terminal—Barden's handbook is a reference and survey. Ciarcia teaches you to &lt;em&gt;build&lt;/em&gt;; Barden teaches you to &lt;em&gt;understand&lt;/em&gt;. The two books complement each other almost perfectly, and it's easy to imagine a 1978-era hobbyist keeping both within reach: Barden's for looking up instruction encodings and timing specifications, Ciarcia's for wiring up the hardware to run them on.&lt;/p&gt;
&lt;p&gt;The difference in approach also reflects the different publishers. BYTE Books, which published Ciarcia, was rooted in the hobbyist magazine world and emphasized hands-on projects. Howard W. Sams had a longer tradition of comprehensive technical references. Each publisher played to its strengths.&lt;/p&gt;
&lt;h3&gt;Final Thoughts&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;The Z-80 Microcomputer Handbook&lt;/em&gt; is not a book that will teach you to build a computer, nor is it one that will dazzle you with narrative flair. It is, instead, something arguably more valuable: a thorough, well-organized, clearly written reference to a processor and its ecosystem, produced at the moment when that ecosystem was at its peak. Barden's systematic approach—hardware first, then software, then systems—gives the reader a complete understanding of the Z80 world from silicon to finished product.&lt;/p&gt;
&lt;p&gt;That the book was still being printed in 1985, with the IBM AT already on the market and the 80386 just a year away, is a testament to both the Z80's remarkable longevity and the quality of Barden's work. Eight printings don't happen by accident. They happen because engineers, students, hobbyists, and embedded systems designers kept walking into bookstores and electronics shops and deciding that yes, they still needed this book.&lt;/p&gt;
&lt;p&gt;Nearly five decades after its original publication, &lt;em&gt;&lt;a href="https://baud.rs/5brWaW"&gt;The Z-80 Microcomputer Handbook&lt;/a&gt;&lt;/em&gt; remains a worthwhile read for anyone interested in the foundations of microcomputing. Paired with Rodnay Zaks' &lt;em&gt;&lt;a href="https://baud.rs/EsBekO"&gt;Programming the Z80&lt;/a&gt;&lt;/em&gt; for software depth and J.S. Walker's &lt;em&gt;&lt;a href="https://baud.rs/Ch4htI"&gt;Design a Z80 Computer&lt;/a&gt;&lt;/em&gt; for a modern practical build guide, it forms part of an essential Z80 library. The architecture it documents influenced a generation of processor designs. The assembly language techniques it teaches remain relevant for anyone working close to the metal. And the ecosystem it surveys—that brief, fertile period when a handful of small companies were inventing the personal computer industry in real time—deserves to be remembered in the detail that Barden provided.&lt;/p&gt;</description><category>assembly language</category><category>book review</category><category>microprocessors</category><category>retrocomputing</category><category>vintage computing</category><category>william barden</category><category>z80</category><category>zilog</category><guid>https://tinycomputers.io/posts/the-z80-microcomputer-handbook-william-barden.html</guid><pubDate>Thu, 05 Feb 2026 21:00:00 GMT</pubDate></item></channel></rss>