<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>TinyComputers.io (Posts about sampo)</title><link>https://tinycomputers.io/</link><description></description><atom:link href="https://tinycomputers.io/categories/sampo.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 A.C. Jokela 
&lt;!-- div style="width: 100%" --&gt;
&lt;a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"&gt;&lt;img alt="" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" /&gt; Creative Commons Attribution-ShareAlike&lt;/a&gt;&amp;nbsp;|&amp;nbsp;
&lt;!-- /div --&gt;
</copyright><lastBuildDate>Mon, 06 Apr 2026 22:12:58 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Part 4: 132 Tests, Zero Failures - Verifying the Sampo CPU on Real Hardware</title><link>https://tinycomputers.io/posts/sampo-fpga-isa-verification.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/sampo-fpga-isa-verification_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;12 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;In &lt;a href="https://tinycomputers.io/posts/sampo-16-bit-risc-cpu-part-1.html"&gt;Part 1&lt;/a&gt;, we designed the Sampo 16-bit RISC architecture. In &lt;a href="https://tinycomputers.io/posts/sampo-fpga-implementation-ulx3s.html"&gt;Part 2&lt;/a&gt;, we synthesized it to an ECP5 FPGA on the ULX3S board. In &lt;a href="https://tinycomputers.io/posts/sampo-llvm-backend-rust-compiler.html"&gt;Part 3&lt;/a&gt;, we built an LLVM backend so Rust could compile for it. But there was a glaring gap in the project: we'd never systematically verified that the hardware actually implements the ISA correctly.&lt;/p&gt;
&lt;p&gt;The "Hello, Sampo!" demo program exercises maybe 10 of the CPU's 66 instructions. The LLVM backend generates code that assumes the hardware matches the spec. If a single instruction is subtly wrong - a carry flag not set, a branch offset miscalculated, a byte load sign-extending when it shouldn't - the entire toolchain is built on sand.&lt;/p&gt;
&lt;p&gt;This post documents the process of building a comprehensive test suite, running it in simulation, finding a real pipeline hazard bug in the CPU, and then the surprisingly treacherous journey of getting those tests running on real FPGA hardware.&lt;/p&gt;
&lt;h3&gt;The Test Strategy&lt;/h3&gt;
&lt;p&gt;The approach is straightforward: write assembly programs that exercise every instruction in the ISA, compare results against known-good values, and report PASS or FAIL over UART. The testbench monitors the serial output, and if it sees "FAIL" anywhere, the test run fails.&lt;/p&gt;
&lt;p&gt;Each test follows the same pattern:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;; Load known inputs&lt;/span&gt;
&lt;span class="nf"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x1234&lt;/span&gt;
&lt;span class="nf"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x5678&lt;/span&gt;

&lt;span class="c1"&gt;; Execute the instruction under test&lt;/span&gt;
&lt;span class="nf"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R9&lt;/span&gt;

&lt;span class="c1"&gt;; Check the result&lt;/span&gt;
&lt;span class="nf"&gt;MOV&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R10&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="c1"&gt;; actual value&lt;/span&gt;
&lt;span class="nf"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x68AC&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; expected value&lt;/span&gt;
&lt;span class="nf"&gt;JALX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;check_eq&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;; prints PASS or FAIL&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;check_eq&lt;/code&gt; subroutine compares R4 (actual) against R5 (expected) and prints the result over the UART. This makes the test output human-readable and machine-parseable:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;=== ALU Tests ===
ADD basic: PASS
ADD zero: PASS
ADD carry out: PASS
ADD overflow: PASS
SUB basic: PASS
...
Done.
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;The Test Framework&lt;/h3&gt;
&lt;p&gt;Every test program begins with a block of helper subroutines that handle UART communication and result reporting. The core is a busy-wait loop that polls the MC6850-compatible UART status register:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="na"&gt;.equ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;ACIA_STATUS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x80&lt;/span&gt;
&lt;span class="na"&gt;.equ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;ACIA_DATA&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;0x81&lt;/span&gt;

&lt;span class="nl"&gt;print_char:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; R5 = character to output&lt;/span&gt;
&lt;span class="nl"&gt;.wait:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;INI&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;ACIA_STATUS&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; Read status register&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;AND&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;; Copy to R7&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ADDI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-2&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="c1"&gt;; Check if TX ready (bit 1)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;BNE&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;.wait&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="c1"&gt;; Loop until ready&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;OUTI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;ACIA_DATA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;; Send character&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;JR&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;RA&lt;/span&gt;&lt;span class="w"&gt;                 &lt;/span&gt;&lt;span class="c1"&gt;; Return&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;check_eq&lt;/code&gt; helper prints "PASS" or "FAIL" based on a register comparison, and the &lt;code&gt;print_str&lt;/code&gt; helper walks a null-terminated string byte by byte. These routines are duplicated in each test file rather than linked - there's no linker in this toolchain, just a single-file assembler.&lt;/p&gt;
&lt;h3&gt;Test Coverage&lt;/h3&gt;
&lt;p&gt;We organized the tests into 10 programs, each targeting a specific area of the instruction set:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test Program&lt;/th&gt;
&lt;th&gt;Instructions Tested&lt;/th&gt;
&lt;th&gt;Test Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;test_alu&lt;/td&gt;
&lt;td&gt;ADD, SUB, AND, OR, XOR, NEG + flags&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_addi&lt;/td&gt;
&lt;td&gt;ADDI with signed immediates + flags&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_shift&lt;/td&gt;
&lt;td&gt;SLL, SRL, SRA, ROL, ROR, SWAP (1/4/8-bit variants)&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_muldiv&lt;/td&gt;
&lt;td&gt;MUL, MULH, DIV, DIVU, REM, REMU&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_loadstore&lt;/td&gt;
&lt;td&gt;LW, LB, LBU, SW, SB + offset variants&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_branch&lt;/td&gt;
&lt;td&gt;All 16 branch conditions (taken + not taken)&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_jump&lt;/td&gt;
&lt;td&gt;J, JR, JALR, JX, JALX&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_stack&lt;/td&gt;
&lt;td&gt;PUSH, POP, CMP, TEST, MOV&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_misc&lt;/td&gt;
&lt;td&gt;EXX, GETF, SETF, SCF, CCF, NOP&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;test_extended&lt;/td&gt;
&lt;td&gt;ADDIX, SUBIX, ANDIX, ORIX, XORIX, SLLX, SRLX, SRAX&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;132&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The branch tests are particularly thorough - each of the 16 conditions (BEQ, BNE, BLT, BGE, BLTU, BGEU, BMI, BPL, BVS, BVC, BCS, BCC, BGT, BLE, BHI, BLS) gets tested both for the taken and not-taken case. We set up flags with arithmetic, then verify the branch goes the right way.&lt;/p&gt;
&lt;h3&gt;Finding a Real Bug: The Pipeline Hazard&lt;/h3&gt;
&lt;p&gt;The first time we ran the full test suite in simulation, 130 of 132 tests passed. Two tests in &lt;code&gt;test_loadstore&lt;/code&gt; were failing: the multi-word store/load test and a load with offset test.&lt;/p&gt;
&lt;p&gt;The failing pattern was consistent: any test that performed a store followed immediately by a load from a different address would read stale data. The load would return the value from the &lt;em&gt;previous&lt;/em&gt; memory operation instead of the current one.&lt;/p&gt;
&lt;p&gt;The root cause was a pipeline hazard between the MEMORY and FETCH states. Here's what was happening:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Cycle N:   MEMORY state - store completes, mem_ready asserts
Cycle N+1: FETCH state  - new instruction fetch begins
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The problem: &lt;code&gt;mem_ready&lt;/code&gt; is a one-cycle delayed version of &lt;code&gt;mem_valid&lt;/code&gt; (because the RAM is synchronous). When the CPU transitions from MEMORY to WRITEBACK to FETCH, the &lt;code&gt;mem_ready&lt;/code&gt; signal from the store was still asserted during the first cycle of the next FETCH. The CPU latched the stale &lt;code&gt;mem_rdata&lt;/code&gt; from the previous store operation as if it were the new instruction.&lt;/p&gt;
&lt;p&gt;The fix was to add a WRITEBACK state after every MEMORY operation - not just loads, but stores too. This gives &lt;code&gt;mem_ready&lt;/code&gt; a cycle to deassert before the next FETCH begins:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;Before&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MEMORY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FETCH&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem_ready&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;still&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;high&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;After&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;MEMORY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;WRITEBACK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FETCH&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem_ready&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;deasserts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;during&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;WRITEBACK&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A one-line change to the next-state logic:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="no"&gt;`ST_MEMORY&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;begin&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem_ready&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;begin&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// Always go through WRITEBACK after MEMORY.&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// For stores: allows mem_ready to deassert before&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// next FETCH (prevents stale rdata latch).&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;next_state&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;`ST_WRITEBACK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is exactly the kind of bug that simulation catches and manual inspection misses. The instruction executes correctly in isolation - it's only the &lt;em&gt;interaction&lt;/em&gt; between consecutive memory operations that triggers the hazard. After the fix, all 132 tests passed in simulation.&lt;/p&gt;
&lt;h3&gt;Taking It to the FPGA&lt;/h3&gt;
&lt;p&gt;With simulation clean, the next step was running the tests on real hardware. The ULX3S board has an &lt;a href="https://baud.rs/bJSrEK"&gt;FTDI&lt;/a&gt; FT231X USB-serial chip connected to the FPGA, so UART output appears on a serial port at 115200 baud.&lt;/p&gt;
&lt;p&gt;There was an immediate practical problem: the test programs run fast. At 12.5 MHz, the entire 20-test ALU suite completes in about 30 milliseconds. By the time openFPGALoader finishes programming the FPGA and releases the USB port, the test output is long gone. The FTDI chip has a small receive buffer, but 364 characters of test output overflows it before you can open the serial port.&lt;/p&gt;
&lt;p&gt;The solution: patch the hex files to loop instead of halting. Replace the HALT instruction with a delay loop followed by a jump back to the reset vector. The test runs, outputs its results, waits about half a second, and starts over. You can open the serial port at any time and catch a complete iteration.&lt;/p&gt;
&lt;h4&gt;The Delay Loop Patch&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;hex_loop_patch.py&lt;/code&gt; script performs binary patching on the assembled hex files. It finds the HALT instruction (encoded as &lt;code&gt;0xE100&lt;/code&gt;) and replaces it with a delay loop:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;; Delay ~0.38 seconds at 12.5 MHz&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x0008&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; outer counter&lt;/span&gt;
&lt;span class="nl"&gt;outer:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0xFFFF&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; inner counter = 65535&lt;/span&gt;
&lt;span class="nl"&gt;inner:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ADDI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-1&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;BNE&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;inner&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ADDI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-1&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;BNE&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;outer&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;JX&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;0x0100&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="c1"&gt;; jump back to reset vector&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The first version of this script &lt;em&gt;inserted&lt;/em&gt; these 10 words at the HALT position. This seemed obviously correct. The tests ran on FPGA. Characters appeared on the serial port.&lt;/p&gt;
&lt;p&gt;They were the wrong characters.&lt;/p&gt;
&lt;h3&gt;The Address Shift Bug&lt;/h3&gt;
&lt;p&gt;The FPGA output for the "Hello, Sampo!" test program was &lt;code&gt;\x08\x08\x08\x08&lt;/code&gt; - four backspace characters, repeating forever. The ALU test suite showed truncated output with roughly 45% of characters missing. Same pattern at 12.5 MHz and 6.25 MHz, ruling out timing violations. Simulation with realistic UART timing (1,080 cycles per byte, matching the hardware baud rate) passed perfectly.&lt;/p&gt;
&lt;p&gt;I spent considerable time investigating the wrong theories. Was the UART transmitter dropping bytes? Was there a clock domain crossing issue? Was &lt;code&gt;$readmemh&lt;/code&gt; in Yosys interpreting the hex file differently from Icarus Verilog? None of these panned out.&lt;/p&gt;
&lt;p&gt;The breakthrough came from staring at &lt;code&gt;\x08&lt;/code&gt;. That's the byte value 8. Where would 8 come from? The "Hello, Sampo!" program loads its message pointer with &lt;code&gt;LIX R4, message&lt;/code&gt; where &lt;code&gt;message&lt;/code&gt; is the label for the string data. In the assembled hex, &lt;code&gt;message&lt;/code&gt; resolves to address &lt;code&gt;0x011E&lt;/code&gt; - the byte immediately after the HALT instruction.&lt;/p&gt;
&lt;p&gt;And there it was. Look at the assembly structure:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nl"&gt;done:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;HALT&lt;/span&gt;&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="c1"&gt;; address 0x011C&lt;/span&gt;
&lt;span class="nl"&gt;message:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="na"&gt;.asciz&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"Hello, Sampo!\n"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;; address 0x011E&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The string data lives immediately after HALT. When &lt;code&gt;hex_loop_patch.py&lt;/code&gt; &lt;em&gt;inserts&lt;/em&gt; 10 words of delay loop code at the HALT position, it pushes the string data down by 20 bytes. But the &lt;code&gt;LIX R4, 0x011E&lt;/code&gt; instruction still points to the original address. At &lt;code&gt;0x011E&lt;/code&gt; there's now the second word of &lt;code&gt;LIX R8, 0x0008&lt;/code&gt; - which contains the value &lt;code&gt;0x0008&lt;/code&gt;. The low byte is &lt;code&gt;0x08&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The CPU faithfully reads byte &lt;code&gt;0x08&lt;/code&gt; from the patched address, outputs it via UART, advances the pointer to &lt;code&gt;0x011F&lt;/code&gt; where the high byte is &lt;code&gt;0x00&lt;/code&gt; (the null terminator), and stops. One &lt;code&gt;\x08&lt;/code&gt; per iteration, four iterations captured. Mystery solved.&lt;/p&gt;
&lt;p&gt;This same address shift corrupted every test program. The test strings ("ADD basic: ", "PASS\n", etc.) all live after HALT and all got displaced. The CPU was reading from locations that now contained delay loop machine code instead of ASCII text. Some fragments of text survived because adjacent strings partially overlapped with their shifted locations, producing the truncated output we saw.&lt;/p&gt;
&lt;h4&gt;The Fix&lt;/h4&gt;
&lt;p&gt;The correct approach: don't shift any data. Place the delay loop at address &lt;code&gt;0x0000&lt;/code&gt; - the 256 bytes of unused memory before the &lt;code&gt;0x0100&lt;/code&gt; reset vector - and replace the single-word HALT with a single-word relative &lt;code&gt;J&lt;/code&gt; (jump) instruction that jumps backward to the loop code. One word replaces one word. No data moves.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Place delay loop at address 0x0000 (unused space)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LOOP_PATCH&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;loop_base&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;

&lt;span class="c1"&gt;# Replace HALT with J instruction to address 0x0000&lt;/span&gt;
&lt;span class="c1"&gt;# J encoding: opcode 0x9, 12-bit signed offset&lt;/span&gt;
&lt;span class="c1"&gt;# target = PC + 2 + (sign_extend(offset) &amp;lt;&amp;lt; 1)&lt;/span&gt;
&lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_addr&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;halt_addr&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="n"&gt;j_word&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mh"&gt;0x9000&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0xFFF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;halt_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;j_word&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;There's a subtle complication: the J instruction shares opcode &lt;code&gt;0x9&lt;/code&gt; with JR (register indirect jump) and JALR (jump and link register). The decoder distinguishes them by specific bit patterns in the offset field. If the calculated offset happens to have &lt;code&gt;bits[3:0] == 0x1&lt;/code&gt; and &lt;code&gt;bits[11:8] != 0xF&lt;/code&gt;, the decoder interprets it as JALR instead of J. The script tries successive target addresses (&lt;code&gt;0x0000&lt;/code&gt;, &lt;code&gt;0x0002&lt;/code&gt;, &lt;code&gt;0x0004&lt;/code&gt;, ...) until it finds one that doesn't collide with the JR/JALR encoding space.&lt;/p&gt;
&lt;p&gt;After the fix, the patched hex files have exactly the same number of words as the originals. The only changes are the delay loop code written to the zero page and the HALT word replaced with a backward jump.&lt;/p&gt;
&lt;p&gt;With the corrected patcher, the "Hello, Sampo!" program finally works on the FPGA - looping cleanly with zero character loss:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/sampo-fpga-isa-verification/HelloSampo.png" style="width: 100%; max-width: 720px; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); margin: 1em 0;" loading="lazy" alt="Terminal showing Hello, Sampo! repeating on the ULX3S FPGA via cu serial connection"&gt;&lt;/p&gt;
&lt;h3&gt;The Testbench: Trusting but Verifying&lt;/h3&gt;
&lt;p&gt;One important discovery during this process: the simulation testbench had &lt;code&gt;tx_ready = 1&lt;/code&gt; permanently. The simulated UART never pushed back on the CPU - it accepted every byte instantly. This meant the CPU's busy-wait loop (&lt;code&gt;INI R6, ACIA_STATUS / ADDI R7, -2 / BNE wait&lt;/code&gt;) was never actually tested in simulation. The status register always returned "ready," so the loop body executed zero times.&lt;/p&gt;
&lt;p&gt;On real hardware, the UART transmitter takes about 87 microseconds per byte at 115200 baud. The busy-wait loop runs hundreds of times per character, exercising the INI instruction, the AND/ADDI flag-setting sequence, and the BNE branch in a tight loop. If any of those instructions had a subtle bug, it would only manifest on hardware.&lt;/p&gt;
&lt;p&gt;We added realistic UART timing to the testbench:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;parameter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TX_BYTE_CYCLES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;108&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// ~1080 cycles per byte&lt;/span&gt;
&lt;span class="kt"&gt;reg&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mh"&gt;15&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mh"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tx_delay_cnt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;always&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="k"&gt;posedge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;clk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;begin&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tx_valid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tx_ready&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;begin&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;tx_ready&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;tx_delay_cnt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TX_BYTE_CYCLES&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tx_delay_cnt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;begin&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;tx_delay_cnt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tx_delay_cnt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tx_delay_cnt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;tx_ready&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;With this change, simulation exercises the same code paths as the hardware. All 132 tests still pass - the UART flow control logic was correct all along, it just wasn't being tested.&lt;/p&gt;
&lt;h3&gt;Running All Tests on the FPGA&lt;/h3&gt;
&lt;video controls style="width: 100%; max-width: 720px; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.15); margin: 0 0 1em 0;"&gt;
&lt;source src="https://tinycomputers.io/sampo-fpga-test-suite.mp4" type="video/mp4"&gt;
Your browser does not support the video tag.
&lt;/source&gt;&lt;/video&gt;

&lt;p&gt;With the patch bug fixed, we ran the complete suite. Each test requires a separate FPGA build (Yosys synthesis, nextpnr place-and-route, ecppack bitstream generation), programming via JTAG, and serial capture. The Makefile automates the entire pipeline:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nf"&gt;fpga-%&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;&lt;span class="nv"&gt;BUILD_DIR&lt;/span&gt;&lt;span class="k"&gt;)&lt;/span&gt;/&lt;span class="n"&gt;sampo_&lt;/span&gt;%.&lt;span class="n"&gt;bit&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;openFPGALoader&lt;span class="w"&gt; &lt;/span&gt;-b&lt;span class="w"&gt; &lt;/span&gt;ulx3s&lt;span class="w"&gt; &lt;/span&gt;$&amp;lt;
&lt;span class="w"&gt;    &lt;/span&gt;sleep&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;python3&lt;span class="w"&gt; &lt;/span&gt;fpga_capture.py&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;SERIAL_PORT&lt;span class="k"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;SERIAL_BAUD&lt;span class="k"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;fpga_capture.py&lt;/code&gt; script opens the serial port, discards the first partial iteration (we might join mid-stream), waits for the &lt;code&gt;=== ... ===&lt;/code&gt; header line that starts each test, captures everything until the header repeats, and outputs one clean iteration.&lt;/p&gt;
&lt;p&gt;The results:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;========================================
=== FPGA: test_alu ===
========================================
=== ALU Tests ===
ADD basic: PASS
ADD zero: PASS
ADD carry out: PASS
...
AND clr C/V: PASS
All tests passed!

========================================
=== FPGA: test_addi ===
========================================
...
All tests passed!

...

========================================
FPGA Test Summary: 10 passed, 0 failed
========================================
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;All 10 test suites pass. All 132 individual tests pass. Zero failures on real hardware.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test Suite&lt;/th&gt;
&lt;th&gt;Tests&lt;/th&gt;
&lt;th&gt;FPGA Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ALU (ADD, SUB, AND, OR, XOR)&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ADDI (immediate arithmetic)&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shift (SLL, SRL, SRA, ROL, SWAP)&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MulDiv (MUL, DIV, REM variants)&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Load/Store (LW, LB, LBU, SW, SB)&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Branch (all 16 conditions)&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jump (J, JR, JALR, JX, JALX)&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stack (PUSH, POP, CMP, TEST, MOV)&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Misc (EXX, GETF, SETF, SCF, CCF, NOP)&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extended (ADDIX, SUBIX, SLLX, etc.)&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;All PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;132&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;All PASS&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;What This Means&lt;/h3&gt;
&lt;p&gt;Having all 132 ISA tests pass on hardware is a significant milestone for the project. It means:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Verilog RTL is correct.&lt;/strong&gt; Every instruction in the Sampo ISA produces the right result, sets the right flags, and handles edge cases (zero, overflow, carry, sign extension) correctly. Not just in behavioral simulation, but in synthesized logic on a real FPGA running at 12.5 MHz.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The assembler is correct.&lt;/strong&gt; All 66 instructions encode properly. Branch offsets calculate correctly. Extended instructions (LIX, JALX, OUTX) with their 32-bit encoding work. The &lt;code&gt;sasm&lt;/code&gt; Rust assembler and the Verilog decoder agree on every instruction format.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The LLVM backend has a solid foundation.&lt;/strong&gt; When the Rust compiler generates a &lt;code&gt;ADD&lt;/code&gt; or &lt;code&gt;BNE&lt;/code&gt; or &lt;code&gt;JALX&lt;/code&gt;, the hardware will execute it correctly. The test suite doesn't exercise every possible code generation pattern, but it validates every primitive instruction that the compiler builds upon.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The UART subsystem works end-to-end.&lt;/strong&gt; Status register polling, TX busy-wait, byte transmission, baud rate generation - all verified on hardware. The MC6850-compatible interface works exactly as specified.&lt;/p&gt;
&lt;h3&gt;Lessons Learned&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Test your assumptions.&lt;/strong&gt; The testbench had &lt;code&gt;tx_ready = 1&lt;/code&gt;. It went unnoticed because simulation "worked." The real hardware exercises code paths that simulation shortcuts. Add realistic peripheral timing to your testbenches from day one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Binary patching is fragile.&lt;/strong&gt; Inserting bytes into a binary without updating references is a classic relocation bug - the same class of problem that linkers exist to solve. If your patch changes the size of anything, every address reference past the patch point is wrong. The fix - placing the patch in unused address space and using a same-size replacement instruction - avoids the problem entirely.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Simulation is necessary but not sufficient.&lt;/strong&gt; The pipeline hazard bug was caught by simulation. The address shift bug was invisible to simulation (both used the same patching script, and the original programs - without patching - worked fine). You need both simulation and hardware testing, exercising different code paths and different failure modes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Systematic testing finds bugs that demos don't.&lt;/strong&gt; "Hello, Sampo!" worked on the FPGA from day one. It exercises &lt;code&gt;LIX&lt;/code&gt;, &lt;code&gt;LBU&lt;/code&gt;, &lt;code&gt;CMP&lt;/code&gt;, &lt;code&gt;BEQ&lt;/code&gt;, &lt;code&gt;INI&lt;/code&gt;, &lt;code&gt;OUTI&lt;/code&gt;, &lt;code&gt;ADDI&lt;/code&gt;, and &lt;code&gt;J&lt;/code&gt; - about 8 instructions. The pipeline hazard only manifested when a store was followed by a load to a different address, a pattern that doesn't occur in a simple print loop. You need tests specifically designed to exercise corner cases.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The entire Sampo project - assembler, emulator, Verilog RTL, FPGA build scripts, test suite, and LLVM backend - is open source on &lt;a href="https://baud.rs/r74wA8"&gt;GitHub&lt;/a&gt;. With hardware verification complete, the next steps might be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Running Rust-compiled code on the FPGA.&lt;/strong&gt; The LLVM backend generates assembly, the assembler produces hex files, and we now know the hardware executes them correctly. Closing this loop - &lt;code&gt;cargo build&lt;/code&gt; to blinking LEDs - is the obvious next milestone.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adding more peripherals.&lt;/strong&gt; The ULX3S has 32MB of SDRAM, an HDMI output, a microSD slot, and an ESP32 co-processor. Each of these opens up interesting possibilities for a working 16-bit computer.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Performance optimization.&lt;/strong&gt; The CPU currently runs at 12.5 MHz with a multi-cycle FSM (5-8 cycles per instruction). Pipelining could push this significantly higher on the ECP5.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But first: 132 tests, zero failures. The Sampo CPU works.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This is Part 4 of the Sampo series. &lt;a href="https://tinycomputers.io/posts/sampo-16-bit-risc-cpu-part-1.html"&gt;Part 1&lt;/a&gt; covers architecture design, &lt;a href="https://tinycomputers.io/posts/sampo-fpga-implementation-ulx3s.html"&gt;Part 2&lt;/a&gt; covers FPGA implementation, and &lt;a href="https://tinycomputers.io/posts/sampo-llvm-backend-rust-compiler.html"&gt;Part 3&lt;/a&gt; covers the LLVM backend.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;Recommended Resources&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/wvPosK"&gt;OrangeCrab ECP5 FPGA Board&lt;/a&gt; - A compact Lattice ECP5 board with DDR3 and USB-C, available on Amazon&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/6U3DBr"&gt;ECP5 FPGA Development Boards&lt;/a&gt; - Other ECP5 boards available on Amazon&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/RGjpAj"&gt;&lt;em&gt;Getting Started with FPGAs&lt;/em&gt;&lt;/a&gt; by Russell Merrick - Beginner-friendly introduction with Verilog and VHDL examples&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/bJSrEK"&gt;FTDI USB Serial Adapters&lt;/a&gt; - Useful for UART debugging with FPGAs&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/dBX5Ij"&gt;USB Logic Analyzers&lt;/a&gt; - Essential for debugging digital signals&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Source Code&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/r74wA8"&gt;github.com/ajokela/sampo&lt;/a&gt;&lt;/strong&gt; - CPU architecture, assembler, emulator, Verilog RTL, test suite, and FPGA build scripts&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/GCQDRa"&gt;github.com/ajokela/llvm-sampo&lt;/a&gt;&lt;/strong&gt; - LLVM backend and Rust target specification&lt;/li&gt;
&lt;/ul&gt;</description><category>cpu design</category><category>ecp5</category><category>fpga</category><category>hardware</category><category>isa</category><category>risc</category><category>sampo</category><category>testing</category><category>uart</category><category>ulx3s</category><category>verification</category><category>verilog</category><guid>https://tinycomputers.io/posts/sampo-fpga-isa-verification.html</guid><pubDate>Sun, 15 Feb 2026 20:00:00 GMT</pubDate></item><item><title>Three Paths to Rust on Custom Hardware</title><link>https://tinycomputers.io/posts/three-paths-to-rust-on-custom-hardware.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/three-paths-to-rust-on-custom-hardware_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;18 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;If you want to run &lt;a href="https://baud.rs/gSnSwR"&gt;Rust&lt;/a&gt; on hardware that Rust was never designed for (a Z80 from 1976, a custom 16-bit RISC CPU, a &lt;a href="https://baud.rs/jt9HTI"&gt;Game Boy&lt;/a&gt;), you have a problem. The Rust compiler targets LLVM, and LLVM doesn't know your CPU exists.&lt;/p&gt;
&lt;p&gt;I've spent some time solving this problem in different ways. I built &lt;a href="https://tinycomputers.io/posts/rust-on-z80-an-llvm-backend-odyssey.html"&gt;LLVM backends for both the Z80&lt;/a&gt; and my own &lt;a href="https://tinycomputers.io/posts/sampo-llvm-backend-rust-compiler.html"&gt;Sampo 16-bit RISC architecture&lt;/a&gt;. That's the "correct" solution (and it works), but it's also countless amounts of time wrestling with TableGen definitions and GlobalISel pipelines, though agentic coding tools help immensely.&lt;/p&gt;
&lt;p&gt;There's a recent project that offers a different path entirely: &lt;a href="https://baud.rs/XiwoCV"&gt;Eurydice&lt;/a&gt;, a Rust-to-C transpiler developed by researchers at Inria and Microsoft. The premise is simple. If your target already has a C compiler, you can skip LLVM entirely:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Rust → Eurydice → C → existing C compiler → your target
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For the Z80, that existing C compiler is &lt;a href="https://baud.rs/XOHX1N"&gt;SDCC&lt;/a&gt;, the Small Device C Compiler. It's mature, well-tested, and has supported the Z80 for decades.&lt;/p&gt;
&lt;p&gt;This article explores three distinct paths to getting Rust on custom hardware, and includes a hands-on walkthrough of the Eurydice approach: transpiling Rust to readable C, then compiling that C for the Z80 with SDCC.&lt;/p&gt;
&lt;h3&gt;Path 1: The Full LLVM Backend&lt;/h3&gt;
&lt;p&gt;This is what I did for both the Z80 and Sampo. You fork LLVM, implement a complete backend (register descriptions, instruction selection, calling conventions, type legalization, assembly printing) and teach the Rust compiler about your new target triple.&lt;/p&gt;
&lt;p&gt;The pipeline looks like this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Rust source → rustc frontend → LLVM IR → Your Backend → Assembly → Binary
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;What you get:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Full Rust language support (within hardware constraints)&lt;/li&gt;
&lt;li&gt;Access to LLVM's optimization passes: constant folding, dead code elimination, register allocation&lt;/li&gt;
&lt;li&gt;A single backend that works for Rust, C (via Clang), and any other LLVM frontend&lt;/li&gt;
&lt;li&gt;Native code quality that improves as LLVM improves&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;What it costs:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLVM is roughly 30 million lines of C++. The learning curve is &lt;a href="https://baud.rs/Jy0EBX"&gt;vertical&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A minimal backend requires 25-30 files of TableGen, C++, and CMake configuration&lt;/li&gt;
&lt;li&gt;Type legalization (teaching LLVM that your 8-bit CPU can't natively handle 64-bit integers) is where 60% of the effort lives&lt;/li&gt;
&lt;li&gt;Keeping your fork synchronized with upstream LLVM is ongoing maintenance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For the Z80, the register poverty problem alone was really the bane of the efforts. The &lt;a href="https://baud.rs/EsBekO"&gt;Z80&lt;/a&gt; has seven 8-bit registers, some of which can pair into 16-bit values. LLVM's register allocator expects 16 or 32 general-purpose registers. Every function call, every 16-bit addition, every pointer dereference requires careful choreography of a register file designed when RAM was measured in kilobytes.  If you follow this blog and have read about my efforts to get LLVM and Rust working for the Z80, you will recall that I needed hundreds of gigabytes of RAM on the build server just to allow full expansion of all the 8-bit registers to the 64-bit and 128-bit types in Rust.&lt;/p&gt;
&lt;p&gt;For Sampo, the experience was smoother; a 16-bit RISC with 16 registers is closer to what LLVM expects. But "smoother" is relative. The &lt;a href="https://tinycomputers.io/posts/sampo-llvm-backend-rust-compiler.html"&gt;Sampo LLVM backend&lt;/a&gt; still involved implementing GlobalISel pipelines, debugging opaque errors like "SmallVector capacity overflow," and building Rust's &lt;code&gt;libcore&lt;/code&gt; for a target that had never existed.&lt;/p&gt;
&lt;p&gt;The full LLVM approach gives you the best results. It's also the hardest path by a wide margin.&lt;/p&gt;
&lt;h3&gt;Path 2: Rust → C via Eurydice → Existing C Compiler&lt;/h3&gt;
&lt;p&gt;This is the path that caught my attention. Eurydice takes a fundamentally different approach: instead of teaching LLVM about your hardware, you transpile Rust to readable C and let an existing C compiler handle the target.  This is the path other niche languages, like &lt;a href="https://baud.rs/nimlang"&gt;nim&lt;/a&gt; use to make portable code.&lt;/p&gt;
&lt;h4&gt;What Is Eurydice?&lt;/h4&gt;
&lt;p&gt;Eurydice grew out of the &lt;a href="https://baud.rs/uxrujt"&gt;Aeneas&lt;/a&gt; formal verification project. Its predecessor, KaRaMeL, compiled F* (a dependently typed functional language used for cryptographic proofs) to C. Eurydice adapts this infrastructure for Rust.&lt;/p&gt;
&lt;p&gt;The pipeline has two stages:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Charon&lt;/strong&gt; extracts rustc's Medium-level Intermediate Representation (MIR) and dumps it as a JSON &lt;code&gt;.llbc&lt;/code&gt; file&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Eurydice&lt;/strong&gt; reads the &lt;code&gt;.llbc&lt;/code&gt;, applies roughly 30 &lt;a href="https://baud.rs/uTpA6y"&gt;optimization passes&lt;/a&gt; to lower Rust semantics to C, and emits &lt;code&gt;.c&lt;/code&gt; and &lt;code&gt;.h&lt;/code&gt; files&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The generated C is genuinely readable, not the kind of machine-generated nightmare you'd expect. Rust structs become C structs. Functions keep their names (with module prefixes). Control flow is preserved. The goal is C code a human could maintain, not just C code that compiles.&lt;/p&gt;
&lt;h4&gt;Why This Matters for Retro/Custom Hardware&lt;/h4&gt;
&lt;p&gt;Here's the insight that matters for this audience: many obscure targets already have a C compiler but will never get an LLVM backend. The Z80 has SDCC. The 6502 has cc65. The 68000 has multiple mature C compilers. The Game Boy has GBDK.&lt;/p&gt;
&lt;p&gt;If Eurydice can produce C that these compilers accept, you get Rust on all of these platforms without touching LLVM at all.&lt;/p&gt;
&lt;h4&gt;The Real-World Use Case&lt;/h4&gt;
&lt;p&gt;This isn't just theoretical. Eurydice's flagship use case is post-quantum cryptography. The ML-KEM (Kyber) key encapsulation algorithm was written and verified in Rust via the &lt;a href="https://baud.rs/3RNTiV"&gt;libcrux&lt;/a&gt; library, then transpiled to C via Eurydice for integration into:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Mozilla's NSS (Network Security Services)&lt;/li&gt;
&lt;li&gt;Microsoft's SymCrypt&lt;/li&gt;
&lt;li&gt;Google's BoringSSL&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These organizations need verified cryptographic implementations but can't take a dependency on the Rust toolchain in their C/C++ codebases. Eurydice bridges that gap.&lt;/p&gt;
&lt;h4&gt;Limitations&lt;/h4&gt;
&lt;p&gt;Eurydice is honest about what it can and can't do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;No &lt;code&gt;dyn&lt;/code&gt; traits&lt;/strong&gt;: dynamic dispatch isn't yet supported (vtable generation is planned)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Const generics&lt;/strong&gt; can cause Charon's MIR extraction to fail&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Iterators&lt;/strong&gt; get compiled to while loops with runtime state management, functional but potentially less efficient than hand-written C loops&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monomorphization&lt;/strong&gt; is required for generics, producing separate C functions for each type instantiation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Strict aliasing&lt;/strong&gt;: the generated code's handling of dynamically sized types violates C's strict-aliasing rules, requiring &lt;code&gt;-fno-strict-aliasing&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Panic-free code only&lt;/strong&gt;: Eurydice doesn't replicate Rust's panic semantics for integer overflow or bounds checking&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For retro targets, some of these limitations are actually advantages. &lt;code&gt;no_std&lt;/code&gt; embedded Rust code tends to avoid &lt;code&gt;dyn&lt;/code&gt; traits and complex iterators. The code that runs well on a Z80 (small functions, fixed-size arrays, simple control flow) is exactly the subset Eurydice handles best.&lt;/p&gt;
&lt;h3&gt;Path 3: Manual &lt;code&gt;no_std&lt;/code&gt; with FFI to C&lt;/h3&gt;
&lt;p&gt;The minimal approach. You write your core logic in Rust targeting a supported architecture, then manually bridge to C via FFI for anything target-specific.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#![no_std]&lt;/span&gt;
&lt;span class="cp"&gt;#![no_main]&lt;/span&gt;

&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"C"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80_out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cp"&gt;#[no_mangle]&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"C"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;compute_trajectory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// Pure Rust computation here&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;some_math&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;unsafe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80_out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x03&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You compile the Rust portion for a supported target (like &lt;code&gt;thumbv6m-none-eabi&lt;/code&gt; for ARM Cortex-M0, the smallest Rust target), extract the algorithm logic, and rewrite the hardware interface in C or assembly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What you get:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rust's type safety and ownership model for algorithm development&lt;/li&gt;
&lt;li&gt;No toolchain modifications required&lt;/li&gt;
&lt;li&gt;Works today with stable Rust&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;What it costs:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You're not actually running Rust on your target; you're using Rust as a development language and manually porting&lt;/li&gt;
&lt;li&gt;No automated pipeline; changes to the Rust code require manual re-porting&lt;/li&gt;
&lt;li&gt;You lose Rust's guarantees at the FFI boundary&lt;/li&gt;
&lt;li&gt;Testing requires maintaining parallel implementations&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is really a development methodology, not a compilation strategy. It's useful for prototyping algorithms in Rust before implementing them in C for a constrained target, but it doesn't give you "Rust on Z80" in any meaningful sense.&lt;/p&gt;
&lt;h3&gt;Walkthrough: Rust → C → Z80&lt;/h3&gt;
&lt;p&gt;Let's do something concrete. We'll take a simple Rust program, transpile it to C with Eurydice, and compile the C for the Z80 with SDCC. I tested every step of this on my machine; what follows is real output, not approximations.&lt;/p&gt;
&lt;h4&gt;Prerequisites&lt;/h4&gt;
&lt;p&gt;You'll need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Nix&lt;/strong&gt; (recommended) or OCaml + OPAM for building Eurydice&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SDCC&lt;/strong&gt; for Z80 compilation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rust&lt;/strong&gt; (Eurydice pins its own nightly via Charon)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On macOS:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Install SDCC&lt;/span&gt;
brew&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;sdcc

&lt;span class="c1"&gt;# Install Nix (if you don't have it)&lt;/span&gt;
curl&lt;span class="w"&gt; &lt;/span&gt;-L&lt;span class="w"&gt; &lt;/span&gt;https://nixos.org/nix/install&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;sh
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Nix is the path of least resistance here. Eurydice depends on specific versions of OCaml, Charon, and KaRaMeL, and the Nix flake pins all of them. You &lt;em&gt;can&lt;/em&gt; build everything manually with OPAM, but you'll be chasing version mismatches for an afternoon.&lt;/p&gt;
&lt;h4&gt;Step 1: Write a Rust Program&lt;/h4&gt;
&lt;p&gt;Create a small Rust project. The key constraint: it needs to stay within the subset Eurydice handles well. No &lt;code&gt;dyn&lt;/code&gt; traits, no complex iterators, no standard library I/O.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;cargo&lt;span class="w"&gt; &lt;/span&gt;init&lt;span class="w"&gt; &lt;/span&gt;--name&lt;span class="w"&gt; &lt;/span&gt;z80demo
&lt;span class="nb"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;z80demo
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Replace &lt;code&gt;src/main.rs&lt;/code&gt; with something appropriate for a Z80:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="sd"&gt;/// Simple GCD computation — the kind of algorithm&lt;/span&gt;
&lt;span class="sd"&gt;/// you'd actually want on constrained hardware.&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="sd"&gt;/// Compute LCM using GCD&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;lcm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="sd"&gt;/// A lookup table — common pattern in embedded code&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FIBONACCI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;89&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;fib_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u16&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FIBONACCI&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;FIBONACCI&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;252&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;105&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="fm"&gt;assert_eq!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;lcm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="fm"&gt;assert_eq!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fib_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="fm"&gt;assert_eq!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is deliberately simple: &lt;code&gt;u16&lt;/code&gt; arithmetic (native to the Z80's 16-bit register pairs), no heap allocation, no traits, no closures. It's the kind of code that will transpile cleanly.&lt;/p&gt;
&lt;h4&gt;Step 2: Extract MIR with Charon&lt;/h4&gt;
&lt;p&gt;Charon hooks into the Rust compiler to extract its Medium-level Intermediate Representation (MIR). The critical detail I missed on my first attempt: Eurydice requires Charon to be invoked with &lt;code&gt;--preset=eurydice&lt;/code&gt;. Without it, Eurydice will reject the output with a cryptic error.&lt;/p&gt;
&lt;p&gt;Using Nix, you can run Charon directly without cloning or building anything:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;nix&lt;span class="w"&gt; &lt;/span&gt;--extra-experimental-features&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nix-command flakes"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;run&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'github:aeneasverif/eurydice#charon'&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--&lt;span class="w"&gt; &lt;/span&gt;cargo&lt;span class="w"&gt; &lt;/span&gt;--preset&lt;span class="o"&gt;=&lt;/span&gt;eurydice
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The first run takes a while as Nix fetches and builds Charon's Rust toolchain. Subsequent runs complete in seconds:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;Compiling&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80demo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="mf"&gt;.1.0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;Users&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;alexjokela&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;projects&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;eurydice&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;z80demo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Finished&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n n-Quoted"&gt;`dev`&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;profile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;[&lt;/span&gt;&lt;span class="n"&gt;unoptimized&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;debuginfo&lt;/span&gt;&lt;span class="err"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This produces &lt;code&gt;z80demo.llbc&lt;/code&gt;, a 107KB JSON file containing the type declarations, function bodies, and trait implementations in Charon's intermediate format.&lt;/p&gt;
&lt;p&gt;If Charon fails, the error usually points to an unsupported Rust feature. The fix is almost always to simplify the Rust code: replace iterators with explicit loops, avoid const generics, use concrete types instead of generics where possible.&lt;/p&gt;
&lt;h4&gt;Step 3: Transpile to C with Eurydice&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;nix&lt;span class="w"&gt; &lt;/span&gt;--extra-experimental-features&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nix-command flakes"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;run&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'github:aeneasverif/eurydice'&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--&lt;span class="w"&gt; &lt;/span&gt;z80demo.llbc
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Eurydice processes the LLBC through roughly 30 optimization passes and emits two files:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="mf"&gt;1&lt;/span&gt;&lt;span class="err"&gt;️⃣&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LLBC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;➡️&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;AST&lt;/span&gt;
&lt;span class="mf"&gt;2&lt;/span&gt;&lt;span class="err"&gt;️⃣&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Cleanup&lt;/span&gt;
&lt;span class="mf"&gt;3&lt;/span&gt;&lt;span class="err"&gt;️⃣&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Monomorphization&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;data&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;
&lt;span class="err"&gt;✅&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Here's the actual generated &lt;code&gt;z80demo.c&lt;/code&gt; (comments and headers trimmed for clarity):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;"z80demo.h"&lt;/span&gt;

&lt;span class="k"&gt;const&lt;/span&gt;
&lt;span class="n"&gt;Eurydice_arr_f5&lt;/span&gt;
&lt;span class="n"&gt;z80demo_FIBONACCI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;21U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;34U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;55U&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;89U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_fib_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;uu____0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="mi"&gt;12U&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;uu____0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80demo_FIBONACCI&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;uu____0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0U&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;uu____0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**&lt;/span&gt;
&lt;span class="cm"&gt; Simple GCD computation — the kind of algorithm&lt;/span&gt;
&lt;span class="cm"&gt; you'd actually want on constrained hardware.&lt;/span&gt;
&lt;span class="cm"&gt;*/&lt;/span&gt;
&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0U&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**&lt;/span&gt;
&lt;span class="cm"&gt; Compute LCM using GCD&lt;/span&gt;
&lt;span class="cm"&gt;*/&lt;/span&gt;
&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_lcm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0U&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0U&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;uu____0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;uu____0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;z80demo_gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0U&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And the generated &lt;code&gt;z80demo.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;"eurydice_glue.h"&lt;/span&gt;

&lt;span class="k"&gt;typedef&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Eurydice_arr_f5_s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;12U&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Eurydice_arr_f5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Eurydice_arr_f5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80demo_FIBONACCI&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_fib_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_lcm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A few things to notice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Rust doc comments are preserved&lt;/strong&gt; as C comments. That's a nice touch.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Arrays are wrapped in structs&lt;/strong&gt; (&lt;code&gt;Eurydice_arr_f5&lt;/code&gt;). This gives C arrays value semantics: you can return and assign them, matching Rust's behavior. The tradeoff is that array access goes through &lt;code&gt;.data[n]&lt;/code&gt; instead of &lt;code&gt;[n]&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Arithmetic is widened to &lt;code&gt;uint32_t&lt;/code&gt;&lt;/strong&gt;. Eurydice promotes &lt;code&gt;u16 % u16&lt;/code&gt; to &lt;code&gt;uint32_t&lt;/code&gt; to avoid C's integer promotion pitfalls. On a Z80, this means 32-bit math library calls; SDCC handles this, but it's heavier than necessary. A hand-tuned version would keep the modulo at 16 bits.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;assert_eq!&lt;/code&gt; becomes &lt;code&gt;EURYDICE_ASSERT&lt;/code&gt;&lt;/strong&gt; with a pair struct. The generated &lt;code&gt;main()&lt;/code&gt; (which I'm omitting here) creates &lt;code&gt;const_uint16_t__x2&lt;/code&gt; structs to hold the two comparison operands. It works, but it's verbose compared to a simple &lt;code&gt;==&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Control flow is preserved&lt;/strong&gt;. The &lt;code&gt;if/else&lt;/code&gt; in &lt;code&gt;fib_lookup&lt;/code&gt;, the &lt;code&gt;while&lt;/code&gt; loop in &lt;code&gt;gcd&lt;/code&gt;, they're structurally identical to the Rust original.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Step 4: Adapt for Bare-Metal Z80&lt;/h4&gt;
&lt;p&gt;Here's where things get practical. Eurydice's &lt;code&gt;eurydice_glue.h&lt;/code&gt; includes &lt;code&gt;&amp;lt;stdio.h&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;stdlib.h&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;string.h&amp;gt;&lt;/code&gt;, and KaRaMeL headers, none of which exist on a bare-metal Z80. We need a minimal replacement that provides only what the generated code actually uses.&lt;/p&gt;
&lt;p&gt;Create &lt;code&gt;eurydice_glue.h&lt;/code&gt; in the project directory:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cm"&gt;/*&lt;/span&gt;
&lt;span class="cm"&gt; * Minimal eurydice_glue.h for bare-metal Z80 via SDCC.&lt;/span&gt;
&lt;span class="cm"&gt; * Replaces the full Eurydice glue header with only what z80demo needs.&lt;/span&gt;
&lt;span class="cm"&gt; */&lt;/span&gt;
&lt;span class="cp"&gt;#ifndef EURYDICE_GLUE_H&lt;/span&gt;
&lt;span class="cp"&gt;#define EURYDICE_GLUE_H&lt;/span&gt;

&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;stdint.h&amp;gt;&lt;/span&gt;

&lt;span class="cm"&gt;/* SDCC Z80: size_t is 16-bit */&lt;/span&gt;
&lt;span class="cp"&gt;#ifndef _SIZE_T_DEFINED&lt;/span&gt;
&lt;span class="k"&gt;typedef&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="cp"&gt;#define _SIZE_T_DEFINED&lt;/span&gt;
&lt;span class="cp"&gt;#endif&lt;/span&gt;

&lt;span class="cm"&gt;/* On bare metal, assertions just halt the CPU */&lt;/span&gt;
&lt;span class="cp"&gt;#define EURYDICE_ASSERT(test, msg)  \&lt;/span&gt;
&lt;span class="cp"&gt;  do {                              \&lt;/span&gt;
&lt;span class="cp"&gt;    if (!(test)) {                  \&lt;/span&gt;
&lt;span class="cp"&gt;      __asm                         \&lt;/span&gt;
&lt;span class="cp"&gt;        halt                        \&lt;/span&gt;
&lt;span class="cp"&gt;      __endasm;                     \&lt;/span&gt;
&lt;span class="cp"&gt;    }                               \&lt;/span&gt;
&lt;span class="cp"&gt;  } while (0)&lt;/span&gt;

&lt;span class="cp"&gt;#endif &lt;/span&gt;&lt;span class="cm"&gt;/* EURYDICE_GLUE_H */&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is the key insight for using Eurydice on constrained targets: the glue header is a compatibility layer, not a fundamental dependency. For any specific program, you can replace it with a minimal shim that provides only what that program's generated code actually references.&lt;/p&gt;
&lt;p&gt;Now create &lt;code&gt;z80_main.c&lt;/code&gt;, our bare-metal wrapper with serial I/O:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;stdint.h&amp;gt;&lt;/span&gt;

&lt;span class="cm"&gt;/* Import Eurydice-generated functions */&lt;/span&gt;
&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_lcm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;z80demo_fib_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="cm"&gt;/* Z80 serial output via port 0x01 (e.g., MC6850 ACIA) */&lt;/span&gt;
&lt;span class="n"&gt;__sfr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;__at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x01&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;serial_data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;putchar_z80&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;serial_data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;putchar_z80&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;print_u16&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;putchar_z80&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;'0'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sc"&gt;'0'&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;putchar_z80&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80demo_gcd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;252&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;105&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"GCD(252,105) = "&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print_u16&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\r\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80demo_lcm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"LCM(12,18) = "&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print_u16&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\r\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z80demo_fib_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Fib(10) = "&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print_u16&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\r\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kr"&gt;__asm&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;halt&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;__endasm&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Step 5: Compile with SDCC&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Compile the Eurydice-generated code&lt;/span&gt;
sdcc&lt;span class="w"&gt; &lt;/span&gt;-mz80&lt;span class="w"&gt; &lt;/span&gt;-c&lt;span class="w"&gt; &lt;/span&gt;--std-c11&lt;span class="w"&gt; &lt;/span&gt;-I.&lt;span class="w"&gt; &lt;/span&gt;z80demo.c

&lt;span class="c1"&gt;# Compile our Z80 wrapper&lt;/span&gt;
sdcc&lt;span class="w"&gt; &lt;/span&gt;-mz80&lt;span class="w"&gt; &lt;/span&gt;-c&lt;span class="w"&gt; &lt;/span&gt;--std-c11&lt;span class="w"&gt; &lt;/span&gt;z80_main.c

&lt;span class="c1"&gt;# Link — code at 0x0000, data at 0x8000&lt;/span&gt;
sdcc&lt;span class="w"&gt; &lt;/span&gt;-mz80&lt;span class="w"&gt; &lt;/span&gt;--code-loc&lt;span class="w"&gt; &lt;/span&gt;0x0000&lt;span class="w"&gt; &lt;/span&gt;--data-loc&lt;span class="w"&gt; &lt;/span&gt;0x8000&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;     &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;z80demo.ihx&lt;span class="w"&gt; &lt;/span&gt;z80_main.rel&lt;span class="w"&gt; &lt;/span&gt;z80demo.rel

&lt;span class="c1"&gt;# Convert to raw binary&lt;/span&gt;
makebin&lt;span class="w"&gt; &lt;/span&gt;-s&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;32768&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;z80demo.ihx&lt;span class="w"&gt; &lt;/span&gt;z80demo.bin
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Both compilation steps complete with zero warnings. The linker produces a 32KB ROM image. According to the memory map, the &lt;code&gt;_CODE&lt;/code&gt; segment is 717 bytes: our Rust-originated logic plus I/O wrappers and SDCC's runtime support for 32-bit division.&lt;/p&gt;
&lt;h4&gt;What the Z80 Assembly Looks Like&lt;/h4&gt;
&lt;p&gt;Here's the GCD function as SDCC compiled it, straight from the Eurydice-generated C:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nl"&gt;_z80demo_gcd:&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; while (b != 0U)&lt;/span&gt;
&lt;span class="err"&gt;00101&lt;/span&gt;&lt;span class="nl"&gt;$:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;d&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;or&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;e&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;jr&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;Z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00103&lt;/span&gt;&lt;span class="no"&gt;$&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; b = (uint32_t)a % (uint32_t)t;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;de&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;__modsint&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; a = t;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;jr&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;00101&lt;/span&gt;&lt;span class="no"&gt;$&lt;/span&gt;
&lt;span class="err"&gt;00103&lt;/span&gt;&lt;span class="nl"&gt;$:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; return a;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ex&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;de&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ret&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;SDCC's register-based calling convention (&lt;code&gt;sdcccall 1&lt;/code&gt;) passes the first 16-bit argument in &lt;code&gt;HL&lt;/code&gt; and the second in &lt;code&gt;DE&lt;/code&gt;, returning results in &lt;code&gt;DE&lt;/code&gt;. The GCD loop is tight: test for zero, call the modulo library routine, swap, repeat. The &lt;code&gt;__modsint&lt;/code&gt; call is where the &lt;code&gt;uint32_t&lt;/code&gt; widening lands; SDCC promotes to 32-bit for the modulo, which adds overhead but ensures correctness.&lt;/p&gt;
&lt;p&gt;The Fibonacci lookup is even cleaner:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nl"&gt;_z80demo_fib_lookup:&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; if ((size_t)n &amp;lt; (size_t)12U)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;a&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;#0&lt;/span&gt;&lt;span class="no"&gt;x00&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;c&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;#0&lt;/span&gt;&lt;span class="no"&gt;x0c&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;jr&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;NC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;00102&lt;/span&gt;&lt;span class="no"&gt;$&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; return z80demo_FIBONACCI.data[(size_t)n]&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;de&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;#_z80demo_FIBONACCI+0&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;c&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;b&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;; n * 2 (16-bit entries)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;de&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;; base + offset&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;hl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ret&lt;/span&gt;
&lt;span class="err"&gt;00102&lt;/span&gt;&lt;span class="nl"&gt;$:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; return 0&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ld&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;de&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;#0&lt;/span&gt;&lt;span class="no"&gt;x0000&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;ret&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The bounds check compiles to a single &lt;code&gt;SUB&lt;/code&gt;/&lt;code&gt;JR NC&lt;/code&gt; pair. The array lookup uses &lt;code&gt;ADD HL,HL&lt;/code&gt; to compute the 16-bit element offset, exactly what you'd write by hand.&lt;/p&gt;
&lt;h4&gt;What Just Happened&lt;/h4&gt;
&lt;p&gt;We took Rust source code, ran two commands (Charon, then Eurydice), got readable C, wrote a 25-line glue header, and compiled for the Z80 with SDCC. Total code size: 717 bytes. No LLVM fork. No TableGen. No hours or days of debugging register allocation.&lt;/p&gt;
&lt;p&gt;The entire Eurydice pipeline (from Rust to C) preserves the structure of the original code. The SDCC step is standard Z80 C compilation, unchanged from what you'd do with hand-written C. The main adaptation work is replacing the glue header, which took about five minutes once I understood what the generated code actually referenced.&lt;/p&gt;
&lt;h3&gt;Comparing the Three Paths&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;LLVM Backend&lt;/th&gt;
&lt;th&gt;Eurydice → C&lt;/th&gt;
&lt;th&gt;Manual FFI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rust coverage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full &lt;code&gt;no_std&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Subset (no &lt;code&gt;dyn&lt;/code&gt;, limited generics)&lt;/td&gt;
&lt;td&gt;None (development aid only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code quality&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native optimized&lt;/td&gt;
&lt;td&gt;Depends on C compiler&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintenance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Track LLVM upstream&lt;/td&gt;
&lt;td&gt;Track Eurydice + Charon&lt;/td&gt;
&lt;td&gt;Manual sync&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Automation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full pipeline&lt;/td&gt;
&lt;td&gt;Full pipeline&lt;/td&gt;
&lt;td&gt;Manual porting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LLVM expertise&lt;/td&gt;
&lt;td&gt;Nix or OCaml&lt;/td&gt;
&lt;td&gt;Basic C/Rust&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Target reuse&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All LLVM frontends&lt;/td&gt;
&lt;td&gt;C-only output&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The right choice depends on your timeline and ambitions. If you're building a serious toolchain for a custom CPU (something you'll maintain for years), the LLVM backend is worth the investment. If you need Rust on a platform that already has a C compiler and you're working with a constrained subset of the language, Eurydice is a compelling shortcut.&lt;/p&gt;
&lt;h3&gt;The Elephant in the Room&lt;/h3&gt;
&lt;p&gt;Eurydice works best for small, self-contained programs that avoid complex Rust features. Its primary limitation is Charon, the MIR extractor, which is "routinely foiled by more recent Rust features" according to the &lt;a href="https://baud.rs/LqCiem"&gt;LWN article&lt;/a&gt; that prompted this exploration. Const generics, complex trait bounds, and advanced pattern matching can all cause extraction failures.&lt;/p&gt;
&lt;p&gt;For embedded and retro targets, this might actually be fine. The Rust code you'd write for a Z80 (&lt;code&gt;no_std&lt;/code&gt;, no allocator, fixed-size buffers, simple arithmetic) is exactly the subset that Eurydice handles well. You're not going to &lt;code&gt;impl Iterator&lt;/code&gt; your way through 64KB of address space.&lt;/p&gt;
&lt;p&gt;But if your Rust code is complex enough to genuinely benefit from Rust's type system (generics, trait objects, complex lifetime management), you've probably outgrown what Eurydice can transpile. At that point, you need an &lt;a href="https://baud.rs/Jy0EBX"&gt;LLVM backend&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The Eurydice team is actively working on expanding coverage. Dynamic dispatch via vtables is the next major feature. Broader standard library support is an ambitious goal for 2026. The project is dual-licensed under Apache 2.0 and MIT, and accepts outside contributions.&lt;/p&gt;
&lt;h3&gt;Where This Leaves Us&lt;/h3&gt;
&lt;p&gt;For my own projects, the LLVM backends for Z80 and Sampo remain the right choice; they support the full &lt;code&gt;no_std&lt;/code&gt; Rust language and produce optimized native code. But if someone asked me "how do I get started running Rust on my &lt;a href="https://baud.rs/build-z80-ciarcia"&gt;retro hardware&lt;/a&gt; &lt;em&gt;this weekend&lt;/em&gt;," I'd point them at Eurydice and SDCC. The barrier to entry dropped from "understand GlobalISel" to "install Nix and run two commands."&lt;/p&gt;
&lt;p&gt;That's genuine progress. The path from Rust to weird hardware just got shorter.&lt;/p&gt;</description><category>compilers</category><category>eurydice</category><category>llvm</category><category>retrocomputing</category><category>rust</category><category>sampo</category><category>sdcc</category><category>transpiler</category><category>z80</category><guid>https://tinycomputers.io/posts/three-paths-to-rust-on-custom-hardware.html</guid><pubDate>Fri, 06 Feb 2026 18:00:00 GMT</pubDate></item><item><title>Part 3: Building an LLVM Backend for Sampo - Rust Runs on a Custom 16-bit RISC CPU</title><link>https://tinycomputers.io/posts/sampo-llvm-backend-rust-compiler.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/sampo-llvm-backend-rust-compiler.mp3" type="audio/mpeg"&gt;
Your browser does not support the audio element.
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;14:58 · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;In &lt;a href="https://tinycomputers.io/posts/sampo-16-bit-risc-cpu-part-1.html"&gt;Part 1&lt;/a&gt;, we designed the Sampo 16-bit RISC architecture from scratch. In &lt;a href="https://tinycomputers.io/posts/sampo-fpga-implementation-ulx3s.html"&gt;Part 2&lt;/a&gt;, we brought it to life on an FPGA (sort of). Now, in Part 3, we tackle arguably the most ambitious goal of the project: &lt;strong&gt;making Rust compile for Sampo&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This isn't just about having a working assembler and emulator. It's about integrating a custom CPU architecture into one of the most sophisticated compiler infrastructures in existence (&lt;a href="https://baud.rs/ZLCbHI"&gt;LLVM&lt;/a&gt;) and then building Rust's standard library for a 16-bit target that has never existed before.&lt;/p&gt;
&lt;p&gt;The result? A complete toolchain where you can write:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#![no_std]&lt;/span&gt;
&lt;span class="cp"&gt;#![no_main]&lt;/span&gt;

&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"C"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;putc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cp"&gt;#[no_mangle]&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"C"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;unsafe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;putc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;b'H'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;putc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;b'i'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;putc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;b'!'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;loop&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And it compiles to native Sampo assembly that runs on our emulator:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Sampo Emulator - Loaded 310 bytes
Starting execution at 0x0100

Hi!

CPU halted at 0x0122
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This article documents the journey: the architecture of an LLVM backend, the challenges of targeting a 16-bit architecture with modern compiler infrastructure, and how AI-assisted development with Claude Code made this ambitious project achievable.&lt;/p&gt;
&lt;h3&gt;Why LLVM?&lt;/h3&gt;
&lt;p&gt;Before diving into implementation details, it's worth asking: why LLVM at all? We already have a working assembler (&lt;code&gt;sasm&lt;/code&gt;) written in Rust. Why not just write a simple C compiler that targets that assembler directly?&lt;/p&gt;
&lt;p&gt;The answer is leverage. LLVM is used by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/zotdzv"&gt;Rust&lt;/a&gt;&lt;/strong&gt; (via &lt;code&gt;rustc&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/Zb46XW"&gt;Clang&lt;/a&gt;&lt;/strong&gt; (C/C++/Objective-C)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/GMibYa"&gt;Swift&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/Whdc21"&gt;Julia&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/UlR4Sx"&gt;Zig&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;And dozens of other languages&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By implementing a single LLVM backend, Sampo gains access to &lt;em&gt;all&lt;/em&gt; of these languages. More importantly, we get decades of optimization research (constant folding, dead code elimination, loop unrolling, register allocation) for free. A hand-written C compiler would take years to reach the same quality.&lt;/p&gt;
&lt;p&gt;The tradeoff is complexity. LLVM is a massive codebase (~30 million lines of C++) with steep learning curves. But with modern AI-assisted development tools, that complexity becomes manageable.&lt;/p&gt;
&lt;h3&gt;Prior Art: LLVM on the Z80&lt;/h3&gt;
&lt;p&gt;This isn't our first attempt at bringing LLVM to unconventional hardware. Before Sampo, we tackled an even more constrained target: the &lt;a href="https://tinycomputers.io/posts/rust-on-z80-an-llvm-backend-odyssey.html"&gt;Zilog Z80&lt;/a&gt;, an 8-bit processor from 1976.&lt;/p&gt;
&lt;p&gt;The Z80 project was, in many ways, a proving ground. We learned:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GlobalISel is the right choice for new backends.&lt;/strong&gt; The older SelectionDAG framework is battle-tested but harder to debug. GlobalISel's modular design made iterative development practical.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Type legalization is where 90% of the work lives.&lt;/strong&gt; An 8-bit processor running code written for 64-bit assumptions requires extensive transformation rules.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI-assisted development actually works for compilers.&lt;/strong&gt; The Z80 backend was our first serious test of using Claude Code for systems programming. The collaboration model we developed there (human direction, AI implementation, iterative refinement) carried directly into Sampo.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Z80 experience also revealed the limits of targeting truly minimal hardware. With only 64KB of address space, no hardware multiply, and registers measured in single bytes, many Rust abstractions simply couldn't fit. The &lt;a href="https://tinycomputers.io/posts/rust-on-z80-an-llvm-backend-odyssey.html"&gt;full write-up&lt;/a&gt; documents both the successes and the fundamental constraints we hit.&lt;/p&gt;
&lt;p&gt;Sampo, as a 16-bit architecture with hardware multiply/divide and a cleaner register file, sidesteps many of those limitations. The Z80 taught us &lt;em&gt;how&lt;/em&gt; to build LLVM backends; Sampo let us build one that actually works well.&lt;/p&gt;
&lt;h3&gt;The Role of Claude Code&lt;/h3&gt;
&lt;p&gt;This project would not have been feasible without extensive use of &lt;a href="https://baud.rs/iO989C"&gt;Claude Code&lt;/a&gt;, Anthropic's AI-powered coding assistant. I want to be explicit about this: implementing an LLVM backend is traditionally a multi-month effort requiring deep expertise in compiler internals. With Claude Code, the core implementation was completed in intensive sessions over a few days.&lt;/p&gt;
&lt;p&gt;Here's how Claude Code contributed:&lt;/p&gt;
&lt;h4&gt;1. Scaffolding the Backend Structure&lt;/h4&gt;
&lt;p&gt;LLVM backends follow a specific structure with dozens of interrelated files: &lt;code&gt;SampoTargetMachine.cpp&lt;/code&gt;, &lt;code&gt;SampoInstrInfo.td&lt;/code&gt;, &lt;code&gt;SampoRegisterInfo.td&lt;/code&gt;, &lt;code&gt;SampoCallingConv.td&lt;/code&gt;, and many more. Claude Code generated the initial scaffolding based on patterns from existing backends (RISC-V, MSP430, AVR), then systematically customized each file for Sampo's specific requirements.&lt;/p&gt;
&lt;h4&gt;2. Debugging Cryptic LLVM Errors&lt;/h4&gt;
&lt;p&gt;LLVM's error messages can be... opaque. Messages like "unable to legalize instruction: G_TRUNC s12 = G_TRUNC s32" or "SmallVector capacity overflow" don't immediately point to solutions. Claude Code could analyze stack traces, cross-reference them with LLVM's source code, and identify the root causes, often obscure interactions between type legalization rules.&lt;/p&gt;
&lt;h4&gt;3. Iterative Refinement&lt;/h4&gt;
&lt;p&gt;The development process was highly iterative. We'd attempt to compile a test case, hit an error, fix it, and discover the next issue. Claude Code maintained context across hundreds of these iterations, remembering what had been tried, what worked, and what the current state of each file was.&lt;/p&gt;
&lt;h4&gt;4. Understanding LLVM Internals&lt;/h4&gt;
&lt;p&gt;LLVM has two instruction selection frameworks: SelectionDAG (legacy) and GlobalISel (newer, recommended for new backends). Claude Code explained the tradeoffs, recommended GlobalISel for Sampo, and then implemented the required components: &lt;code&gt;SampoLegalizerInfo&lt;/code&gt;, &lt;code&gt;SampoRegisterBankInfo&lt;/code&gt;, and &lt;code&gt;SampoInstructionSelector&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This isn't to diminish the human element; architectural decisions, design philosophy, and validation all required human judgment. But the mechanical work of writing hundreds of lines of boilerplate C++, TableGen definitions, and CMake configurations was dramatically accelerated.&lt;/p&gt;
&lt;h3&gt;LLVM Backend Architecture&lt;/h3&gt;
&lt;p&gt;An LLVM backend transforms LLVM Intermediate Representation (IR) into target-specific machine code. For Sampo, this involves several stages:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Rust Source Code
       ↓
   rustc frontend
       ↓
    LLVM IR
       ↓
  Instruction Selection (GlobalISel)
       ↓
  Register Allocation
       ↓
  Prologue/Epilogue Insertion
       ↓
  MC Layer (Machine Code)
       ↓
  Sampo Assembly (.s file)
       ↓
  sasm Assembler
       ↓
  Binary (.bin file)
       ↓
  semu Emulator
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's examine the key components we implemented.&lt;/p&gt;
&lt;h4&gt;File Structure&lt;/h4&gt;
&lt;p&gt;A complete LLVM backend requires approximately 25-30 files. Here's the structure for Sampo:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;lib&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;Target&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CMakeLists&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;txt&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;td&lt;/span&gt;&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Top&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TableGen&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoAsmPrinter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Assembly&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;generation&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoCallingConv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;td&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Calling&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;convention&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoFrameLowering&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Stack&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;handling&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoFrameLowering&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoInstrFormats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;td&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Instruction&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoInstrInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Instruction&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;utilities&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoInstrInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoInstrInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;td&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Instruction&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;definitions&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoISelLowering&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;DAG&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;lowering&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minimal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoISelLowering&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoMCInstLower&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MachineInstr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MCInst&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoMCInstLower&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoRegisterInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Register&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;handling&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoRegisterInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoRegisterInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;td&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Register&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;definitions&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoSubtarget&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoSubtarget&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoTargetMachine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Entry&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoTargetMachine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GISel&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoCallLowering&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GlobalISel&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;calls&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoCallLowering&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoInstructionSelector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoLegalizerInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;legalization&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoLegalizerInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoRegisterBankInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoRegisterBankInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MCTargetDesc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoAsmBackend&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Object&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;generation&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoELFObjectWriter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoInstPrinter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Assembly&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;printing&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoMCAsmInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoMCCodeEmitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoMCTargetDesc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;
&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TargetInfo&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoTargetInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="p"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;registration&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Each file has a specific role. The TableGen files (&lt;code&gt;.td&lt;/code&gt;) are processed at build time to generate C++ code for instruction encoding, assembly printing, and more. The &lt;code&gt;GISel/&lt;/code&gt; directory contains GlobalISel-specific components; this is where most of the interesting logic lives.&lt;/p&gt;
&lt;h4&gt;Target Description (TableGen)&lt;/h4&gt;
&lt;p&gt;LLVM uses &lt;a href="https://baud.rs/k04R4l"&gt;TableGen&lt;/a&gt;, a domain-specific language, to describe target architectures declaratively. For Sampo, we defined:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Registers&lt;/strong&gt; (&lt;code&gt;SampoRegisterInfo.td&lt;/code&gt;):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;R0&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoReg&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="s"&gt;"R0"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c c-SingleLine"&gt;// Zero register&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;R1&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoReg&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="s"&gt;"R1"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c c-SingleLine"&gt;// Return address&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;R2&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SampoReg&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="s"&gt;"R2"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c c-SingleLine"&gt;// Stack pointer&lt;/span&gt;
&lt;span class="c c-SingleLine"&gt;// ... R3-R15&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;RegisterClass&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="s"&gt;"Sampo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i16&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sequence&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"R%u"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;)&amp;gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Instructions&lt;/strong&gt; (&lt;code&gt;SampoInstrInfo.td&lt;/code&gt;):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FormatR&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mh"&gt;0x0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;$rd&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ins&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;$rs1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;$rs2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="w"&gt;                  &lt;/span&gt;&lt;span class="s"&gt;"ADD&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s"&gt;$rd, $rs1, $rs2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;                  &lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;$rd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;$rs1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;$rs2&lt;/span&gt;&lt;span class="p"&gt;))]&amp;gt;;&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FormatXNoRs&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mh"&gt;0x8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;$rd&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ins&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;imm16&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;$imm&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="w"&gt;                      &lt;/span&gt;&lt;span class="s"&gt;"LIX&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s"&gt;$rd, $imm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;                      &lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;$rd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;imm16&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;$imm&lt;/span&gt;&lt;span class="p"&gt;)]&amp;gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Calling Convention&lt;/strong&gt; (&lt;code&gt;SampoCallingConv.td&lt;/code&gt;):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CC_Sampo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CallingConv&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;[&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c c-SingleLine"&gt;// First 4 arguments in R4-R7&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;CCIfType&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;[&lt;/span&gt;&lt;span class="n"&gt;i16&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CCAssignToReg&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;[&lt;/span&gt;&lt;span class="n"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;R5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;R6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;R7&lt;/span&gt;&lt;span class="p"&gt;]&amp;gt;&amp;gt;,&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c c-SingleLine"&gt;// Additional arguments on stack&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;CCIfType&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;[&lt;/span&gt;&lt;span class="n"&gt;i16&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CCAssignToStack&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;]&amp;gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;These declarative definitions generate thousands of lines of C++ code automatically.&lt;/p&gt;
&lt;h4&gt;GlobalISel: The Modern Instruction Selector&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://baud.rs/69RpUC"&gt;GlobalISel&lt;/a&gt; is LLVM's newer instruction selection framework, designed to be more modular and easier to target than the legacy SelectionDAG approach. It works in phases:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;IRTranslator&lt;/strong&gt;: Converts LLVM IR to Generic Machine IR (GMIR)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Legalizer&lt;/strong&gt;: Transforms illegal operations into legal ones&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RegBankSelect&lt;/strong&gt;: Assigns operands to register banks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;InstructionSelect&lt;/strong&gt;: Maps GMIR to target instructions&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For a 16-bit architecture like Sampo, the &lt;strong&gt;Legalizer&lt;/strong&gt; is where most complexity lives. LLVM IR freely uses types like &lt;code&gt;i32&lt;/code&gt;, &lt;code&gt;i64&lt;/code&gt;, and even &lt;code&gt;i128&lt;/code&gt;. Sampo's ALU only operates on 16-bit values. The legalizer must transform these:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;// In SampoLegalizerInfo.cpp&lt;/span&gt;
&lt;span class="n"&gt;getActionDefinitionsBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;G_ADD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;legalFor&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="n"&gt;s16&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="c1"&gt;// i16 add is native&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clampScalar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;s16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;s16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Clamp to 16-bit&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;widenScalarToNextPow2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// Widen smaller types&lt;/span&gt;

&lt;span class="n"&gt;getActionDefinitionsBuilder&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="n"&gt;G_SDIV&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;G_UDIV&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;legalFor&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="n"&gt;s16&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;libcallFor&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="n"&gt;s32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;s64&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// Use libcalls for larger types&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clampScalar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;s16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;s64&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This tells LLVM: "16-bit addition is a single instruction. 32-bit addition needs to be broken into multiple 16-bit operations. 64-bit division should call a library function."&lt;/p&gt;
&lt;h4&gt;Debugging the Legalizer: A Case Study&lt;/h4&gt;
&lt;p&gt;One particularly memorable debugging session illustrates the challenges of LLVM development. When first attempting to compile Rust's &lt;code&gt;libcore&lt;/code&gt;, the compiler crashed with:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Assertion failed: (idx &amp;lt; size()), function operator[], file SmallVector.h, line 301
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This cryptic error (a SmallVector bounds overflow deep in LLVM's internals) gave no indication of what was wrong. The stack trace pointed to &lt;code&gt;SampoInstPrinter::printOperand&lt;/code&gt;, which prints assembly operands.&lt;/p&gt;
&lt;p&gt;Working with Claude Code, we traced the issue through multiple layers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The crash occurred when printing a &lt;code&gt;JALR&lt;/code&gt; (indirect call) instruction&lt;/li&gt;
&lt;li&gt;&lt;code&gt;JALR&lt;/code&gt; is defined in TableGen as &lt;code&gt;JALR $rd, $rs1&lt;/code&gt; (two operands)&lt;/li&gt;
&lt;li&gt;Our call lowering code was only providing one operand (the target register)&lt;/li&gt;
&lt;li&gt;The printer tried to access operand index 1, which didn't exist&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The fix was a single line change, adding the return address destination register:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;// Before (broken):&lt;/span&gt;
&lt;span class="n"&gt;MIRBuilder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;buildInstr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;JALR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addReg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Callee&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getReg&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="c1"&gt;// After (fixed):&lt;/span&gt;
&lt;span class="n"&gt;MIRBuilder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;buildInstr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;JALR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addDef&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;R1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Return address destination&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addReg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Callee&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getReg&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This pattern repeated throughout development: an opaque error, careful tracing through LLVM's layers, and ultimately a small fix. Without Claude Code's ability to quickly navigate LLVM's massive codebase and maintain context across debugging sessions, each of these issues could have taken days to resolve.&lt;/p&gt;
&lt;h4&gt;The 16-bit Challenge: Type Legalization&lt;/h4&gt;
&lt;p&gt;The most significant technical challenge was handling non-16-bit types. Consider what happens when Rust code uses a &lt;code&gt;u32&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u32&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x12345678&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u32&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Sampo has no 32-bit registers. LLVM must:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Split the 32-bit value across two 16-bit registers (R4:R5)&lt;/li&gt;
&lt;li&gt;Implement addition with carry propagation&lt;/li&gt;
&lt;li&gt;Track both halves through register allocation&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The legalizer handles this through "narrowing" actions:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;getActionDefinitionsBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;G_ADD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;legalFor&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="n"&gt;s16&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;narrowScalarFor&lt;/span&gt;&lt;span class="p"&gt;({{&lt;/span&gt;&lt;span class="n"&gt;s32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;s16&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Narrow s32 to s16 pairs&lt;/span&gt;
&lt;span class="w"&gt;                     &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LegalityQuery&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                       &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;make_pair&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LLT&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;scalar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;                     &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We also encountered issues with unusual type sizes. LLVM's intermediate stages sometimes create types like &lt;code&gt;s12&lt;/code&gt; or &lt;code&gt;s24&lt;/code&gt; (12-bit and 24-bit integers). These aren't power-of-two sizes, which caused crashes in the type legalization framework:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;LLVM ERROR: unable to legalize instruction: %1:_(s12) = G_TRUNC %0:_(s32)
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The fix required careful specification of widening rules:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;getActionDefinitionsBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;G_TRUNC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;widenScalarIf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LegalityQuery&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Size&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Types&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;getSizeInBits&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;isPowerOf2_32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Size&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Non-power-of-2?&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LegalityQuery&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Size&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Types&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;getSizeInBits&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;NewSize&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PowerOf2Ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Size&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;make_pair&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LLT&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;scalar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NewSize&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;legalIf&lt;/span&gt;&lt;span class="p"&gt;([](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LegalityQuery&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Types&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;getSizeInBits&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;
&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Types&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;getSizeInBits&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This tells LLVM: "If you see a non-power-of-2 type, round it up to the next power of 2 first, then proceed with normal legalization."&lt;/p&gt;
&lt;h4&gt;Multi-Word Arithmetic&lt;/h4&gt;
&lt;p&gt;When Rust code uses 32-bit or 64-bit integers, Sampo must synthesize these operations from 16-bit primitives. Consider a simple 32-bit addition:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u32&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x12340000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;u32&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x00005678&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// 0x12345678&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This compiles to a sequence that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Adds the low 16-bit halves&lt;/li&gt;
&lt;li&gt;Adds the high 16-bit halves with carry propagation&lt;/li&gt;
&lt;li&gt;Manages results across register pairs&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The generated assembly looks like:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;; R4:R5 = first operand (low:high)&lt;/span&gt;
&lt;span class="c1"&gt;; R6:R7 = second operand (low:high)&lt;/span&gt;
&lt;span class="nf"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;R8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;; Add low halves&lt;/span&gt;
&lt;span class="nf"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;R9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="c1"&gt;; Prepare carry&lt;/span&gt;
&lt;span class="c1"&gt;; (carry detection logic)&lt;/span&gt;
&lt;span class="nf"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;R10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R7&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;; Add high halves&lt;/span&gt;
&lt;span class="nf"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;R10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R9&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; Add carry&lt;/span&gt;
&lt;span class="c1"&gt;; Result in R8:R10&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;LLVM's legalizer generates this multi-instruction sequence automatically through "narrowing" rules. We didn't write this expansion manually; we just told LLVM that 32-bit operations should be narrowed to 16-bit pairs.&lt;/p&gt;
&lt;h4&gt;Function Calling Convention&lt;/h4&gt;
&lt;p&gt;Getting function calls right was crucial. Sampo uses:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;R4-R7&lt;/strong&gt;: First four arguments (caller-saved)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;R1&lt;/strong&gt;: Return address&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;R2&lt;/strong&gt;: Stack pointer&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;R8-R11&lt;/strong&gt;: Temporaries (caller-saved)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;R12-R15&lt;/strong&gt;: Saved registers (callee-saved)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;code&gt;SampoCallLowering.cpp&lt;/code&gt; file implements this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;SampoCallLowering::lowerCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MachineIRBuilder&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;MIRBuilder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;                                   &lt;/span&gt;&lt;span class="n"&gt;CallLoweringInfo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Copy arguments to their designated registers&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MCPhysReg&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ArgRegs&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;R5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;                                       &lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;R6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;R7&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OrigArgs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;MIRBuilder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;buildCopy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ArgRegs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OrigArgs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;Regs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// Spill to stack&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Build the call instruction&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Callee&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isReg&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// Indirect call: JALR R1, Rs  (save return addr to R1, jump to Rs)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;MIRBuilder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;buildInstr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;JALR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addDef&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;R1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addReg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Callee&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getReg&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// Direct call: JALX symbol&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;MIRBuilder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;buildInstr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;JALX&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Callee&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Mark caller-saved registers as clobbered&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// ... implicit defs for R4-R11&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;One subtle bug took hours to track down: the &lt;code&gt;JALR&lt;/code&gt; instruction (indirect call) expects two operands: the destination register for the return address (R1) and the source register containing the jump target. Initially, we only provided one operand, causing a crash deep in the assembly printer when it tried to access the non-existent second operand. The error message was simply "SmallVector capacity overflow," not exactly illuminating without context.&lt;/p&gt;
&lt;h4&gt;The Assembly Printer Layer&lt;/h4&gt;
&lt;p&gt;The final stage of code generation converts LLVM's internal machine instructions to textual assembly. This involves two components:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MCInstLower&lt;/strong&gt; converts MachineInstr (high-level) to MCInst (low-level):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;SampoMCInstLower::Lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MachineInstr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;MI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MCInst&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;OutMI&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;OutMI&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setOpcode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MI&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;getOpcode&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MachineOperand&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;MO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MI&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;operands&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;MCOperand&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MCOp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LowerOperand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MO&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MCOp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isValid&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Skip implicit operands&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;OutMI&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;addOperand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MCOp&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;InstPrinter&lt;/strong&gt; converts MCInst to assembly text:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;SampoInstPrinter::printOperand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MCInst&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;MI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;OpNo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;                                    &lt;/span&gt;&lt;span class="n"&gt;raw_ostream&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;O&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MCOperand&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Op&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MI&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;getOperand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OpNo&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Op&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isReg&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;printRegName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;O&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Op&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getReg&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Op&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isImm&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;O&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Op&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getImm&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Op&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isExpr&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;MAI&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;printExpr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;O&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Op&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getExpr&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;TableGen generates most of the printer code automatically from instruction definitions. The pattern &lt;code&gt;"ADD\t$rd, $rs1, $rs2"&lt;/code&gt; in the TableGen file directly produces the assembly format.&lt;/p&gt;
&lt;h3&gt;Building Rust's Standard Library&lt;/h3&gt;
&lt;p&gt;With the LLVM backend working, the next step was teaching Rust about Sampo. This required:&lt;/p&gt;
&lt;h4&gt;1. Adding the Target Triple&lt;/h4&gt;
&lt;p&gt;In Rust's &lt;code&gt;rustc_target&lt;/code&gt; crate, we added &lt;code&gt;sampo-unknown-none&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;// compiler/rustc_target/src/spec/targets/sampo_unknown_none.rs&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;crate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;target&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;data_layout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"e-m:e-p:16:16-i8:8-i16:16-i32:16-n16-S16"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;into&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;llvm_target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"sampo-unknown-none"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;into&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;pointer_width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;arch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Arch&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Sampo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;TargetOptions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;panic_strategy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;PanicStrategy&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Abort&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;atomic_cas&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;max_atomic_width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;c_int_width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="nb"&gt;Default&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;data_layout&lt;/code&gt; string is critical; it tells LLVM that pointers are 16 bits, alignment requirements, and native integer sizes. Getting this wrong causes subtle miscompilations.&lt;/p&gt;
&lt;h4&gt;2. Registering the Target in Rust&lt;/h4&gt;
&lt;p&gt;Rust's build system needs to know about new targets in multiple places:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;// compiler/rustc_target/src/spec/mod.rs&lt;/span&gt;
&lt;span class="n"&gt;supported_targets&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... existing targets ...&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"sampo-unknown-none"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sampo_unknown_none&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// compiler/rustc_span/src/symbol.rs&lt;/span&gt;
&lt;span class="n"&gt;Symbols&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... existing symbols ...&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;sampo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;Arch&lt;/code&gt; enum in &lt;code&gt;rustc_target&lt;/code&gt; also needed a new variant. These changes propagate through Rust's bootstrap system, eventually producing a compiler that recognizes &lt;code&gt;--target sampo-unknown-none&lt;/code&gt;.&lt;/p&gt;
&lt;h4&gt;3. Building Core Libraries&lt;/h4&gt;
&lt;p&gt;Rust's &lt;code&gt;#![no_std]&lt;/code&gt; programs still need &lt;code&gt;libcore&lt;/code&gt; (the dependency-free foundation) and &lt;code&gt;compiler_builtins&lt;/code&gt; (intrinsics for operations the hardware doesn't support natively). Building these required:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Point Rust at our custom LLVM&lt;/span&gt;
&lt;span class="nb"&gt;export&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;LLVM_CONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/path/to/llvm-sampo/build/bin/llvm-config

&lt;span class="c1"&gt;# Build stage 1 compiler&lt;/span&gt;
./x.py&lt;span class="w"&gt; &lt;/span&gt;build&lt;span class="w"&gt; &lt;/span&gt;--stage&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;

&lt;span class="c1"&gt;# Build libraries for Sampo&lt;/span&gt;
./x.py&lt;span class="w"&gt; &lt;/span&gt;build&lt;span class="w"&gt; &lt;/span&gt;--stage&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;library&lt;span class="w"&gt; &lt;/span&gt;--target&lt;span class="w"&gt; &lt;/span&gt;sampo-unknown-none
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This compiles approximately 50,000 lines of Rust into Sampo assembly, a significant stress test of the backend. The resulting libraries:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;libcore&lt;/code&gt;: 1.1 MB (Rust's core library)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;liballoc&lt;/code&gt;: 211 KB (heap allocation)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;libcompiler_builtins&lt;/code&gt;: 2.3 MB (soft-float, 64-bit arithmetic, etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;3. Handling Missing Features&lt;/h4&gt;
&lt;p&gt;A 16-bit CPU without atomic operations or floating-point hardware needs careful configuration:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;atomic_cas: false&lt;/code&gt;: No compare-and-swap&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_atomic_width: Some(0)&lt;/code&gt;: No atomic operations at all&lt;/li&gt;
&lt;li&gt;&lt;code&gt;panic_strategy: PanicStrategy::Abort&lt;/code&gt;: No unwinding&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Rust's type system handles these gracefully. Code that requires atomics simply won't compile for Sampo, with clear error messages.&lt;/p&gt;
&lt;h3&gt;The Complete Pipeline&lt;/h3&gt;
&lt;p&gt;Let's trace through what happens when compiling our "Hi!" program:&lt;/p&gt;
&lt;h4&gt;Stage 1: Rust to LLVM IR&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;putc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;b'H'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Becomes:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;call&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="vg"&gt;@putc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i8&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;zeroext&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;72&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Stage 2: LLVM IR to Generic Machine IR&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c"&gt;%0:gpr = G_CONSTANT i16 72&lt;/span&gt;
$&lt;span class="n"&gt;r4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;COPY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c"&gt;%0&lt;/span&gt;
&lt;span class="n"&gt;JALX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;@putc,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;implicit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;$r4,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;implicit-def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;$r1,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Stage 3: Instruction Selection&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c"&gt;%0:gpr = LIX 72&lt;/span&gt;
$&lt;span class="n"&gt;r4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;COPY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c"&gt;%0&lt;/span&gt;
&lt;span class="n"&gt;JALX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;@putc,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Stage 4: Register Allocation&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;r4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;72&lt;/span&gt;
&lt;span class="n"&gt;JALX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;@putc&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Stage 5: Assembly Output&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nf"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;72&lt;/span&gt;
&lt;span class="nf"&gt;JALX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;putc&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Stage 6: Binary&lt;/h4&gt;
&lt;p&gt;Our &lt;code&gt;sasm&lt;/code&gt; assembler produces the final binary, which runs on &lt;code&gt;semu&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;The Development Process: Iterating with AI&lt;/h3&gt;
&lt;p&gt;Traditional compiler development follows a deliberate pace: study the codebase for weeks, implement a small feature, spend days debugging, repeat. With Claude Code, this cycle compressed dramatically.&lt;/p&gt;
&lt;p&gt;A typical session looked like:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Describe the goal&lt;/strong&gt;: "I need to implement call lowering for indirect function calls"&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Receive implementation&lt;/strong&gt;: Claude Code generates &lt;code&gt;SampoCallLowering.cpp&lt;/code&gt; with appropriate patterns&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Test&lt;/strong&gt;: Compile a test case, observe failure&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Debug together&lt;/strong&gt;: Share the error, get analysis and fixes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Iterate&lt;/strong&gt;: Sometimes 10-20 cycles for a single feature&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The key insight is that Claude Code doesn't just generate code; it explains &lt;em&gt;why&lt;/em&gt; that code is correct (or incorrect). When the call lowering crashed, Claude Code walked through:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How MachineInstrs represent instructions&lt;/li&gt;
&lt;li&gt;The difference between explicit and implicit operands&lt;/li&gt;
&lt;li&gt;Why the TableGen definition expected two operands&lt;/li&gt;
&lt;li&gt;What the MCInstLower layer does with each operand type&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This contextual understanding accelerates learning far beyond copy-paste coding.&lt;/p&gt;
&lt;h4&gt;Code Quality Considerations&lt;/h4&gt;
&lt;p&gt;AI-generated code requires the same scrutiny as human-written code. During this project, we found:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Things Claude Code did well:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Boilerplate that follows established patterns&lt;/li&gt;
&lt;li&gt;TableGen definitions (highly formulaic)&lt;/li&gt;
&lt;li&gt;Explaining LLVM concepts and architecture&lt;/li&gt;
&lt;li&gt;Debugging from error messages and stack traces&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Things requiring human judgment:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Architectural decisions (GlobalISel vs SelectionDAG)&lt;/li&gt;
&lt;li&gt;Performance tradeoffs in instruction selection&lt;/li&gt;
&lt;li&gt;Edge cases in type legalization&lt;/li&gt;
&lt;li&gt;Testing strategy and coverage&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The final codebase reflects this collaboration. Claude Code generated perhaps 80% of the initial code, but human review and iteration refined it into something production-quality.&lt;/p&gt;
&lt;h3&gt;Lessons Learned&lt;/h3&gt;
&lt;h4&gt;1. Start with GlobalISel&lt;/h4&gt;
&lt;p&gt;For new backends, GlobalISel is significantly easier to work with than SelectionDAG. The modular design means you can implement and test each phase independently.&lt;/p&gt;
&lt;h4&gt;2. Type Legalization is the Hard Part&lt;/h4&gt;
&lt;p&gt;For non-standard word sizes (16-bit, 8-bit), most complexity lives in the legalizer. Plan to spend 60%+ of your effort here.&lt;/p&gt;
&lt;h4&gt;3. Test Early and Often&lt;/h4&gt;
&lt;p&gt;We maintained a suite of LLVM IR test files that exercised specific features:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c"&gt;; test_call.ll - Function calling&lt;/span&gt;
&lt;span class="k"&gt;define&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="vg"&gt;@_start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;call&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="vg"&gt;@putc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i8&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;72&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c"&gt;; 'H'&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;ret&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Each bug fix was validated against this suite before proceeding.&lt;/p&gt;
&lt;h4&gt;4. AI-Assisted Development Changes Everything&lt;/h4&gt;
&lt;p&gt;Traditional LLVM backend development requires months of ramp-up time just to understand the codebase. Claude Code's ability to explain concepts, generate boilerplate, and debug issues compressed this dramatically. The key is knowing what questions to ask and validating the outputs.&lt;/p&gt;
&lt;h4&gt;5. LLVM's Abstractions Are Worth It&lt;/h4&gt;
&lt;p&gt;Despite the complexity, LLVM's abstractions pay dividends. Register allocation, instruction scheduling, and numerous optimizations come for free. A hand-written code generator would take years to match this quality.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;With Rust compiling for Sampo, several exciting possibilities open up:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Operating System Development&lt;/strong&gt;: Sampo now has enough tooling to write a simple operating system. A minimal kernel with task switching, memory management, and device drivers becomes feasible. Rust's ownership model could make this a particularly safe OS, even on a minimal 16-bit platform.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Language Ports&lt;/strong&gt;: Since we implemented an LLVM backend (not just Rust support), Clang should work with minimal additional effort. C and C++ for Sampo would enable porting existing retrocomputing software. Imagine &lt;a href="https://baud.rs/3YiduS"&gt;CP/M&lt;/a&gt; utilities or classic games recompiled for modern Sampo hardware.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hardware Verification&lt;/strong&gt;: Running Rust-generated code on the FPGA implementation will provide end-to-end validation of both the hardware and software toolchains. Any discrepancy between the emulator and hardware would become immediately visible.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Educational Materials&lt;/strong&gt;: A complete, working compiler toolchain for a simple CPU is valuable for teaching. Students can trace code from high-level Rust through every compilation stage to final execution. The relative simplicity of a 16-bit architecture makes the concepts accessible.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Performance Optimization&lt;/strong&gt;: The current backend generates correct code, but there's room for improvement. Instruction scheduling, better register allocation hints, and peephole optimizations could improve code density and speed.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Building an LLVM backend for a custom CPU is one of those projects that sounds impossible until you're in the middle of it, then sounds impossible again when you hit your third cryptic linker error at 2 AM. But it's achievable, especially with modern AI-assisted development tools.&lt;/p&gt;
&lt;p&gt;The Sampo project now spans:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Architecture design&lt;/strong&gt;: A clean 16-bit RISC with Z80-inspired features&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hardware implementation&lt;/strong&gt;: Verilog RTL running on an ECP5 FPGA (need to order hardware first!)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Assembler and emulator&lt;/strong&gt;: Written in Rust, fully functional&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LLVM backend&lt;/strong&gt;: Complete GlobalISel-based code generator&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rust support&lt;/strong&gt;: &lt;code&gt;libcore&lt;/code&gt;, &lt;code&gt;liballoc&lt;/code&gt;, and &lt;code&gt;compiler_builtins&lt;/code&gt; for &lt;code&gt;sampo-unknown-none&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From Finnish mythology, the &lt;a href="https://baud.rs/GbaMVL"&gt;Sampo&lt;/a&gt; was a magical mill that produced endless riches. Our Sampo is more modest; it just produces machine code. But there's something magical about typing &lt;code&gt;cargo build --target sampo-unknown-none&lt;/code&gt; and watching a high-level language compile down to instructions for a CPU that didn't exist a few months ago.&lt;/p&gt;
&lt;p&gt;The complete source code is available on GitHub:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/GCQDRa"&gt;llvm-sampo&lt;/a&gt;&lt;/strong&gt; - The LLVM backend and Rust target specification&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/r74wA8"&gt;sampo&lt;/a&gt;&lt;/strong&gt; - CPU architecture, assembler, emulator, and FPGA RTL&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Whether you're interested in compiler development, CPU design, or just want to see how deep the rabbit hole goes, I hope this series has been illuminating.&lt;/p&gt;
&lt;h3&gt;Recommended Books&lt;/h3&gt;
&lt;p&gt;If you're interested in learning more about LLVM, Rust, or computer architecture, these books are excellent resources:&lt;/p&gt;
&lt;h4&gt;LLVM &amp;amp; Compiler Development&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/cpTnhU"&gt;Learn LLVM 17&lt;/a&gt; by Kai Nacke - Comprehensive guide to LLVM internals, including backend development&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/N610Db"&gt;LLVM Techniques, Tips, and Best Practices&lt;/a&gt; by Min-Yih Hsu - Practical patterns for working with LLVM&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/YtLncr"&gt;LLVM Code Generation&lt;/a&gt; - Focused coverage of code generation, instruction selection, and register allocation&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Rust Programming&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/kAPJDa"&gt;&lt;em&gt;The Rust Programming Language, 3rd Edition&lt;/em&gt;&lt;/a&gt; by Steve Klabnik &amp;amp; Carol Nichols - The definitive Rust guide, updated for 2024&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/R1fDfb"&gt;&lt;em&gt;Programming Rust, 2nd Edition&lt;/em&gt;&lt;/a&gt; by Jim Blandy et al. - Deep dive into Rust's systems programming capabilities&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Computer Architecture&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/Y0TnVh"&gt;Computer Architecture: A Quantitative Approach&lt;/a&gt; by Hennessy &amp;amp; Patterson - The classic text on CPU design&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/UUtBki"&gt;Digital Design and Computer Architecture&lt;/a&gt; by Harris &amp;amp; Harris - From gates to processors, excellent for CPU design&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/pbDcC6"&gt;The RISC-V Reader&lt;/a&gt; - Modern RISC architecture principles (many Sampo design decisions were informed by RISC-V)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Source Code&lt;/h3&gt;
&lt;p&gt;All code is available under open source licenses:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/GCQDRa"&gt;github.com/ajokela/llvm-sampo&lt;/a&gt;&lt;/strong&gt; - LLVM backend (Apache 2.0 + LLVM Exception)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://baud.rs/r74wA8"&gt;github.com/ajokela/sampo&lt;/a&gt;&lt;/strong&gt; - Assembler, emulator, FPGA RTL&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;This project wouldn't have been possible without the LLVM community's extensive documentation and the examples provided by existing backends. The &lt;a href="https://baud.rs/s53YsX"&gt;MSP430&lt;/a&gt;, &lt;a href="https://baud.rs/ASLcbC"&gt;AVR&lt;/a&gt;, and &lt;a href="https://baud.rs/1SpI9N"&gt;RISC-V&lt;/a&gt; backends were particularly useful references for handling small word sizes.&lt;/p&gt;
&lt;p&gt;Claude Code, developed by Anthropic, was instrumental in navigating LLVM's complexity. While AI-assisted development is sometimes viewed skeptically, this project demonstrates its potential for tackling genuinely difficult engineering challenges. The key is treating AI as a collaborator rather than a replacement; it accelerates the mechanical aspects while humans provide direction and judgment.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This is Part 3 of the Sampo series. &lt;a href="https://tinycomputers.io/posts/sampo-16-bit-risc-cpu-part-1.html"&gt;Part 1&lt;/a&gt; covers the architecture design, and &lt;a href="https://tinycomputers.io/posts/sampo-fpga-implementation-ulx3s.html"&gt;Part 2&lt;/a&gt; covers the FPGA implementation.&lt;/em&gt;&lt;/p&gt;</description><category>ai-assisted development</category><category>claude code</category><category>code generation</category><category>compiler</category><category>globalisel</category><category>llvm</category><category>retrocomputing</category><category>risc</category><category>rust</category><category>sampo</category><guid>https://tinycomputers.io/posts/sampo-llvm-backend-rust-compiler.html</guid><pubDate>Wed, 04 Feb 2026 16:00:00 GMT</pubDate></item><item><title>Part 2: Implementing Sampo on the ULX3S FPGA</title><link>https://tinycomputers.io/posts/sampo-fpga-implementation-ulx3s.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;p&gt;After designing the &lt;a href="https://tinycomputers.io/posts/sampo-16-bit-risc-cpu-part-1.html"&gt;Sampo RISC architecture&lt;/a&gt; on paper (complete with a working assembler and emulator) it's time to bring it to life in silicon. Or at least, in programmable logic. This post documents the hardware selection and implementation planning for synthesizing Sampo on an FPGA.&lt;/p&gt;
&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/sampo-fpga-implementation-ulx3s_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;7 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;The Story So Far&lt;/h3&gt;
&lt;p&gt;If you haven't read &lt;a href="https://tinycomputers.io/posts/sampo-16-bit-risc-cpu-part-1.html"&gt;Part 1 of this series&lt;/a&gt;, here's the quick version: Sampo is a 16-bit RISC CPU designed to bridge the gap between clean RISC design principles and Z80-friendly features. It has 16 general-purpose registers, ~66 instructions, port-based I/O, block operations (LDIR, LDDR), alternate registers for fast interrupt handling, and hardware multiply/divide.&lt;/p&gt;
&lt;p&gt;The project already includes working tools written in Rust:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;sasm&lt;/strong&gt; - A full assembler&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;semu&lt;/strong&gt; - An emulator with TUI debugger (step, breakpoints, memory inspection)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And for hardware implementation, we now have two complete RTL implementations:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Amaranth HDL&lt;/strong&gt; (&lt;code&gt;/rtl/&lt;/code&gt;):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;cpu.py&lt;/code&gt;, &lt;code&gt;alu.py&lt;/code&gt;, &lt;code&gt;decode.py&lt;/code&gt;, &lt;code&gt;regfile.py&lt;/code&gt;, &lt;code&gt;soc.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Python-based, excellent for rapid iteration&lt;/li&gt;
&lt;li&gt;Generates Verilog for synthesis&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;AI Assisted Hand-written Verilog&lt;/strong&gt; (&lt;code&gt;/verilog/rtl/&lt;/code&gt;):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;cpu.v&lt;/code&gt;, &lt;code&gt;alu.v&lt;/code&gt;, &lt;code&gt;decode.v&lt;/code&gt;, &lt;code&gt;regfile.v&lt;/code&gt;, &lt;code&gt;shifter.v&lt;/code&gt;, &lt;code&gt;uart.v&lt;/code&gt;, &lt;code&gt;ram.v&lt;/code&gt;, &lt;code&gt;soc.v&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Readable, portable, works with any toolchain&lt;/li&gt;
&lt;li&gt;Includes testbenches for Icarus Verilog and Verilator&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now it's time to synthesize it to real hardware.&lt;/p&gt;
&lt;h3&gt;Choosing an FPGA Platform&lt;/h3&gt;
&lt;p&gt;The FPGA world is split between proprietary toolchains (Xilinx Vivado, Intel Quartus) and the growing open source ecosystem. For a project like Sampo, where understanding every layer of the stack matters, open source tooling is the clear choice.&lt;/p&gt;
&lt;h4&gt;Open Source FPGA Options&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;FPGA Family&lt;/th&gt;
&lt;th&gt;Capacity&lt;/th&gt;
&lt;th&gt;Toolchain&lt;/th&gt;
&lt;th&gt;Maturity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gowin GW1N/GW2A&lt;/td&gt;
&lt;td&gt;1K-55K LUTs&lt;/td&gt;
&lt;td&gt;Project Apicula&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lattice iCE40&lt;/td&gt;
&lt;td&gt;1K-8K LUTs&lt;/td&gt;
&lt;td&gt;Project IceStorm&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lattice ECP5&lt;/td&gt;
&lt;td&gt;12K-85K LUTs&lt;/td&gt;
&lt;td&gt;Project Trellis&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Xilinx 7-series&lt;/td&gt;
&lt;td&gt;10K-200K+ LUTs&lt;/td&gt;
&lt;td&gt;Project X-Ray (partial)&lt;/td&gt;
&lt;td&gt;Experimental&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For Sampo, which estimates at &lt;strong&gt;~1,500-2,500 LUTs&lt;/strong&gt; for the basic CPU, even the smaller FPGAs have more than enough capacity. But if we want room to grow (adding caches, more peripherals, maybe even multi-core experiments) a larger device makes sense.&lt;/p&gt;
&lt;h3&gt;The ULX3S Board&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://baud.rs/Ij7oaR"&gt;ULX3S&lt;/a&gt; is an open hardware development board built around the ECP5 FPGA. It's designed by &lt;a href="https://baud.rs/v9aiPd"&gt;Radiona.org&lt;/a&gt; and has become the de facto standard for open source FPGA development.&lt;/p&gt;
&lt;h4&gt;Specifications&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Specification&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;FPGA&lt;/td&gt;
&lt;td&gt;Lattice ECP5 (LFE5U-85F/45F/12F-6BG381C)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LUTs&lt;/td&gt;
&lt;td&gt;12K / 44K / 84K (depending on variant)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;USB&lt;/td&gt;
&lt;td&gt;FTDI FT231XS (500 kbit JTAG, 3 Mbit serial)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPIO&lt;/td&gt;
&lt;td&gt;56 pins (28 differential pairs), PMOD-compatible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAM&lt;/td&gt;
&lt;td&gt;32 MB SDRAM @ 166 MHz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flash&lt;/td&gt;
&lt;td&gt;4-16 MB Quad-SPI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;microSD slot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LEDs&lt;/td&gt;
&lt;td&gt;11 total (8 user, 2 USB, 1 WiFi)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Buttons&lt;/td&gt;
&lt;td&gt;7 (4 direction, 2 fire, 1 power)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audio&lt;/td&gt;
&lt;td&gt;3.5mm jack (stereo + digital/composite)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Video&lt;/td&gt;
&lt;td&gt;GPDI (HDMI-compatible) with level shifter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Display&lt;/td&gt;
&lt;td&gt;Header for 0.96" SPI OLED (SSD1331)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wireless&lt;/td&gt;
&lt;td&gt;ESP32-WROOM-32 (WiFi/Bluetooth, standalone JTAG)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ADC&lt;/td&gt;
&lt;td&gt;8 channels, 12-bit, 1 MS/s (MAX11125)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Clock&lt;/td&gt;
&lt;td&gt;25 MHz onboard, differential input available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Power&lt;/td&gt;
&lt;td&gt;3 switching regulators (1.1V, 2.5V, 3.3V)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sleep&lt;/td&gt;
&lt;td&gt;5 µA standby, RTC wake-up with battery backup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dimensions&lt;/td&gt;
&lt;td&gt;94mm × 51mm&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h4&gt;Why ULX3S for Sampo&lt;/h4&gt;
&lt;p&gt;The ULX3S isn't just an FPGA breakout board; it's a complete system:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;32MB SDRAM&lt;/strong&gt;: Real memory, not just block RAM. Essential for running actual programs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;HDMI output&lt;/strong&gt;: Video terminal without external hardware.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;microSD slot&lt;/strong&gt;: Load programs, implement a filesystem.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ESP32 co-processor&lt;/strong&gt;: WiFi-based JTAG debugging from any device.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Buttons and LEDs&lt;/strong&gt;: Instant I/O for testing without wiring anything.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Audio output&lt;/strong&gt;: Even supports composite video through the audio jack.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Budget Alternative: Tang Nano 9K&lt;/h3&gt;
&lt;p&gt;Before we dive into the ULX3S, it's worth mentioning a much cheaper option. The &lt;strong&gt;Tang Nano 9K&lt;/strong&gt; (~$15 on AliExpress) uses a Gowin GW1NR-9 FPGA with 8,640 LUTs, more than enough for Sampo:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;8,640 LUTs&lt;/li&gt;
&lt;li&gt;64Mbit PSRAM (can serve as the full 64KB address space and then some)&lt;/li&gt;
&lt;li&gt;HDMI output for a video terminal&lt;/li&gt;
&lt;li&gt;USB-C programming&lt;/li&gt;
&lt;li&gt;Fully supported by open-source toolchain (Yosys + nextpnr-gowin)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For initial development and testing, the Tang Nano 9K is hard to beat on price. But the ULX3S offers more I/O, more RAM, and a richer peripheral set, making it the better choice for a more complete Sampo system.&lt;/p&gt;
&lt;h3&gt;LUT Budget Planning&lt;/h3&gt;
&lt;p&gt;The Sampo RTL implementation is designed to be compact. Here's the resource breakdown:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Estimated LUTs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;16 × 16-bit registers&lt;/td&gt;
&lt;td&gt;~256 FFs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ALU (16-bit)&lt;/td&gt;
&lt;td&gt;200 - 400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control logic&lt;/td&gt;
&lt;td&gt;500 - 1,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Instruction decode&lt;/td&gt;
&lt;td&gt;300 - 500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sampo CPU core&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~1,500 - 2,500&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UART (115200 baud)&lt;/td&gt;
&lt;td&gt;200 - 300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SPI controller (SD card)&lt;/td&gt;
&lt;td&gt;300 - 500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPIO controller&lt;/td&gt;
&lt;td&gt;200 - 400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Basic system&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~2,500 - 4,000&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SDRAM controller&lt;/td&gt;
&lt;td&gt;500 - 1,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Instruction cache&lt;/td&gt;
&lt;td&gt;1,000 - 2,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data cache&lt;/td&gt;
&lt;td&gt;1,000 - 2,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Full system&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~6,000 - 10,000&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These estimates are based on typical RISC CPU implementations. The actual numbers will depend on optimization choices and synthesis settings.&lt;/p&gt;
&lt;h4&gt;Variant Recommendations&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;12K LUTs&lt;/strong&gt; (ULX3S-12F): Plenty for basic Sampo + peripherals, tight for caches.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;45K LUTs&lt;/strong&gt; (ULX3S-45F): Comfortable. Full CPU with cache, room for experiments.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;85K LUTs&lt;/strong&gt; (ULX3S-85F): Luxurious. Multi-core experiments, extensive peripherals.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Toolchain Setup&lt;/h3&gt;
&lt;p&gt;The ECP5 toolchain is fully open source:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# macOS (Homebrew)&lt;/span&gt;
brew&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;yosys&lt;span class="w"&gt; &lt;/span&gt;nextpnr-ecp5&lt;span class="w"&gt; &lt;/span&gt;ecpprog&lt;span class="w"&gt; &lt;/span&gt;fujprog

&lt;span class="c1"&gt;# Ubuntu/Debian&lt;/span&gt;
apt&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;yosys&lt;span class="w"&gt; &lt;/span&gt;nextpnr-ecp5&lt;span class="w"&gt; &lt;/span&gt;ecpprog

&lt;span class="c1"&gt;# Amaranth HDL (for our existing RTL)&lt;/span&gt;
pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;amaranth&lt;span class="w"&gt; &lt;/span&gt;amaranth-boards

&lt;span class="c1"&gt;# Or build FPGA tools from source for latest features&lt;/span&gt;
git&lt;span class="w"&gt; &lt;/span&gt;clone&lt;span class="w"&gt; &lt;/span&gt;https://github.com/YosysHQ/yosys
git&lt;span class="w"&gt; &lt;/span&gt;clone&lt;span class="w"&gt; &lt;/span&gt;https://github.com/YosysHQ/nextpnr
git&lt;span class="w"&gt; &lt;/span&gt;clone&lt;span class="w"&gt; &lt;/span&gt;https://github.com/YosysHQ/prjtrellis
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Tool Roles&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amaranth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Python-based HDL (generates Verilog)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yosys&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Verilog synthesis (RTL → netlist)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;nextpnr-ecp5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Place and route (netlist → bitstream)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Project Trellis&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ECP5 bitstream documentation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ecpprog/fujprog&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Upload bitstream to board&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h4&gt;Amaranth Build Flow&lt;/h4&gt;
&lt;p&gt;Since Sampo's RTL is written in Amaranth, the build flow starts with Python:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Generate Verilog from Amaranth&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rtl/
python&lt;span class="w"&gt; &lt;/span&gt;-m&lt;span class="w"&gt; &lt;/span&gt;amaranth&lt;span class="w"&gt; &lt;/span&gt;generate&lt;span class="w"&gt; &lt;/span&gt;soc.py&lt;span class="w"&gt; &lt;/span&gt;&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;sampo.v

&lt;span class="c1"&gt;# Then synthesize with standard tools&lt;/span&gt;
yosys&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"synth_ecp5 -top sampo_soc -json sampo.json"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;sampo.v
nextpnr-ecp5&lt;span class="w"&gt; &lt;/span&gt;--85k&lt;span class="w"&gt; &lt;/span&gt;--package&lt;span class="w"&gt; &lt;/span&gt;CABGA381&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;--lpf&lt;span class="w"&gt; &lt;/span&gt;ulx3s.lpf&lt;span class="w"&gt; &lt;/span&gt;--json&lt;span class="w"&gt; &lt;/span&gt;sampo.json&lt;span class="w"&gt; &lt;/span&gt;--textcfg&lt;span class="w"&gt; &lt;/span&gt;sampo.config
ecppack&lt;span class="w"&gt; &lt;/span&gt;sampo.config&lt;span class="w"&gt; &lt;/span&gt;sampo.bit

&lt;span class="c1"&gt;# Program the board&lt;/span&gt;
fujprog&lt;span class="w"&gt; &lt;/span&gt;sampo.bit
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Hand-Written Verilog Implementation&lt;/h4&gt;
&lt;p&gt;In addition to the Amaranth RTL, we now have a complete ai-assisted hand-written Verilog implementation at &lt;code&gt;/verilog/&lt;/code&gt;. While Amaranth can generate Verilog, the auto-generated output isn't particularly readable. The hand-written version is designed for clarity and portability:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;verilog&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rtl&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sampo_pkg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vh&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;# Opcodes, constants, state definitions&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;alu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="c1"&gt;# 16-bit ALU with all operations&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shifter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;# Barrel shifter (1/4/8-bit shifts, rotates)&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;regfile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;# 16 registers + alternate set (EXX)&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;decode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;# Instruction decoder&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="c1"&gt;# FSM-based CPU core (8 states)&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ram&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="c1"&gt;# 64KB synchronous RAM&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;uart&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;# Simple UART for serial I/O&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;soc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="c1"&gt;# Top-level SoC integration&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tb&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;alu_tb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;# ALU unit tests&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;regfile_tb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;# Register file tests&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sampo_tb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;# Full system testbench&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;programs&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;hello&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hex&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;# Test program in Verilog hex format&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Makefile&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="c1"&gt;# Build automation&lt;/span&gt;
&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bin2hex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;# Convert sasm output to Verilog $readmemh format&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The Verilog implementation uses an 8-state FSM for the CPU: RESET → FETCH → FETCH_EXT → DECODE → EXECUTE → MEMORY → WRITEBACK → HALTED. This makes timing predictable and debugging straightforward.&lt;/p&gt;
&lt;h4&gt;Simulation with Icarus Verilog&lt;/h4&gt;
&lt;p&gt;The Verilog implementation includes a complete Makefile for testing:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nb"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;verilog/

&lt;span class="c1"&gt;# Run the main simulation (hello world)&lt;/span&gt;
make&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;test&lt;/span&gt;

&lt;span class="c1"&gt;# Run ALU unit tests&lt;/span&gt;
make&lt;span class="w"&gt; &lt;/span&gt;test-alu

&lt;span class="c1"&gt;# Run register file tests&lt;/span&gt;
make&lt;span class="w"&gt; &lt;/span&gt;test-regfile

&lt;span class="c1"&gt;# Build with Verilator (faster simulation)&lt;/span&gt;
make&lt;span class="w"&gt; &lt;/span&gt;verilate

&lt;span class="c1"&gt;# View waveforms in GTKWave&lt;/span&gt;
make&lt;span class="w"&gt; &lt;/span&gt;wave
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Sample output from &lt;code&gt;make test&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c"&gt;=== Sampo CPU Testbench ===&lt;/span&gt;
&lt;span class="c"&gt;RAM init file: &lt;/span&gt;&lt;span class="nt"&gt;..&lt;/span&gt;&lt;span class="c"&gt;/programs/hello&lt;/span&gt;&lt;span class="nt"&gt;.&lt;/span&gt;&lt;span class="c"&gt;hex&lt;/span&gt;

&lt;span class="c"&gt;CPU started at PC=0x0100&lt;/span&gt;
&lt;span class="c"&gt;UART output:&lt;/span&gt;
&lt;span class="nb"&gt;----------------------------------------&lt;/span&gt;
&lt;span class="c"&gt;Hello&lt;/span&gt;&lt;span class="nt"&gt;,&lt;/span&gt;&lt;span class="c"&gt; Sampo!&lt;/span&gt;
&lt;span class="nb"&gt;----------------------------------------&lt;/span&gt;

&lt;span class="c"&gt;Simulation complete:&lt;/span&gt;
&lt;span class="c"&gt;  Final PC:    0x011E&lt;/span&gt;
&lt;span class="c"&gt;  Cycles:      847&lt;/span&gt;
&lt;span class="c"&gt;  UART chars:  14&lt;/span&gt;
&lt;span class="c"&gt;  Status:      HALTED&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The Verilog version is portable to any FPGA toolchain (Xilinx, Intel, Lattice, Gowin) without requiring Amaranth or Python in the build chain.&lt;/p&gt;
&lt;h3&gt;Implementation Roadmap&lt;/h3&gt;
&lt;p&gt;With both Amaranth and Verilog implementations complete and tested in simulation, the roadmap is now about bringing them up on hardware.&lt;/p&gt;
&lt;h4&gt;Phase 1: Core Bring-up ✓ (Complete)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;✓ Instruction fetch and decode&lt;/li&gt;
&lt;li&gt;✓ ALU operations (all 16 operations)&lt;/li&gt;
&lt;li&gt;✓ Barrel shifter (1/4/8-bit shifts, rotates, RCL/RCR)&lt;/li&gt;
&lt;li&gt;✓ Register file with alternate set (EXX)&lt;/li&gt;
&lt;li&gt;✓ FSM-based CPU core (8 states)&lt;/li&gt;
&lt;li&gt;✓ RAM interface (64KB)&lt;/li&gt;
&lt;li&gt;✓ UART for serial I/O&lt;/li&gt;
&lt;li&gt;✓ SoC integration&lt;/li&gt;
&lt;li&gt;✓ Testbenches passing (ALU, regfile, full system)&lt;/li&gt;
&lt;li&gt;✓ Hello World runs in simulation&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Phase 1.5: FPGA Bring-up (Current)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;○ ULX3S pin constraints (.lpf file)&lt;/li&gt;
&lt;li&gt;○ Clock setup (PLL from 25MHz)&lt;/li&gt;
&lt;li&gt;○ Map UART to FTDI&lt;/li&gt;
&lt;li&gt;○ LED heartbeat / debug outputs&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Phase 2: Memory System&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;SDRAM controller for 32MB RAM&lt;/li&gt;
&lt;li&gt;Instruction cache (optional but helps timing)&lt;/li&gt;
&lt;li&gt;Basic interrupt handling&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Phase 3: Peripherals&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;SPI controller for SD card boot&lt;/li&gt;
&lt;li&gt;GPIO controller (buttons, LEDs)&lt;/li&gt;
&lt;li&gt;Timer/counter module&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Phase 4: Advanced Features&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Data cache&lt;/li&gt;
&lt;li&gt;MMU for memory protection&lt;/li&gt;
&lt;li&gt;HDMI text console (VGA timing → GPDI)&lt;/li&gt;
&lt;li&gt;ESP32 WiFi integration for wireless debugging&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Recommended Tools &amp;amp; Books&lt;/h3&gt;
&lt;h4&gt;Hardware&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/HBq3zf"&gt;Tang Nano 9K FPGA&lt;/a&gt; - Budget-friendly FPGA board (~$25 on Amazon, ~$15 on AliExpress)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/BYIR58"&gt;USB Logic Analyzer&lt;/a&gt; - Essential for debugging signals (24MHz, 8 channels)&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Books&lt;/h4&gt;
&lt;p&gt;If you're new to Verilog or FPGA development, these are excellent starting points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/RGjpAj"&gt;&lt;em&gt;Getting Started with FPGAs&lt;/em&gt;&lt;/a&gt; by Russell Merrick - Beginner-friendly with Verilog and VHDL examples&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/tEyX95"&gt;&lt;em&gt;Programming FPGAs: Getting Started with Verilog&lt;/em&gt;&lt;/a&gt; by Simon Monk - Practical hands-on guide&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/6qfzvC"&gt;&lt;em&gt;Verilog by Example&lt;/em&gt;&lt;/a&gt; by Blaine Readler - Concise reference for working engineers&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Resources&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/VQxLTd"&gt;Sampo on GitHub&lt;/a&gt; - Full source including assembler, emulator, and RTL&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/JUjA8C"&gt;ULX3S GitHub&lt;/a&gt; - Schematics, examples, documentation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/JLKZBr"&gt;Project Trellis&lt;/a&gt; - ECP5 bitstream documentation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/0QCVAC"&gt;Amaranth HDL&lt;/a&gt; - Python-based hardware description&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/xlX31y"&gt;nextpnr&lt;/a&gt; - Place and route tool&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/LZdP4F"&gt;Yosys&lt;/a&gt; - Verilog synthesis&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Where to Buy&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;ULX3S:&lt;/strong&gt;
- &lt;a href="https://baud.rs/NClAGd"&gt;AliExpress&lt;/a&gt; - ~$100-150 depending on variant
- &lt;a href="https://baud.rs/AQB0Xg"&gt;Mouser&lt;/a&gt; - Official distribution
- &lt;a href="https://baud.rs/0gTuW6"&gt;CrowdSupply&lt;/a&gt; - Original campaign page&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tang Nano 9K (budget alternative):&lt;/strong&gt;
- &lt;a href="https://baud.rs/HBq3zf"&gt;Amazon&lt;/a&gt; - ~$25, faster shipping
- &lt;a href="https://baud.rs/9G7KR0"&gt;AliExpress&lt;/a&gt; - ~$15, slower shipping&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Next up: Getting our first instructions executing on real hardware. Both the Amaranth and Verilog implementations are ready and tested; Hello World runs in simulation and the testbenches pass. Now it's a matter of pin constraints, clock domains, and debugging the inevitable timing issues.&lt;/p&gt;</description><category>amaranth</category><category>cpu design</category><category>ecp5</category><category>fpga</category><category>hardware</category><category>lattice</category><category>open-source</category><category>risc</category><category>sampo</category><category>ulx3s</category><category>verilog</category><guid>https://tinycomputers.io/posts/sampo-fpga-implementation-ulx3s.html</guid><pubDate>Mon, 02 Feb 2026 18:00:00 GMT</pubDate></item><item><title>Sampo: Designing a 16-bit RISC CPU from Scratch - Part 1: Theory and Architecture</title><link>https://tinycomputers.io/posts/sampo-16-bit-risc-cpu-part-1.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;p&gt;In Finnish mythology, the &lt;strong&gt;Sampo&lt;/strong&gt; is a magical artifact from the epic poem &lt;a href="https://baud.rs/ZEIjqv"&gt;&lt;em&gt;Kalevala&lt;/em&gt;&lt;/a&gt;, compiled by Elias Lönnrot in 1835. According to legend, the Sampo was forged by Ilmarinen, a legendary blacksmith and sky god, from a swan's feather, a grain of barley, a ball of wool, a drop of milk, and a shaft of a distaff. The resulting creation took the form of a magical mill that could produce flour, salt, and gold endlessly, bringing riches and good fortune to its holder.&lt;/p&gt;
&lt;p&gt;The exact nature of the Sampo has been debated by scholars since 1818, with over 30 theories proposed, ranging from a world pillar to an astrolabe to a decorated shield. This mystery makes it a fitting namesake for a CPU architecture: something that transforms simple inputs into useful outputs, whose inner workings invite exploration and understanding.&lt;/p&gt;
&lt;p&gt;This is the first part of a two-part series exploring the Sampo CPU architecture. In this article, we'll dive deep into the theory, design philosophy, and architectural decisions that shaped Sampo. In Part 2, we'll get our hands dirty with an actual FPGA implementation using Amaranth HDL, bringing this processor to life on real silicon.&lt;/p&gt;
&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/sampo-16-bit-risc-cpu-part-1_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;16 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;The Problem Space: Why Another CPU?&lt;/h3&gt;
&lt;p&gt;Before diving into Sampo's architecture, it's worth asking: why design a new CPU at all? The retrocomputing community has no shortage of classic processors to explore (the &lt;a href="https://baud.rs/Ch4htI"&gt;Z80&lt;/a&gt;, &lt;a href="https://baud.rs/CrdDFR"&gt;6502&lt;/a&gt;, 68000) and modern RISC architectures like &lt;a href="https://baud.rs/q9i1Th"&gt;RISC-V&lt;/a&gt; offer clean, well-documented designs for educational purposes.&lt;/p&gt;
&lt;p&gt;The answer lies in a specific niche that existing architectures don't quite fill. Consider the typical workloads of classic 8-bit systems: interpreters for languages like BASIC and Forth, operating systems like CP/M, text editors, and simple games. These workloads have distinct characteristics:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Heavy use of memory operations&lt;/strong&gt;: Block copies, string manipulation, memory fills&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Port-based I/O&lt;/strong&gt;: Serial terminals, disk controllers, sound chips accessed via dedicated I/O instructions&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context switching&lt;/strong&gt;: Interrupt handlers that need to save and restore register state quickly&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BCD arithmetic&lt;/strong&gt;: Calculator applications, financial software&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The &lt;a href="https://baud.rs/n39HUo"&gt;Z80 excels&lt;/a&gt; at these tasks through specialized instructions (LDIR, LDDR, IN, OUT) and its alternate register set. But the Z80 is an 8-bit CISC processor with irregular encoding, complex addressing modes, and over 300 instruction variants. This makes it challenging to implement efficiently in modern hardware or to target with optimizing compilers.&lt;/p&gt;
&lt;p&gt;Modern RISC architectures like RISC-V take the opposite approach: clean, orthogonal instruction sets optimized for pipelining and compiler code generation. But they typically use memory-mapped I/O (no dedicated I/O instructions), lack block operations, and provide no alternate register sets for fast context switching.&lt;/p&gt;
&lt;p&gt;Sampo occupies the middle ground, a "Z80 programmer's RISC" that combines the regularity and simplicity of RISC design with the specialized capabilities that made the Z80 so effective for its target workloads.&lt;/p&gt;
&lt;h3&gt;Design Goals&lt;/h3&gt;
&lt;p&gt;Sampo was designed with five primary goals:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;RISC-inspired instruction set&lt;/strong&gt;: Clean, orthogonal design with predictable encoding&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;16-bit native word size&lt;/strong&gt;: Registers, ALU, and memory addressing all 16-bit&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Efficient for interpreters and compilers&lt;/strong&gt;: Stack operations, indirect addressing, hardware multiply/divide&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Simple to implement&lt;/strong&gt;: Suitable for &lt;a href="https://baud.rs/kTjxm9"&gt;FPGA synthesis&lt;/a&gt; or software emulation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Z80-workload compatible&lt;/strong&gt;: Port-based I/O, BCD support, block operations, alternate registers&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These goals create natural tensions. RISC purity would eliminate block operations and port-based I/O. Maximum Z80 compatibility would preserve its irregular encoding. Sampo resolves these tensions by borrowing selectively from multiple architectural traditions.&lt;/p&gt;
&lt;h3&gt;Architectural Lineage&lt;/h3&gt;
&lt;p&gt;Sampo's design draws from four distinct sources, each contributing specific elements:&lt;/p&gt;
&lt;h4&gt;From RISC-V&lt;/h4&gt;
&lt;p&gt;RISC-V's influence is most visible in Sampo's register conventions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Zero register (R0)&lt;/strong&gt;: A register that always reads as zero and ignores writes. This eliminates the need for separate "clear" or "load zero" instructions: &lt;code&gt;ADD R4, R0, R0&lt;/code&gt; clears R4, &lt;code&gt;ADD R4, R5, R0&lt;/code&gt; copies R5 to R4.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Register naming conventions&lt;/strong&gt;: Return address (RA), stack pointer (SP), global pointer (GP), argument registers (A0-A3), temporaries (T0-T3), and saved registers (S0-S3).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Load/store architecture&lt;/strong&gt;: Only load and store instructions access memory; all computation occurs between registers.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;From &lt;a href="https://baud.rs/RGYV4g"&gt;MIPS&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;MIPS contributed Sampo's approach to instruction encoding:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Simple, orthogonal formats&lt;/strong&gt;: A small number of instruction formats (R, I, S, B, J) with consistent field positions&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;4-bit primary opcode&lt;/strong&gt;: Sixteen instruction categories, each with function codes for variants&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PC-relative branching&lt;/strong&gt;: Branch targets specified as signed offsets from the program counter&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;From &lt;a href="https://baud.rs/p4IsMu"&gt;ARM Thumb/Thumb-2&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;ARM's Thumb instruction set inspired Sampo's hybrid encoding strategy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;16-bit base instruction width&lt;/strong&gt;: Most common operations fit in 16 bits for improved code density&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;32-bit extended forms&lt;/strong&gt;: Operations requiring larger immediates use a two-word format&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prefix-based extension&lt;/strong&gt;: The 0xF opcode prefix indicates a 32-bit instruction, simplifying decode&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;From the Z80&lt;/h4&gt;
&lt;p&gt;The Z80 provides Sampo's "personality": the features that make it feel familiar to retrocomputing enthusiasts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Port-based I/O&lt;/strong&gt;: IN and OUT instructions with 8-bit port addresses, separate from the memory address space&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Alternate register set&lt;/strong&gt;: The EXX instruction swaps working registers with shadow copies for fast interrupt handling&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Block operations&lt;/strong&gt;: LDIR, LDDR, FILL, and CPIR for efficient memory manipulation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BCD support&lt;/strong&gt;: The DAA (Decimal Adjust Accumulator) instruction for binary-coded decimal arithmetic&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;64KB address space&lt;/strong&gt;: 16-bit addresses, matching the Z80's memory model&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;The Register File&lt;/h3&gt;
&lt;p&gt;Sampo provides 16 general-purpose 16-bit registers, organized with RISC-V-style conventions:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Register&lt;/th&gt;
&lt;th&gt;Alias&lt;/th&gt;
&lt;th&gt;Convention&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;R0&lt;/td&gt;
&lt;td&gt;ZERO&lt;/td&gt;
&lt;td&gt;Always reads as 0, writes ignored&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;R1&lt;/td&gt;
&lt;td&gt;RA&lt;/td&gt;
&lt;td&gt;Return address (saved by caller)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;R2&lt;/td&gt;
&lt;td&gt;SP&lt;/td&gt;
&lt;td&gt;Stack pointer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;R3&lt;/td&gt;
&lt;td&gt;GP&lt;/td&gt;
&lt;td&gt;Global pointer (optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;R4-R7&lt;/td&gt;
&lt;td&gt;A0-A3&lt;/td&gt;
&lt;td&gt;Arguments / Return values&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;R8-R11&lt;/td&gt;
&lt;td&gt;T0-T3&lt;/td&gt;
&lt;td&gt;Temporaries (caller-saved)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;R12-R15&lt;/td&gt;
&lt;td&gt;S0-S3&lt;/td&gt;
&lt;td&gt;Saved registers (callee-saved)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The zero register deserves special attention. Having a register that always contains zero eliminates entire classes of instructions found in other architectures:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;MOV Rd, Rs&lt;/strong&gt; becomes &lt;code&gt;ADD Rd, Rs, R0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CLR Rd&lt;/strong&gt; becomes &lt;code&gt;ADD Rd, R0, R0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;NEG Rd, Rs&lt;/strong&gt; can use R0 as the implicit minuend&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CMP Rs, #0&lt;/strong&gt; becomes &lt;code&gt;SUB R0, Rs, R0&lt;/code&gt; (result discarded, flags set)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This technique, pioneered by MIPS and refined by RISC-V, dramatically simplifies the instruction set while maintaining expressiveness.&lt;/p&gt;
&lt;h4&gt;Alternate Registers&lt;/h4&gt;
&lt;p&gt;Unlike the Z80, which swaps all main registers with EXX, Sampo is selective. Only registers R4-R11 (the arguments and temporaries) have shadow copies. The critical system registers (R0 (zero), R1 (return address), R2 (stack pointer), R3 (global pointer), and R12-R15 (saved registers)) are never swapped.&lt;/p&gt;
&lt;p&gt;This design decision serves interrupt handling. When an interrupt occurs, the handler can execute EXX to gain a fresh set of working registers without corrupting the interrupted code's arguments or temporaries. The stack pointer remains valid (no need to establish a new stack), and the return address register can be used to save the interrupted PC.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nl"&gt;irq_handler:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;EXX&lt;/span&gt;&lt;span class="w"&gt;                     &lt;/span&gt;&lt;span class="c1"&gt;; Swap to alternate R4-R11&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; ... handle interrupt using R4'-R11' ...&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; Primary registers preserved automatically&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;EXX&lt;/span&gt;&lt;span class="w"&gt;                     &lt;/span&gt;&lt;span class="c1"&gt;; Swap back&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;RETI&lt;/span&gt;&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="c1"&gt;; Return from interrupt&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;The Flags Register&lt;/h3&gt;
&lt;p&gt;Sampo uses an 8-bit flags register with six defined flags:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bit&lt;/th&gt;
&lt;th&gt;Flag&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;td&gt;Negative&lt;/td&gt;
&lt;td&gt;Sign bit of result (bit 15)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Z&lt;/td&gt;
&lt;td&gt;Zero&lt;/td&gt;
&lt;td&gt;Result is zero&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td&gt;Carry&lt;/td&gt;
&lt;td&gt;Unsigned overflow / borrow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;V&lt;/td&gt;
&lt;td&gt;Overflow&lt;/td&gt;
&lt;td&gt;Signed overflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;H&lt;/td&gt;
&lt;td&gt;Half-carry&lt;/td&gt;
&lt;td&gt;Carry from bit 3 to 4 (for BCD)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;I&lt;/td&gt;
&lt;td&gt;Interrupt&lt;/td&gt;
&lt;td&gt;Interrupt enable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The N, Z, C, and V flags follow standard conventions and support the full range of conditional branches. The H (half-carry) flag exists specifically for the DAA instruction, enabling correct BCD arithmetic. The I flag controls interrupt recognition.&lt;/p&gt;
&lt;p&gt;Notably, Sampo provides explicit GETF and SETF instructions to read and write the flags register, unlike many RISC architectures that treat flags as implicit state. This supports context switching and debugging.&lt;/p&gt;
&lt;h3&gt;Memory Model&lt;/h3&gt;
&lt;p&gt;Sampo uses a straightforward memory model:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Address space&lt;/strong&gt;: 64KB (16-bit addresses)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Byte-addressable&lt;/strong&gt;: Individual bytes can be loaded and stored&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Little-endian&lt;/strong&gt;: Multi-byte values stored with LSB at lower address&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Word alignment&lt;/strong&gt;: 16-bit words should be aligned on even addresses (optional enforcement)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A suggested memory map divides the 64KB space:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="mf"&gt;0&lt;/span&gt;&lt;span class="n"&gt;x0000&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0&lt;/span&gt;&lt;span class="n"&gt;x00FF&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nb"&gt;Int&lt;/span&gt;&lt;span class="n"&gt;errupt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;
&lt;span class="mf"&gt;0&lt;/span&gt;&lt;span class="n"&gt;x0100&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0&lt;/span&gt;&lt;span class="n"&gt;x7FFF&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;Program&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="mf"&gt;32&lt;/span&gt;&lt;span class="n"&gt;KB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mf"&gt;0&lt;/span&gt;&lt;span class="n"&gt;x8000&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0&lt;/span&gt;&lt;span class="n"&gt;xFEFF&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;RAM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="mf"&gt;32&lt;/span&gt;&lt;span class="n"&gt;KB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mf"&gt;0&lt;/span&gt;&lt;span class="n"&gt;xFF00&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0&lt;/span&gt;&lt;span class="n"&gt;xFFFF&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;Memory&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;mapped&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;I&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;O&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;256&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This layout provides a clean separation between code, data, and I/O while leaving room for customization. The interrupt vector area at the bottom of memory follows Z80 conventions, with the reset vector at 0x0000 and interrupt vector at 0x0004.&lt;/p&gt;
&lt;h4&gt;Port-Based I/O&lt;/h4&gt;
&lt;p&gt;In addition to memory, Sampo provides a separate 256-port I/O address space accessed via IN and OUT instructions. This design directly mirrors the Z80 and enables straightforward porting of code that interacts with serial ports, disk controllers, sound chips, and other peripherals.&lt;/p&gt;
&lt;p&gt;The I/O instructions come in two forms:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nf"&gt;INI&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x80&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; Read from port 0x80 (immediate port number)&lt;/span&gt;
&lt;span class="nf"&gt;IN&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; Read from port specified in R5&lt;/span&gt;
&lt;span class="nf"&gt;OUTI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x81&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; Write R4 to port 0x81 (immediate)&lt;/span&gt;
&lt;span class="nf"&gt;OUT&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; Write R4 to port specified in R5&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Extended 32-bit forms (INX, OUTX) allow the full 8-bit port range to be specified in immediate form.&lt;/p&gt;
&lt;h3&gt;Instruction Encoding&lt;/h3&gt;
&lt;p&gt;Sampo uses a clean, regular encoding scheme with 16-bit base instructions and 32-bit extended forms. The 4-bit primary opcode in bits 15:12 determines the instruction category:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Opcode&lt;/th&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0x0&lt;/td&gt;
&lt;td&gt;ADD&lt;/td&gt;
&lt;td&gt;Register addition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0x1&lt;/td&gt;
&lt;td&gt;SUB&lt;/td&gt;
&lt;td&gt;Register subtraction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0x2&lt;/td&gt;
&lt;td&gt;AND&lt;/td&gt;
&lt;td&gt;Bitwise AND&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0x3&lt;/td&gt;
&lt;td&gt;OR&lt;/td&gt;
&lt;td&gt;Bitwise OR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0x4&lt;/td&gt;
&lt;td&gt;XOR&lt;/td&gt;
&lt;td&gt;Bitwise XOR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0x5&lt;/td&gt;
&lt;td&gt;ADDI&lt;/td&gt;
&lt;td&gt;Add immediate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0x6&lt;/td&gt;
&lt;td&gt;LOAD&lt;/td&gt;
&lt;td&gt;Load from memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0x7&lt;/td&gt;
&lt;td&gt;STORE&lt;/td&gt;
&lt;td&gt;Store to memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0x8&lt;/td&gt;
&lt;td&gt;BRANCH&lt;/td&gt;
&lt;td&gt;Conditional branch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0x9&lt;/td&gt;
&lt;td&gt;JUMP&lt;/td&gt;
&lt;td&gt;Unconditional jump/call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0xA&lt;/td&gt;
&lt;td&gt;SHIFT&lt;/td&gt;
&lt;td&gt;Shift and rotate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0xB&lt;/td&gt;
&lt;td&gt;MULDIV&lt;/td&gt;
&lt;td&gt;Multiply/divide/BCD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0xC&lt;/td&gt;
&lt;td&gt;MISC&lt;/td&gt;
&lt;td&gt;Stack, block ops, compare&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0xD&lt;/td&gt;
&lt;td&gt;I/O&lt;/td&gt;
&lt;td&gt;Port input/output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0xE&lt;/td&gt;
&lt;td&gt;SYSTEM&lt;/td&gt;
&lt;td&gt;NOP, HALT, interrupts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0xF&lt;/td&gt;
&lt;td&gt;EXTENDED&lt;/td&gt;
&lt;td&gt;32-bit instructions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h4&gt;Instruction Formats&lt;/h4&gt;
&lt;p&gt;Six formats cover all instruction types:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Format R (Register-Register)&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c"&gt;15       12 11     8 7      4 3      0&lt;/span&gt;
&lt;span class="nb"&gt;+----------+--------+--------+--------+&lt;/span&gt;
&lt;span class="c"&gt;|  opcode  |   Rd   |  Rs1   |  Rs2   |&lt;/span&gt;
&lt;span class="nb"&gt;+----------+--------+--------+--------+&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Used for three-register operations like &lt;code&gt;ADD R4, R5, R6&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Format I (Immediate)&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c"&gt;15       12 11     8 7                0&lt;/span&gt;
&lt;span class="nb"&gt;+----------+--------+------------------+&lt;/span&gt;
&lt;span class="c"&gt;|  opcode  |   Rd   |      imm8        |&lt;/span&gt;
&lt;span class="nb"&gt;+----------+--------+------------------+&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Used for operations with 8-bit immediates like &lt;code&gt;ADDI R4, 42&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Format S (Store)&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c"&gt;15       12 11     8 7      4 3      0&lt;/span&gt;
&lt;span class="nb"&gt;+----------+--------+--------+--------+&lt;/span&gt;
&lt;span class="c"&gt;|  opcode  |  imm4  |  Rs1   |  Rs2   |&lt;/span&gt;
&lt;span class="nb"&gt;+----------+--------+--------+--------+&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Used for stores where the destination register field holds an offset.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Format B (Branch)&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c"&gt;15       12 11     8 7                0&lt;/span&gt;
&lt;span class="nb"&gt;+----------+--------+------------------+&lt;/span&gt;
&lt;span class="c"&gt;|  opcode  |  cond  |     offset8      |&lt;/span&gt;
&lt;span class="nb"&gt;+----------+--------+------------------+&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Used for conditional branches with PC-relative offsets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Format J (Jump)&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c"&gt;15       12 11                       0&lt;/span&gt;
&lt;span class="nb"&gt;+----------+--------------------------+&lt;/span&gt;
&lt;span class="c"&gt;|  opcode  |        offset12          |&lt;/span&gt;
&lt;span class="nb"&gt;+----------+--------------------------+&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Used for unconditional jumps with 12-bit PC-relative offsets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Format X (Extended)&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c"&gt;Word 0:&lt;/span&gt;
&lt;span class="c"&gt;15       12 11     8 7      4 3      0&lt;/span&gt;
&lt;span class="nb"&gt;+----------+--------+--------+--------+&lt;/span&gt;
&lt;span class="c"&gt;|   0xF    |   Rd   |  Rs1   |  sub   |&lt;/span&gt;
&lt;span class="nb"&gt;+----------+--------+--------+--------+&lt;/span&gt;

&lt;span class="c"&gt;Word 1:&lt;/span&gt;
&lt;span class="c"&gt;15                                   0&lt;/span&gt;
&lt;span class="nb"&gt;+-------------------------------------+&lt;/span&gt;
&lt;span class="c"&gt;|              imm16                  |&lt;/span&gt;
&lt;span class="nb"&gt;+-------------------------------------+&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Used for operations requiring 16-bit immediates or absolute addresses.&lt;/p&gt;
&lt;h4&gt;Encoding Examples&lt;/h4&gt;
&lt;p&gt;To illustrate the encoding scheme, let's examine several instructions:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ADD R4, R5, R6&lt;/strong&gt; (R4 = R5 + R6):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Opcode = 0x0, Rd = 4, Rs1 = 5, Rs2 = 6
Binary: 0000 0100 0101 0110 = 0x0456
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;ADDI R4, 10&lt;/strong&gt; (R4 = R4 + 10):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Opcode = 0x5, Rd = 4, imm8 = 10
Binary: 0101 0100 0000 1010 = 0x540A
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;BEQ +8&lt;/strong&gt; (branch forward 8 bytes if equal):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Opcode = 0x8, cond = 0 (BEQ), offset = 4 words
Binary: 1000 0000 0000 0100 = 0x8004
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;LIX R4, 0x1234&lt;/strong&gt; (load 16-bit immediate):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Word 0: 0xF (extended), Rd = 4, Rs = 0, sub = 7 (LIX)
Word 1: 0x1234
Binary: 1111 0100 0000 0111 0001 0010 0011 0100 = 0xF407 0x1234
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The regularity of this encoding makes instruction decode straightforward: the first nibble determines the instruction category, and subsequent fields are in consistent positions across formats.&lt;/p&gt;
&lt;h3&gt;The Instruction Set&lt;/h3&gt;
&lt;p&gt;Sampo provides approximately 66 distinct instructions, organized into ten categories.&lt;/p&gt;
&lt;h4&gt;Arithmetic (15 instructions)&lt;/h4&gt;
&lt;p&gt;The arithmetic category includes standard operations (ADD, SUB, ADDI) plus multiply/divide support:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;MUL&lt;/strong&gt;: 16×16 multiplication, low 16 bits of result&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MULH/MULHU&lt;/strong&gt;: High 16 bits of 32-bit product (signed/unsigned)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DIV/DIVU&lt;/strong&gt;: Integer division (signed/unsigned)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;REM/REMU&lt;/strong&gt;: Remainder (signed/unsigned)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DAA&lt;/strong&gt;: Decimal adjust for BCD arithmetic&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;NEG&lt;/strong&gt;: Two's complement negation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CMP&lt;/strong&gt;: Compare (subtract without storing result)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Hardware multiply and divide are essential for interpreter performance; dividing a 32-bit value by 10 for number formatting would be prohibitively slow without hardware support.&lt;/p&gt;
&lt;h4&gt;Logic (6 instructions)&lt;/h4&gt;
&lt;p&gt;Standard bitwise operations: AND, OR, XOR, NOT, plus immediate forms ANDI and ORI.&lt;/p&gt;
&lt;h4&gt;Shift and Rotate (16 variants)&lt;/h4&gt;
&lt;p&gt;Sampo provides an unusually rich set of shift operations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;SLL/SRL/SRA&lt;/strong&gt;: Shift left/right logical/arithmetic&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ROL/ROR&lt;/strong&gt;: Rotate left/right&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RCL/RCR&lt;/strong&gt;: Rotate through carry (17-bit rotation)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SWAP&lt;/strong&gt;: Swap high and low bytes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each shift type comes in three shift amounts: 1, 4, and 8 bits. The 4-bit shift is particularly useful for hexadecimal digit extraction and insertion. Variable shifts use the extended format with the shift amount in the second register or immediate field.&lt;/p&gt;
&lt;h4&gt;Load/Store (6 instructions)&lt;/h4&gt;
&lt;p&gt;Memory access instructions include word and byte loads (with sign or zero extension), word and byte stores, and LUI (Load Upper Immediate) for constructing 16-bit constants:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nf"&gt;LUI&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x12&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;; R4 = 0x1200&lt;/span&gt;
&lt;span class="nf"&gt;ORI&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x34&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;; R4 = 0x1234&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Branch (16 conditions)&lt;/h4&gt;
&lt;p&gt;Sampo supports a comprehensive set of branch conditions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;BEQ/BNE&lt;/strong&gt;: Equal/not equal&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BLT/BGE/BGT/BLE&lt;/strong&gt;: Signed comparisons&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BLTU/BGEU/BHI/BLS&lt;/strong&gt;: Unsigned comparisons&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BMI/BPL&lt;/strong&gt;: Negative/positive&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BVS/BVC&lt;/strong&gt;: Overflow set/clear&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BCS/BCC&lt;/strong&gt;: Carry set/clear&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This covers all reasonable comparison outcomes for both signed and unsigned arithmetic.&lt;/p&gt;
&lt;h4&gt;Jump/Call (4 instructions)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;J&lt;/strong&gt;: PC-relative unconditional jump&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;JAL&lt;/strong&gt;: Jump and link (save return address in RA)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;JR&lt;/strong&gt;: Jump to address in register&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;JALR&lt;/strong&gt;: Jump and link to register address&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Block Operations (6 instructions)&lt;/h4&gt;
&lt;p&gt;The block operations use a fixed register convention (R4=count, R5=source, R6=destination):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;LDI/LDD&lt;/strong&gt;: Load single byte, increment/decrement pointers and count&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LDIR/LDDR&lt;/strong&gt;: Repeat until count reaches zero&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;FILL&lt;/strong&gt;: Fill memory region with value&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CPIR&lt;/strong&gt;: Compare and search forward&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These instructions are decidedly un-RISC; they're multi-cycle operations that modify multiple registers. But they're implemented with predictable behavior (always the same registers, always the same algorithm) and provide enormous speedups for common memory operations.&lt;/p&gt;
&lt;h4&gt;Stack (4 instructions)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;PUSH/POP&lt;/strong&gt;: Single register push/pop&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PUSHM/POPM&lt;/strong&gt;: Push/pop multiple registers (via bitmask)&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;I/O (4 instructions)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;INI/OUTI&lt;/strong&gt;: Immediate port address&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;IN/OUT&lt;/strong&gt;: Register port address&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;System (9 instructions)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;NOP&lt;/strong&gt;: No operation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;HALT&lt;/strong&gt;: Stop processor&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DI/EI&lt;/strong&gt;: Disable/enable interrupts&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;EXX&lt;/strong&gt;: Exchange alternate registers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RETI&lt;/strong&gt;: Return from interrupt&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SWI&lt;/strong&gt;: Software interrupt&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SCF/CCF&lt;/strong&gt;: Set/complement carry flag&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GETF/SETF&lt;/strong&gt;: Read/write flags register&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Comparison with Other Architectures&lt;/h3&gt;
&lt;p&gt;To put Sampo in context, consider how it compares with related processors:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Z80&lt;/th&gt;
&lt;th&gt;MIPS&lt;/th&gt;
&lt;th&gt;RISC-V&lt;/th&gt;
&lt;th&gt;Sampo&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Word size&lt;/td&gt;
&lt;td&gt;8-bit&lt;/td&gt;
&lt;td&gt;32-bit&lt;/td&gt;
&lt;td&gt;32/64-bit&lt;/td&gt;
&lt;td&gt;16-bit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Instruction width&lt;/td&gt;
&lt;td&gt;1-4 bytes&lt;/td&gt;
&lt;td&gt;4 bytes&lt;/td&gt;
&lt;td&gt;2/4 bytes&lt;/td&gt;
&lt;td&gt;2/4 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Registers&lt;/td&gt;
&lt;td&gt;8 + alternates&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;16 + alternates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zero register&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;$zero&lt;/td&gt;
&lt;td&gt;x0&lt;/td&gt;
&lt;td&gt;R0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I/O model&lt;/td&gt;
&lt;td&gt;Port-based&lt;/td&gt;
&lt;td&gt;Memory-mapped&lt;/td&gt;
&lt;td&gt;Memory-mapped&lt;/td&gt;
&lt;td&gt;Port-based&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Block operations&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Instruction count&lt;/td&gt;
&lt;td&gt;~300+&lt;/td&gt;
&lt;td&gt;~60&lt;/td&gt;
&lt;td&gt;~50 base&lt;/td&gt;
&lt;td&gt;~66&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Sampo sits in an interesting position: more regular than the Z80 but with Z80-friendly features, smaller and simpler than 32-bit RISC but still cleanly orthogonal.&lt;/p&gt;
&lt;h3&gt;Code Examples&lt;/h3&gt;
&lt;p&gt;To demonstrate how Sampo assembly looks in practice, here's a "Hello World" program that outputs text via a serial port:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="na"&gt;.org&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x0100&lt;/span&gt;

&lt;span class="na"&gt;.equ&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;ACIA_STATUS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0x80&lt;/span&gt;
&lt;span class="na"&gt;.equ&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;ACIA_DATA&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;0x81&lt;/span&gt;
&lt;span class="na"&gt;.equ&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;TX_READY&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;0x02&lt;/span&gt;

&lt;span class="nl"&gt;start:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;LIX&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;message&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;; Load address of string&lt;/span&gt;

&lt;span class="nl"&gt;loop:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;LBU&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="c1"&gt;; Load byte from string&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;CMP&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R0&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="c1"&gt;; Compare with zero&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;BEQ&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;done&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="c1"&gt;; If null terminator, done&lt;/span&gt;

&lt;span class="nl"&gt;wait_tx:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;INI&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;ACIA_STATUS&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;; Read serial status port&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;ANDI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;TX_READY&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;; Check transmit ready bit&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;BEQ&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;wait_tx&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="c1"&gt;; Wait if not ready&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;OUTI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;ACIA_DATA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;; Write character to data port&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;ADDI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="c1"&gt;; Next character&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;J&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="no"&gt;loop&lt;/span&gt;
&lt;span class="nl"&gt;done:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;HALT&lt;/span&gt;

&lt;span class="nl"&gt;message:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="na"&gt;.asciz&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"Hello, Sampo!\n"&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And here's a Fibonacci function demonstrating the calling convention:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;; fib(n) - compute nth Fibonacci number&lt;/span&gt;
&lt;span class="c1"&gt;; Input: R4 (A0) = n&lt;/span&gt;
&lt;span class="c1"&gt;; Output: R4 (A0) = fib(n)&lt;/span&gt;

&lt;span class="nl"&gt;fib:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;ADDI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;; a = 0&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;ADDI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;; b = 1&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;CMP&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R0&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;BEQ&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;fib_done&lt;/span&gt;

&lt;span class="nl"&gt;fib_loop:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;; temp = a + b&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R0&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;; a = b&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R0&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;; b = temp&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;ADDI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-1&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;; n--&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;BNE&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;fib_loop&lt;/span&gt;

&lt;span class="nl"&gt;fib_done:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="no"&gt;R4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;R0&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;; return a&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nf"&gt;JR&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="no"&gt;RA&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The code reads naturally to anyone familiar with RISC assembly, while the I/O instructions and register conventions provide the Z80-like feel that makes porting classic software straightforward.&lt;/p&gt;
&lt;h3&gt;Looking Ahead: FPGA Implementation&lt;/h3&gt;
&lt;p&gt;With the architecture defined, the next step is implementation. In Part 2 of this series, we'll build a working Sampo processor using Amaranth HDL, a modern Python-based hardware description language. We'll cover:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The ALU module&lt;/strong&gt;: Implementing all arithmetic and logic operations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The register file&lt;/strong&gt;: Including the alternate register set and zero register&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The instruction decoder&lt;/strong&gt;: Parsing the various instruction formats&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The control unit&lt;/strong&gt;: Managing the fetch-decode-execute cycle&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The memory interface&lt;/strong&gt;: Connecting to block RAM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The I/O subsystem&lt;/strong&gt;: Implementing the port-based I/O model&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Integration&lt;/strong&gt;: Putting it all together into a working system-on-chip&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We'll synthesize the design for an affordable &lt;a href="https://baud.rs/OE4qHU"&gt;FPGA board&lt;/a&gt; and run actual Sampo programs, demonstrating that this architecture isn't just a paper exercise but a real, working processor.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://baud.rs/r74wA8"&gt;Sampo project on GitHub&lt;/a&gt; includes a complete &lt;a href="https://baud.rs/gSnSwR"&gt;Rust-based&lt;/a&gt; assembler (sasm) and emulator (semu) with a TUI debugger, so you can start writing and testing Sampo programs today. The FPGA implementation will let you run those same programs on real hardware, completing the journey from mythological artifact to silicon reality.&lt;/p&gt;
&lt;p&gt;Stay tuned for Part 2, where we'll forge our own Sampo, not from swan feathers and barley, but from lookup tables and flip-flops.&lt;/p&gt;</description><category>computer architecture</category><category>cpu design</category><category>retrocomputing</category><category>risc</category><category>sampo</category><category>z80</category><guid>https://tinycomputers.io/posts/sampo-16-bit-risc-cpu-part-1.html</guid><pubDate>Tue, 30 Dec 2025 15:00:00 GMT</pubDate></item></channel></rss>