Rust on Z80: An LLVM Backend Odyssey

A.C. Jokela

2025-12-25

This is the story of attempting something probably inadvisable: compiling Rust for the Zilog Z80, an 8-bit processor from 1976. It's also a story about using AI as a genuine collaborator on deep systems programming work, and what happens when modern software abstractions collide with hardware constraints from an era when 64 kilobytes was considered generous.

Transparency: Claude Code as Collaborator

I want to be upfront about something: significant portions of this compiler backend were developed in collaboration with Claude Code, Anthropic's AI coding assistant. This isn't a case of "AI wrote the code and I took credit"; it's more nuanced than that. Claude served as an unusually patient pair programmer who happens to have read every LLVM tutorial ever written.

Here's what that collaboration actually looked like:

I would describe a problem: "The instruction selector is failing with cannot select: G_SADDO for signed addition with overflow detection." Claude would analyze the GlobalISel pipeline, identify that the Z80's ADC instruction sets the P/V flag for signed overflow, and propose an implementation. I would review, test, discover edge cases, and we'd iterate.

The debugging sessions were particularly valuable. When compilation hung for seven hours on what should have been a two-minute build, Claude helped trace the issue to an accidental infinite recursion: a replace_all refactoring had changed RBI.constrainGenericRegister(...) to constrainOrSetRegClass(...) inside the constrainOrSetRegClass helper function itself. The function was calling itself forever. Finding that bug manually would have taken hours of printf debugging; with Claude analyzing the code structure, we found it in minutes.

This is what AI-assisted development actually looks like in 2025: not magic code generation, but accelerated iteration with a collaborator who never gets frustrated when you ask "wait, explain register allocation to me again."

Why Z80? Why Rust?

The Z80 powered the TRS-80, ZX Spectrum, MSX computers, and countless embedded systems. It's still manufactured today; you can buy new Z80 chips. I actually did just that, I bought a handful of vintage ceramic Z80 chips off of eBay. There's something appealing about running modern language constructs on hardware designed when ABBA topped the charts.

More practically, I've been building Z80-based projects on the RetroShield platform, which lets you run vintage processors on Arduino-compatible hardware. Having a modern compiler toolchain opens possibilities that hand-written assembly doesn't.

But Rust specifically? Rust's ownership model and zero-cost abstractions are theoretically perfect for resource-constrained systems. The language was designed for systems programming. The question is whether "systems" can stretch back 50 years.

Building LLVM for the Z80

The first step was getting LLVM itself to build with Z80 support. This meant:

Adding Z80 to the list of supported targets in the build system
Creating the target description files (registers, instruction formats, calling conventions)
Implementing the GlobalISel pipeline components
Wiring everything together so llc -mtriple=z80-unknown-unknown actually works

The target description files alone span thousands of lines. Here's what defining just the basic registers looks like:

def A : Z80Reg<0, "a">;
def B : Z80Reg<1, "b">;
def C : Z80Reg<2, "c">;
def D : Z80Reg<3, "d">;
def E : Z80Reg<4, "e">;
def H : Z80Reg<5, "h">;
def L : Z80Reg<6, "l">;

// 16-bit register pairs
def BC : Z80RegWithSub<7, "bc", [B, C]>;
def DE : Z80RegWithSub<8, "de", [D, E]>;
def HL : Z80RegWithSub<9, "hl", [H, L]>;

Every instruction needs similar treatment. The Z80 has over 700 documented instruction variants when you count all the addressing modes. Not all are needed for a basic backend, but getting basic arithmetic, loads, stores, branches, and calls working required implementing dozens of instruction patterns.

The build process itself was surprisingly manageable. LLVM's build system is well-designed. A complete build with the Z80 target takes about 20 minutes on modern hardware. The iteration cycle during development was typically: change a few files, rebuild (30 seconds to 2 minutes depending on what changed), test with llc, fix, repeat.

The LLVM Approach

LLVM provides a framework for building compiler backends. You describe your target's registers, instruction set, and calling conventions; LLVM handles optimization, instruction selection, and register allocation. In theory, adding a new target is "just" filling in these descriptions.

In practice, LLVM assumes certain things about targets. It assumes you have a reasonable number of general-purpose registers. It assumes arithmetic operations work on values that fit in registers. It assumes function calls follow conventions that modern ABIs have standardized.

The Z80 violates all of these assumptions.

The Register Poverty Problem

The Z80 has seven 8-bit registers: A, B, C, D, E, H, and L. Some can be paired into 16-bit registers: BC, DE, HL. That's it. Modern architectures have 16 or 32 general-purpose registers; the Z80 has seven that aren't even all general-purpose. A is the accumulator with special arithmetic privileges, HL is the primary memory pointer.

LLVM's register allocator expects to juggle many virtual registers across many physical registers. When you have more virtual registers than physical registers, it spills values to memory. On the Z80, you're spilling constantly. Every 32-bit operation requires careful choreography of the few registers available.

Here's what a simple 16-bit addition looks like in our backend:

define i16 @add16(i16 %a, i16 %b) {
  %result = add i16 %a, %b
  ret i16 %result
}

This compiles to:

add16:
    add hl,de
    ret

That's clean because we designed the calling convention to pass arguments in HL and DE. The backend recognizes that the inputs are already where they need to be and emits just the ADD instruction.

But 32-bit addition? That becomes a multi-instruction sequence juggling values through the stack because we can't hold four 16-bit values in registers simultaneously.

The Width Problem

The Z80 is fundamentally an 8-bit processor with 16-bit addressing. Rust's standard library uses usize for indexing, which on most platforms is 32 or 64 bits. The Z80 cannot directly perform 32-bit arithmetic. Every u32 operation expands into multiple 8-bit or 16-bit operations.

Consider multiplication. The Z80 has no multiply instruction at all. To multiply two 16-bit numbers, we emit a call to a runtime library function (__mulhi3) that implements multiplication through shifts and adds. 32-bit multiplication requires calling a function that orchestrates four 16-bit multiplications with proper carry handling.

Division is worse. Iterative division algorithms on 8-bit hardware are slow. Floating-point arithmetic doesn't exist in hardware; every floating-point operation becomes a library call to software implementations.

GlobalISel: The Modern Approach

We're using LLVM's GlobalISel framework rather than the older SelectionDAG. GlobalISel provides finer control over instruction selection through explicit lowering steps:

IRTranslator: Converts LLVM IR to generic machine instructions (G_ADD, G_LOAD, etc.)
Legalizer: Transforms operations the target can't handle into sequences it can
RegBankSelect: Assigns register banks (8-bit vs 16-bit on Z80)
InstructionSelector: Converts generic instructions to target-specific instructions

Each step presented challenges. The Legalizer needed custom rules to break 32-bit operations into 16-bit pieces. RegBankSelect needed to understand that some Z80 instructions only work with specific register pairs. The InstructionSelector needed patterns for every Z80 instruction variant.

One particularly tricky issue: LLVM's overflow-detecting arithmetic. Instructions like G_SADDO (signed add with overflow) return both a result and an overflow flag. The Z80's ADC instruction sets the P/V flag on signed overflow, but capturing that flag to a register requires careful instruction sequencing; you can't just read the flag register arbitrarily.

The Bug That Cost Seven Hours

During development, we hit a bug that perfectly illustrates the challenges of compiler work. After implementing a helper function to handle register class assignment, compilation started hanging. Not crashing, hanging. A simple three-function test file that should compile in milliseconds ran for over seven hours before I killed it.

The issue? During a refactoring pass, we used a global search-and-replace to change all calls from RBI.constrainGenericRegister(...) to our new constrainOrSetRegClass(...) helper. But the helper function itself contained a call to RBI.constrainGenericRegister() as its fallback case. The replace-all changed that too:

// Before (correct):
bool constrainOrSetRegClass(Register Reg, ...) {
  if (!MRI.getRegClassOrNull(Reg)) {
    MRI.setRegClass(Reg, &RC);
    return true;
  }
  return RBI.constrainGenericRegister(Reg, RC, MRI);  // Fallback
}

// After (infinite recursion):
bool constrainOrSetRegClass(Register Reg, ...) {
  if (!MRI.getRegClassOrNull(Reg)) {
    MRI.setRegClass(Reg, &RC);
    return true;
  }
  return constrainOrSetRegClass(Reg, RC, MRI);  // Calls itself forever!
}

The function was calling itself instead of the underlying LLVM function. Every attempt to compile anything would recurse until the stack overflowed or the heat death of the universe, whichever came first.

This is the kind of bug that's obvious in hindsight but insidious during development. There were no compiler errors, no warnings, no crashes with helpful stack traces. Just silence as the process spun forever.

Finding it required adding debug output at each step of the instruction selector, rebuilding, and watching where the output stopped. Claude helped immensely here, recognizing the pattern of "output stops here" and immediately checking what that code path did.

The Calling Convention

We designed a Z80-specific calling convention optimized for the hardware's constraints:

First 16-bit argument: HL register pair
Second 16-bit argument: DE register pair
Return value: HL register pair
Additional arguments: Stack
Caller-saved: All registers (callee can clobber anything)
Callee-saved: None

This convention minimizes register shuffling for simple functions. A function taking two 16-bit values and returning one doesn't need any register setup at all; the arguments arrive exactly where the ADD instruction expects them.

For 8-bit arguments, values arrive in the low byte of HL (L register) or DE (E register). This wastes the high byte but simplifies the calling convention.

This is radically different from typical calling conventions. Modern ABIs specify precise preservation rules, stack alignment requirements, and argument passing in specific registers. On the Z80, with so few registers, we had to make pragmatic choices. Every function saves and restores what it needs; there's no concept of "preserved across calls."

A Working Example

Here's LLVM IR that our backend compiles successfully:

target datalayout = "e-m:e-p:16:8-i16:8-i32:8-i64:8-n8:16"
target triple = "z80-unknown-unknown"

define i16 @add16(i16 %a, i16 %b) {
  %result = add i16 %a, %b
  ret i16 %result
}

define i16 @sub16(i16 %a, i16 %b) {
  %result = sub i16 %a, %b
  ret i16 %result
}

define i8 @add8(i8 %a, i8 %b) {
  %result = add i8 %a, %b
  ret i8 %result
}

Compiled output:

    .text
    .globl  add16
add16:
    add hl,de
    ret

    .globl  sub16
sub16:
    and a           ; clear carry
    sbc hl,de
    ret

    .globl  add8
add8:
    ld  c,l
    ld  b,c
    add a,b
    ret

The 16-bit operations are efficient. The 8-bit addition shows the register shuffling required when values aren't in the accumulator, so we have to move values through available registers to get them where the ADD instruction expects.

Compilation time for these three functions: 0.01 seconds. The backend works.

Where We Are Now

The backend compiles simple LLVM IR to working Z80 assembly. Integer arithmetic, control flow, function calls, memory access: the fundamentals work. We've implemented handlers for dozens of generic machine instructions and their various edge cases.

Attempting to compile Rust's core library has been... educational. The core library is massive. It includes:

All the formatting infrastructure (Display, Debug, write! macros)
Iterator implementations and adaptors
Option, Result, and their many combinator methods
Slice operations, sorting algorithms
Panic handling infrastructure
Unicode handling

Each of these generates significant code. The formatting system alone probably exceeds the entire memory capacity of a typical Z80 system.

Current status: compilation of core starts, processes thousands of functions, but eventually hits edge cases we haven't handled yet. The most recent error involves register class assignment in the floating-point decimal formatting code, ironic since the Z80 has no floating-point hardware.

Connecting Rust to the Z80 Backend

Getting Rust to use our LLVM backend required modifying the Rust compiler itself. This involved:

Adding a target specification: Defining z80-unknown-none-elf in Rust's target database with the appropriate data layout, pointer width, and feature flags.
Pointing Rust at our LLVM: Rust can use an external LLVM rather than its bundled version. We configured the build to use our Z80-enabled LLVM.
Disabling C compiler-builtins: Rust's standard library includes some C code from compiler-rt for low-level operations. There's no Z80 C compiler readily available, so we had to disable these and rely on pure Rust implementations.
Setting panic=abort: The Z80 can't reasonably support stack unwinding for panic handling.

The Rust target specification looks like this:

Target {
    arch: Arch::Z80,
    data_layout: "e-m:e-p:16:8-i16:8-i32:8-i64:8-n8:16".into(),
    llvm_target: "z80-unknown-unknown".into(),
    pointer_width: 16,
    options: TargetOptions {
        c_int_width: 16,
        panic_strategy: PanicStrategy::Abort,
        max_atomic_width: Some(0),  // No atomics
        atomic_cas: false,
        singlethread: true,
        no_builtins: true,  // No C runtime
        ..TargetOptions::default()
    },
}

The pointer_width: 16 is crucial: this is a 16-bit architecture. The max_atomic_width: Some(0) tells Rust that atomic operations aren't available at all, since the Z80 has no atomic instructions.

When Rust tries to compile core, it invokes rustc, which invokes LLVM, which invokes our Z80 backend. Each function in core goes through this pipeline. The sheer volume is staggering; core contains thousands of generic functions that get monomorphized for every type they're used with.

The Honest Assessment

Will Rust's standard library ever practically run on a Z80? Almost certainly not. The core library alone, compiled for Z80, would likely exceed a megabyte, far beyond the 64KB address space. Even if you could page-swap the code, the runtime overhead of software floating-point, 32-bit arithmetic emulation, and iterator abstractions would make execution glacially slow.

What might actually work:

#![no_std] #![no_core] programs: Bare-metal Rust with a tiny custom runtime, no standard library, hand-optimized for the hardware. A few kilobytes of carefully written Rust that compiles to tight Z80 assembly.
Code generation experiments: Using the LLVM backend to study how modern language constructs map to constrained hardware, even if the results aren't practical to run.
Educational purposes: Understanding compiler internals by working with hardware simple enough to reason about completely.

The value isn't in running production Rust on Z80s. It's in the journey: understanding LLVM's internals, grappling with register allocation on a machine that predates the concept (and myself albeit by only a few years), and seeing how far modern tooling can stretch.

Conclusion

Compiling Rust for the Z80 is somewhere between ambitious and absurd. The hardware constraints are genuinely incompatible with modern language expectations. But the attempt has been valuable: understanding LLVM deeply, exploring what "resource-constrained" really means, and discovering that AI collaboration can work effectively on low-level systems programming.

The Z80 was designed for a world where programmers counted bytes. Rust was designed for a world where programmers trust the compiler to manage complexity. Making them meet is an exercise in translation across decades of computing evolution.