Part 3: Building an LLVM Backend for Sampo - Rust Runs on a Custom 16-bit RISC CPU

A.C. Jokela

2026-02-04

In Part 1, we designed the Sampo 16-bit RISC architecture from scratch. In Part 2, we brought it to life on an FPGA (sort of). Now, in Part 3, we tackle arguably the most ambitious goal of the project: making Rust compile for Sampo.

This isn't just about having a working assembler and emulator. It's about integrating a custom CPU architecture into one of the most sophisticated compiler infrastructures in existence (LLVM) and then building Rust's standard library for a 16-bit target that has never existed before.

The result? A complete toolchain where you can write:

#![no_std]
#![no_main]

extern "C" { fn putc(c: u8); }

#[no_mangle]
pub extern "C" fn _start() {
    unsafe {
        putc(b'H');
        putc(b'i');
        putc(b'!');
    }
    loop {}
}

And it compiles to native Sampo assembly that runs on our emulator:

Sampo Emulator - Loaded 310 bytes
Starting execution at 0x0100

Hi!

CPU halted at 0x0122

This article documents the journey: the architecture of an LLVM backend, the challenges of targeting a 16-bit architecture with modern compiler infrastructure, and how AI-assisted development with Claude Code made this ambitious project achievable.

Why LLVM?

Before diving into implementation details, it's worth asking: why LLVM at all? We already have a working assembler (sasm) written in Rust. Why not just write a simple C compiler that targets that assembler directly?

The answer is leverage. LLVM is used by:

Rust (via rustc)
Clang (C/C++/Objective-C)
Swift
Julia
Zig
And dozens of other languages

By implementing a single LLVM backend, Sampo gains access to all of these languages. More importantly, we get decades of optimization research (constant folding, dead code elimination, loop unrolling, register allocation) for free. A hand-written C compiler would take years to reach the same quality.

The tradeoff is complexity. LLVM is a massive codebase (~30 million lines of C++) with steep learning curves. But with modern AI-assisted development tools, that complexity becomes manageable.

Prior Art: LLVM on the Z80

This isn't our first attempt at bringing LLVM to unconventional hardware. Before Sampo, we tackled an even more constrained target: the Zilog Z80, an 8-bit processor from 1976.

The Z80 project was, in many ways, a proving ground. We learned:

GlobalISel is the right choice for new backends. The older SelectionDAG framework is battle-tested but harder to debug. GlobalISel's modular design made iterative development practical.
Type legalization is where 90% of the work lives. An 8-bit processor running code written for 64-bit assumptions requires extensive transformation rules.
AI-assisted development actually works for compilers. The Z80 backend was our first serious test of using Claude Code for systems programming. The collaboration model we developed there (human direction, AI implementation, iterative refinement) carried directly into Sampo.

The Z80 experience also revealed the limits of targeting truly minimal hardware. With only 64KB of address space, no hardware multiply, and registers measured in single bytes, many Rust abstractions simply couldn't fit. The full write-up documents both the successes and the fundamental constraints we hit.

Sampo, as a 16-bit architecture with hardware multiply/divide and a cleaner register file, sidesteps many of those limitations. The Z80 taught us how to build LLVM backends; Sampo let us build one that actually works well.

The Role of Claude Code

This project would not have been feasible without extensive use of Claude Code, Anthropic's AI-powered coding assistant. I want to be explicit about this: implementing an LLVM backend is traditionally a multi-month effort requiring deep expertise in compiler internals. With Claude Code, the core implementation was completed in intensive sessions over a few days.

Here's how Claude Code contributed:

1. Scaffolding the Backend Structure

LLVM backends follow a specific structure with dozens of interrelated files: SampoTargetMachine.cpp, SampoInstrInfo.td, SampoRegisterInfo.td, SampoCallingConv.td, and many more. Claude Code generated the initial scaffolding based on patterns from existing backends (RISC-V, MSP430, AVR), then systematically customized each file for Sampo's specific requirements.

2. Debugging Cryptic LLVM Errors

LLVM's error messages can be... opaque. Messages like "unable to legalize instruction: G_TRUNC s12 = G_TRUNC s32" or "SmallVector capacity overflow" don't immediately point to solutions. Claude Code could analyze stack traces, cross-reference them with LLVM's source code, and identify the root causes, often obscure interactions between type legalization rules.

3. Iterative Refinement

The development process was highly iterative. We'd attempt to compile a test case, hit an error, fix it, and discover the next issue. Claude Code maintained context across hundreds of these iterations, remembering what had been tried, what worked, and what the current state of each file was.

4. Understanding LLVM Internals

LLVM has two instruction selection frameworks: SelectionDAG (legacy) and GlobalISel (newer, recommended for new backends). Claude Code explained the tradeoffs, recommended GlobalISel for Sampo, and then implemented the required components: SampoLegalizerInfo, SampoRegisterBankInfo, and SampoInstructionSelector.

This isn't to diminish the human element; architectural decisions, design philosophy, and validation all required human judgment. But the mechanical work of writing hundreds of lines of boilerplate C++, TableGen definitions, and CMake configurations was dramatically accelerated.

LLVM Backend Architecture

An LLVM backend transforms LLVM Intermediate Representation (IR) into target-specific machine code. For Sampo, this involves several stages:

Rust Source Code
       ↓
   rustc frontend
       ↓
    LLVM IR
       ↓
  Instruction Selection (GlobalISel)
       ↓
  Register Allocation
       ↓
  Prologue/Epilogue Insertion
       ↓
  MC Layer (Machine Code)
       ↓
  Sampo Assembly (.s file)
       ↓
  sasm Assembler
       ↓
  Binary (.bin file)
       ↓
  semu Emulator

Let's examine the key components we implemented.

File Structure

A complete LLVM backend requires approximately 25-30 files. Here's the structure for Sampo:

llvm/lib/Target/Sampo/
├── CMakeLists.txt
├── Sampo.h
├── Sampo.td                    # Top-level TableGen
├── SampoAsmPrinter.cpp         # Assembly generation
├── SampoCallingConv.td         # Calling convention
├── SampoFrameLowering.cpp      # Stack frame handling
├── SampoFrameLowering.h
├── SampoInstrFormats.td        # Instruction encoding
├── SampoInstrInfo.cpp          # Instruction utilities
├── SampoInstrInfo.h
├── SampoInstrInfo.td           # Instruction definitions
├── SampoISelLowering.cpp       # DAG lowering (minimal)
├── SampoISelLowering.h
├── SampoMCInstLower.cpp        # MachineInstr → MCInst
├── SampoMCInstLower.h
├── SampoRegisterInfo.cpp       # Register handling
├── SampoRegisterInfo.h
├── SampoRegisterInfo.td        # Register definitions
├── SampoSubtarget.cpp          # Target features
├── SampoSubtarget.h
├── SampoTargetMachine.cpp      # Entry point
├── SampoTargetMachine.h
├── GISel/
│   ├── SampoCallLowering.cpp   # GlobalISel calls
│   ├── SampoCallLowering.h
│   ├── SampoInstructionSelector.cpp
│   ├── SampoLegalizerInfo.cpp  # Type legalization
│   ├── SampoLegalizerInfo.h
│   ├── SampoRegisterBankInfo.cpp
│   └── SampoRegisterBankInfo.h
├── MCTargetDesc/
│   ├── SampoAsmBackend.cpp     # Object file generation
│   ├── SampoELFObjectWriter.cpp
│   ├── SampoInstPrinter.cpp    # Assembly printing
│   ├── SampoMCAsmInfo.cpp
│   ├── SampoMCCodeEmitter.cpp
│   └── SampoMCTargetDesc.cpp
└── TargetInfo/
    └── SampoTargetInfo.cpp     # Target registration

Each file has a specific role. The TableGen files (.td) are processed at build time to generate C++ code for instruction encoding, assembly printing, and more. The GISel/ directory contains GlobalISel-specific components; this is where most of the interesting logic lives.

Target Description (TableGen)

LLVM uses TableGen, a domain-specific language, to describe target architectures declaratively. For Sampo, we defined:

Registers (SampoRegisterInfo.td):

def R0  : SampoReg<0,  "R0">;   // Zero register
def R1  : SampoReg<1,  "R1">;   // Return address
def R2  : SampoReg<2,  "R2">;   // Stack pointer
// ... R3-R15

def GPR : RegisterClass<"Sampo", [i16], 16, (sequence "R%u", 0, 15)>;

Instructions (SampoInstrInfo.td):

def ADD : FormatR<0x0, 0x0, (outs GPR:$rd), (ins GPR:$rs1, GPR:$rs2),
                  "ADD\t$rd, $rs1, $rs2",
                  [(set GPR:$rd, (add GPR:$rs1, GPR:$rs2))]>;

def LIX : FormatXNoRs<0x8, (outs GPR:$rd), (ins imm16:$imm),
                      "LIX\t$rd, $imm",
                      [(set GPR:$rd, imm16:$imm)]>;

Calling Convention (SampoCallingConv.td):

def CC_Sampo : CallingConv<[
  // First 4 arguments in R4-R7
  CCIfType<[i16], CCAssignToReg<[R4, R5, R6, R7]>>,
  // Additional arguments on stack
  CCIfType<[i16], CCAssignToStack<2, 2>>
]>;

These declarative definitions generate thousands of lines of C++ code automatically.

GlobalISel: The Modern Instruction Selector

GlobalISel is LLVM's newer instruction selection framework, designed to be more modular and easier to target than the legacy SelectionDAG approach. It works in phases:

IRTranslator: Converts LLVM IR to Generic Machine IR (GMIR)
Legalizer: Transforms illegal operations into legal ones
RegBankSelect: Assigns operands to register banks
InstructionSelect: Maps GMIR to target instructions

For a 16-bit architecture like Sampo, the Legalizer is where most complexity lives. LLVM IR freely uses types like i32, i64, and even i128. Sampo's ALU only operates on 16-bit values. The legalizer must transform these:

// In SampoLegalizerInfo.cpp
getActionDefinitionsBuilder(G_ADD)
    .legalFor({s16})           // i16 add is native
    .clampScalar(0, s16, s16)  // Clamp to 16-bit
    .widenScalarToNextPow2(0); // Widen smaller types

getActionDefinitionsBuilder({G_SDIV, G_UDIV})
    .legalFor({s16})
    .libcallFor({s32, s64})    // Use libcalls for larger types
    .clampScalar(0, s16, s64);

This tells LLVM: "16-bit addition is a single instruction. 32-bit addition needs to be broken into multiple 16-bit operations. 64-bit division should call a library function."

Debugging the Legalizer: A Case Study

One particularly memorable debugging session illustrates the challenges of LLVM development. When first attempting to compile Rust's libcore, the compiler crashed with:

Assertion failed: (idx < size()), function operator[], file SmallVector.h, line 301

This cryptic error (a SmallVector bounds overflow deep in LLVM's internals) gave no indication of what was wrong. The stack trace pointed to SampoInstPrinter::printOperand, which prints assembly operands.

Working with Claude Code, we traced the issue through multiple layers:

The crash occurred when printing a JALR (indirect call) instruction
JALR is defined in TableGen as JALR $rd, $rs1 (two operands)
Our call lowering code was only providing one operand (the target register)
The printer tried to access operand index 1, which didn't exist

The fix was a single line change, adding the return address destination register:

// Before (broken):
MIRBuilder.buildInstr(Sampo::JALR)
    .addReg(Info.Callee.getReg());

// After (fixed):
MIRBuilder.buildInstr(Sampo::JALR)
    .addDef(Sampo::R1)  // Return address destination
    .addReg(Info.Callee.getReg());

This pattern repeated throughout development: an opaque error, careful tracing through LLVM's layers, and ultimately a small fix. Without Claude Code's ability to quickly navigate LLVM's massive codebase and maintain context across debugging sessions, each of these issues could have taken days to resolve.

The 16-bit Challenge: Type Legalization

The most significant technical challenge was handling non-16-bit types. Consider what happens when Rust code uses a u32:

let x: u32 = 0x12345678;
let y: u32 = x + 1;

Sampo has no 32-bit registers. LLVM must:

Split the 32-bit value across two 16-bit registers (R4:R5)
Implement addition with carry propagation
Track both halves through register allocation

The legalizer handles this through "narrowing" actions:

getActionDefinitionsBuilder(G_ADD)
    .legalFor({s16})
    .narrowScalarFor({{s32, s16}},  // Narrow s32 to s16 pairs
                     [](const LegalityQuery &Query) {
                       return std::make_pair(0, LLT::scalar(16));
                     });

We also encountered issues with unusual type sizes. LLVM's intermediate stages sometimes create types like s12 or s24 (12-bit and 24-bit integers). These aren't power-of-two sizes, which caused crashes in the type legalization framework:

LLVM ERROR: unable to legalize instruction: %1:_(s12) = G_TRUNC %0:_(s32)

The fix required careful specification of widening rules:

getActionDefinitionsBuilder(G_TRUNC)
    .widenScalarIf(
        [](const LegalityQuery &Query) {
          unsigned Size = Query.Types[1].getSizeInBits();
          return !llvm::isPowerOf2_32(Size);  // Non-power-of-2?
        },
        [](const LegalityQuery &Query) {
          unsigned Size = Query.Types[1].getSizeInBits();
          unsigned NewSize = llvm::PowerOf2Ceil(Size);
          return std::make_pair(1, LLT::scalar(NewSize));
        })
    .legalIf([](const LegalityQuery &Query) {
      return Query.Types[0].getSizeInBits() <=
             Query.Types[1].getSizeInBits();
    });

This tells LLVM: "If you see a non-power-of-2 type, round it up to the next power of 2 first, then proceed with normal legalization."

Multi-Word Arithmetic

When Rust code uses 32-bit or 64-bit integers, Sampo must synthesize these operations from 16-bit primitives. Consider a simple 32-bit addition:

let a: u32 = 0x12340000;
let b: u32 = 0x00005678;
let c = a + b;  // 0x12345678

This compiles to a sequence that:

Adds the low 16-bit halves
Adds the high 16-bit halves with carry propagation
Manages results across register pairs

The generated assembly looks like:

; R4:R5 = first operand (low:high)
; R6:R7 = second operand (low:high)
ADD   R8, R4, R6      ; Add low halves
LIX   R9, 0           ; Prepare carry
; (carry detection logic)
ADD   R10, R5, R7     ; Add high halves
ADD   R10, R10, R9    ; Add carry
; Result in R8:R10

LLVM's legalizer generates this multi-instruction sequence automatically through "narrowing" rules. We didn't write this expansion manually; we just told LLVM that 32-bit operations should be narrowed to 16-bit pairs.

Function Calling Convention

Getting function calls right was crucial. Sampo uses:

R4-R7: First four arguments (caller-saved)
R1: Return address
R2: Stack pointer
R8-R11: Temporaries (caller-saved)
R12-R15: Saved registers (callee-saved)

The SampoCallLowering.cpp file implements this:

bool SampoCallLowering::lowerCall(MachineIRBuilder &MIRBuilder,
                                   CallLoweringInfo &Info) const {
  // Copy arguments to their designated registers
  static const MCPhysReg ArgRegs[] = {Sampo::R4, Sampo::R5,
                                       Sampo::R6, Sampo::R7};

  for (unsigned i = 0; i < Info.OrigArgs.size(); i++) {
    if (i < 4) {
      MIRBuilder.buildCopy(Register(ArgRegs[i]), Info.OrigArgs[i].Regs[0]);
    } else {
      // Spill to stack
    }
  }

  // Build the call instruction
  if (Info.Callee.isReg()) {
    // Indirect call: JALR R1, Rs  (save return addr to R1, jump to Rs)
    MIRBuilder.buildInstr(Sampo::JALR)
        .addDef(Sampo::R1)
        .addReg(Info.Callee.getReg());
  } else {
    // Direct call: JALX symbol
    MIRBuilder.buildInstr(Sampo::JALX)
        .add(Info.Callee);
  }

  // Mark caller-saved registers as clobbered
  // ... implicit defs for R4-R11
}

One subtle bug took hours to track down: the JALR instruction (indirect call) expects two operands: the destination register for the return address (R1) and the source register containing the jump target. Initially, we only provided one operand, causing a crash deep in the assembly printer when it tried to access the non-existent second operand. The error message was simply "SmallVector capacity overflow," not exactly illuminating without context.

The Assembly Printer Layer

The final stage of code generation converts LLVM's internal machine instructions to textual assembly. This involves two components:

MCInstLower converts MachineInstr (high-level) to MCInst (low-level):

void SampoMCInstLower::Lower(const MachineInstr *MI, MCInst &OutMI) const {
  OutMI.setOpcode(MI->getOpcode());

  for (const MachineOperand &MO : MI->operands()) {
    MCOperand MCOp = LowerOperand(MO);
    if (MCOp.isValid())  // Skip implicit operands
      OutMI.addOperand(MCOp);
  }
}

InstPrinter converts MCInst to assembly text:

void SampoInstPrinter::printOperand(const MCInst *MI, unsigned OpNo,
                                    raw_ostream &O) {
  const MCOperand &Op = MI->getOperand(OpNo);
  if (Op.isReg())
    printRegName(O, Op.getReg());
  else if (Op.isImm())
    O << Op.getImm();
  else if (Op.isExpr())
    MAI.printExpr(O, *Op.getExpr());
}

TableGen generates most of the printer code automatically from instruction definitions. The pattern "ADD\t$rd, $rs1, $rs2" in the TableGen file directly produces the assembly format.

Building Rust's Standard Library

With the LLVM backend working, the next step was teaching Rust about Sampo. This required:

1. Adding the Target Triple

In Rust's rustc_target crate, we added sampo-unknown-none:

// compiler/rustc_target/src/spec/targets/sampo_unknown_none.rs
pub(crate) fn target() -> Target {
    Target {
        data_layout: "e-m:e-p:16:16-i8:8-i16:16-i32:16-n16-S16".into(),
        llvm_target: "sampo-unknown-none".into(),
        pointer_width: 16,
        arch: Arch::Sampo,
        options: TargetOptions {
            panic_strategy: PanicStrategy::Abort,
            atomic_cas: false,
            max_atomic_width: Some(0),
            c_int_width: 16,
            ..Default::default()
        },
    }
}

The data_layout string is critical; it tells LLVM that pointers are 16 bits, alignment requirements, and native integer sizes. Getting this wrong causes subtle miscompilations.

2. Registering the Target in Rust

Rust's build system needs to know about new targets in multiple places:

// compiler/rustc_target/src/spec/mod.rs
supported_targets! {
    // ... existing targets ...
    ("sampo-unknown-none", sampo_unknown_none),
}

// compiler/rustc_span/src/symbol.rs
Symbols {
    // ... existing symbols ...
    sampo,
}

The Arch enum in rustc_target also needed a new variant. These changes propagate through Rust's bootstrap system, eventually producing a compiler that recognizes --target sampo-unknown-none.

3. Building Core Libraries

Rust's #![no_std] programs still need libcore (the dependency-free foundation) and compiler_builtins (intrinsics for operations the hardware doesn't support natively). Building these required:

# Point Rust at our custom LLVM
export LLVM_CONFIG=/path/to/llvm-sampo/build/bin/llvm-config

# Build stage 1 compiler
./x.py build --stage 1

# Build libraries for Sampo
./x.py build --stage 1 library --target sampo-unknown-none

This compiles approximately 50,000 lines of Rust into Sampo assembly, a significant stress test of the backend. The resulting libraries:

libcore: 1.1 MB (Rust's core library)
liballoc: 211 KB (heap allocation)
libcompiler_builtins: 2.3 MB (soft-float, 64-bit arithmetic, etc.)

3. Handling Missing Features

A 16-bit CPU without atomic operations or floating-point hardware needs careful configuration:

atomic_cas: false: No compare-and-swap
max_atomic_width: Some(0): No atomic operations at all
panic_strategy: PanicStrategy::Abort: No unwinding

Rust's type system handles these gracefully. Code that requires atomics simply won't compile for Sampo, with clear error messages.

The Complete Pipeline

Let's trace through what happens when compiling our "Hi!" program:

Stage 1: Rust to LLVM IR

putc(b'H');

Becomes:

call void @putc(i8 zeroext 72)

Stage 2: LLVM IR to Generic Machine IR

%0:gpr = G_CONSTANT i16 72
$r4 = COPY %0
JALX @putc, implicit $r4, implicit-def $r1, ...

Stage 3: Instruction Selection

%0:gpr = LIX 72
$r4 = COPY %0
JALX @putc, ...

Stage 4: Register Allocation

$r4 = LIX 72
JALX @putc

Stage 5: Assembly Output

LIX  R4, 72
JALX putc

Stage 6: Binary

Our sasm assembler produces the final binary, which runs on semu.

The Development Process: Iterating with AI

Traditional compiler development follows a deliberate pace: study the codebase for weeks, implement a small feature, spend days debugging, repeat. With Claude Code, this cycle compressed dramatically.

A typical session looked like:

Describe the goal: "I need to implement call lowering for indirect function calls"
Receive implementation: Claude Code generates SampoCallLowering.cpp with appropriate patterns
Test: Compile a test case, observe failure
Debug together: Share the error, get analysis and fixes
Iterate: Sometimes 10-20 cycles for a single feature

The key insight is that Claude Code doesn't just generate code; it explains why that code is correct (or incorrect). When the call lowering crashed, Claude Code walked through:

How MachineInstrs represent instructions
The difference between explicit and implicit operands
Why the TableGen definition expected two operands
What the MCInstLower layer does with each operand type

This contextual understanding accelerates learning far beyond copy-paste coding.

Code Quality Considerations

AI-generated code requires the same scrutiny as human-written code. During this project, we found:

Things Claude Code did well:

Boilerplate that follows established patterns
TableGen definitions (highly formulaic)
Explaining LLVM concepts and architecture
Debugging from error messages and stack traces

Things requiring human judgment:

Architectural decisions (GlobalISel vs SelectionDAG)
Performance tradeoffs in instruction selection
Edge cases in type legalization
Testing strategy and coverage

The final codebase reflects this collaboration. Claude Code generated perhaps 80% of the initial code, but human review and iteration refined it into something production-quality.

Lessons Learned

1. Start with GlobalISel

For new backends, GlobalISel is significantly easier to work with than SelectionDAG. The modular design means you can implement and test each phase independently.

2. Type Legalization is the Hard Part

For non-standard word sizes (16-bit, 8-bit), most complexity lives in the legalizer. Plan to spend 60%+ of your effort here.

3. Test Early and Often

We maintained a suite of LLVM IR test files that exercised specific features:

; test_call.ll - Function calling
define void @_start() {
  call void @putc(i8 72)  ; 'H'
  ret void
}

Each bug fix was validated against this suite before proceeding.

4. AI-Assisted Development Changes Everything

Traditional LLVM backend development requires months of ramp-up time just to understand the codebase. Claude Code's ability to explain concepts, generate boilerplate, and debug issues compressed this dramatically. The key is knowing what questions to ask and validating the outputs.

5. LLVM's Abstractions Are Worth It

Despite the complexity, LLVM's abstractions pay dividends. Register allocation, instruction scheduling, and numerous optimizations come for free. A hand-written code generator would take years to match this quality.

What's Next

With Rust compiling for Sampo, several exciting possibilities open up:

Operating System Development: Sampo now has enough tooling to write a simple operating system. A minimal kernel with task switching, memory management, and device drivers becomes feasible. Rust's ownership model could make this a particularly safe OS, even on a minimal 16-bit platform.

Language Ports: Since we implemented an LLVM backend (not just Rust support), Clang should work with minimal additional effort. C and C++ for Sampo would enable porting existing retrocomputing software. Imagine CP/M utilities or classic games recompiled for modern Sampo hardware.

Hardware Verification: Running Rust-generated code on the FPGA implementation will provide end-to-end validation of both the hardware and software toolchains. Any discrepancy between the emulator and hardware would become immediately visible.

Educational Materials: A complete, working compiler toolchain for a simple CPU is valuable for teaching. Students can trace code from high-level Rust through every compilation stage to final execution. The relative simplicity of a 16-bit architecture makes the concepts accessible.

Performance Optimization: The current backend generates correct code, but there's room for improvement. Instruction scheduling, better register allocation hints, and peephole optimizations could improve code density and speed.

Conclusion

Building an LLVM backend for a custom CPU is one of those projects that sounds impossible until you're in the middle of it, then sounds impossible again when you hit your third cryptic linker error at 2 AM. But it's achievable, especially with modern AI-assisted development tools.

The Sampo project now spans:

Architecture design: A clean 16-bit RISC with Z80-inspired features
Hardware implementation: Verilog RTL running on an ECP5 FPGA (need to order hardware first!)
Assembler and emulator: Written in Rust, fully functional
LLVM backend: Complete GlobalISel-based code generator
Rust support: libcore, liballoc, and compiler_builtins for sampo-unknown-none

From Finnish mythology, the Sampo was a magical mill that produced endless riches. Our Sampo is more modest; it just produces machine code. But there's something magical about typing cargo build --target sampo-unknown-none and watching a high-level language compile down to instructions for a CPU that didn't exist a few months ago.

The complete source code is available on GitHub:

llvm-sampo - The LLVM backend and Rust target specification
sampo - CPU architecture, assembler, emulator, and FPGA RTL

Whether you're interested in compiler development, CPU design, or just want to see how deep the rabbit hole goes, I hope this series has been illuminating.

Recommended Books

If you're interested in learning more about LLVM, Rust, or computer architecture, these books are excellent resources:

LLVM & Compiler Development

Learn LLVM 17 by Kai Nacke - Comprehensive guide to LLVM internals, including backend development
LLVM Techniques, Tips, and Best Practices by Min-Yih Hsu - Practical patterns for working with LLVM
LLVM Code Generation - Focused coverage of code generation, instruction selection, and register allocation

Rust Programming

The Rust Programming Language, 3rd Edition by Steve Klabnik & Carol Nichols - The definitive Rust guide, updated for 2024
Programming Rust, 2nd Edition by Jim Blandy et al. - Deep dive into Rust's systems programming capabilities

Computer Architecture

Computer Architecture: A Quantitative Approach by Hennessy & Patterson - The classic text on CPU design
Digital Design and Computer Architecture by Harris & Harris - From gates to processors, excellent for CPU design
The RISC-V Reader - Modern RISC architecture principles (many Sampo design decisions were informed by RISC-V)

Source Code

All code is available under open source licenses:

github.com/ajokela/llvm-sampo - LLVM backend (Apache 2.0 + LLVM Exception)
github.com/ajokela/sampo - Assembler, emulator, FPGA RTL

Acknowledgments

This project wouldn't have been possible without the LLVM community's extensive documentation and the examples provided by existing backends. The MSP430, AVR, and RISC-V backends were particularly useful references for handling small word sizes.

Claude Code, developed by Anthropic, was instrumental in navigating LLVM's complexity. While AI-assisted development is sometimes viewed skeptically, this project demonstrates its potential for tackling genuinely difficult engineering challenges. The key is treating AI as a collaborator rather than a replacement; it accelerates the mechanical aspects while humans provide direction and judgment.

This is Part 3 of the Sampo series. Part 1 covers the architecture design, and Part 2 covers the FPGA implementation.