In Finnish mythology, the Sampo is a magical artifact from the epic poem Kalevala, compiled by Elias Lönnrot in 1835. According to legend, the Sampo was forged by Ilmarinen, a legendary blacksmith and sky god, from a swan's feather, a grain of barley, a ball of wool, a drop of milk, and a shaft of a distaff. The resulting creation took the form of a magical mill that could produce flour, salt, and gold endlessly—bringing riches and good fortune to its holder.

The exact nature of the Sampo has been debated by scholars since 1818, with over 30 theories proposed—ranging from a world pillar to an astrolabe to a decorated shield. This mystery makes it a fitting namesake for a CPU architecture: something that transforms simple inputs into useful outputs, whose inner workings invite exploration and understanding.

This is the first part of a two-part series exploring the Sampo CPU architecture. In this article, we'll dive deep into the theory, design philosophy, and architectural decisions that shaped Sampo. In Part 2, we'll get our hands dirty with an actual FPGA implementation using Amaranth HDL, bringing this processor to life on real silicon.

The Problem Space: Why Another CPU?

Before diving into Sampo's architecture, it's worth asking: why design a new CPU at all? The retrocomputing community has no shortage of classic processors to explore—the Z80, 6502, 68000—and modern RISC architectures like RISC-V offer clean, well-documented designs for educational purposes.

The answer lies in a specific niche that existing architectures don't quite fill. Consider the typical workloads of classic 8-bit systems: interpreters for languages like BASIC and Forth, operating systems like CP/M, text editors, and simple games. These workloads have distinct characteristics:

  1. Heavy use of memory operations: Block copies, string manipulation, memory fills
  2. Port-based I/O: Serial terminals, disk controllers, sound chips accessed via dedicated I/O instructions
  3. Context switching: Interrupt handlers that need to save and restore register state quickly
  4. BCD arithmetic: Calculator applications, financial software

The Z80 excels at these tasks through specialized instructions (LDIR, LDDR, IN, OUT) and its alternate register set. But the Z80 is an 8-bit CISC processor with irregular encoding, complex addressing modes, and over 300 instruction variants. This makes it challenging to implement efficiently in modern hardware or to target with optimizing compilers.

Modern RISC architectures like RISC-V take the opposite approach: clean, orthogonal instruction sets optimized for pipelining and compiler code generation. But they typically use memory-mapped I/O (no dedicated I/O instructions), lack block operations, and provide no alternate register sets for fast context switching.

Sampo occupies the middle ground—a "Z80 programmer's RISC" that combines the regularity and simplicity of RISC design with the specialized capabilities that made the Z80 so effective for its target workloads.

Design Goals

Sampo was designed with five primary goals:

  1. RISC-inspired instruction set: Clean, orthogonal design with predictable encoding
  2. 16-bit native word size: Registers, ALU, and memory addressing all 16-bit
  3. Efficient for interpreters and compilers: Stack operations, indirect addressing, hardware multiply/divide
  4. Simple to implement: Suitable for FPGA synthesis or software emulation
  5. Z80-workload compatible: Port-based I/O, BCD support, block operations, alternate registers

These goals create natural tensions. RISC purity would eliminate block operations and port-based I/O. Maximum Z80 compatibility would preserve its irregular encoding. Sampo resolves these tensions by borrowing selectively from multiple architectural traditions.

Architectural Lineage

Sampo's design draws from four distinct sources, each contributing specific elements:

From RISC-V

RISC-V's influence is most visible in Sampo's register conventions:

  • Zero register (R0): A register that always reads as zero and ignores writes. This eliminates the need for separate "clear" or "load zero" instructions—ADD R4, R0, R0 clears R4, ADD R4, R5, R0 copies R5 to R4.
  • Register naming conventions: Return address (RA), stack pointer (SP), global pointer (GP), argument registers (A0-A3), temporaries (T0-T3), and saved registers (S0-S3).
  • Load/store architecture: Only load and store instructions access memory; all computation occurs between registers.

From MIPS

MIPS contributed Sampo's approach to instruction encoding:

  • Simple, orthogonal formats: A small number of instruction formats (R, I, S, B, J) with consistent field positions
  • 4-bit primary opcode: Sixteen instruction categories, each with function codes for variants
  • PC-relative branching: Branch targets specified as signed offsets from the program counter

From ARM Thumb/Thumb-2

ARM's Thumb instruction set inspired Sampo's hybrid encoding strategy:

  • 16-bit base instruction width: Most common operations fit in 16 bits for improved code density
  • 32-bit extended forms: Operations requiring larger immediates use a two-word format
  • Prefix-based extension: The 0xF opcode prefix indicates a 32-bit instruction, simplifying decode

From the Z80

The Z80 provides Sampo's "personality"—the features that make it feel familiar to retrocomputing enthusiasts:

  • Port-based I/O: IN and OUT instructions with 8-bit port addresses, separate from the memory address space
  • Alternate register set: The EXX instruction swaps working registers with shadow copies for fast interrupt handling
  • Block operations: LDIR, LDDR, FILL, and CPIR for efficient memory manipulation
  • BCD support: The DAA (Decimal Adjust Accumulator) instruction for binary-coded decimal arithmetic
  • 64KB address space: 16-bit addresses, matching the Z80's memory model

The Register File

Sampo provides 16 general-purpose 16-bit registers, organized with RISC-V-style conventions:

Register Alias Convention
R0 ZERO Always reads as 0, writes ignored
R1 RA Return address (saved by caller)
R2 SP Stack pointer
R3 GP Global pointer (optional)
R4-R7 A0-A3 Arguments / Return values
R8-R11 T0-T3 Temporaries (caller-saved)
R12-R15 S0-S3 Saved registers (callee-saved)

The zero register deserves special attention. Having a register that always contains zero eliminates entire classes of instructions found in other architectures:

  • MOV Rd, Rs becomes ADD Rd, Rs, R0
  • CLR Rd becomes ADD Rd, R0, R0
  • NEG Rd, Rs can use R0 as the implicit minuend
  • CMP Rs, #0 becomes SUB R0, Rs, R0 (result discarded, flags set)

This technique, pioneered by MIPS and refined by RISC-V, dramatically simplifies the instruction set while maintaining expressiveness.

Alternate Registers

Unlike the Z80, which swaps all main registers with EXX, Sampo is selective. Only registers R4-R11 (the arguments and temporaries) have shadow copies. The critical system registers—R0 (zero), R1 (return address), R2 (stack pointer), R3 (global pointer), and R12-R15 (saved registers)—are never swapped.

This design decision serves interrupt handling. When an interrupt occurs, the handler can execute EXX to gain a fresh set of working registers without corrupting the interrupted code's arguments or temporaries. The stack pointer remains valid (no need to establish a new stack), and the return address register can be used to save the interrupted PC.

irq_handler:
    EXX                     ; Swap to alternate R4-R11
    ; ... handle interrupt using R4'-R11' ...
    ; Primary registers preserved automatically
    EXX                     ; Swap back
    RETI                    ; Return from interrupt

The Flags Register

Sampo uses an 8-bit flags register with six defined flags:

Bit Flag Name Description
7 N Negative Sign bit of result (bit 15)
6 Z Zero Result is zero
5 C Carry Unsigned overflow / borrow
4 V Overflow Signed overflow
3 H Half-carry Carry from bit 3 to 4 (for BCD)
2 I Interrupt Interrupt enable

The N, Z, C, and V flags follow standard conventions and support the full range of conditional branches. The H (half-carry) flag exists specifically for the DAA instruction, enabling correct BCD arithmetic. The I flag controls interrupt recognition.

Notably, Sampo provides explicit GETF and SETF instructions to read and write the flags register, unlike many RISC architectures that treat flags as implicit state. This supports context switching and debugging.

Memory Model

Sampo uses a straightforward memory model:

  • Address space: 64KB (16-bit addresses)
  • Byte-addressable: Individual bytes can be loaded and stored
  • Little-endian: Multi-byte values stored with LSB at lower address
  • Word alignment: 16-bit words should be aligned on even addresses (optional enforcement)

A suggested memory map divides the 64KB space:

0x0000-0x00FF   Interrupt vectors / Reset
0x0100-0x7FFF   Program ROM (~32KB)
0x8000-0xFEFF   RAM (~32KB)
0xFF00-0xFFFF   Memory-mapped I/O (256 bytes)

This layout provides a clean separation between code, data, and I/O while leaving room for customization. The interrupt vector area at the bottom of memory follows Z80 conventions, with the reset vector at 0x0000 and interrupt vector at 0x0004.

Port-Based I/O

In addition to memory, Sampo provides a separate 256-port I/O address space accessed via IN and OUT instructions. This design directly mirrors the Z80 and enables straightforward porting of code that interacts with serial ports, disk controllers, sound chips, and other peripherals.

The I/O instructions come in two forms:

INI  R4, 0x80       ; Read from port 0x80 (immediate port number)
IN   R4, (R5)       ; Read from port specified in R5
OUTI 0x81, R4       ; Write R4 to port 0x81 (immediate)
OUT  (R5), R4       ; Write R4 to port specified in R5

Extended 32-bit forms (INX, OUTX) allow the full 8-bit port range to be specified in immediate form.

Instruction Encoding

Sampo uses a clean, regular encoding scheme with 16-bit base instructions and 32-bit extended forms. The 4-bit primary opcode in bits 15:12 determines the instruction category:

Opcode Category Description
0x0 ADD Register addition
0x1 SUB Register subtraction
0x2 AND Bitwise AND
0x3 OR Bitwise OR
0x4 XOR Bitwise XOR
0x5 ADDI Add immediate
0x6 LOAD Load from memory
0x7 STORE Store to memory
0x8 BRANCH Conditional branch
0x9 JUMP Unconditional jump/call
0xA SHIFT Shift and rotate
0xB MULDIV Multiply/divide/BCD
0xC MISC Stack, block ops, compare
0xD I/O Port input/output
0xE SYSTEM NOP, HALT, interrupts
0xF EXTENDED 32-bit instructions

Instruction Formats

Six formats cover all instruction types:

Format R (Register-Register):

15       12 11     8 7      4 3      0
+----------+--------+--------+--------+
|  opcode  |   Rd   |  Rs1   |  Rs2   |
+----------+--------+--------+--------+

Used for three-register operations like ADD R4, R5, R6.

Format I (Immediate):

15       12 11     8 7                0
+----------+--------+------------------+
|  opcode  |   Rd   |      imm8        |
+----------+--------+------------------+

Used for operations with 8-bit immediates like ADDI R4, 42.

Format S (Store):

15       12 11     8 7      4 3      0
+----------+--------+--------+--------+
|  opcode  |  imm4  |  Rs1   |  Rs2   |
+----------+--------+--------+--------+

Used for stores where the destination register field holds an offset.

Format B (Branch):

15       12 11     8 7                0
+----------+--------+------------------+
|  opcode  |  cond  |     offset8      |
+----------+--------+------------------+

Used for conditional branches with PC-relative offsets.

Format J (Jump):

15       12 11                       0
+----------+--------------------------+
|  opcode  |        offset12          |
+----------+--------------------------+

Used for unconditional jumps with 12-bit PC-relative offsets.

Format X (Extended):

Word 0:
15       12 11     8 7      4 3      0
+----------+--------+--------+--------+
|   0xF    |   Rd   |  Rs1   |  sub   |
+----------+--------+--------+--------+

Word 1:
15                                   0
+-------------------------------------+
|              imm16                  |
+-------------------------------------+

Used for operations requiring 16-bit immediates or absolute addresses.

Encoding Examples

To illustrate the encoding scheme, let's examine several instructions:

ADD R4, R5, R6 (R4 = R5 + R6):

Opcode = 0x0, Rd = 4, Rs1 = 5, Rs2 = 6
Binary: 0000 0100 0101 0110 = 0x0456

ADDI R4, 10 (R4 = R4 + 10):

Opcode = 0x5, Rd = 4, imm8 = 10
Binary: 0101 0100 0000 1010 = 0x540A

BEQ +8 (branch forward 8 bytes if equal):

Opcode = 0x8, cond = 0 (BEQ), offset = 4 words
Binary: 1000 0000 0000 0100 = 0x8004

LIX R4, 0x1234 (load 16-bit immediate):

Word 0: 0xF (extended), Rd = 4, Rs = 0, sub = 7 (LIX)
Word 1: 0x1234
Binary: 1111 0100 0000 0111 0001 0010 0011 0100 = 0xF407 0x1234

The regularity of this encoding makes instruction decode straightforward—the first nibble determines the instruction category, and subsequent fields are in consistent positions across formats.

The Instruction Set

Sampo provides approximately 66 distinct instructions, organized into ten categories.

Arithmetic (15 instructions)

The arithmetic category includes standard operations (ADD, SUB, ADDI) plus multiply/divide support:

  • MUL: 16×16 multiplication, low 16 bits of result
  • MULH/MULHU: High 16 bits of 32-bit product (signed/unsigned)
  • DIV/DIVU: Integer division (signed/unsigned)
  • REM/REMU: Remainder (signed/unsigned)
  • DAA: Decimal adjust for BCD arithmetic
  • NEG: Two's complement negation
  • CMP: Compare (subtract without storing result)

Hardware multiply and divide are essential for interpreter performance—dividing a 32-bit value by 10 for number formatting would be prohibitively slow without hardware support.

Logic (6 instructions)

Standard bitwise operations: AND, OR, XOR, NOT, plus immediate forms ANDI and ORI.

Shift and Rotate (16 variants)

Sampo provides an unusually rich set of shift operations:

  • SLL/SRL/SRA: Shift left/right logical/arithmetic
  • ROL/ROR: Rotate left/right
  • RCL/RCR: Rotate through carry (17-bit rotation)
  • SWAP: Swap high and low bytes

Each shift type comes in three shift amounts: 1, 4, and 8 bits. The 4-bit shift is particularly useful for hexadecimal digit extraction and insertion. Variable shifts use the extended format with the shift amount in the second register or immediate field.

Load/Store (6 instructions)

Memory access instructions include word and byte loads (with sign or zero extension), word and byte stores, and LUI (Load Upper Immediate) for constructing 16-bit constants:

LUI  R4, 0x12       ; R4 = 0x1200
ORI  R4, R4, 0x34   ; R4 = 0x1234

Branch (16 conditions)

Sampo supports a comprehensive set of branch conditions:

  • BEQ/BNE: Equal/not equal
  • BLT/BGE/BGT/BLE: Signed comparisons
  • BLTU/BGEU/BHI/BLS: Unsigned comparisons
  • BMI/BPL: Negative/positive
  • BVS/BVC: Overflow set/clear
  • BCS/BCC: Carry set/clear

This covers all reasonable comparison outcomes for both signed and unsigned arithmetic.

Jump/Call (4 instructions)

  • J: PC-relative unconditional jump
  • JAL: Jump and link (save return address in RA)
  • JR: Jump to address in register
  • JALR: Jump and link to register address

Block Operations (6 instructions)

The block operations use a fixed register convention (R4=count, R5=source, R6=destination):

  • LDI/LDD: Load single byte, increment/decrement pointers and count
  • LDIR/LDDR: Repeat until count reaches zero
  • FILL: Fill memory region with value
  • CPIR: Compare and search forward

These instructions are decidedly un-RISC—they're multi-cycle operations that modify multiple registers. But they're implemented with predictable behavior (always the same registers, always the same algorithm) and provide enormous speedups for common memory operations.

Stack (4 instructions)

  • PUSH/POP: Single register push/pop
  • PUSHM/POPM: Push/pop multiple registers (via bitmask)

I/O (4 instructions)

  • INI/OUTI: Immediate port address
  • IN/OUT: Register port address

System (9 instructions)

  • NOP: No operation
  • HALT: Stop processor
  • DI/EI: Disable/enable interrupts
  • EXX: Exchange alternate registers
  • RETI: Return from interrupt
  • SWI: Software interrupt
  • SCF/CCF: Set/complement carry flag
  • GETF/SETF: Read/write flags register

Comparison with Other Architectures

To put Sampo in context, consider how it compares with related processors:

Aspect Z80 MIPS RISC-V Sampo
Word size 8-bit 32-bit 32/64-bit 16-bit
Instruction width 1-4 bytes 4 bytes 2/4 bytes 2/4 bytes
Registers 8 + alternates 32 32 16 + alternates
Zero register No $zero x0 R0
I/O model Port-based Memory-mapped Memory-mapped Port-based
Block operations Yes No No Yes
Instruction count ~300+ ~60 ~50 base ~66

Sampo sits in an interesting position: more regular than the Z80 but with Z80-friendly features, smaller and simpler than 32-bit RISC but still cleanly orthogonal.

Code Examples

To demonstrate how Sampo assembly looks in practice, here's a "Hello World" program that outputs text via a serial port:

        .org 0x0100

.equ    ACIA_STATUS 0x80
.equ    ACIA_DATA   0x81
.equ    TX_READY    0x02

start:
        LIX  R4, message        ; Load address of string

loop:
        LBU  R5, (R4)           ; Load byte from string
        CMP  R5, R0             ; Compare with zero
        BEQ  done               ; If null terminator, done

wait_tx:
        INI  R6, ACIA_STATUS    ; Read serial status port
        ANDI R6, R6, TX_READY   ; Check transmit ready bit
        BEQ  wait_tx            ; Wait if not ready

        OUTI ACIA_DATA, R5      ; Write character to data port
        ADDI R4, 1              ; Next character
        J    loop
done:
        HALT

message:
        .asciz "Hello, Sampo!\n"

And here's a Fibonacci function demonstrating the calling convention:

; fib(n) - compute nth Fibonacci number
; Input: R4 (A0) = n
; Output: R4 (A0) = fib(n)

fib:
        ADDI R5, R0, 0      ; a = 0
        ADDI R6, R0, 1      ; b = 1
        CMP  R4, R0
        BEQ  fib_done

fib_loop:
        ADD  R7, R5, R6     ; temp = a + b
        ADD  R5, R6, R0     ; a = b
        ADD  R6, R7, R0     ; b = temp
        ADDI R4, R4, -1     ; n--
        BNE  fib_loop

fib_done:
        ADD  R4, R5, R0     ; return a
        JR   RA

The code reads naturally to anyone familiar with RISC assembly, while the I/O instructions and register conventions provide the Z80-like feel that makes porting classic software straightforward.

Looking Ahead: FPGA Implementation

With the architecture defined, the next step is implementation. In Part 2 of this series, we'll build a working Sampo processor using Amaranth HDL, a modern Python-based hardware description language. We'll cover:

  • The ALU module: Implementing all arithmetic and logic operations
  • The register file: Including the alternate register set and zero register
  • The instruction decoder: Parsing the various instruction formats
  • The control unit: Managing the fetch-decode-execute cycle
  • The memory interface: Connecting to block RAM
  • The I/O subsystem: Implementing the port-based I/O model
  • Integration: Putting it all together into a working system-on-chip

We'll synthesize the design for an affordable FPGA board and run actual Sampo programs, demonstrating that this architecture isn't just a paper exercise but a real, working processor.

The Sampo project on GitHub includes a complete Rust-based assembler (sasm) and emulator (semu) with a TUI debugger, so you can start writing and testing Sampo programs today. The FPGA implementation will let you run those same programs on real hardware, completing the journey from mythological artifact to silicon reality.

Stay tuned for Part 2, where we'll forge our own Sampo—not from swan feathers and barley, but from lookup tables and flip-flops.