
A friend recently brought to my attention a project called SectorC and it demonstrated something remarkable: a C compiler that fits in a 512-byte x86-16 boot sector. It compiles a substantial subset of C (variables, functions, if/while, 14 binary operators, pointer dereference, inline assembly) in less space than most error messages.
I wanted to see if the same idea could work on the Z80.
The Z80 is a fundamentally different machine from x86-16. It has no memory-to-memory move instructions, no string operations like stosw, no segment registers that double as a free 64K hash table. It's an 8-bit processor pretending to be 16-bit through register pairs. Every operation that x86 does in one instruction tends to take two or three on Z80. So the question wasn't whether a Z80 version would be bigger (it obviously would) but whether it could stay small enough to be interesting.
The answer is 733 bytes.
What It Compiles
SectorZ implements the same "Barely C" language as SectorC. All tokens must be separated by spaces, which eliminates the need for a real tokenizer. You write C, but with mandatory whitespace:
int c ; int x ; void putch ( ) { asm ( 58 98 209 ) ; asm ( 211 129 ) ; } void main ( ) { x = 72 ; c = x ; putch ( ) ; }
The supported feature set:
- Global variable declarations (
int name ;) - Function definitions (
void name ( ) { ... }) - Assignment (
x = expr ;) - Function calls (
func ( ) ;) - If statements (
if ( expr ) { ... }) - While loops (
while ( expr ) { ... }) - 14 binary operators:
+ - * & | ^ << >> == != < > <= >= - Pointer dereference for read (
* expr) and write (* expr = expr ;) - Address-of operator (
& var) - Inline machine code (
asm ( byte byte ... ) ;) - Parenthesized subexpressions
No function arguments, no local variables, no return values, no preprocessor, no error checking. The programmer is trusted completely, in the grand tradition of 1970s C.
The Architecture Tax
SectorC fits in 512 bytes partly because x86-16 is dense. The stosw instruction stores AX to [ES:DI] and increments DI, all in a single byte. On the Z80, the equivalent operation (store a 16-bit value and advance the pointer) takes three bytes at minimum. SectorC uses segment registers to create a free 64K lookup table for variable and function hashing. The Z80 has no segments.
This is the fundamental challenge: the Z80 instruction set is more orthogonal and regular than x86, but it pays for that regularity with verbosity. A simple "emit a 3-byte instruction" helper that writes an opcode and a 16-bit address costs 7 bytes. SectorC does the same thing with stosw and a single mov, effectively 4 bytes.
The result is that SectorZ is larger than its x86 counterpart. But 733 bytes for a self-contained C compiler on an 8-bit processor from 1976 still feels pretty good.
Memory Layout
The compiler loads at address $0000 and uses the upper portion of the Z80's 64K address space for its data structures:
| Address | Size | Purpose |
|---|---|---|
$0000 |
733 bytes | Compiler code |
$D000 |
256 bytes | Function trampoline table (64 entries x 4 bytes) |
$D100 |
256 bytes | Variable storage (128 entries x 2 bytes) |
$D200 |
3 bytes | Tokenizer state (semicolon buffer, number flag, EOF flag) |
$D300+ |
~11.5K | Generated code output |
The compiler reads source code character by character from the MC6850 ACIA serial port, compiles it to Z80 machine code starting at $D300, and then calls main() through the trampoline table. The entire process (read, compile, execute) happens without ever touching a disk.
Key Design Decisions
HL as the Code Pointer
The most important register allocation decision in the whole compiler is using HL as the output pointer. Z80's LD (HL), n instruction stores an immediate byte to the address in HL in just 2 bytes. The alternative, using DE with LD A, n / LD (DE), A, costs 3 bytes per emit site. Since the compiler emits bytes constantly, this saves roughly 25 bytes across all the emit sequences. It does mean HL is permanently occupied, so the tokenizer has to push/pop HL around every call, but the trade-off is clearly worth it.
The atoi Hash Trick
This is borrowed directly from SectorC, and it's the single cleverest idea in the whole design. The tokenizer hashes every identifier using the same algorithm as atoi: hash = hash * 10 + char. For numeric tokens, it subtracts '0' from each character first, so the hash is the actual integer value. For identifiers, the raw ASCII values are accumulated.
The key insight is that the hash doubles as a lookup key. Variable names hash to 16-bit values; the low byte (masked to even alignment) indexes into the variable table at $D100. Function names hash similarly, with the low byte (masked to 4-byte alignment) indexing into the trampoline table at $D000. Keywords like if, while, and void hash to fixed values that the compiler checks directly.
No symbol table. No string comparison. Just arithmetic.
The cp_de_imm Trick
Comparing a 16-bit register pair against a constant is expensive on Z80. The naive approach (LD A, E / CP low / JR NZ, skip / LD A, D / CP high) costs 7 bytes, and the compiler does this check constantly (for every keyword and punctuation token). SectorZ uses an inline-constant trick instead:
cp_de_imm: ex (sp), hl ; Swap HL with return address on stack ld a, (hl) ; Load low byte of constant inc hl cp e ; Compare with E jr nz, cp_de_ne ld a, (hl) ; Load high byte of constant cp d ; Compare with D cp_de_ne: inc hl ; Skip past constant regardless ex (sp), hl ; Restore HL, fix return address ret
The 16-bit constant is embedded directly after the CALL, as a DW:
call cp_de_imm dw tok_if ; The constant to compare against jr z, do_if ; Branch if DE == tok_if
The function reads the constant from the return address, advances the return address past it, and restores everything. Each comparison site costs just 5 bytes (3 for the call, 2 for the constant) instead of 7. With 15+ comparison sites in the compiler, this saves around 30 bytes.
The EX (SP), HL instruction is the hero here. It atomically swaps HL with the top of the stack, which is exactly what we need: get the return address into HL for reading, then put the updated address back. This instruction doesn't exist on x86 (SectorC uses lodsw with a different approach), and it's one of the few places where Z80 is genuinely more elegant.
Runtime Helpers for Binary Operations
SectorC generates inline code for binary operators. The x86 ADD AX, BX is just 2 bytes, so inlining is cheap. On Z80, a 16-bit add is ADD HL, DE (1 byte), but subtraction requires OR A / SBC HL, DE (3 bytes), and multiplication doesn't exist as a single instruction at all.
SectorZ moves all binary operations into runtime helper functions. The generated code for any binary expression follows the same pattern: push left operand, evaluate right operand, pop left into DE, swap, call helper. This costs 6 bytes per operator use in the generated code (1 push + 1 pop + 1 ex + 3 call), but the compiler only needs to emit a uniform sequence, which keeps the compiler itself small.
The 14 runtime helpers add 109 bytes to the compiler. The shift and comparison helpers alone would be prohibitively large to inline (the multiply routine is 25 bytes). By centralizing them, the compiler trades generated code density for compiler code density, which is the right call when you're trying to minimize the compiler.
Function Trampolines
Functions are called through a trampoline table at $D000. When the compiler encounters a function definition, it writes a 3-byte JP actual_address instruction into the trampoline slot. When generated code calls a function, it calls the trampoline, which jumps to the real code.
This eliminates forward-reference problems entirely. Functions can be called before they're defined (as long as the caller executes after the definition has been compiled). The trampoline table has 64 entries, which means the low 8 bits of a function name's hash, masked to 4-byte alignment, must be unique across all functions in a program. For typical small programs, this works fine.
The Semicolon Hack
One of the trickier parsing problems in Barely C is the semicolon. Consider x = 3 + 4 ;. The expression parser reads tokens until it hits something that isn't an operator. When it reads ;, it doesn't match any operator, so it returns. But it has already consumed the semicolon. The statement parser needs that semicolon to know the statement is complete.
SectorC's solution, which SectorZ copies, is a one-character pushback buffer. The tokenizer treats semicolons specially: if it encounters a ; while accumulating a token, it saves a flag and returns the current token. The next call to tok_next checks the flag first and returns a synthetic semicolon token without reading any input.
This is 15 bytes of code that prevents the need for a much more complex token lookahead mechanism.
A Real Program: Prime Sieve
Hello World with asm() blocks is a legitimate test, but it doesn't really exercise the compiler. Here's a Sieve of Eratosthenes that finds all primes below 100:
int c ; int x ; int d ; int i ; int j ; int p ; int n ; int s ; int f ; void putch ( ) { asm ( 58 98 209 ) ; asm ( 211 129 ) ; } void printnum ( ) { f = 0 ; d = 0 ; while ( x >= 100 ) { x = x - 100 ; d = d + 1 ; } if ( d ) { c = d + 48 ; putch ( ) ; f = 1 ; } d = 0 ; while ( x >= 10 ) { x = x - 10 ; d = d + 1 ; } if ( d + f ) { c = d + 48 ; putch ( ) ; } c = x + 48 ; putch ( ) ; } void main ( ) { s = 57344 ; n = 100 ; i = 2 ; while ( i < n ) { * ( s + i + i ) = 1 ; i = i + 1 ; } p = 2 ; while ( p * p < n ) { if ( * ( s + p + p ) ) { j = p + p ; while ( j < n ) { * ( s + j + j ) = 0 ; j = j + p ; } } p = p + 1 ; } i = 2 ; while ( i < n ) { if ( * ( s + i + i ) ) { x = i ; printnum ( ) ; c = 32 ; putch ( ) ; } i = i + 1 ; } c = 10 ; putch ( ) ; }
This program demonstrates several things that aren't obvious from the language description.
Pointer arithmetic as arrays. Barely C has no array type, but * ( s + i + i ) reads a 16-bit value from address s + 2*i, effectively treating a block of memory as an integer array. The sieve stores its flags at address 57344 ($E000), well above both the compiler and the generated code.
Decimal output without division. The language has no division or modulo operators, so printnum extracts digits via repeated subtraction. The f flag tracks whether a hundreds digit was printed, ensuring proper formatting of numbers like 2 (just "2") vs 103 ("103", not "13").
The putch function. This is the I/O bridge between Barely C and the hardware. The asm statement emits raw Z80 opcodes: 58 98 209 is LD A, ($D162) (load the low byte of variable c), and 211 129 is OUT ($81), A (send it to the ACIA serial port). The programmer has to compute the variable's memory address from its hash (c hashes to 99, masked to even alignment gives 98, at offset $D162), which is admittedly inconvenient but functional.
Running it through the emulator:
$ (cat primes.bc; printf '\x1a') | retroshield sectorz.bin 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
All 25 primes below 100, computed and printed by 733 bytes of compiler generating Z80 machine code on the fly.
Size Breakdown
Where do the 733 bytes go?
| Component | Bytes | % |
|---|---|---|
| Entry point | 13 | 1.8% |
Top-level parser (compile) |
60 | 8.2% |
Statement dispatch (compile_stmts) |
51 | 7.0% |
| Assignment and calls | 30 | 4.1% |
asm statement |
23 | 3.1% |
Control flow (if, while) |
63 | 8.6% |
| Deref assign | 27 | 3.7% |
| Expression parser | 59 | 8.0% |
| Unary expressions | 73 | 10.0% |
Helpers (emit_var, emit3, func_addr, emit_test) |
32 | 4.4% |
cp_de_imm |
11 | 1.5% |
| Tokenizer | 98 | 13.4% |
getch (serial I/O) |
26 | 3.5% |
| Operator table | 58 | 7.9% |
| Runtime helpers | 109 | 14.9% |
The runtime helpers are the largest single component at 15% of the binary. The multiply routine alone is 25 bytes. If the Z80 had a hardware multiply instruction, the compiler would be noticeably smaller. The tokenizer at 13% is the next largest piece, driven primarily by the multiply-by-10 hash accumulation loop, which requires several register shuffles because the Z80 has no 16-bit multiply.
The operator table is pure data: 14 entries of 4 bytes each (token hash + helper address) plus a 2-byte sentinel. It's an unavoidable cost of supporting 14 operators, but the table-driven approach keeps the expression parser compact at 59 bytes.
SectorC vs. SectorZ
| SectorC (x86-16) | SectorZ (Z80) | |
|---|---|---|
| Size | 512 bytes | 733 bytes |
| Target | x86-16 real mode | Z80 bare metal |
| I/O | VGA memory, INT 16h | MC6850 ACIA serial |
| Variables | 64K segment | 256-byte table |
| Functions | Direct call | JP trampoline table |
| Binary ops | Inline generated code | CALL to runtime helpers |
| Code emit |
stosw (1 byte) |
LD (HL),n / INC HL (3 bytes) |
| Token compare | lodsw / cmp |
EX (SP),HL inline trick |
The 221-byte difference comes down to instruction set density. The x86 has a rich CISC heritage (string instructions, memory-to-register operations, implicit operand encoding) that makes tiny programs disproportionately easy. The Z80 is capable but verbose. Every extra byte in the instruction encoding cascades across every emit site, every comparison, every helper function.
That said, the Z80 has a few tricks of its own. EX (SP), HL is a single-byte instruction that enables the inline constant comparison technique. The ADD HL, DE instruction does 16-bit addition in one byte. And EX DE, HL swaps two register pairs in one byte, which is essential for getting operands into the right positions cheaply.
Running It
SectorZ runs on the retro-z80-emulator, a Rust-based Z80 emulator that connects stdin/stdout to an emulated MC6850 ACIA serial port. It also runs on real hardware via the RetroShield Z80. The compiler loads at address $0000, reads source from serial, compiles and executes.
$ z80asm -o sectorz.bin sectorz.asm $ (cat examples/primes.bc; printf '\x1a') | retroshield sectorz.bin
The \x1a (Ctrl-Z) at the end signals EOF to the compiler. The emulator's serial implementation silently drops null bytes, so the traditional CP/M EOF marker of $00 doesn't work. A minor debugging adventure that reinforced the value of reading the emulator source code before assuming how it handles edge cases.
What's Missing
Quite a lot, obviously. No function arguments means all communication happens through global variables. No local scope means recursive functions can't maintain independent state. No else clause. No for loop. No return statement (functions run to the closing brace and always return). No character or string literals. No preprocessor.
But these are the same limitations as SectorC. The point was never to build a production compiler. It's a demonstration that a meaningful C compiler can exist in a space that most programmers would consider insufficient for anything useful. Seven hundred thirty-three bytes is less than a single TCP packet. It's smaller than most compiler error messages. And yet it reads C source code, performs lexical analysis, parses expressions with arbitrary nesting, generates native machine code with forward-patched control flow, and executes the result, all on a processor designed in 1976.
The source code is available on GitHub.
If you're interested in Z80 development, Design a Z80 Computer is a great hands-on guide, and Learn Multiplatform Assembly Programming with ChibiAkumas covers Z80 assembly alongside other architectures.