After designing the Sampo RISC architecture on paper—complete with a working assembler and emulator—it's time to bring it to life in silicon. Or at least, in programmable logic. This post documents the hardware selection and implementation planning for synthesizing Sampo on an FPGA.
The Story So Far
If you haven't read Part 1 of this series, here's the quick version: Sampo is a 16-bit RISC CPU designed to bridge the gap between clean RISC design principles and Z80-friendly features. It has 16 general-purpose registers, ~66 instructions, port-based I/O, block operations (LDIR, LDDR), alternate registers for fast interrupt handling, and hardware multiply/divide.
The project already includes working tools written in Rust:
- sasm - A full assembler
- semu - An emulator with TUI debugger (step, breakpoints, memory inspection)
And for hardware implementation, we now have two complete RTL implementations:
Amaranth HDL (/rtl/):
-
cpu.py,alu.py,decode.py,regfile.py,soc.py - Python-based, excellent for rapid iteration
- Generates Verilog for synthesis
AI Assisted Hand-written Verilog (/verilog/rtl/):
-
cpu.v,alu.v,decode.v,regfile.v,shifter.v,uart.v,ram.v,soc.v - Readable, portable, works with any toolchain
- Includes testbenches for Icarus Verilog and Verilator
Now it's time to synthesize it to real hardware.
Choosing an FPGA Platform
The FPGA world is split between proprietary toolchains (Xilinx Vivado, Intel Quartus) and the growing open source ecosystem. For a project like Sampo, where understanding every layer of the stack matters, open source tooling is the clear choice.
Open Source FPGA Options
| FPGA Family | Capacity | Toolchain | Maturity |
|---|---|---|---|
| Gowin GW1N/GW2A | 1K-55K LUTs | Project Apicula | Good |
| Lattice iCE40 | 1K-8K LUTs | Project IceStorm | Excellent |
| Lattice ECP5 | 12K-85K LUTs | Project Trellis | Excellent |
| Xilinx 7-series | 10K-200K+ LUTs | Project X-Ray (partial) | Experimental |
For Sampo, which estimates at ~1,500-2,500 LUTs for the basic CPU, even the smaller FPGAs have more than enough capacity. But if we want room to grow—adding caches, more peripherals, maybe even multi-core experiments—a larger device makes sense.
The ULX3S Board
The ULX3S is an open hardware development board built around the ECP5 FPGA. It's designed by Radiona.org and has become the de facto standard for open source FPGA development.
Specifications
| Component | Specification |
|---|---|
| FPGA | Lattice ECP5 (LFE5U-85F/45F/12F-6BG381C) |
| LUTs | 12K / 44K / 84K (depending on variant) |
| USB | FTDI FT231XS (500 kbit JTAG, 3 Mbit serial) |
| GPIO | 56 pins (28 differential pairs), PMOD-compatible |
| RAM | 32 MB SDRAM @ 166 MHz |
| Flash | 4-16 MB Quad-SPI |
| Storage | microSD slot |
| LEDs | 11 total (8 user, 2 USB, 1 WiFi) |
| Buttons | 7 (4 direction, 2 fire, 1 power) |
| Audio | 3.5mm jack (stereo + digital/composite) |
| Video | GPDI (HDMI-compatible) with level shifter |
| Display | Header for 0.96" SPI OLED (SSD1331) |
| Wireless | ESP32-WROOM-32 (WiFi/Bluetooth, standalone JTAG) |
| ADC | 8 channels, 12-bit, 1 MS/s (MAX11125) |
| Clock | 25 MHz onboard, differential input available |
| Power | 3 switching regulators (1.1V, 2.5V, 3.3V) |
| Sleep | 5 µA standby, RTC wake-up with battery backup |
| Dimensions | 94mm × 51mm |
Why ULX3S for Sampo
The ULX3S isn't just an FPGA breakout board—it's a complete system:
- 32MB SDRAM: Real memory, not just block RAM. Essential for running actual programs.
- HDMI output: Video terminal without external hardware.
- microSD slot: Load programs, implement a filesystem.
- ESP32 co-processor: WiFi-based JTAG debugging from any device.
- Buttons and LEDs: Instant I/O for testing without wiring anything.
- Audio output: Even supports composite video through the audio jack.
Budget Alternative: Tang Nano 9K
Before we dive into the ULX3S, it's worth mentioning a much cheaper option. The Tang Nano 9K (~$15 on AliExpress) uses a Gowin GW1NR-9 FPGA with 8,640 LUTs—more than enough for Sampo:
- 8,640 LUTs
- 64Mbit PSRAM (can serve as the full 64KB address space and then some)
- HDMI output for a video terminal
- USB-C programming
- Fully supported by open-source toolchain (Yosys + nextpnr-gowin)
For initial development and testing, the Tang Nano 9K is hard to beat on price. But the ULX3S offers more I/O, more RAM, and a richer peripheral set—making it the better choice for a more complete Sampo system.
LUT Budget Planning
The Sampo RTL implementation is designed to be compact. Here's the resource breakdown:
| Component | Estimated LUTs |
|---|---|
| 16 × 16-bit registers | ~256 FFs |
| ALU (16-bit) | 200 - 400 |
| Control logic | 500 - 1,000 |
| Instruction decode | 300 - 500 |
| Sampo CPU core | ~1,500 - 2,500 |
| UART (115200 baud) | 200 - 300 |
| SPI controller (SD card) | 300 - 500 |
| GPIO controller | 200 - 400 |
| Basic system | ~2,500 - 4,000 |
| SDRAM controller | 500 - 1,000 |
| Instruction cache | 1,000 - 2,000 |
| Data cache | 1,000 - 2,000 |
| Full system | ~6,000 - 10,000 |
These estimates are based on typical RISC CPU implementations. The actual numbers will depend on optimization choices and synthesis settings.
Variant Recommendations
- 12K LUTs (ULX3S-12F): Plenty for basic Sampo + peripherals, tight for caches.
- 45K LUTs (ULX3S-45F): Comfortable. Full CPU with cache, room for experiments.
- 85K LUTs (ULX3S-85F): Luxurious. Multi-core experiments, extensive peripherals.
Toolchain Setup
The ECP5 toolchain is fully open source:
# macOS (Homebrew) brew install yosys nextpnr-ecp5 ecpprog fujprog # Ubuntu/Debian apt install yosys nextpnr-ecp5 ecpprog # Amaranth HDL (for our existing RTL) pip install amaranth amaranth-boards # Or build FPGA tools from source for latest features git clone https://github.com/YosysHQ/yosys git clone https://github.com/YosysHQ/nextpnr git clone https://github.com/YosysHQ/prjtrellis
Tool Roles
| Tool | Purpose |
|---|---|
| Amaranth | Python-based HDL (generates Verilog) |
| Yosys | Verilog synthesis (RTL → netlist) |
| nextpnr-ecp5 | Place and route (netlist → bitstream) |
| Project Trellis | ECP5 bitstream documentation |
| ecpprog/fujprog | Upload bitstream to board |
Amaranth Build Flow
Since Sampo's RTL is written in Amaranth, the build flow starts with Python:
# Generate Verilog from Amaranth cd rtl/ python -m amaranth generate soc.py > sampo.v # Then synthesize with standard tools yosys -p "synth_ecp5 -top sampo_soc -json sampo.json" sampo.v nextpnr-ecp5 --85k --package CABGA381 \ --lpf ulx3s.lpf --json sampo.json --textcfg sampo.config ecppack sampo.config sampo.bit # Program the board fujprog sampo.bit
Hand-Written Verilog Implementation
In addition to the Amaranth RTL, we now have a complete ai-assisted hand-written Verilog implementation at /verilog/. While Amaranth can generate Verilog, the auto-generated output isn't particularly readable. The hand-written version is designed for clarity and portability:
verilog/ ├── rtl/ │ ├── sampo_pkg.vh # Opcodes, constants, state definitions │ ├── alu.v # 16-bit ALU with all operations │ ├── shifter.v # Barrel shifter (1/4/8-bit shifts, rotates) │ ├── regfile.v # 16 registers + alternate set (EXX) │ ├── decode.v # Instruction decoder │ ├── cpu.v # FSM-based CPU core (8 states) │ ├── ram.v # 64KB synchronous RAM │ ├── uart.v # Simple UART for serial I/O │ └── soc.v # Top-level SoC integration ├── tb/ │ ├── alu_tb.v # ALU unit tests │ ├── regfile_tb.v # Register file tests │ └── sampo_tb.v # Full system testbench ├── programs/ │ └── hello.hex # Test program in Verilog hex format ├── Makefile # Build automation └── bin2hex.py # Convert sasm output to Verilog $readmemh format
The Verilog implementation uses an 8-state FSM for the CPU: RESET → FETCH → FETCH_EXT → DECODE → EXECUTE → MEMORY → WRITEBACK → HALTED. This makes timing predictable and debugging straightforward.
Simulation with Icarus Verilog
The Verilog implementation includes a complete Makefile for testing:
cd verilog/ # Run the main simulation (hello world) make test # Run ALU unit tests make test-alu # Run register file tests make test-regfile # Build with Verilator (faster simulation) make verilate # View waveforms in GTKWave make wave
Sample output from make test:
=== Sampo CPU Testbench === RAM init file: ../programs/hello.hex CPU started at PC=0x0100 UART output: ---------------------------------------- Hello, Sampo! ---------------------------------------- Simulation complete: Final PC: 0x011E Cycles: 847 UART chars: 14 Status: HALTED
The Verilog version is portable to any FPGA toolchain—Xilinx, Intel, Lattice, Gowin—without requiring Amaranth or Python in the build chain.
Implementation Roadmap
With both Amaranth and Verilog implementations complete and tested in simulation, the roadmap is now about bringing them up on hardware.
Phase 1: Core Bring-up ✓ (Complete)
- ✓ Instruction fetch and decode
- ✓ ALU operations (all 16 operations)
- ✓ Barrel shifter (1/4/8-bit shifts, rotates, RCL/RCR)
- ✓ Register file with alternate set (EXX)
- ✓ FSM-based CPU core (8 states)
- ✓ RAM interface (64KB)
- ✓ UART for serial I/O
- ✓ SoC integration
- ✓ Testbenches passing (ALU, regfile, full system)
- ✓ Hello World runs in simulation
Phase 1.5: FPGA Bring-up (Current)
- ○ ULX3S pin constraints (.lpf file)
- ○ Clock setup (PLL from 25MHz)
- ○ Map UART to FTDI
- ○ LED heartbeat / debug outputs
Phase 2: Memory System
- SDRAM controller for 32MB RAM
- Instruction cache (optional but helps timing)
- Basic interrupt handling
Phase 3: Peripherals
- SPI controller for SD card boot
- GPIO controller (buttons, LEDs)
- Timer/counter module
Phase 4: Advanced Features
- Data cache
- MMU for memory protection
- HDMI text console (VGA timing → GPDI)
- ESP32 WiFi integration for wireless debugging
Recommended Tools & Books
Hardware
- Tang Nano 9K FPGA - Budget-friendly FPGA board (~$25 on Amazon, ~$15 on AliExpress)
- USB Logic Analyzer - Essential for debugging signals (24MHz, 8 channels)
Books
If you're new to Verilog or FPGA development, these are excellent starting points:
- Getting Started with FPGAs by Russell Merrick - Beginner-friendly with Verilog and VHDL examples
- Programming FPGAs: Getting Started with Verilog by Simon Monk - Practical hands-on guide
- Verilog by Example by Blaine Readler - Concise reference for working engineers
Resources
- Sampo on GitHub - Full source including assembler, emulator, and RTL
- ULX3S GitHub - Schematics, examples, documentation
- Project Trellis - ECP5 bitstream documentation
- Amaranth HDL - Python-based hardware description
- nextpnr - Place and route tool
- Yosys - Verilog synthesis
Where to Buy
ULX3S: - AliExpress - ~$100-150 depending on variant - Mouser - Official distribution - CrowdSupply - Original campaign page
Tang Nano 9K (budget alternative): - Amazon - ~$25, faster shipping - AliExpress - ~$15, slower shipping
Next up: Getting our first instructions executing on real hardware. Both the Amaranth and Verilog implementations are ready and tested—Hello World runs in simulation and the testbenches pass. Now it's a matter of pin constraints, clock domains, and debugging the inevitable timing issues.