<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>TinyComputers.io (Posts about virtual machine)</title><link>https://tinycomputers.io/</link><description></description><atom:link href="https://tinycomputers.io/categories/virtual-machine.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 A.C. Jokela 
&lt;!-- div style="width: 100%" --&gt;
&lt;a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"&gt;&lt;img alt="" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" /&gt; Creative Commons Attribution-ShareAlike&lt;/a&gt;&amp;nbsp;|&amp;nbsp;
&lt;!-- /div --&gt;
</copyright><lastBuildDate>Mon, 06 Apr 2026 22:12:57 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>A Stack-Based Bytecode VM for Lattice: 100 Opcodes, Serialization, and a Self-Hosted Compiler</title><link>https://tinycomputers.io/posts/a-stack-based-bytecode-vm-for-lattice.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/a-stack-based-bytecode-vm-for-lattice_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;29 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;When I &lt;a href="https://tinycomputers.io/posts/from-tree-walker-to-bytecode-vm-compiling-lattice.html"&gt;first wrote about&lt;/a&gt; Lattice's move from a tree-walking interpreter to a bytecode VM, the instruction set had 62 opcodes, concurrency primitives still delegated to the tree-walker, and programs couldn't be serialized. The VM was a foundation, correct and complete enough to become the default, but clearly a starting point.&lt;/p&gt;
&lt;p&gt;That was ten versions ago. The bytecode VM now has 100 opcodes, compiles concurrency primitives into standalone sub-chunks with zero AST dependency at runtime, ships a binary serialization format for ahead-of-time compilation, includes an ephemeral bump arena for short-lived string temporaries, and (perhaps most satisfyingly) has a self-hosted compiler written entirely in Lattice that produces the same &lt;code&gt;.latc&lt;/code&gt; bytecode files as the C implementation.&lt;/p&gt;
&lt;p&gt;This post walks through what changed and why. The full technical treatment is available as a &lt;a href="https://tinycomputers.io/papers/lattice_vm.pdf"&gt;research paper&lt;/a&gt;; this is the practitioner's version.&lt;/p&gt;
&lt;h3&gt;Why Keep Going&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://tinycomputers.io/posts/from-tree-walker-to-bytecode-vm-compiling-lattice.html"&gt;original bytecode VM&lt;/a&gt; solved the immediate problems: it eliminated recursive AST dispatch overhead and gave Lattice a single execution path for file execution, the REPL, and the WASM playground. But three issues remained.&lt;/p&gt;
&lt;p&gt;First, &lt;code&gt;OP_SCOPE&lt;/code&gt; and &lt;code&gt;OP_SELECT&lt;/code&gt; (Lattice's structured concurrency opcodes) still stored AST node pointers in the constant pool and dropped into the tree-walking evaluator at runtime. This meant the AST had to stay alive during concurrent execution, which defeated one of the main motivations for having a bytecode VM in the first place.&lt;/p&gt;
&lt;p&gt;Second, the AST dependency made serialization impossible. You can serialize bytecode to a file, but you can't easily serialize an arbitrary C pointer to an AST node. Programs had to be parsed and compiled on every run.&lt;/p&gt;
&lt;p&gt;Third, the dispatch loop used a plain &lt;code&gt;switch&lt;/code&gt; statement. Not a crisis, but computed goto dispatch is a well-known improvement for bytecode interpreters, and leaving it on the table felt unnecessary.&lt;/p&gt;
&lt;p&gt;All three problems are solved now. Let me start with the instruction set, since everything else builds on it.&lt;/p&gt;
&lt;h3&gt;100 Opcodes&lt;/h3&gt;
&lt;p&gt;The instruction set grew from 62 to 100 opcodes, organized into 16 functional categories:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Representative opcodes&lt;/th&gt;
&lt;th style="text-align: right;"&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Stack manipulation&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CONSTANT&lt;/code&gt;, &lt;code&gt;NIL&lt;/code&gt;, &lt;code&gt;TRUE&lt;/code&gt;, &lt;code&gt;FALSE&lt;/code&gt;, &lt;code&gt;UNIT&lt;/code&gt;, &lt;code&gt;POP&lt;/code&gt;, &lt;code&gt;DUP&lt;/code&gt;, &lt;code&gt;SWAP&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arithmetic/logical&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ADD&lt;/code&gt;, &lt;code&gt;SUB&lt;/code&gt;, &lt;code&gt;MUL&lt;/code&gt;, &lt;code&gt;DIV&lt;/code&gt;, &lt;code&gt;MOD&lt;/code&gt;, &lt;code&gt;NEG&lt;/code&gt;, &lt;code&gt;NOT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bitwise&lt;/td&gt;
&lt;td&gt;&lt;code&gt;BIT_AND&lt;/code&gt;, &lt;code&gt;BIT_OR&lt;/code&gt;, &lt;code&gt;BIT_XOR&lt;/code&gt;, &lt;code&gt;BIT_NOT&lt;/code&gt;, &lt;code&gt;LSHIFT&lt;/code&gt;, &lt;code&gt;RSHIFT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comparison&lt;/td&gt;
&lt;td&gt;&lt;code&gt;EQ&lt;/code&gt;, &lt;code&gt;NEQ&lt;/code&gt;, &lt;code&gt;LT&lt;/code&gt;, &lt;code&gt;GT&lt;/code&gt;, &lt;code&gt;LTEQ&lt;/code&gt;, &lt;code&gt;GTEQ&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CONCAT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Variables&lt;/td&gt;
&lt;td&gt;&lt;code&gt;GET/SET_LOCAL&lt;/code&gt;, &lt;code&gt;GET/SET/DEFINE_GLOBAL&lt;/code&gt;, &lt;code&gt;GET/SET_UPVALUE&lt;/code&gt;, &lt;code&gt;CLOSE_UPVALUE&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control flow&lt;/td&gt;
&lt;td&gt;&lt;code&gt;JUMP&lt;/code&gt;, &lt;code&gt;JUMP_IF_FALSE&lt;/code&gt;, &lt;code&gt;JUMP_IF_TRUE&lt;/code&gt;, &lt;code&gt;JUMP_IF_NOT_NIL&lt;/code&gt;, &lt;code&gt;LOOP&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Functions&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CALL&lt;/code&gt;, &lt;code&gt;CLOSURE&lt;/code&gt;, &lt;code&gt;RETURN&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Iterators&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ITER_INIT&lt;/code&gt;, &lt;code&gt;ITER_NEXT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data structures&lt;/td&gt;
&lt;td&gt;&lt;code&gt;BUILD_ARRAY&lt;/code&gt;, &lt;code&gt;INDEX&lt;/code&gt;, &lt;code&gt;SET_INDEX&lt;/code&gt;, &lt;code&gt;GET_FIELD&lt;/code&gt;, &lt;code&gt;INVOKE&lt;/code&gt;, etc.&lt;/td&gt;
&lt;td style="text-align: right;"&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exceptions/defer&lt;/td&gt;
&lt;td&gt;&lt;code&gt;PUSH_EXCEPTION_HANDLER&lt;/code&gt;, &lt;code&gt;THROW&lt;/code&gt;, &lt;code&gt;DEFER_PUSH&lt;/code&gt;, &lt;code&gt;DEFER_RUN&lt;/code&gt;, etc.&lt;/td&gt;
&lt;td style="text-align: right;"&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase system&lt;/td&gt;
&lt;td&gt;&lt;code&gt;FREEZE&lt;/code&gt;, &lt;code&gt;THAW&lt;/code&gt;, &lt;code&gt;CLONE&lt;/code&gt;, &lt;code&gt;MARK_FLUID&lt;/code&gt;, &lt;code&gt;REACT&lt;/code&gt;, &lt;code&gt;BOND&lt;/code&gt;, &lt;code&gt;SEED&lt;/code&gt;, etc.&lt;/td&gt;
&lt;td style="text-align: right;"&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Builtins/modules&lt;/td&gt;
&lt;td&gt;&lt;code&gt;PRINT&lt;/code&gt;, &lt;code&gt;IMPORT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SCOPE&lt;/code&gt;, &lt;code&gt;SELECT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integer fast paths&lt;/td&gt;
&lt;td&gt;&lt;code&gt;INC_LOCAL&lt;/code&gt;, &lt;code&gt;DEC_LOCAL&lt;/code&gt;, &lt;code&gt;ADD_INT&lt;/code&gt;, &lt;code&gt;SUB_INT&lt;/code&gt;, &lt;code&gt;LOAD_INT8&lt;/code&gt;, etc.&lt;/td&gt;
&lt;td style="text-align: right;"&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wide variants&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CONSTANT_16&lt;/code&gt;, &lt;code&gt;GET_GLOBAL_16&lt;/code&gt;, &lt;code&gt;SET_GLOBAL_16&lt;/code&gt;, &lt;code&gt;DEFINE_GLOBAL_16&lt;/code&gt;, &lt;code&gt;CLOSURE_16&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Special&lt;/td&gt;
&lt;td&gt;&lt;code&gt;RESET_EPHEMERAL&lt;/code&gt;, &lt;code&gt;HALT&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td style="text-align: right;"&gt;&lt;strong&gt;100&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The growth came from three directions: the integer fast-path opcodes (8 new), the wide constant variants (5 new), and the concurrency/arena opcodes. Let me explain each.&lt;/p&gt;
&lt;h4&gt;Integer Fast Paths&lt;/h4&gt;
&lt;p&gt;Tight loops like &lt;code&gt;for i in 0..1000&lt;/code&gt; spend most of their time incrementing a counter and comparing it to a bound. The generic &lt;code&gt;OP_ADD&lt;/code&gt; has to check whether its operands are integers, floats, or strings (for concatenation), which adds branching overhead on every iteration.&lt;/p&gt;
&lt;p&gt;The integer fast-path opcodes (&lt;code&gt;OP_ADD_INT&lt;/code&gt;, &lt;code&gt;OP_SUB_INT&lt;/code&gt;, &lt;code&gt;OP_MUL_INT&lt;/code&gt;, &lt;code&gt;OP_LT_INT&lt;/code&gt;, &lt;code&gt;OP_LTEQ_INT&lt;/code&gt;) skip the type check entirely and operate directly on &lt;code&gt;int64_t&lt;/code&gt; values. &lt;code&gt;OP_INC_LOCAL&lt;/code&gt; and &lt;code&gt;OP_DEC_LOCAL&lt;/code&gt; handle the &lt;code&gt;i += 1&lt;/code&gt; and &lt;code&gt;i -= 1&lt;/code&gt; patterns as single-byte instructions that modify the stack slot in place, no push or pop required.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;OP_LOAD_INT8&lt;/code&gt; encodes a signed byte directly in the instruction stream. The integer &lt;code&gt;42&lt;/code&gt; becomes two bytes (&lt;code&gt;OP_LOAD_INT8&lt;/code&gt;, &lt;code&gt;0x2A&lt;/code&gt;) instead of a three-byte &lt;code&gt;OP_CONSTANT&lt;/code&gt; plus an eight-byte constant pool entry. Any integer in [-128, 127] gets this treatment.&lt;/p&gt;
&lt;h4&gt;Wide Constant Variants&lt;/h4&gt;
&lt;p&gt;The original instruction set used a single byte for constant pool indices, limiting each chunk to 256 constants. This is fine for most functions, but the self-hosted compiler (a 2,000-line Lattice program compiled as a single top-level script) blows past that limit easily.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;OP_CONSTANT_16&lt;/code&gt;, &lt;code&gt;OP_GET_GLOBAL_16&lt;/code&gt;, &lt;code&gt;OP_SET_GLOBAL_16&lt;/code&gt;, &lt;code&gt;OP_DEFINE_GLOBAL_16&lt;/code&gt;, and &lt;code&gt;OP_CLOSURE_16&lt;/code&gt; use two-byte big-endian indices, supporting up to 65,536 constants per chunk. The compiler automatically switches to wide variants when an index exceeds 255.&lt;/p&gt;
&lt;h3&gt;The Compiler&lt;/h3&gt;
&lt;p&gt;The bytecode compiler performs a single-pass walk over the AST. It maintains a chain of &lt;code&gt;Compiler&lt;/code&gt; structs linked via &lt;code&gt;enclosing&lt;/code&gt; pointers, one per function being compiled. Variable references resolve through three tiers: local (scan the current compiler's locals array), upvalue (recursively check enclosing compilers), and global (fall through to &lt;code&gt;OP_GET_GLOBAL&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Three compilation modes handle different use cases. &lt;code&gt;compile()&lt;/code&gt; is the standard file mode: it compiles all declarations and emits an implicit call to &lt;code&gt;main()&lt;/code&gt; if one is defined. &lt;code&gt;compile_module()&lt;/code&gt; is for imports, identical to &lt;code&gt;compile()&lt;/code&gt; but skips the auto-call. &lt;code&gt;compile_repl()&lt;/code&gt; preserves the last expression on the stack as the iteration's return value (displayed with &lt;code&gt;=&amp;gt;&lt;/code&gt; prefix) and keeps the known-enum table alive across REPL iterations so enum declarations persist.&lt;/p&gt;
&lt;p&gt;The compiler implements several optimizations during code generation. Binary operations on literal operands are folded at compile time: &lt;code&gt;3 + 4&lt;/code&gt; emits a single &lt;code&gt;OP_LOAD_INT8 7&lt;/code&gt; rather than two loads and an &lt;code&gt;OP_ADD&lt;/code&gt;. The pattern &lt;code&gt;x += 1&lt;/code&gt; is detected and emitted as the single-byte &lt;code&gt;OP_INC_LOCAL&lt;/code&gt;, which modifies the stack slot in place. And every statement is wrapped by &lt;code&gt;compile_stmt_reset()&lt;/code&gt;, which appends &lt;code&gt;OP_RESET_EPHEMERAL&lt;/code&gt; to trigger the ephemeral arena cleanup.&lt;/p&gt;
&lt;h3&gt;Computed Goto Dispatch&lt;/h3&gt;
&lt;p&gt;The dispatch loop now uses GCC/Clang's labels-as-values extension for computed goto:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#ifdef VM_USE_COMPUTED_GOTO&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;dispatch_table&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;OP_CONSTANT&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;lbl_OP_CONSTANT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;OP_NIL&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;lbl_OP_NIL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... all 100 entries&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="cp"&gt;#endif&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(;;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="cp"&gt;#ifdef VM_USE_COMPUTED_GOTO&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;dispatch_table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="cp"&gt;#endif&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;switch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Each opcode handler ends with a &lt;code&gt;goto *dispatch_table[READ_BYTE()]&lt;/code&gt; rather than breaking back to the top of the loop. This eliminates the switch statement's bounds check and branch table indirection, replacing it with a single indirect jump. The CPU's branch predictor sees different jump sites for different opcodes, which improves prediction accuracy compared to a single switch that all opcodes funnel through.&lt;/p&gt;
&lt;p&gt;On platforms without the extension, it falls back to a standard switch. The VM works correctly either way.&lt;/p&gt;
&lt;h3&gt;Pre-Compiled Concurrency&lt;/h3&gt;
&lt;p&gt;This is the change I'm most pleased with, because it solves the problem cleanly.&lt;/p&gt;
&lt;p&gt;Lattice has three concurrency primitives: &lt;code&gt;scope&lt;/code&gt; defines a concurrent region, &lt;code&gt;spawn&lt;/code&gt; launches a task within that region, and &lt;code&gt;select&lt;/code&gt; multiplexes over channels. In the tree-walker, these work by passing AST node pointers to spawned threads, which then evaluate the subtrees independently. The bytecode VM's original implementation did the same thing: &lt;code&gt;OP_SCOPE&lt;/code&gt; stored an &lt;code&gt;Expr*&lt;/code&gt; pointer in the constant pool and called the tree-walking evaluator at runtime.&lt;/p&gt;
&lt;p&gt;The solution is to compile each concurrent body into a standalone &lt;code&gt;Chunk&lt;/code&gt; at compile time. The compiler provides two helpers: &lt;code&gt;compile_sub_body()&lt;/code&gt; for statement blocks and &lt;code&gt;compile_sub_expr()&lt;/code&gt; for expressions. Each creates a fresh &lt;code&gt;Compiler&lt;/code&gt;, compiles the code into a new chunk, emits &lt;code&gt;OP_HALT&lt;/code&gt;, and stores the resulting chunk in the parent's constant pool as a &lt;code&gt;VAL_CLOSURE&lt;/code&gt; constant.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;OP_SCOPE&lt;/code&gt; uses variable-length encoding: a spawn count, a sync body chunk index, and one chunk index per spawn body. At runtime, the VM:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Exports locals&lt;/strong&gt; to the global environment using the &lt;code&gt;local_names&lt;/code&gt; debug table, so sub-chunks can access parent variables via &lt;code&gt;OP_GET_GLOBAL&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Runs the sync body&lt;/strong&gt; (if present) via a recursive &lt;code&gt;vm_run()&lt;/code&gt; call&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Spawns threads&lt;/strong&gt; for each spawn body, each running on a cloned VM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Joins&lt;/strong&gt; all threads and propagates errors&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;OP_SELECT&lt;/code&gt; similarly encodes per-arm metadata: flags, channel expression chunk index, body chunk index, and binding name index. The VM evaluates channel expressions, polls for readiness, and executes the winning arm.&lt;/p&gt;
&lt;p&gt;The key insight is that sub-chunks run as &lt;code&gt;FUNC_SCRIPT&lt;/code&gt; without lexical access to the parent's locals. Since they can't use upvalues to reach into the parent frame, the VM exports the parent's live locals into the global environment before running any sub-chunk, using a pushed scope that gets popped after all sub-chunks complete. This is slightly more expensive than true lexical capture, but it keeps the sub-chunks completely self-contained: no AST, no parent frame dependency, fully serializable.&lt;/p&gt;
&lt;h3&gt;Bytecode Serialization&lt;/h3&gt;
&lt;p&gt;With AST dependency eliminated, serialization becomes straightforward. The &lt;code&gt;.latc&lt;/code&gt; binary format starts with an 8-byte header:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;[4C 41 54 43]  magic: "LATC"
[01 00]        format version: 1
[00 00]        reserved
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The rest is a recursive chunk encoding: code length + bytecode bytes, line numbers for source mapping, typed constants (with a one-byte type tag for each), and local name debug info. Constants use seven type tags:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: right;"&gt;Tag&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Encoding&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;0&lt;/td&gt;
&lt;td&gt;Int&lt;/td&gt;
&lt;td&gt;8-byte signed LE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;1&lt;/td&gt;
&lt;td&gt;Float&lt;/td&gt;
&lt;td&gt;8-byte IEEE 754&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;2&lt;/td&gt;
&lt;td&gt;Bool&lt;/td&gt;
&lt;td&gt;1 byte&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;3&lt;/td&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;length-prefixed (u32 + bytes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;4&lt;/td&gt;
&lt;td&gt;Nil&lt;/td&gt;
&lt;td&gt;no payload&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;5&lt;/td&gt;
&lt;td&gt;Unit&lt;/td&gt;
&lt;td&gt;no payload&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: right;"&gt;6&lt;/td&gt;
&lt;td&gt;Closure&lt;/td&gt;
&lt;td&gt;param count + variadic flag + recursive sub-chunk&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The &lt;code&gt;Closure&lt;/code&gt; tag is what makes this recursive: a function constant contains its parameter metadata followed by a complete serialized sub-chunk. Nested functions serialize naturally to arbitrary depth.&lt;/p&gt;
&lt;p&gt;The CLI integrates this cleanly:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Compile to .latc&lt;/span&gt;
clat&lt;span class="w"&gt; &lt;/span&gt;compile&lt;span class="w"&gt; &lt;/span&gt;input.lat&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;output.latc

&lt;span class="c1"&gt;# Run pre-compiled bytecode (auto-detects .latc suffix)&lt;/span&gt;
clat&lt;span class="w"&gt; &lt;/span&gt;output.latc

&lt;span class="c1"&gt;# Or compile and run in one step (the default)&lt;/span&gt;
clat&lt;span class="w"&gt; &lt;/span&gt;input.lat
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Loading validates magic bytes, checks the format version, and uses a bounds-checking &lt;code&gt;ByteReader&lt;/code&gt; that produces descriptive error messages for truncated or malformed inputs.&lt;/p&gt;
&lt;h3&gt;The Ephemeral Bump Arena&lt;/h3&gt;
&lt;p&gt;String concatenation is a common source of short-lived allocations. An expression like &lt;code&gt;"hello " + name + "!"&lt;/code&gt; creates intermediate strings that are immediately consumed and discarded. In a language with deep-clone-on-read semantics, these temporaries add up.&lt;/p&gt;
&lt;p&gt;The ephemeral bump arena is a simple optimization: string concatenation in &lt;code&gt;OP_ADD&lt;/code&gt; and &lt;code&gt;OP_CONCAT&lt;/code&gt; allocates into a bump arena (&lt;code&gt;vm-&amp;gt;ephemeral&lt;/code&gt;) instead of the general-purpose heap. These allocations are tagged with &lt;code&gt;REGION_EPHEMERAL&lt;/code&gt;, and &lt;code&gt;OP_RESET_EPHEMERAL&lt;/code&gt; (emitted by the compiler at every statement boundary) resets the arena in O(1), reclaiming all temporary strings at once.&lt;/p&gt;
&lt;p&gt;The tricky part is escape analysis. If a temporary string gets assigned to a global variable, stored in an array, or passed to a compiled closure, it needs to be promoted out of the ephemeral arena before the arena is reset. The VM handles this at specific escape points: &lt;code&gt;OP_DEFINE_GLOBAL&lt;/code&gt;, &lt;code&gt;OP_CALL&lt;/code&gt; (for compiled closures), &lt;code&gt;array.push&lt;/code&gt;, and &lt;code&gt;OP_SET_INDEX_LOCAL&lt;/code&gt;. Each of these calls &lt;code&gt;vm_promote_value()&lt;/code&gt;, which deep-clones the string to the regular heap if its region is ephemeral.&lt;/p&gt;
&lt;p&gt;The arena uses a page-based allocator with 4 KB pages. Resetting doesn't free pages; it just moves the bump pointer back to zero, so subsequent allocations reuse the same memory without any &lt;code&gt;malloc&lt;/code&gt;/&lt;code&gt;free&lt;/code&gt; overhead. The full design and safety proof are covered in a &lt;a href="https://tinycomputers.io/papers/lattice_arena_safety.pdf"&gt;companion paper&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Closures and the Storage Hack&lt;/h3&gt;
&lt;p&gt;The upvalue system hasn't changed architecturally since the &lt;a href="https://tinycomputers.io/posts/from-tree-walker-to-bytecode-vm-compiling-lattice.html"&gt;first VM post&lt;/a&gt;; it's still the Lua-inspired open/closed model where &lt;code&gt;ObjUpvalue&lt;/code&gt; structs start pointing into the stack and get closed (deep-cloned to the heap) when variables go out of scope. But the encoding grew to accommodate the wider instruction set.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;OP_CLOSURE&lt;/code&gt; uses variable-length encoding: a constant pool index for the function's compiled chunk, an upvalue count, and then &lt;code&gt;[is_local, index]&lt;/code&gt; byte pairs for each captured variable. &lt;code&gt;OP_CLOSURE_16&lt;/code&gt; uses a two-byte big-endian function index for chunks with more than 256 constants.&lt;/p&gt;
&lt;p&gt;The storage hack (repurposing &lt;code&gt;closure.body&lt;/code&gt; (NULL), &lt;code&gt;closure.native_fn&lt;/code&gt; (Chunk pointer), &lt;code&gt;closure.captured_env&lt;/code&gt; (ObjUpvalue** cast), and &lt;code&gt;region_id&lt;/code&gt; (upvalue count)) remains unchanged. A sentinel value &lt;code&gt;VM_NATIVE_MARKER&lt;/code&gt; distinguishes C-native functions from compiled closures:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#define VM_NATIVE_MARKER ((struct Expr **)(uintptr_t)0x1)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A closure with &lt;code&gt;body == NULL&lt;/code&gt; and &lt;code&gt;native_fn != NULL&lt;/code&gt; is either a C native (if &lt;code&gt;default_values == VM_NATIVE_MARKER&lt;/code&gt;) or a compiled bytecode function (otherwise). This avoids adding VM-specific fields to the &lt;code&gt;LatValue&lt;/code&gt; union, which matters when values are deep-cloned frequently.&lt;/p&gt;
&lt;h3&gt;The Self-Hosted Compiler&lt;/h3&gt;
&lt;p&gt;The file &lt;code&gt;compiler/latc.lat&lt;/code&gt; is a bytecode compiler written entirely in Lattice, approximately 2,060 lines that read &lt;code&gt;.lat&lt;/code&gt; source, produce bytecode, and write &lt;code&gt;.latc&lt;/code&gt; files using the same binary format as the C implementation:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Use the self-hosted compiler&lt;/span&gt;
clat&lt;span class="w"&gt; &lt;/span&gt;compiler/latc.lat&lt;span class="w"&gt; &lt;/span&gt;input.lat&lt;span class="w"&gt; &lt;/span&gt;output.latc

&lt;span class="c1"&gt;# Run the result&lt;/span&gt;
clat&lt;span class="w"&gt; &lt;/span&gt;output.latc
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The architecture mirrors the C compiler: lexing via the built-in &lt;code&gt;tokenize()&lt;/code&gt; function, a recursive-descent parser, single-pass code emission, and scope management with upvalue resolution. But Lattice's value semantics required some creative workarounds.&lt;/p&gt;
&lt;p&gt;The biggest constraint is that structs and maps are pass-by-value. In C, the compiler uses a &lt;code&gt;Compiler&lt;/code&gt; struct with mutable fields: local arrays, scope depth, a chunk pointer. In Lattice, passing a struct to a function creates a copy, so mutations in the callee don't propagate back. The self-hosted compiler works around this with parallel global arrays: &lt;code&gt;code&lt;/code&gt;, &lt;code&gt;constants&lt;/code&gt;, &lt;code&gt;c_lines&lt;/code&gt;, &lt;code&gt;local_names&lt;/code&gt;, &lt;code&gt;local_depths&lt;/code&gt;, &lt;code&gt;local_captured&lt;/code&gt;. Since array mutations via &lt;code&gt;.push()&lt;/code&gt; and index assignment are in-place (via &lt;code&gt;resolve_lvalue&lt;/code&gt;), global arrays work where structs don't.&lt;/p&gt;
&lt;p&gt;Nested function compilation uses explicit &lt;code&gt;save_compiler()&lt;/code&gt; / &lt;code&gt;restore_compiler()&lt;/code&gt; functions that copy all global arrays to local temporaries and back. It's verbose but correct. The Buffer type (used for serialization output) is also pass-by-value, so a global &lt;code&gt;ser_buf&lt;/code&gt; accumulates serialized bytes across function calls.&lt;/p&gt;
&lt;p&gt;Other language constraints: no &lt;code&gt;else if&lt;/code&gt; (requires &lt;code&gt;else { if ... }&lt;/code&gt; or &lt;code&gt;match&lt;/code&gt;), mandatory type annotations on function parameters (&lt;code&gt;fn foo(a: any)&lt;/code&gt;), and &lt;code&gt;test&lt;/code&gt; is a keyword so you can't use it as an identifier.&lt;/p&gt;
&lt;p&gt;The self-hosted compiler currently handles expressions, variables, functions with closures, control flow (if/else, while, loop, for, break, continue, match), structs, enums, exceptions, defer, string interpolation, and imports. Not yet implemented: concurrency primitives and advanced phase operations (react, bond, seed). The bootstrapping chain is:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;latc.lat → [C VM interprets] → output.latc → [C VM executes]
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Full self-hosting (where &lt;code&gt;latc.lat&lt;/code&gt; compiles itself) requires adding concurrency support and closing the remaining feature gaps.&lt;/p&gt;
&lt;h3&gt;The VM Execution Engine&lt;/h3&gt;
&lt;p&gt;The VM maintains a 4,096-slot value stack, a 256-frame call stack, an exception handler stack (64 entries), a defer stack (256 entries), a global environment, the open upvalue linked list, the ephemeral arena, and a module cache. A pre-allocated &lt;code&gt;fast_args[16]&lt;/code&gt; buffer avoids heap allocation for most native function calls.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;OP_CALL&lt;/code&gt; instruction discriminates three callee types. Native C functions (marked with &lt;code&gt;VM_NATIVE_MARKER&lt;/code&gt;) get the fast path: arguments are popped into &lt;code&gt;fast_args&lt;/code&gt;, the C function pointer is invoked, and the return value is pushed. No call frame allocated. Compiled closures get the full treatment: the VM promotes ephemeral values in the current frame (so the callee's &lt;code&gt;OP_RESET_EPHEMERAL&lt;/code&gt; doesn't invalidate the caller's temporaries), then pushes a new &lt;code&gt;CallFrame&lt;/code&gt; with the instruction pointer at byte 0 of the callee's chunk. Callable structs look up a constructor-named field and dispatch accordingly.&lt;/p&gt;
&lt;p&gt;Exception handling uses a handler stack. &lt;code&gt;OP_PUSH_EXCEPTION_HANDLER&lt;/code&gt; records the current IP, chunk, call frame index, and stack top. When &lt;code&gt;OP_THROW&lt;/code&gt; executes, the nearest handler is popped, the call frame and value stacks are unwound, the error value is pushed, and execution resumes at the handler's saved IP. Deferred blocks interact correctly: &lt;code&gt;OP_DEFER_RUN&lt;/code&gt; executes all defer entries registered at or above the current frame before the frame is popped by &lt;code&gt;OP_RETURN&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Iterators avoid closure allocation entirely. &lt;code&gt;OP_ITER_INIT&lt;/code&gt; converts a range or array into an internal iterator occupying two stack slots (collection + cursor index). &lt;code&gt;OP_ITER_NEXT&lt;/code&gt; advances the cursor, pushes the next element, or jumps to a specified offset when exhausted. The tree-walker used closure-based iterators for &lt;code&gt;for&lt;/code&gt; loops; the bytecode version is simpler and avoids the allocation.&lt;/p&gt;
&lt;h3&gt;Ref&amp;lt;T&amp;gt;: The Escape Hatch from Value Semantics&lt;/h3&gt;
&lt;p&gt;Everything described so far operates in a world where values are deep-cloned on every read. Maps are pass-by-value. Structs are pass-by-value. Pass a collection to a function and the function gets its own copy; mutations don't propagate back. This is correct and eliminates aliasing bugs, but it creates a real problem: how do you share mutable state when you actually need to?&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Ref&amp;lt;T&amp;gt;&lt;/code&gt; is the answer. It's a reference-counted shared mutable wrapper, the one type in Lattice that deliberately breaks value semantics:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;LatRef&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// the wrapped inner value&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;refcount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// reference count&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When a &lt;code&gt;Ref&lt;/code&gt; is cloned (which happens on every variable read, like everything else), the VM bumps the refcount and copies the pointer. It does &lt;em&gt;not&lt;/em&gt; deep-clone the inner value. Multiple copies of a &lt;code&gt;Ref&lt;/code&gt; share the same underlying &lt;code&gt;LatRef&lt;/code&gt;, so mutations through one are visible through all others. This is the explicit opt-in to reference semantics that the rest of the language avoids.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;let r = Ref::new([1, 2, 3])
let r2 = r              // shallow copy — same LatRef
r.push(4)
print(r2.get())          // [1, 2, 3, 4] — shared state
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The VM provides transparent proxying: &lt;code&gt;OP_INDEX&lt;/code&gt;, &lt;code&gt;OP_SET_INDEX&lt;/code&gt;, and &lt;code&gt;OP_INVOKE&lt;/code&gt; all check for &lt;code&gt;VAL_REF&lt;/code&gt; and delegate to the inner value. Indexing into a &lt;code&gt;Ref&amp;lt;Array&amp;gt;&lt;/code&gt; indexes the inner array. Calling &lt;code&gt;.push()&lt;/code&gt; on a &lt;code&gt;Ref&amp;lt;Array&amp;gt;&lt;/code&gt; mutates the inner array directly. At the language level, a Ref mostly behaves like the value it wraps; you just get shared mutation instead of isolated copies.&lt;/p&gt;
&lt;p&gt;Ref has its own methods (&lt;code&gt;get()&lt;/code&gt;/&lt;code&gt;deref()&lt;/code&gt; to clone the inner value out, &lt;code&gt;set(v)&lt;/code&gt; to replace it, &lt;code&gt;inner_type()&lt;/code&gt; to inspect the wrapped type) plus proxied methods for whatever the inner value supports (map &lt;code&gt;set&lt;/code&gt;/&lt;code&gt;get&lt;/code&gt;/&lt;code&gt;keys&lt;/code&gt;, array &lt;code&gt;push&lt;/code&gt;/&lt;code&gt;pop&lt;/code&gt;, etc.).&lt;/p&gt;
&lt;p&gt;The phase system applies to Refs too. Freezing a Ref blocks all mutation: &lt;code&gt;set()&lt;/code&gt;, &lt;code&gt;push()&lt;/code&gt;, index assignment all check &lt;code&gt;obj-&amp;gt;phase == VTAG_CRYSTAL&lt;/code&gt; and error with "cannot set on a frozen Ref." This makes frozen Refs safe to share across concurrent boundaries; they're immutable handles to immutable data.&lt;/p&gt;
&lt;p&gt;This introduces a third memory management strategy alongside the dual-heap (mark-and-sweep for fluid values, arenas for crystal values) and the ephemeral bump arena. Refs use reference counting: &lt;code&gt;ref_retain()&lt;/code&gt; on clone, &lt;code&gt;ref_release()&lt;/code&gt; on free, with the inner value freed when the count hits zero. It's a deliberate trade-off: reference counting is simple and deterministic, and since Refs are the uncommon case (most Lattice code uses value semantics), the lack of cycle collection hasn't been an issue in practice.&lt;/p&gt;
&lt;h3&gt;Validation&lt;/h3&gt;
&lt;p&gt;The VM is validated by &lt;strong&gt;815 tests&lt;/strong&gt; covering every feature: arithmetic, closures, upvalues, phase transitions, exception handling, defer, iterators, data structures, concurrency, modules, bytecode serialization, and the self-hosted compiler.&lt;/p&gt;
&lt;p&gt;All 815 tests pass under both normal compilation and AddressSanitizer builds (&lt;code&gt;make asan&lt;/code&gt;), which dynamically checks for heap buffer overflows, use-after-free, stack buffer overflows, and memory leaks. For a VM with manual memory management, upvalue lifetime tracking, and an ephemeral arena that reclaims memory at statement boundaries, ASan validation is essential.&lt;/p&gt;
&lt;p&gt;Both execution modes (bytecode VM (default) and tree-walker (&lt;code&gt;--tree-walk&lt;/code&gt;)) share the same test suite and produce identical results:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;make&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;test&lt;/span&gt;&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;# bytecode VM: 815 passed&lt;/span&gt;
make&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;TREE_WALK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;# tree-walker: 815 passed&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Feature parity is complete:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th style="text-align: center;"&gt;Tree-walker&lt;/th&gt;
&lt;th style="text-align: center;"&gt;Bytecode VM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Phase system (freeze/thaw/clone/forge)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Closures with upvalues&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exception handling (try/catch/throw)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Defer blocks&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pattern matching&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structs with methods&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enums with payloads&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arrays, maps, tuples, sets, buffers&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Iterators (for-in, ranges)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Module imports&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency (scope/spawn/select)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Channels&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase reactions/bonds/seeds&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Contracts (require/ensure)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Variable tracking (history)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bytecode serialization (.latc)&lt;/td&gt;
&lt;td style="text-align: center;"&gt;---&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Computed goto dispatch&lt;/td&gt;
&lt;td style="text-align: center;"&gt;---&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ephemeral bump arena&lt;/td&gt;
&lt;td style="text-align: center;"&gt;---&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Specialized integer ops&lt;/td&gt;
&lt;td style="text-align: center;"&gt;---&lt;/td&gt;
&lt;td style="text-align: center;"&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The last four rows are VM-only features that have no tree-walker equivalent.&lt;/p&gt;
&lt;h3&gt;What's Next&lt;/h3&gt;
&lt;p&gt;The VM is feature-complete but not performance-optimized. The obvious next steps are register allocation to reduce stack traffic, type-specialized dispatch paths guided by runtime profiling, tail call optimization for recursive patterns, and constant pool deduplication across compilation units. Further out, the bytecode provides a natural intermediate representation for JIT compilation.&lt;/p&gt;
&lt;p&gt;On the self-hosting front, adding concurrency primitives to &lt;code&gt;latc.lat&lt;/code&gt; would close the gap to full self-compilation, where the Lattice compiler compiles itself, producing a &lt;code&gt;.latc&lt;/code&gt; file that can then compile other programs without the C implementation in the loop.&lt;/p&gt;
&lt;p&gt;The full technical details (including encoding diagrams, the complete opcode listing, compilation walkthroughs, and references to related work in Lua, CPython, YARV, and WebAssembly) are in the &lt;a href="https://tinycomputers.io/papers/lattice_vm.pdf"&gt;research paper&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The source code is at &lt;a href="https://baud.rs/fIe3gx"&gt;github.com/ajokela/lattice&lt;/a&gt;, and the project site is at &lt;a href="https://baud.rs/bwvnYT"&gt;lattice-lang.org&lt;/a&gt;.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;git clone https://github.com/ajokela/lattice.git
cd lattice &amp;amp;&amp;amp; make
./clat
&lt;/pre&gt;&lt;/div&gt;</description><category>bytecode</category><category>c</category><category>closures</category><category>compilers</category><category>concurrency</category><category>interpreters</category><category>language design</category><category>lattice</category><category>phase system</category><category>programming languages</category><category>self-hosting</category><category>serialization</category><category>upvalues</category><category>virtual machine</category><guid>https://tinycomputers.io/posts/a-stack-based-bytecode-vm-for-lattice.html</guid><pubDate>Fri, 20 Feb 2026 18:00:00 GMT</pubDate></item><item><title>From Tree-Walker to Bytecode VM: Compiling Lattice</title><link>https://tinycomputers.io/posts/from-tree-walker-to-bytecode-vm-compiling-lattice.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/from-tree-walker-to-bytecode-vm-compiling-lattice_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;16 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Lattice is a programming language built around a &lt;a href="https://tinycomputers.io/posts/introducing-lattice-a-crystallization-based-programming-language.html"&gt;crystallization-based phase system&lt;/a&gt;: values start as mutable "flux" and can be frozen into immutable "fix," with the runtime enforcing the transition and providing &lt;a href="https://tinycomputers.io/posts/mutability-as-a-first-class-concept-the-lattice-phase-system.html"&gt;reactions, bonds, contracts, and temporal tracking&lt;/a&gt; around it. It's implemented in C with no external dependencies.&lt;/p&gt;
&lt;p&gt;When I started building &lt;a href="https://baud.rs/q5yFwI"&gt;Lattice&lt;/a&gt;, a tree-walking interpreter was the obvious first move. You parse source into an AST, walk the nodes recursively, and evaluate as you go. It's straightforward, easy to debug, and lets you iterate on language semantics quickly without worrying about a second representation. &lt;a href="https://baud.rs/crafting-interpreters"&gt;&lt;em&gt;Crafting Interpreters&lt;/em&gt;&lt;/a&gt; calls this approach "the simplest way to build an interpreter," and it's right.&lt;/p&gt;
&lt;p&gt;But tree-walkers have well-known limitations. Every expression evaluation descends through function calls: &lt;code&gt;eval_expr&lt;/code&gt; calling &lt;code&gt;eval_binary&lt;/code&gt; calling &lt;code&gt;eval_expr&lt;/code&gt; twice more. The overhead compounds. You're chasing pointers through heap-allocated AST nodes with poor cache locality. And the call stack of the host language (C, in Lattice's case) becomes tangled with the call stack of the guest language, making it harder to implement features like error recovery and coroutines cleanly.&lt;/p&gt;
&lt;p&gt;Lattice v0.3.0 shipped a bytecode compiler and stack-based virtual machine alongside the tree-walker. In v0.3.1, the bytecode VM became the default for file execution, the interactive REPL, and the browser-based playground. The tree-walker is still available via &lt;code&gt;--tree-walk&lt;/code&gt;, but the VM now handles everything. This post walks through the architecture of that VM, some design decisions that turned out to matter, and a mutation bug that only surfaces when you combine deep-clone-on-read semantics with in-place method dispatch.&lt;/p&gt;
&lt;h3&gt;Architecture Overview&lt;/h3&gt;
&lt;p&gt;The bytecode pipeline has three stages: lexing and parsing (shared with the tree-walker), compilation from AST to bytecode chunks, and execution on a stack-based VM. The compiler and VM together add about 8,200 lines of C to the codebase, bringing the total to around 33,000 lines.&lt;/p&gt;
&lt;p&gt;A &lt;code&gt;Chunk&lt;/code&gt; is the compilation unit: a dynamic array of bytecode instructions, a constant pool, and debug metadata mapping instructions back to source line numbers. The compiler walks the AST and emits bytes into a chunk. The VM reads bytes from the chunk and executes them against a value stack.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;typedef&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="c1"&gt;// bytecode array&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;constants&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;// constant pool&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;const_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;const_cap&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;// source line per instruction&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;local_names&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// slot → variable name (debug)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;local_name_cap&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The VM itself is a &lt;code&gt;for(;;)&lt;/code&gt; loop with a &lt;code&gt;switch&lt;/code&gt; on the current opcode byte, the textbook approach. No computed gotos, no threaded dispatch, no JIT. Just a switch. On modern hardware with branch prediction, a well-organized switch over 62 opcodes is fast enough that the overhead is negligible compared to the cost of actual operations (string allocation, hash table lookups, deep cloning).&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(;;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;switch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;OP_CONSTANT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;OP_ADD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;OP_CALL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// 59 more cases&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The value stack holds 4,096 slots. The call frame stack holds 256 frames. Each &lt;code&gt;CallFrame&lt;/code&gt; tracks its own instruction pointer, a base pointer into the value stack for its local variables, and an array of captured upvalues for closures. When you call a function, the VM pushes a new frame pointing at the callee's chunk. When the function returns, the frame pops and execution resumes in the caller.&lt;/p&gt;
&lt;h3&gt;The Instruction Set&lt;/h3&gt;
&lt;p&gt;Lattice's instruction set has 62 opcodes. Some are standard (&lt;code&gt;OP_ADD&lt;/code&gt;, &lt;code&gt;OP_JUMP_IF_FALSE&lt;/code&gt;, &lt;code&gt;OP_RETURN&lt;/code&gt;). Others exist because of Lattice-specific semantics.&lt;/p&gt;
&lt;p&gt;The phase system needs dedicated opcodes. &lt;code&gt;OP_FREEZE&lt;/code&gt; pops a value, deep-clones it into a crystal region with &lt;code&gt;VTAG_CRYSTAL&lt;/code&gt; tags, and pushes the frozen result. &lt;code&gt;OP_THAW&lt;/code&gt; does the reverse. &lt;code&gt;OP_MARK_FLUID&lt;/code&gt; sets the phase tag to &lt;code&gt;VTAG_FLUID&lt;/code&gt;; this is what &lt;code&gt;flux&lt;/code&gt; bindings emit after their initializer. &lt;code&gt;OP_FREEZE_VAR&lt;/code&gt; and &lt;code&gt;OP_THAW_VAR&lt;/code&gt; handle the case where &lt;code&gt;freeze(x)&lt;/code&gt; targets a named variable and needs to write back the result, carrying extra operands to identify the variable's location (local slot, upvalue, or global name).&lt;/p&gt;
&lt;p&gt;Phase reactions and bonds each have their own opcodes: &lt;code&gt;OP_REACT&lt;/code&gt;, &lt;code&gt;OP_UNREACT&lt;/code&gt;, &lt;code&gt;OP_BOND&lt;/code&gt;, &lt;code&gt;OP_UNBOND&lt;/code&gt;, &lt;code&gt;OP_SEED&lt;/code&gt;, &lt;code&gt;OP_UNSEED&lt;/code&gt;. These could theoretically be implemented as native function calls, but making them opcodes lets the compiler emit the variable name as a constant operand, and the VM needs the name to look up the correct reaction/bond registration in its tracking tables, and encoding it in the bytecode avoids a runtime string lookup.&lt;/p&gt;
&lt;p&gt;Structured concurrency uses an interesting hybrid. &lt;code&gt;OP_SCOPE&lt;/code&gt; and &lt;code&gt;OP_SELECT&lt;/code&gt; each carry a constant-pool index that stores a pointer to the original AST &lt;code&gt;Expr*&lt;/code&gt; node. When the VM hits one of these opcodes, it invokes the tree-walking evaluator on that subtree. This is a deliberate design choice; the concurrency primitives involve spawning threads and managing channels, which requires the evaluator's full environment machinery. Rather than reimplement all of that in the VM, the bytecode compiler punts to the tree-walker for these specific constructs. The rest of the program runs on the VM; only &lt;code&gt;scope&lt;/code&gt; and &lt;code&gt;select&lt;/code&gt; blocks briefly drop into interpretation.&lt;/p&gt;
&lt;h3&gt;Closures and Upvalues&lt;/h3&gt;
&lt;p&gt;Closures are where bytecode VMs get interesting, and Lattice follows the upvalue model that Lua pioneered and Crafting Interpreters popularized.&lt;/p&gt;
&lt;p&gt;When a function is defined inside another function and references variables from the enclosing scope, those variables need to outlive their original stack frame. The solution is upvalues, indirection objects that start pointing into the stack and get "closed over" when the variable goes out of scope.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;typedef&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;ObjUpvalue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// points to stack slot or &amp;amp;closed&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;closed&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="c1"&gt;// holds value after scope exit&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;ObjUpvalue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;next&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;// linked list for open upvalues&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ObjUpvalue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;While the enclosing function is still executing, &lt;code&gt;location&lt;/code&gt; points directly at the stack slot. When the enclosing function returns, &lt;code&gt;OP_CLOSE_UPVALUE&lt;/code&gt; copies the stack value into the &lt;code&gt;closed&lt;/code&gt; field and repoints &lt;code&gt;location&lt;/code&gt; to &lt;code&gt;&amp;amp;closed&lt;/code&gt;. The closure doesn't know or care about the switch; it always dereferences &lt;code&gt;location&lt;/code&gt;. This is why upvalues work: they're a level of indirection that transparently survives stack frame destruction.&lt;/p&gt;
&lt;p&gt;The compiler resolves variable references in three stages: first it checks local scope (&lt;code&gt;resolve_local&lt;/code&gt;), then upvalues (&lt;code&gt;resolve_upvalue&lt;/code&gt;, which walks the compiler chain recursively), then falls back to globals via &lt;code&gt;OP_GET_GLOBAL&lt;/code&gt;. The &lt;code&gt;OP_CLOSURE&lt;/code&gt; instruction is followed by a series of &lt;code&gt;(is_local, index)&lt;/code&gt; byte pairs, one per upvalue, telling the VM whether to capture from the current frame's stack or from the parent frame's upvalue array.&lt;/p&gt;
&lt;p&gt;A concrete example makes this clearer. Consider a counter factory:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;fn make_counter() {
    flux count = 0
    return |n| { count += n; count }
}

let c = make_counter()
print(c(5))   // 5
print(c(3))   // 8
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When &lt;code&gt;make_counter&lt;/code&gt; returns, its stack frame is destroyed, but &lt;code&gt;count&lt;/code&gt; needs to survive, because the returned closure references it. During compilation, the compiler sees that the closure's body references &lt;code&gt;count&lt;/code&gt;, which is local to the enclosing &lt;code&gt;make_counter&lt;/code&gt;. It emits an &lt;code&gt;(is_local=true, index=1)&lt;/code&gt; upvalue descriptor. At runtime, &lt;code&gt;OP_CLOSURE&lt;/code&gt; calls &lt;code&gt;capture_upvalue()&lt;/code&gt;, which either reuses an existing &lt;code&gt;ObjUpvalue&lt;/code&gt; pointing at that stack slot or creates a new one. When &lt;code&gt;make_counter&lt;/code&gt; returns, &lt;code&gt;OP_CLOSE_UPVALUE&lt;/code&gt; copies the stack value of &lt;code&gt;count&lt;/code&gt; into the upvalue's &lt;code&gt;closed&lt;/code&gt; field and repoints &lt;code&gt;location&lt;/code&gt;. The closure keeps working, oblivious to the frame being gone.&lt;/p&gt;
&lt;p&gt;One implementation detail worth noting: Lattice stores the upvalue array by repurposing the closure's &lt;code&gt;captured_env&lt;/code&gt; field (normally an &lt;code&gt;Env*&lt;/code&gt; in the tree-walker) and the upvalue count in the &lt;code&gt;region_id&lt;/code&gt; field. This avoids adding new fields to the &lt;code&gt;LatValue&lt;/code&gt; union, which matters when values are deep-cloned frequently, since every field adds to the clone cost.&lt;/p&gt;
&lt;h3&gt;Compiling for the REPL&lt;/h3&gt;
&lt;p&gt;A REPL that runs on a bytecode VM needs different compilation from file execution. The difference is small but important.&lt;/p&gt;
&lt;p&gt;In file mode, &lt;code&gt;compile_module()&lt;/code&gt; compiles a complete program and terminates with &lt;code&gt;OP_UNIT; OP_RETURN&lt;/code&gt;; the module returns unit, and any expression results along the way are discarded with &lt;code&gt;OP_POP&lt;/code&gt;. This is the right behavior for scripts: you don't want every intermediate expression to accumulate on the stack.&lt;/p&gt;
&lt;p&gt;In REPL mode, &lt;code&gt;compile_repl()&lt;/code&gt; needs the opposite behavior for the last expression. When you type &lt;code&gt;42&lt;/code&gt; at the REPL prompt, you want to see &lt;code&gt;=&amp;gt; 42&lt;/code&gt;. So if the last item in the compiled chunk is a bare expression statement, &lt;code&gt;compile_repl()&lt;/code&gt; compiles the expression but &lt;em&gt;skips the &lt;code&gt;OP_POP&lt;/code&gt;&lt;/em&gt;, leaving the value on the stack. Then it emits &lt;code&gt;OP_RETURN&lt;/code&gt;, and the VM receives the value as the chunk's return value.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;last_is_expr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;item_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;item_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ITEM_STMT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;item_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;STMT_EXPR&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_is_expr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;emit_byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OP_RETURN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;// value already on stack&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;emit_byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OP_UNIT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;// no expression — return unit&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;emit_byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OP_RETURN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For function definitions, struct declarations, and enum definitions, the result is unit, and the REPL silently suppresses the &lt;code&gt;=&amp;gt;&lt;/code&gt; output. This matches user expectations: defining a function shouldn't print anything. The effect in practice:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;lattice&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="n"&gt;lattice&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"hello"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;" world"&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"hello world"&lt;/span&gt;
&lt;span class="n"&gt;lattice&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;lattice&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;
&lt;span class="n"&gt;lattice&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;lattice&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;49&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Each line is independently compiled and executed on the persistent VM. Globals defined in one line (&lt;code&gt;flux x = 10&lt;/code&gt;) are visible in subsequent lines because they're stored in the VM's environment, which persists across iterations. The &lt;code&gt;Chunk&lt;/code&gt; for each line is freed after execution; constants that matter (like global variable values) have already been deep-cloned into the environment.&lt;/p&gt;
&lt;p&gt;The other critical difference is enum persistence. &lt;code&gt;compile_module()&lt;/code&gt; frees its known-enum registry after compilation, because the compiler is done. &lt;code&gt;compile_repl()&lt;/code&gt; must not, because enums defined in REPL iteration N need to be visible in iteration N+1. The REPL calls &lt;code&gt;compiler_free_known_enums()&lt;/code&gt; only on exit. The same lifetime concern applies to parsed programs; struct and function declarations store &lt;code&gt;Expr*&lt;/code&gt; pointers that compiled chunks reference at runtime. The REPL accumulates all parsed programs in a dynamic array and frees them only when the session ends.&lt;/p&gt;
&lt;h3&gt;The Global Mutation Bug&lt;/h3&gt;
&lt;p&gt;This is the story I find most instructive, because it reveals a subtle interaction between two independently reasonable design decisions.&lt;/p&gt;
&lt;p&gt;Lattice has &lt;strong&gt;deep-clone-on-read&lt;/strong&gt; semantics. When you access a variable, the environment doesn't hand you a reference to the stored value; it hands you a fresh deep clone. This eliminates aliasing entirely: two variables never share underlying memory, passing a map to a function gives the function its own copy, and there's no way to create spooky action at a distance through shared mutable state.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;env_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;lat_map_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;scopes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="mi"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;value_deep_clone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// always a fresh copy&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is expensive but correct. It gives Lattice pure value semantics without needing a borrow checker or persistent data structures.&lt;/p&gt;
&lt;p&gt;The tree-walking evaluator handles in-place mutation (like &lt;code&gt;array.push()&lt;/code&gt;) with a separate &lt;code&gt;resolve_lvalue()&lt;/code&gt; mechanism that obtains a direct mutable pointer into the environment's storage, bypassing the deep clone. Push, pop, index assignment: these all go through &lt;code&gt;resolve_lvalue&lt;/code&gt; and mutate the stored value directly.&lt;/p&gt;
&lt;p&gt;The bytecode VM needed the same distinction. For local variables, this is straightforward: locals live on the value stack, and the VM has a direct pointer to them via &lt;code&gt;frame-&amp;gt;slots[slot]&lt;/code&gt;. I added &lt;code&gt;OP_INVOKE_LOCAL&lt;/code&gt;, which takes a stack slot index as an operand and passes a pointer to &lt;code&gt;vm_invoke_builtin()&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;OP_INVOKE_LOCAL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;method_idx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;arg_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;method_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;constants&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;method_idx&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;slots&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// direct pointer&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm_invoke_builtin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;method_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;arg_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;local_var_name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// builtin mutated obj in-place — mutation persists&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... fall through to closure/method dispatch&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When &lt;code&gt;.push()&lt;/code&gt; grows the array by reallocating &lt;code&gt;obj-&amp;gt;as.array.elems&lt;/code&gt; and incrementing &lt;code&gt;obj-&amp;gt;as.array.len&lt;/code&gt;, it's directly modifying the stack slot. The mutation persists because &lt;code&gt;obj&lt;/code&gt; &lt;em&gt;is&lt;/em&gt; the variable.&lt;/p&gt;
&lt;p&gt;For globals, the situation is different. Globals live in the environment (a scope-chain of hash maps), and &lt;code&gt;env_get()&lt;/code&gt; deep-clones. The generic &lt;code&gt;OP_INVOKE&lt;/code&gt; opcode works by evaluating the receiver expression onto the stack (which, for a global variable, means emitting &lt;code&gt;OP_GET_GLOBAL&lt;/code&gt;, which calls &lt;code&gt;env_get()&lt;/code&gt;, which deep-clones) and then dispatching the method on the cloned value. After the builtin mutates the clone, &lt;code&gt;OP_INVOKE&lt;/code&gt; pops and &lt;em&gt;frees&lt;/em&gt; it. The mutation vanishes.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux nums = [1, 2, 3]
nums.push(4)
print(nums)  // still [1, 2, 3] — the push mutated a clone
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is the kind of bug that's obvious in retrospect but invisible when you're implementing things one piece at a time. &lt;code&gt;env_get()&lt;/code&gt; deep-cloning is correct. &lt;code&gt;OP_INVOKE&lt;/code&gt; popping the receiver after dispatch is correct. Each piece behaves correctly in isolation. The bug emerges from their composition.&lt;/p&gt;
&lt;p&gt;The fix is &lt;code&gt;OP_INVOKE_GLOBAL&lt;/code&gt;, a new opcode that knows the receiver is a global variable and writes back after mutation:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;OP_INVOKE_GLOBAL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name_idx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;method_idx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;arg_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;READ_BYTE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;global_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;constants&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name_idx&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;method_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;constants&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;method_idx&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;LatValue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;obj_val&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;env_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;global_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;obj_val&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;VM_ERROR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"undefined variable '%s'"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;global_name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm_invoke_builtin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;obj_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;method_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;arg_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;global_name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* handle error */&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// Write back the mutated clone to the environment&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;env_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vm&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;global_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;obj_val&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... fall through for non-builtin methods&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The compiler emits &lt;code&gt;OP_INVOKE_GLOBAL&lt;/code&gt; when it sees a method call on an identifier that isn't a local variable or an upvalue:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;EXPR_METHOD_CALL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;method_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;EXPR_IDENT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;resolve_local&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;method_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="c1"&gt;// ... emit OP_INVOKE_LOCAL&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;upvalue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;resolve_upvalue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;method_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;upvalue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="c1"&gt;// Not local, not upvalue — must be global&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="c1"&gt;// ... emit OP_INVOKE_GLOBAL&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... fall through to generic OP_INVOKE&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This gives us three tiers of method dispatch: &lt;code&gt;OP_INVOKE_LOCAL&lt;/code&gt; for locals (direct pointer, no clone), &lt;code&gt;OP_INVOKE_GLOBAL&lt;/code&gt; for globals (clone + write-back), and &lt;code&gt;OP_INVOKE&lt;/code&gt; for everything else (computed receivers like &lt;code&gt;get_array().push(x)&lt;/code&gt;, where there's nothing to write back to). With the fix:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux nums = [1, 2, 3]
nums.push(4)
nums.push(5)
print(nums)  // [1, 2, 3, 4, 5] — mutations persist
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;All mutating builtins (&lt;code&gt;push&lt;/code&gt;, &lt;code&gt;pop&lt;/code&gt;, &lt;code&gt;set&lt;/code&gt;, &lt;code&gt;remove&lt;/code&gt;, &lt;code&gt;insert&lt;/code&gt;, &lt;code&gt;remove_at&lt;/code&gt;) now work correctly on global variables. The same pattern applies to maps, sets, and any other type with in-place methods.&lt;/p&gt;
&lt;p&gt;The broader lesson is that deep-clone-on-read semantics create an impedance mismatch with in-place mutation. In a reference-based language, &lt;code&gt;obj.push(x)&lt;/code&gt; just works; &lt;code&gt;obj&lt;/code&gt; is a reference, and the mutation happens wherever the reference points. In a value-based language, you need to explicitly handle the write-back for every level of variable storage. The tree-walker's &lt;code&gt;resolve_lvalue&lt;/code&gt; is one solution. The VM's tiered invoke opcodes are another. Both exist because of the same underlying tension.&lt;/p&gt;
&lt;h3&gt;The WASM Playground&lt;/h3&gt;
&lt;p&gt;Lattice's browser-based &lt;a href="https://baud.rs/odS816"&gt;playground&lt;/a&gt; compiles the entire VM to WebAssembly via Emscripten. The WASM API exposes four functions: &lt;code&gt;lat_init()&lt;/code&gt;, &lt;code&gt;lat_run_line()&lt;/code&gt;, &lt;code&gt;lat_is_complete()&lt;/code&gt;, and &lt;code&gt;lat_destroy()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The playground runs the same bytecode VM as the native binary. Each line of input goes through the same pipeline: lex, parse, &lt;code&gt;compile_repl()&lt;/code&gt;, &lt;code&gt;vm_run()&lt;/code&gt;. The &lt;code&gt;lat_is_complete()&lt;/code&gt; function checks bracket depth to determine whether the user is mid-expression, enabling multi-line input by waiting for balanced braces before compiling.&lt;/p&gt;
&lt;p&gt;Previously the playground used the tree-walking evaluator, which meant code could behave differently in the browser than on the command line. Switching the WASM build to the bytecode VM eliminates that inconsistency; the playground, the REPL, and file execution all use the same compilation and execution path.&lt;/p&gt;
&lt;h3&gt;What Didn't Change&lt;/h3&gt;
&lt;p&gt;It's worth noting what the bytecode VM &lt;em&gt;doesn't&lt;/em&gt; change about Lattice.&lt;/p&gt;
&lt;p&gt;The value representation is identical. A &lt;code&gt;LatValue&lt;/code&gt; is still a tagged union with a type tag, phase tag, and payload. Phase transitions still deep-clone data across heap regions. The dual-heap architecture (mark-and-sweep for fluid data, arena-based regions for crystal data) is unchanged. Global variables still live in a scope-chain environment.&lt;/p&gt;
&lt;p&gt;The parser and AST are completely shared. The compiler reads the same &lt;code&gt;Program&lt;/code&gt; structure that the tree-walker reads. A single set of test programs validates both execution paths, and all 771 tests pass on both.&lt;/p&gt;
&lt;p&gt;The phase system compiles one-to-one. &lt;code&gt;freeze()&lt;/code&gt; becomes &lt;code&gt;OP_FREEZE&lt;/code&gt;. &lt;code&gt;thaw()&lt;/code&gt; becomes &lt;code&gt;OP_THAW&lt;/code&gt;. Bonds, reactions, seeds, pressure constraints: each has a corresponding opcode that does exactly what the tree-walker's evaluator function did, just driven by bytecode dispatch instead of recursive AST traversal.&lt;/p&gt;
&lt;h3&gt;Performance&lt;/h3&gt;
&lt;p&gt;I haven't done rigorous benchmarking, and I'm deliberately not making performance claims. The motivation for the bytecode VM wasn't speed; it was consistency (one execution path everywhere) and architectural cleanliness (the VM is easier to extend than the tree-walker's deeply nested switch statements).&lt;/p&gt;
&lt;p&gt;That said, bytecode VMs are generally faster than tree-walkers for the structural reasons mentioned earlier: better cache locality (sequential byte array vs. pointer-chasing through AST nodes), less call overhead (one switch dispatch vs. recursive function calls), and a compact representation that fits more of the program in cache. Whether this matters for Lattice programs depends on the workload. For a language whose core runtime cost is dominated by deep cloning, the dispatch overhead is rarely the bottleneck.&lt;/p&gt;
&lt;h3&gt;Looking Forward&lt;/h3&gt;
&lt;p&gt;The VM is feature-complete but not optimized. There's no constant folding, no dead code elimination, no register allocation (it's a pure stack machine). The &lt;code&gt;OP_SCOPE&lt;/code&gt; and &lt;code&gt;OP_SELECT&lt;/code&gt; concurrency opcodes still delegate to the tree-walker. The dispatch loop is a plain switch rather than computed gotos.&lt;/p&gt;
&lt;p&gt;These are all well-understood optimizations with clear implementation paths. The point of v0.3.1 is that the bytecode VM is now the default, passes all tests, and handles the full language surface including the phase system. Optimization is a separate project.&lt;/p&gt;
&lt;p&gt;The source code is at &lt;a href="https://baud.rs/fIe3gx"&gt;github.com/ajokela/lattice&lt;/a&gt;, and you can try it in the browser at &lt;a href="https://baud.rs/bwvnYT"&gt;lattice-lang.web.app&lt;/a&gt;. The bytecode VM, compiler, REPL, and all 62 opcodes are in four files: &lt;code&gt;compiler.c&lt;/code&gt;, &lt;code&gt;vm.c&lt;/code&gt;, &lt;code&gt;chunk.c&lt;/code&gt;, and &lt;code&gt;opcode.c&lt;/code&gt;.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;git clone https://github.com/ajokela/lattice.git
cd lattice &amp;amp;&amp;amp; make
./clat
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Recommended Resources&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/crafting-interpreters"&gt;&lt;em&gt;Crafting Interpreters&lt;/em&gt;&lt;/a&gt; by Robert Nystrom - The definitive guide to building interpreters and bytecode VMs, and a major influence on Lattice's upvalue implementation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/P6ofTE"&gt;&lt;em&gt;Writing A Compiler In Go&lt;/em&gt;&lt;/a&gt; by Thorsten Ball - Practical companion covering bytecode compilation and stack-based VMs&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/BSTqlt"&gt;&lt;em&gt;Engineering a Compiler&lt;/em&gt;&lt;/a&gt; by Cooper &amp;amp; Torczon - Comprehensive treatment of compiler internals from front-end to optimization&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/JhMFPU"&gt;&lt;em&gt;Compilers: Principles, Techniques, and Tools&lt;/em&gt;&lt;/a&gt; by Aho, Lam, Sethi, Ullman - The classic &lt;em&gt;Dragon Book&lt;/em&gt; covering parsing, code generation, and optimization theory&lt;/li&gt;
&lt;/ul&gt;</description><category>bytecode</category><category>c</category><category>compilers</category><category>interpreters</category><category>language design</category><category>lattice</category><category>programming languages</category><category>virtual machine</category><guid>https://tinycomputers.io/posts/from-tree-walker-to-bytecode-vm-compiling-lattice.html</guid><pubDate>Tue, 17 Feb 2026 18:00:00 GMT</pubDate></item></channel></rss>