Teaching an LLM a Language It Has Never Seen

A.C. Jokela

2026-04-02

Lattice is a programming language I designed. Its central feature is the phase system: every runtime value carries a mutability tag that transitions between states the way matter moves between liquid and solid. You declare a variable with flux (mutable) or fix (immutable). You freeze a value to make it immutable, thaw it to get a mutable copy, and sublimate it to make it permanently frozen. forge blocks let you build something mutably and have the result exit as immutable. None of this exists in any other language.

Lattice does not appear in Claude's training data. I designed the language after the knowledge cutoff. There is no Lattice source code on GitHub (other than my own repository). There are no Stack Overflow answers. There is no tutorial ecosystem, no community blog posts, no textbook chapters. The only documentation that exists is the code itself, a 38-chapter handbook I wrote, and three blog posts on this site.

Claude writes Lattice fluently. It writes correct programs using the phase system, the concurrency primitives, the module system, and the trait/impl pattern. It writes struct definitions with per-field phase annotations. It uses forge blocks and anneal expressions correctly. And it wrote a 4,955-line self-hosted compiler in Lattice, for Lattice: a complete tokenizer, parser, and bytecode code generator that reads .lat source files and emits .latc bytecode binaries.

The question is how any of this is possible when the model has never seen the language before.

The Rust Smell

The answer starts with syntax. Here is a Lattice function:

fn greet(name: String) -> String {
    return "Hello, ${name}!"
}

And here is the Rust equivalent:

fn greet(name: &str) -> String {
    format!("Hello, {name}!")
}

The fn keyword, the colon-separated type annotations, the -> return type, the curly braces: Claude has seen these patterns millions of times in Rust code. When it encounters them in Lattice, it doesn't need to learn a new syntax. It needs to recognize a familiar one.

This extends deep into the language. Lattice structs look like Rust structs:

struct Point {
    x: Float,
    y: Float
}

Lattice enums look like Rust enums:

enum Shape {
    Circle(Float),
    Rectangle(Float, Float)
}

Lattice match expressions look like Rust match expressions:

match shape {
    Shape::Circle(r) => pi() * r * r,
    Shape::Rectangle(w, h) => w * h,
    _ => 0.0
}

Lattice traits and impl blocks look like Rust traits and impl blocks:

trait Printable {
    fn display(self: any) -> String
}

impl Printable for Point {
    fn display(self: any) -> String {
        return "(${self.x}, ${self.y})"
    }
}

Closures use the same |params| body syntax. The .. range operator works the same way. The ? postfix operator propagates errors. for item in collection iterates. let binds variables. The structural similarity is pervasive enough that a model trained on Rust can parse and generate Lattice code without any Lattice-specific training.

I did not design Lattice to be AI-friendly. I designed it because Rust's syntax is good and I wanted to use it for a language with different semantics. But the side effect is that Claude can write Lattice from day one because the syntax activates the same neural pathways that Rust does. The model doesn't know it's writing a different language. It knows it's writing code that looks like Rust, and the structural patterns transfer.

The Phase System: Where Familiarity Ends

The Rust resemblance carries Claude through basic Lattice programs without difficulty. Where it gets interesting is the phase system, because this is where Lattice has no analog in any language Claude has seen.

In Rust, mutability is a static property: let mut x = 5; or let x = 5;. You decide at declaration time and the compiler enforces it. In Lattice, mutability is a runtime state that values transition through:

flux counter = 0          // mutable
counter = counter + 1     // allowed: counter is fluid

freeze(counter)           // transition: fluid → crystal
counter = counter + 1     // runtime error: counter is crystal

flux copy = thaw(counter) // get a mutable copy
copy = copy + 1           // allowed: copy is fluid

Claude handles this correctly. When I describe the phase system and provide examples, Claude generates code that uses flux and fix declarations appropriately, calls freeze() at the right points, and avoids mutating crystal values. The model maps flux to "mutable variable" and fix to "immutable variable" in its internal representation, and the transition functions (freeze, thaw) become explicit state changes that it tracks through the program.

The harder constructs are the ones with no familiar analog.

forge blocks are mutable construction zones whose output exits as immutable:

fix config = forge {
    flux c = {}
    c.host = "localhost"
    c.port = 8080
    c.debug = false
    c   // exits the forge block as crystal
}
// config is now crystal; cannot be modified

Claude gets this right because the pattern (build something mutably, freeze the result) maps to the builder pattern in Rust and other languages. The syntax is novel but the concept isn't.

anneal is harder. It temporarily thaws a crystal value into a mutable binding for the duration of a block, then re-freezes it:

fix settings = forge { flux s = {}; s.theme = "dark"; s }

anneal(settings) |s| {
    s.theme = "light"   // temporarily mutable
}
// settings is crystal again, with theme = "light"

Claude produces correct anneal code when given the semantics, but it occasionally generates patterns that would work in Rust (taking a &mut reference) but don't apply in Lattice (where anneal is the only way to modify a crystal value in place). The model's Rust intuitions are strong enough to produce syntactically valid Lattice but sometimes semantically incorrect programs, because it defaults to Rust's mutation model when the Lattice-specific construct is unfamiliar.

The reactive phase system is where Claude needs the most guidance. react, bond, and seed have no precedent in any mainstream language:

flux temperature = 72.0

react("temperature", fn(name, old_phase, new_phase) {
    print("${name} changed from ${old_phase} to ${new_phase}")
})

freeze(temperature)  // triggers the reaction callback

flux primary = "active"
flux mirror = "active"

bond("mirror", "primary", "sync")  // when primary changes phase, mirror follows

freeze(primary)  // mirror also freezes

Claude can produce these patterns when given the API, but it doesn't intuit them. It never suggests react or bond unprompted, because there's nothing in its training data that would trigger the association. These constructs must be taught explicitly. The Rust smell gets Claude through 80% of Lattice. The last 20% requires actual specification.

The Spectrum of Difficulty

Working with Claude on Lattice code over several months has revealed a clear gradient of difficulty:

Trivial (Rust transfer): Functions, structs, enums, match expressions, closures, for loops, string interpolation, module imports, error propagation with ?. Claude writes these correctly on the first attempt because they're syntactically identical to Rust.

Easy (new vocabulary, familiar concept): flux/fix declarations, freeze()/thaw() calls, basic phase checking. Claude maps these to mutable/immutable patterns it already knows. The vocabulary is new; the concept isn't.

Moderate (new pattern, teachable): forge blocks, anneal expressions, crystallize blocks, struct field-level phase annotations (alloy structs). These require explanation, but once Claude sees one or two examples, it generalizes correctly. The builder pattern and block-scoped mutation are close enough to existing patterns that the model bridges the gap.

Hard (no analog, requires specification): Reactive phase operations (react, bond, seed), phase pattern matching (fluid val =>, crystal val =>), the concurrency constraint that only crystal values can be sent on channels, strict mode's consumption semantics for freeze. Claude can use these but never invents them. They must be explicitly described.

The concurrency constraint is a good example of the "hard" category. In Lattice, data sent on a channel must be crystal:

let ch = Channel::new()
flux data = "mutable"

// ch.send(data)  // runtime error: cannot send fluid value

freeze(data)
ch.send(data)     // works: data is now crystal

This rule exists because crystal values are deeply immutable: they can't be modified by the sender after transmission, which eliminates data races structurally. Claude understands the concept (Rust has Send and Sync traits that serve a similar purpose), but it doesn't automatically apply Lattice's specific rule without being told. Left to its own devices, Claude will try to send fluid values on channels, because that's what you'd do in Go or Python. The constraint must be stated.

Strict mode (#mode strict at the top of a file) is another case where Claude needs explicit guidance. In strict mode, let is banned (you must use flux or fix), freeze() consumes the original binding (Rust-like move semantics), and crystal bindings cannot be assigned to at all, not even as a runtime error. Claude can write strict-mode Lattice, but it defaults to casual-mode patterns unless reminded. The model's prior is "permissive runtime" because that's what most dynamic languages are.

The gradient correlates exactly with how much the construct resembles something in Rust or another mainstream language. When the syntax is familiar, Claude's transfer learning handles it. When the concept is familiar but the syntax is new, one or two examples are enough. When both the syntax and the concept are novel, Claude needs the specification.

The Self-Hosted Compiler

The strongest evidence that Claude can deeply understand a language it was never trained on is latc.lat: a 4,955-line self-hosted compiler written in Lattice, for Lattice.

The compiler reads .lat source files and emits .latc bytecode binaries. It has twelve sections:

Opcode constant definitions (mapping all 100+ VM opcodes to integers)
Token stream and cursor helpers (peek, advance, expect, match_tok)
Compiler state management (save/restore for nested compilation)
Error reporting
Bytecode emit helpers (emit_byte, emit_jump, patch_jump, emit_loop)
Constant pool management (integers, floats, strings, closures)
Scope and variable resolution (begin_scope, end_scope, resolve_local, upvalue tracking)
Expression parsing (precedence climbing, binary/unary ops, calls, field access)
Statement compilation (let/flux/fix, if/while/for, return, match, try/catch)
Declaration compilation (functions, structs, enums, traits, impl blocks)
Binary serialization (writing the LATC file format with magic bytes, version header, chunk data)
Main entry point

Claude wrote this. Not "Claude assisted with this" or "Claude generated boilerplate for this." Claude wrote a recursive descent parser for Lattice's grammar, a bytecode compiler that emits correct opcodes for the phase system, and a binary serializer that produces files the C runtime can load and execute. The compiler bootstraps: you run it with the C-based clat interpreter, and it produces bytecode that the same interpreter executes.

The compiler itself uses Lattice's phase system for its own internal state. The compiler's mutable working data (the bytecode buffer, the constant pool, the local variable tracking arrays) is declared with flux:

flux code = []
flux c_lines = []
flux constants = []
flux local_name_arr = []
flux local_depth_arr = []
flux local_captured_arr = []
flux local_count = 0

This is the compiler eating its own dogfood. The mutable state that the compiler needs to build bytecode is declared using the same phase system that the compiler is compiling. The phase keywords aren't decorative here; they're structurally necessary because the compiler modifies these arrays on every opcode emission and scope transition.

The compiler has 118 functions across 12 sections, with 554 opcode references. It handles every construct in the language: flux/fix declarations, forge blocks, freeze/thaw/sublimate calls, anneal and crystallize expressions, struct and enum definitions with phase annotations, trait/impl blocks, match expressions with phase-aware pattern matching, structured concurrency with scope/spawn, channel operations, try/catch, defer, and the complete expression grammar with correct operator precedence.

Writing a self-hosted compiler requires understanding the language at every level simultaneously. The tokenizer must know every keyword, operator, and delimiter. The parser must handle every grammatical production, including the phase-specific constructs (forge, anneal, crystallize) that exist nowhere in Claude's training data. The code generator must emit the correct opcodes for phase transitions, reactive bindings, and structured concurrency. And the whole thing must be written in the language being compiled, which means Claude is writing Lattice to compile Lattice, using constructs it learned from examples rather than training data.

The compiler's serialization section writes the LATC binary format byte by byte:

fn serialize_latc(ch: any) {
    ser_buf = Buffer::new(0)

    // Header: "LATC" + version(1) + reserved(0)
    write_u8(76)    // 'L'
    write_u8(65)    // 'A'
    write_u8(84)    // 'T'
    write_u8(67)    // 'C'
    write_u16_le(1) // format version
    write_u16_le(0) // reserved

    serialize_chunk(ch)
    return ser_buf
}

This is not pattern matching against compiler source code from the training data. No Lattice compiler exists in the training data. Claude wrote a compiler for a language that has no prior art, in a language that has no prior art, producing a binary format that has no prior art. Every decision (the magic bytes, the chunk serialization order, the upvalue encoding) came from understanding the specification I provided and the runtime behavior of the C-based interpreter.

What I Actually Gave Claude

The teaching process was less structured than you might expect. There was no formal curriculum, no staged introduction of concepts, no carefully sequenced lesson plan. And I should be honest about the recursive nature of what happened: Claude Code was the primary tool for building Lattice itself. The language, the C implementation, the grammar, the runtime, the test suite, the handbook: all of it was built with Claude Code. I designed the language and directed the implementation, but Claude wrote the C, the LaTeX, and the example programs.

So the situation is: Claude wrote Lattice (the implementation), and then Claude wrote in Lattice (the programs and the self-hosted compiler). The model built the language and then learned the language it built. The "teaching material" that Claude uses to write Lattice code is documentation and examples that Claude itself produced in earlier sessions.

The artifacts:

The C implementation: ~80 source files, the parser, the VM, the phase system runtime. Built with Claude Code from my architectural direction.
A handbook: 38 chapters covering every feature, with worked examples. Written in LaTeX with Claude Code. This lives in a repository that Claude can read in subsequent sessions.
Example programs (examples/phase_demo.lat, examples/sorting.lat, examples/state_machine.lat) that demonstrate idiomatic Lattice. Written by Claude Code.
815 test files under AddressSanitizer that exercise every construct. Written by Claude Code.
An EBNF grammar reference as an appendix to the handbook.

When I work with Claude on Lattice code, I don't paste the entire handbook into the context window. Claude has access to the project directory. It reads files as needed. If I ask it to write a function that uses forge, it reads examples/phase_demo.lat or chapters/ch11-phases-explained.tex to see how forge works. If I ask it to add an opcode to the compiler, it reads include/stackopcode.h and src/stackvm.c to understand the existing instruction set.

The key insight: Claude doesn't need to be trained on a language to write it. It needs access to the specification and examples at inference time. And in this case, those specifications and examples were produced by Claude itself in prior sessions. The model's understanding is constructed on the fly from documentation in its context, not retrieved from weights. This is why the Rust resemblance matters so much: the syntax gives Claude a structural scaffold, and the specification (which Claude wrote) fills in the semantics.

This is also why the self-hosted compiler was possible. By the time Claude wrote latc.lat, it had already written the entire language implementation, the handbook, the test suite, and hundreds of example programs. The language had moved from "novel" to "familiar" through accumulated context, not through training. Each session built on the last. Each example reinforced the phase system's rules. By the time the compiler was attempted, Claude's working understanding of Lattice (constructed from its own prior output) was deep enough to write a 5,000-line program that correctly compiles the language. The model taught itself a language by building the language first.

Why Syntax Matters More Than Semantics

The Lattice experience suggests something counterintuitive about how LLMs interact with programming languages: syntax transfer is more powerful than semantic understanding.

Claude can write correct Lattice because Lattice looks like Rust. The semantic differences (phase system vs. ownership, runtime type checking vs. compile-time guarantees, garbage collection vs. RAII) are significant, but they don't prevent Claude from producing working code. The model generates syntactically valid Lattice from Rust patterns and then adjusts the semantics when corrected.

This has implications for language design. If you want AI tooling to support your language from day one, without waiting for it to appear in training data, design your syntax to rhyme with something popular. Lattice's resemblance to Rust wasn't designed for AI, but it is the reason AI can write it. A language with a radically different syntax (APL, Forth, J) would be much harder for Claude to learn from examples alone, even if the semantics were simpler.

The reverse is also true: a language with familiar syntax but deeply unfamiliar semantics (like Lattice's reactive phase system) will produce code that looks correct but occasionally behaves wrong. Claude's Rust intuitions are strong enough to generate valid-looking phase code, but the model sometimes falls back to Rust's mutation model when the Lattice-specific behavior is more constrained. The syntax transfers perfectly. The semantics require teaching.

Implications for Language Designers

If you're designing a new programming language in 2026, the AI tooling question is unavoidable. Your language won't have IDE plugins, autocompleters, or AI coding assistants on day one. The community doesn't exist yet. The training data doesn't include your language. Every other language your users work with has Copilot or Claude support. Yours doesn't.

Lattice suggests a strategy: make your syntax rhyme with something an LLM already knows.

This isn't about copying Rust. Lattice has genuinely novel semantics. The phase system, the reactive bindings, the alloy structs with per-field phase annotations: none of these exist in Rust. But they're expressed through syntax (keywords, braces, type annotations, block expressions) that maps directly to Rust's structural patterns. Claude can parse the syntax without help and learn the semantics from examples.

The alternative is designing a syntax so novel that LLMs can't bootstrap from existing knowledge. This is a legitimate design choice; some ideas genuinely need new notation. But the cost is high: your users won't get AI assistance until your language appears in training data, which requires the language to become popular first, which is harder without AI assistance. It's a chicken-and-egg problem that familiar syntax sidesteps.

The practical recommendation: novel semantics, familiar syntax. Invent the ideas. Borrow the notation. Let the LLM cross the bridge on syntax and learn the semantics on the other side.

What This Means for the "AI Writes Code" Conversation

The Lattice case study complicates the popular narrative about AI code generation in both directions.

For the optimists who say AI can learn anything: Claude cannot invent the reactive phase system. It cannot propose bond or seed or anneal without being told they exist. The novel constructs, the ones that make Lattice a genuinely different language rather than a Rust reskin, are invisible to the model until explicitly specified. AI transfer learning has limits, and those limits are at the boundaries of what the training data contains.

For the pessimists who say AI can only regurgitate training data: Claude wrote a 5,000-line self-hosted compiler for a language it has never seen. That is not regurgitation. The compiler produces correct bytecode for constructs (phase transitions, reactive bonds, per-field phase annotations) that exist in no other language. The model assembled knowledge from its understanding of compilers generally, Rust syntax specifically, and the Lattice specification I provided, and produced something genuinely new. Antirez called this "assembling knowledge" when he observed the same phenomenon with his Z80 emulator project. I think that's the right term.

The truth is somewhere that neither camp wants to occupy. LLMs can go far beyond their training data when the new territory is structurally adjacent to something they know. They cannot go beyond their training data when the new territory is structurally novel. The boundary between "adjacent" and "novel" is syntax. Familiar syntax is a bridge. Novel syntax is a wall. Novel semantics behind familiar syntax is a trap: the model crosses the bridge confidently and then occasionally falls.

Lattice exists in all three zones simultaneously. Its Rust-like surface lets Claude cross the bridge. Its phase system is the novel semantics behind familiar syntax. And the self-hosted compiler is proof that the bridge, once crossed, supports weight that no one expected.

I didn't set out to test the limits of LLM language understanding when I designed Lattice. I set out to build a programming language with a novel approach to mutability. The AI dimension was a side effect: I used Claude Code as my development tool because I use Claude Code for everything, and the language happened to be learnable because it happened to look like Rust. But the result is one of the more complete demonstrations of LLM transfer learning applied to a genuinely novel domain: not just writing programs in an unfamiliar language, but writing a compiler for that language, in that language, from a specification that exists nowhere in the training data.

The 4,955 lines of latc.lat are the proof that LLMs can go further than their training data when the conditions are right. The conditions are: familiar syntax, clear specification, accessible examples, and a human who knows when the model is wrong. Remove any one of those and the compiler doesn't get written. But with all four in place, the model produces something that works, that compiles, and that no human typed by hand.