<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>TinyComputers.io (Posts about Machine Learning)</title><link>https://tinycomputers.io/</link><description></description><atom:link href="https://tinycomputers.io/categories/cat_machine-learning.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 A.C. Jokela 
&lt;!-- div style="width: 100%" --&gt;
&lt;a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"&gt;&lt;img alt="" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" /&gt; Creative Commons Attribution-ShareAlike&lt;/a&gt;&amp;nbsp;|&amp;nbsp;
&lt;!-- /div --&gt;
</copyright><lastBuildDate>Mon, 06 Apr 2026 22:12:59 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Teaching an LLM a Language It Has Never Seen</title><link>https://tinycomputers.io/posts/teaching-llms-languages-theyve-never-seen.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/teaching-llms-languages-theyve-never-seen_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;33 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a href="https://tinycomputers.io/posts/introducing-lattice-a-crystallization-based-programming-language.html"&gt;Lattice&lt;/a&gt; is a programming language I designed. Its central feature is the phase system: every runtime value carries a mutability tag that transitions between states the way matter moves between liquid and solid. You declare a variable with &lt;code&gt;flux&lt;/code&gt; (mutable) or &lt;code&gt;fix&lt;/code&gt; (immutable). You &lt;code&gt;freeze&lt;/code&gt; a value to make it immutable, &lt;code&gt;thaw&lt;/code&gt; it to get a mutable copy, and &lt;code&gt;sublimate&lt;/code&gt; it to make it permanently frozen. &lt;code&gt;forge&lt;/code&gt; blocks let you build something mutably and have the result exit as immutable. None of this exists in any other language.&lt;/p&gt;
&lt;p&gt;Lattice does not appear in Claude's training data. I designed the language after the knowledge cutoff. There is no Lattice source code on GitHub (other than my own repository). There are no Stack Overflow answers. There is no tutorial ecosystem, no community blog posts, no textbook chapters. The only documentation that exists is the code itself, a 38-chapter handbook I wrote, and three blog posts on this site.&lt;/p&gt;
&lt;p&gt;Claude writes Lattice fluently. It writes correct programs using the phase system, the concurrency primitives, the module system, and the trait/impl pattern. It writes struct definitions with per-field phase annotations. It uses &lt;code&gt;forge&lt;/code&gt; blocks and &lt;code&gt;anneal&lt;/code&gt; expressions correctly. And it wrote a 4,955-line self-hosted compiler in Lattice, for Lattice: a complete tokenizer, parser, and bytecode code generator that reads &lt;code&gt;.lat&lt;/code&gt; source files and emits &lt;code&gt;.latc&lt;/code&gt; bytecode binaries.&lt;/p&gt;
&lt;p&gt;The question is how any of this is possible when the model has never seen the language before.&lt;/p&gt;
&lt;h3&gt;The Rust Smell&lt;/h3&gt;
&lt;p&gt;The answer starts with syntax. Here is a Lattice function:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;fn&lt;span class="w"&gt; &lt;/span&gt;greet(name:&lt;span class="w"&gt; &lt;/span&gt;String)&lt;span class="w"&gt; &lt;/span&gt;-&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;String&lt;span class="w"&gt; &lt;/span&gt;{
&lt;span class="w"&gt;    &lt;/span&gt;return&lt;span class="w"&gt; &lt;/span&gt;"Hello,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;!"
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And here is the Rust equivalent:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kp"&gt;&amp;amp;&lt;/span&gt;&lt;span class="kt"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="fm"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Hello, {name}!"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;fn&lt;/code&gt; keyword, the colon-separated type annotations, the &lt;code&gt;-&amp;gt;&lt;/code&gt; return type, the curly braces: Claude has seen these patterns millions of times in Rust code. When it encounters them in Lattice, it doesn't need to learn a new syntax. It needs to recognize a familiar one.&lt;/p&gt;
&lt;p&gt;This extends deep into the language. Lattice structs look like Rust structs:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;struct Point {
    x: Float,
    y: Float
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Lattice enums look like Rust enums:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;enum&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Shape&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Circle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;Rectangle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Lattice match expressions look like Rust match expressions:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;match shape {
    Shape::Circle(r) =&amp;gt; pi() &lt;span class="gs"&gt;* r *&lt;/span&gt; r,
    Shape::Rectangle(w, h) =&amp;gt; w * h,
    _ =&amp;gt; 0.0
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Lattice traits and impl blocks look like Rust traits and impl blocks:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;trait&lt;span class="w"&gt; &lt;/span&gt;Printable&lt;span class="w"&gt; &lt;/span&gt;{
&lt;span class="w"&gt;    &lt;/span&gt;fn&lt;span class="w"&gt; &lt;/span&gt;display(self:&lt;span class="w"&gt; &lt;/span&gt;any)&lt;span class="w"&gt; &lt;/span&gt;-&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;String
}

impl&lt;span class="w"&gt; &lt;/span&gt;Printable&lt;span class="w"&gt; &lt;/span&gt;for&lt;span class="w"&gt; &lt;/span&gt;Point&lt;span class="w"&gt; &lt;/span&gt;{
&lt;span class="w"&gt;    &lt;/span&gt;fn&lt;span class="w"&gt; &lt;/span&gt;display(self:&lt;span class="w"&gt; &lt;/span&gt;any)&lt;span class="w"&gt; &lt;/span&gt;-&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;String&lt;span class="w"&gt; &lt;/span&gt;{
&lt;span class="w"&gt;        &lt;/span&gt;return&lt;span class="w"&gt; &lt;/span&gt;"(&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;)"
&lt;span class="w"&gt;    &lt;/span&gt;}
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Closures use the same &lt;code&gt;|params| body&lt;/code&gt; syntax. The &lt;code&gt;..&lt;/code&gt; range operator works the same way. The &lt;code&gt;?&lt;/code&gt; postfix operator propagates errors. &lt;code&gt;for item in collection&lt;/code&gt; iterates. &lt;code&gt;let&lt;/code&gt; binds variables. The structural similarity is pervasive enough that a model trained on Rust can parse and generate Lattice code without any Lattice-specific training.&lt;/p&gt;
&lt;p&gt;I did not design Lattice to be AI-friendly. I designed it because Rust's syntax is good and I wanted to use it for a language with different semantics. But the side effect is that Claude can write Lattice from day one because the syntax activates the same neural pathways that Rust does. The model doesn't know it's writing a different language. It knows it's writing code that looks like Rust, and the structural patterns transfer.&lt;/p&gt;
&lt;h3&gt;The Phase System: Where Familiarity Ends&lt;/h3&gt;
&lt;p&gt;The Rust resemblance carries Claude through basic Lattice programs without difficulty. Where it gets interesting is the phase system, because this is where Lattice has no analog in any language Claude has seen.&lt;/p&gt;
&lt;p&gt;In Rust, mutability is a static property: &lt;code&gt;let mut x = 5;&lt;/code&gt; or &lt;code&gt;let x = 5;&lt;/code&gt;. You decide at declaration time and the compiler enforces it. In Lattice, mutability is a runtime state that values transition through:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux counter = 0          // mutable
counter = counter + 1     // allowed: counter is fluid

freeze(counter)           // transition: fluid → crystal
counter = counter + 1     // runtime error: counter is crystal

flux copy = thaw(counter) // get a mutable copy
copy = copy + 1           // allowed: copy is fluid
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Claude handles this correctly. When I describe the phase system and provide examples, Claude generates code that uses &lt;code&gt;flux&lt;/code&gt; and &lt;code&gt;fix&lt;/code&gt; declarations appropriately, calls &lt;code&gt;freeze()&lt;/code&gt; at the right points, and avoids mutating crystal values. The model maps &lt;code&gt;flux&lt;/code&gt; to "mutable variable" and &lt;code&gt;fix&lt;/code&gt; to "immutable variable" in its internal representation, and the transition functions (&lt;code&gt;freeze&lt;/code&gt;, &lt;code&gt;thaw&lt;/code&gt;) become explicit state changes that it tracks through the program.&lt;/p&gt;
&lt;p&gt;The harder constructs are the ones with no familiar analog.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;forge&lt;/code&gt; blocks are mutable construction zones whose output exits as immutable:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;fix config = forge {
    flux c = {}
    c.host = "localhost"
    c.port = 8080
    c.debug = false
    c   // exits the forge block as crystal
}
// config is now crystal; cannot be modified
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Claude gets this right because the pattern (build something mutably, freeze the result) maps to the builder pattern in Rust and other languages. The syntax is novel but the concept isn't.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;anneal&lt;/code&gt; is harder. It temporarily thaws a crystal value into a mutable binding for the duration of a block, then re-freezes it:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;fix settings = forge { flux s = {}; s.theme = "dark"; s }

anneal(settings) |s| {
    s.theme = "light"   // temporarily mutable
}
// settings is crystal again, with theme = "light"
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Claude produces correct &lt;code&gt;anneal&lt;/code&gt; code when given the semantics, but it occasionally generates patterns that would work in Rust (taking a &lt;code&gt;&amp;amp;mut&lt;/code&gt; reference) but don't apply in Lattice (where &lt;code&gt;anneal&lt;/code&gt; is the only way to modify a crystal value in place). The model's Rust intuitions are strong enough to produce syntactically valid Lattice but sometimes semantically incorrect programs, because it defaults to Rust's mutation model when the Lattice-specific construct is unfamiliar.&lt;/p&gt;
&lt;p&gt;The reactive phase system is where Claude needs the most guidance. &lt;code&gt;react&lt;/code&gt;, &lt;code&gt;bond&lt;/code&gt;, and &lt;code&gt;seed&lt;/code&gt; have no precedent in any mainstream language:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux&lt;span class="w"&gt; &lt;/span&gt;temperature&lt;span class="w"&gt; &lt;/span&gt;=&lt;span class="w"&gt; &lt;/span&gt;72.0

react("temperature",&lt;span class="w"&gt; &lt;/span&gt;fn(name,&lt;span class="w"&gt; &lt;/span&gt;old_phase,&lt;span class="w"&gt; &lt;/span&gt;new_phase)&lt;span class="w"&gt; &lt;/span&gt;{
&lt;span class="w"&gt;    &lt;/span&gt;print("&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;changed&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;old_phase&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;to&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;new_phase&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;")
})

freeze(temperature)&lt;span class="w"&gt;  &lt;/span&gt;//&lt;span class="w"&gt; &lt;/span&gt;triggers&lt;span class="w"&gt; &lt;/span&gt;the&lt;span class="w"&gt; &lt;/span&gt;reaction&lt;span class="w"&gt; &lt;/span&gt;callback
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;flux primary = "active"
flux mirror = "active"

bond("mirror", "primary", "sync")  // when primary changes phase, mirror follows

freeze(primary)  // mirror also freezes
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Claude can produce these patterns when given the API, but it doesn't intuit them. It never suggests &lt;code&gt;react&lt;/code&gt; or &lt;code&gt;bond&lt;/code&gt; unprompted, because there's nothing in its training data that would trigger the association. These constructs must be taught explicitly. The Rust smell gets Claude through 80% of Lattice. The last 20% requires actual specification.&lt;/p&gt;
&lt;h3&gt;The Spectrum of Difficulty&lt;/h3&gt;
&lt;p&gt;Working with Claude on Lattice code over several months has revealed a clear gradient of difficulty:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Trivial (Rust transfer):&lt;/strong&gt; Functions, structs, enums, match expressions, closures, for loops, string interpolation, module imports, error propagation with &lt;code&gt;?&lt;/code&gt;. Claude writes these correctly on the first attempt because they're syntactically identical to Rust.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Easy (new vocabulary, familiar concept):&lt;/strong&gt; &lt;code&gt;flux&lt;/code&gt;/&lt;code&gt;fix&lt;/code&gt; declarations, &lt;code&gt;freeze()&lt;/code&gt;/&lt;code&gt;thaw()&lt;/code&gt; calls, basic phase checking. Claude maps these to mutable/immutable patterns it already knows. The vocabulary is new; the concept isn't.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Moderate (new pattern, teachable):&lt;/strong&gt; &lt;code&gt;forge&lt;/code&gt; blocks, &lt;code&gt;anneal&lt;/code&gt; expressions, &lt;code&gt;crystallize&lt;/code&gt; blocks, struct field-level phase annotations (alloy structs). These require explanation, but once Claude sees one or two examples, it generalizes correctly. The builder pattern and block-scoped mutation are close enough to existing patterns that the model bridges the gap.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hard (no analog, requires specification):&lt;/strong&gt; Reactive phase operations (&lt;code&gt;react&lt;/code&gt;, &lt;code&gt;bond&lt;/code&gt;, &lt;code&gt;seed&lt;/code&gt;), phase pattern matching (&lt;code&gt;fluid val =&amp;gt;&lt;/code&gt;, &lt;code&gt;crystal val =&amp;gt;&lt;/code&gt;), the concurrency constraint that only crystal values can be sent on channels, strict mode's consumption semantics for &lt;code&gt;freeze&lt;/code&gt;. Claude can use these but never invents them. They must be explicitly described.&lt;/p&gt;
&lt;p&gt;The concurrency constraint is a good example of the "hard" category. In Lattice, data sent on a channel must be crystal:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nv"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Channel&lt;/span&gt;::&lt;span class="nv"&gt;new&lt;/span&gt;&lt;span class="ss"&gt;()&lt;/span&gt;
&lt;span class="nv"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mutable"&lt;/span&gt;

&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ch&lt;/span&gt;.&lt;span class="k"&gt;send&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;runtime&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;error&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;cannot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;send&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;fluid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;value&lt;/span&gt;

&lt;span class="nv"&gt;freeze&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;ch&lt;/span&gt;.&lt;span class="k"&gt;send&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;works&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;now&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;crystal&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This rule exists because crystal values are deeply immutable: they can't be modified by the sender after transmission, which eliminates data races structurally. Claude understands the concept (Rust has &lt;code&gt;Send&lt;/code&gt; and &lt;code&gt;Sync&lt;/code&gt; traits that serve a similar purpose), but it doesn't automatically apply Lattice's specific rule without being told. Left to its own devices, Claude will try to send fluid values on channels, because that's what you'd do in Go or Python. The constraint must be stated.&lt;/p&gt;
&lt;p&gt;Strict mode (&lt;code&gt;#mode strict&lt;/code&gt; at the top of a file) is another case where Claude needs explicit guidance. In strict mode, &lt;code&gt;let&lt;/code&gt; is banned (you must use &lt;code&gt;flux&lt;/code&gt; or &lt;code&gt;fix&lt;/code&gt;), &lt;code&gt;freeze()&lt;/code&gt; consumes the original binding (Rust-like move semantics), and crystal bindings cannot be assigned to at all, not even as a runtime error. Claude can write strict-mode Lattice, but it defaults to casual-mode patterns unless reminded. The model's prior is "permissive runtime" because that's what most dynamic languages are.&lt;/p&gt;
&lt;p&gt;The gradient correlates exactly with how much the construct resembles something in Rust or another mainstream language. When the syntax is familiar, Claude's transfer learning handles it. When the concept is familiar but the syntax is new, one or two examples are enough. When both the syntax and the concept are novel, Claude needs the specification.&lt;/p&gt;
&lt;h3&gt;The Self-Hosted Compiler&lt;/h3&gt;
&lt;p&gt;The strongest evidence that Claude can deeply understand a language it was never trained on is &lt;code&gt;latc.lat&lt;/code&gt;: a &lt;a href="https://tinycomputers.io/posts/a-stack-based-bytecode-vm-for-lattice.html"&gt;4,955-line self-hosted compiler&lt;/a&gt; written in Lattice, for Lattice.&lt;/p&gt;
&lt;p&gt;The compiler reads &lt;code&gt;.lat&lt;/code&gt; source files and emits &lt;code&gt;.latc&lt;/code&gt; bytecode binaries. It has twelve sections:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Opcode constant definitions (mapping all 100+ VM opcodes to integers)&lt;/li&gt;
&lt;li&gt;Token stream and cursor helpers (&lt;code&gt;peek&lt;/code&gt;, &lt;code&gt;advance&lt;/code&gt;, &lt;code&gt;expect&lt;/code&gt;, &lt;code&gt;match_tok&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Compiler state management (save/restore for nested compilation)&lt;/li&gt;
&lt;li&gt;Error reporting&lt;/li&gt;
&lt;li&gt;Bytecode emit helpers (&lt;code&gt;emit_byte&lt;/code&gt;, &lt;code&gt;emit_jump&lt;/code&gt;, &lt;code&gt;patch_jump&lt;/code&gt;, &lt;code&gt;emit_loop&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Constant pool management (integers, floats, strings, closures)&lt;/li&gt;
&lt;li&gt;Scope and variable resolution (&lt;code&gt;begin_scope&lt;/code&gt;, &lt;code&gt;end_scope&lt;/code&gt;, &lt;code&gt;resolve_local&lt;/code&gt;, upvalue tracking)&lt;/li&gt;
&lt;li&gt;Expression parsing (precedence climbing, binary/unary ops, calls, field access)&lt;/li&gt;
&lt;li&gt;Statement compilation (let/flux/fix, if/while/for, return, match, try/catch)&lt;/li&gt;
&lt;li&gt;Declaration compilation (functions, structs, enums, traits, impl blocks)&lt;/li&gt;
&lt;li&gt;Binary serialization (writing the LATC file format with magic bytes, version header, chunk data)&lt;/li&gt;
&lt;li&gt;Main entry point&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Claude wrote this. Not "Claude assisted with this" or "Claude generated boilerplate for this." Claude wrote a recursive descent parser for Lattice's grammar, a bytecode compiler that emits correct opcodes for the phase system, and a binary serializer that produces files the C runtime can load and execute. The compiler bootstraps: you run it with the C-based &lt;code&gt;clat&lt;/code&gt; interpreter, and it produces bytecode that the same interpreter executes.&lt;/p&gt;
&lt;p&gt;The compiler itself uses Lattice's phase system for its own internal state. The compiler's mutable working data (the bytecode buffer, the constant pool, the local variable tracking arrays) is declared with &lt;code&gt;flux&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c_lines&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;constants&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;local_name_arr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;local_depth_arr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;local_captured_arr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;flux&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;local_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is the compiler eating its own dogfood. The mutable state that the compiler needs to build bytecode is declared using the same phase system that the compiler is compiling. The phase keywords aren't decorative here; they're structurally necessary because the compiler modifies these arrays on every opcode emission and scope transition.&lt;/p&gt;
&lt;p&gt;The compiler has 118 functions across 12 sections, with 554 opcode references. It handles every construct in the language: &lt;code&gt;flux&lt;/code&gt;/&lt;code&gt;fix&lt;/code&gt; declarations, &lt;code&gt;forge&lt;/code&gt; blocks, &lt;code&gt;freeze&lt;/code&gt;/&lt;code&gt;thaw&lt;/code&gt;/&lt;code&gt;sublimate&lt;/code&gt; calls, &lt;code&gt;anneal&lt;/code&gt; and &lt;code&gt;crystallize&lt;/code&gt; expressions, struct and enum definitions with phase annotations, trait/impl blocks, match expressions with phase-aware pattern matching, structured concurrency with &lt;code&gt;scope&lt;/code&gt;/&lt;code&gt;spawn&lt;/code&gt;, channel operations, &lt;code&gt;try&lt;/code&gt;/&lt;code&gt;catch&lt;/code&gt;, &lt;code&gt;defer&lt;/code&gt;, and the complete expression grammar with correct operator precedence.&lt;/p&gt;
&lt;p&gt;Writing a self-hosted compiler requires understanding the language at every level simultaneously. The tokenizer must know every keyword, operator, and delimiter. The parser must handle every grammatical production, including the phase-specific constructs (&lt;code&gt;forge&lt;/code&gt;, &lt;code&gt;anneal&lt;/code&gt;, &lt;code&gt;crystallize&lt;/code&gt;) that exist nowhere in Claude's training data. The code generator must emit the correct opcodes for phase transitions, reactive bindings, and structured concurrency. And the whole thing must be written in the language being compiled, which means Claude is writing Lattice to compile Lattice, using constructs it learned from examples rather than training data.&lt;/p&gt;
&lt;p&gt;The compiler's serialization section writes the LATC binary format byte by byte:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;fn serialize_latc(ch: any) {
    ser_buf = Buffer::new(0)

    // Header: "LATC" + version(1) + reserved(0)
    write_u8(76)    // 'L'
    write_u8(65)    // 'A'
    write_u8(84)    // 'T'
    write_u8(67)    // 'C'
    write_u16_le(1) // format version
    write_u16_le(0) // reserved

    serialize_chunk(ch)
    return ser_buf
}
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is not pattern matching against compiler source code from the training data. No Lattice compiler exists in the training data. Claude wrote a compiler for a language that has no prior art, in a language that has no prior art, producing a binary format that has no prior art. Every decision (the magic bytes, the chunk serialization order, the upvalue encoding) came from understanding the specification I provided and the runtime behavior of the C-based interpreter.&lt;/p&gt;
&lt;h3&gt;What I Actually Gave Claude&lt;/h3&gt;
&lt;p&gt;The teaching process was less structured than you might expect. There was no formal curriculum, no staged introduction of concepts, no carefully sequenced lesson plan. And I should be honest about the recursive nature of what happened: Claude Code was the primary tool for building Lattice itself. The language, the C implementation, the grammar, the runtime, the test suite, the handbook: all of it was built with Claude Code. I designed the language and directed the implementation, but Claude wrote the C, the LaTeX, and the example programs.&lt;/p&gt;
&lt;p&gt;So the situation is: Claude wrote Lattice (the implementation), and then Claude wrote in Lattice (the programs and the self-hosted compiler). The model built the language and then learned the language it built. The "teaching material" that Claude uses to write Lattice code is documentation and examples that Claude itself produced in earlier sessions.&lt;/p&gt;
&lt;p&gt;The artifacts:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The C implementation: ~80 source files, the parser, the VM, the phase system runtime. Built with Claude Code from my architectural direction.&lt;/li&gt;
&lt;li&gt;A &lt;a href="https://tinycomputers.io/posts/introducing-lattice-a-crystallization-based-programming-language.html"&gt;handbook&lt;/a&gt;: 38 chapters covering every feature, with worked examples. Written in LaTeX with Claude Code. This lives in a repository that Claude can read in subsequent sessions.&lt;/li&gt;
&lt;li&gt;Example programs (&lt;code&gt;examples/phase_demo.lat&lt;/code&gt;, &lt;code&gt;examples/sorting.lat&lt;/code&gt;, &lt;code&gt;examples/state_machine.lat&lt;/code&gt;) that demonstrate idiomatic Lattice. Written by Claude Code.&lt;/li&gt;
&lt;li&gt;815 test files under AddressSanitizer that exercise every construct. Written by Claude Code.&lt;/li&gt;
&lt;li&gt;An EBNF grammar reference as an appendix to the handbook.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When I work with Claude on Lattice code, I don't paste the entire handbook into the context window. Claude has access to the project directory. It reads files as needed. If I ask it to write a function that uses &lt;code&gt;forge&lt;/code&gt;, it reads &lt;code&gt;examples/phase_demo.lat&lt;/code&gt; or &lt;code&gt;chapters/ch11-phases-explained.tex&lt;/code&gt; to see how &lt;code&gt;forge&lt;/code&gt; works. If I ask it to add an opcode to the compiler, it reads &lt;code&gt;include/stackopcode.h&lt;/code&gt; and &lt;code&gt;src/stackvm.c&lt;/code&gt; to understand the existing instruction set.&lt;/p&gt;
&lt;p&gt;The key insight: Claude doesn't need to be trained on a language to write it. It needs access to the specification and examples at inference time. And in this case, those specifications and examples were produced by Claude itself in prior sessions. The model's understanding is constructed on the fly from documentation in its context, not retrieved from weights. This is why the Rust resemblance matters so much: the syntax gives Claude a structural scaffold, and the specification (which Claude wrote) fills in the semantics.&lt;/p&gt;
&lt;p&gt;This is also why the self-hosted compiler was possible. By the time Claude wrote &lt;code&gt;latc.lat&lt;/code&gt;, it had already written the entire language implementation, the handbook, the test suite, and hundreds of example programs. The language had moved from "novel" to "familiar" through accumulated context, not through training. Each session built on the last. Each example reinforced the phase system's rules. By the time the compiler was attempted, Claude's working understanding of Lattice (constructed from its own prior output) was deep enough to write a 5,000-line program that correctly compiles the language. The model taught itself a language by building the language first.&lt;/p&gt;
&lt;h3&gt;Why Syntax Matters More Than Semantics&lt;/h3&gt;
&lt;p&gt;The Lattice experience suggests something counterintuitive about how LLMs interact with programming languages: syntax transfer is more powerful than semantic understanding.&lt;/p&gt;
&lt;p&gt;Claude can write correct Lattice because Lattice looks like Rust. The semantic differences (phase system vs. ownership, runtime type checking vs. compile-time guarantees, garbage collection vs. RAII) are significant, but they don't prevent Claude from producing working code. The model generates syntactically valid Lattice from Rust patterns and then adjusts the semantics when corrected.&lt;/p&gt;
&lt;p&gt;This has implications for language design. If you want AI tooling to support your language from day one, without waiting for it to appear in training data, design your syntax to rhyme with something popular. Lattice's resemblance to Rust wasn't designed for AI, but it is the reason AI can write it. A language with a radically different syntax (APL, Forth, J) would be much harder for Claude to learn from examples alone, even if the semantics were simpler.&lt;/p&gt;
&lt;p&gt;The reverse is also true: a language with familiar syntax but deeply unfamiliar semantics (like Lattice's reactive phase system) will produce code that looks correct but occasionally behaves wrong. Claude's Rust intuitions are strong enough to generate valid-looking phase code, but the model sometimes falls back to Rust's mutation model when the Lattice-specific behavior is more constrained. The syntax transfers perfectly. The semantics require teaching.&lt;/p&gt;
&lt;h3&gt;Implications for Language Designers&lt;/h3&gt;
&lt;p&gt;If you're designing a new programming language in 2026, the AI tooling question is unavoidable. Your language won't have IDE plugins, autocompleters, or AI coding assistants on day one. The community doesn't exist yet. The training data doesn't include your language. Every other language your users work with has Copilot or Claude support. Yours doesn't.&lt;/p&gt;
&lt;p&gt;Lattice suggests a strategy: make your syntax rhyme with something an LLM already knows.&lt;/p&gt;
&lt;p&gt;This isn't about copying Rust. Lattice has genuinely novel semantics. The phase system, the reactive bindings, the alloy structs with per-field phase annotations: none of these exist in Rust. But they're expressed through syntax (keywords, braces, type annotations, block expressions) that maps directly to Rust's structural patterns. Claude can parse the syntax without help and learn the semantics from examples.&lt;/p&gt;
&lt;p&gt;The alternative is designing a syntax so novel that LLMs can't bootstrap from existing knowledge. This is a legitimate design choice; some ideas genuinely need new notation. But the cost is high: your users won't get AI assistance until your language appears in training data, which requires the language to become popular first, which is harder without AI assistance. It's a chicken-and-egg problem that familiar syntax sidesteps.&lt;/p&gt;
&lt;p&gt;The practical recommendation: novel semantics, familiar syntax. Invent the ideas. Borrow the notation. Let the LLM cross the bridge on syntax and learn the semantics on the other side.&lt;/p&gt;
&lt;h3&gt;What This Means for the "AI Writes Code" Conversation&lt;/h3&gt;
&lt;p&gt;The Lattice case study complicates the popular narrative about AI code generation in both directions.&lt;/p&gt;
&lt;p&gt;For the optimists who say AI can learn anything: Claude cannot invent the reactive phase system. It cannot propose &lt;code&gt;bond&lt;/code&gt; or &lt;code&gt;seed&lt;/code&gt; or &lt;code&gt;anneal&lt;/code&gt; without being told they exist. The novel constructs, the ones that make Lattice a genuinely different language rather than a Rust reskin, are invisible to the model until explicitly specified. AI transfer learning has limits, and those limits are at the boundaries of what the training data contains.&lt;/p&gt;
&lt;p&gt;For the pessimists who say AI can only regurgitate training data: Claude wrote a 5,000-line self-hosted compiler for a language it has never seen. That is not regurgitation. The compiler produces correct bytecode for constructs (phase transitions, reactive bonds, per-field phase annotations) that exist in no other language. The model assembled knowledge from its understanding of compilers generally, Rust syntax specifically, and the Lattice specification I provided, and produced something genuinely new. Antirez called this "assembling knowledge" when he observed the same phenomenon with his &lt;a href="https://baud.rs/KJoorR"&gt;Z80 emulator project&lt;/a&gt;. I think that's the right term.&lt;/p&gt;
&lt;p&gt;The truth is somewhere that neither camp wants to occupy. LLMs can go far beyond their training data when the new territory is structurally adjacent to something they know. They cannot go beyond their training data when the new territory is structurally novel. The boundary between "adjacent" and "novel" is syntax. Familiar syntax is a bridge. Novel syntax is a wall. Novel semantics behind familiar syntax is a trap: the model crosses the bridge confidently and then occasionally falls.&lt;/p&gt;
&lt;p&gt;Lattice exists in all three zones simultaneously. Its Rust-like surface lets Claude cross the bridge. Its phase system is the novel semantics behind familiar syntax. And the self-hosted compiler is proof that the bridge, once crossed, supports weight that no one expected.&lt;/p&gt;
&lt;p&gt;I didn't set out to test the limits of LLM language understanding when I designed Lattice. I set out to build a programming language with a novel approach to mutability. The AI dimension was a side effect: I used Claude Code as my development tool because I use Claude Code for everything, and the language happened to be learnable because it happened to look like Rust. But the result is one of the more complete demonstrations of LLM transfer learning applied to a genuinely novel domain: not just writing programs in an unfamiliar language, but writing a compiler for that language, in that language, from a specification that exists nowhere in the training data.&lt;/p&gt;
&lt;p&gt;The 4,955 lines of &lt;code&gt;latc.lat&lt;/code&gt; are the proof that LLMs can go further than their training data when the conditions are right. The conditions are: familiar syntax, clear specification, accessible examples, and a human who knows when the model is wrong. Remove any one of those and the compiler doesn't get written. But with all four in place, the model produces something that works, that compiles, and that no human typed by hand.&lt;/p&gt;</description><category>ai</category><category>claude</category><category>compilers</category><category>language design</category><category>lattice</category><category>llm</category><category>phase system</category><category>programming languages</category><category>rust</category><category>self-hosting</category><guid>https://tinycomputers.io/posts/teaching-llms-languages-theyve-never-seen.html</guid><pubDate>Thu, 02 Apr 2026 13:00:00 GMT</pubDate></item><item><title>Distilled Reasoning on Strix Halo: Running a Claude-Trained Thinking Model Locally</title><link>https://tinycomputers.io/posts/distilled-reasoning-on-strix-halo-qwen35-claude-thinking.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/distilled-reasoning-on-strix-halo-qwen35-claude-thinking_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;27 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;There is a specific moment in the open-source LLM ecosystem that keeps recurring: someone takes a frontier model's outputs, uses them as training data for a smaller model, and publishes the result. The technique is called distillation, and it has been applied to coding ability, instruction following, and general knowledge. What is newer is distilling &lt;em&gt;reasoning&lt;/em&gt;—the step-by-step chain-of-thought process that models like Claude use internally when working through complex problems.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF"&gt;Jackrong's Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled&lt;/a&gt; is one of the more interesting examples. It takes the Qwen3.5-27B base model and fine-tunes it on thousands of reasoning trajectories extracted from Claude 4.6 Opus. The result is a model that exposes its thinking process through &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; tags before delivering a final answer, mimicking the extended thinking behavior that Anthropic built into Claude natively. In 4-bit quantization, the entire model fits in about sixteen gigabytes.&lt;/p&gt;
&lt;p&gt;I wanted to know two things. First, whether this kind of distilled reasoning actually works—whether a 27B model can meaningfully replicate the structured thinking of a model orders of magnitude larger. Second, whether the AMD Strix Halo APU, with its unified memory architecture and integrated RDNA 3.5 GPU, could run it at useful speeds. The answer to both turned out to be more nuanced than a simple yes or no.&lt;/p&gt;
&lt;h3&gt;The Hardware&lt;/h3&gt;
&lt;p&gt;The machine is the same &lt;a href="https://tinycomputers.io/posts/amd-ai-max+-395-system-review-a-comprehensive-analysis.html"&gt;AMD Ryzen AI MAX+ 395&lt;/a&gt; that has appeared in several previous posts. It is an APU: CPU and GPU on the same die, sharing the same pool of LPDDR5X memory. There is no PCIe bus between the processor and the graphics engine. There is no dedicated VRAM to fill up. The GPU sees roughly 65GB of addressable memory out of the system's 122GB total, which means a 16GB quantized model loads without any of the memory pressure games you play on discrete GPU setups.&lt;/p&gt;
&lt;p&gt;This matters for local LLM inference because the bottleneck for most language models is memory bandwidth, not compute. Tokens are generated one at a time, each requiring a full pass through the model's weights. The faster you can stream those weights from memory to the processing units, the faster you generate tokens. The Strix Halo's LPDDR5X provides roughly 120 GB/s of bandwidth to the unified memory pool. A discrete GPU like the RTX 4090 has 1 TB/s of bandwidth to its dedicated VRAM, but the Strix Halo never has to copy weights across a PCIe bus. For models that fit entirely in the GPU's addressable space, the unified architecture eliminates an entire class of overhead.&lt;/p&gt;
&lt;p&gt;The system runs Ollama 0.17.6, which wraps llama.cpp and provides model management and an HTTP inference API. ROCm 7.2 handles the GPU compute layer, though Ollama's GGUF inference path is primarily CPU-based with GPU offloading for specific operations. The &lt;code&gt;gfx1151&lt;/code&gt; GPU target is not yet in the mainline PyTorch or llama.cpp kernel prebuilds, so &lt;code&gt;HSA_OVERRIDE_GFX_VERSION=11.0.0&lt;/code&gt; remains necessary to map it to the closest supported target (gfx1100, Navi 31).&lt;/p&gt;
&lt;h3&gt;The Model&lt;/h3&gt;
&lt;p&gt;The model's architecture is straightforward: Qwen3.5-27B, a 27 billion parameter transformer, fine-tuned via supervised learning on structured reasoning data. What makes it interesting is the training data. The creator assembled three datasets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered"&gt;Opus-4.6-Reasoning-3000x-filtered&lt;/a&gt;&lt;/strong&gt;: Three thousand reasoning trajectories extracted from Claude 4.6 Opus, filtered for quality.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/datasets/TeichAI/claude-4.5-opus-high-reasoning-250x"&gt;claude-4.5-opus-high-reasoning-250x&lt;/a&gt;&lt;/strong&gt;: Two hundred and fifty examples of high-intensity structured reasoning from an earlier Claude version.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning-700x"&gt;Qwen3.5-reasoning-700x&lt;/a&gt;&lt;/strong&gt;: Seven hundred step-by-step problem-solving examples.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The combined training signal teaches the model to produce output in a specific format: a &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; block containing the reasoning process, followed by a clean final answer. This is architecturally similar to what Anthropic does with Claude's extended thinking, except that Claude's thinking is a native capability of the model's training and architecture, while this is a behavior pattern learned through supervised fine-tuning on examples of that behavior.&lt;/p&gt;
&lt;p&gt;The distinction matters, and I will come back to it.&lt;/p&gt;
&lt;p&gt;The model is distributed in GGUF format, which is the standard for llama.cpp and Ollama. I used the Q4_K_M quantization, which compresses the model's weights from 16-bit floats to 4-bit integers with a mixed precision scheme that preserves more information in attention layers. The file is 15.4GB on disk. The &lt;a href="https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF"&gt;model card&lt;/a&gt; reports 29-35 tokens per second on an RTX 3090; I was curious what the Strix Halo would deliver.&lt;/p&gt;
&lt;h3&gt;Setting It Up&lt;/h3&gt;
&lt;p&gt;Getting the model running took less than ten minutes. Download the GGUF file from HuggingFace:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;mkdir&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;~/models/qwen35-reasoning
curl&lt;span class="w"&gt; &lt;/span&gt;-L&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;~/models/qwen35-reasoning/model.gguf&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="s1"&gt;'https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/resolve/main/Qwen3.5-27B.Q4_K_M.gguf'&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Note the filename. The HuggingFace repo is named &lt;code&gt;Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF&lt;/code&gt;, but the actual GGUF files inside follow a simpler naming scheme: &lt;code&gt;Qwen3.5-27B.Q4_K_M.gguf&lt;/code&gt;. I wasted time trying to guess the full distilled name before checking the API.&lt;/p&gt;
&lt;p&gt;Create an Ollama Modelfile that imports the local GGUF and sets inference parameters:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;FROM&lt;span class="w"&gt; &lt;/span&gt;/home/alex/models/qwen35-reasoning/model.gguf

PARAMETER&lt;span class="w"&gt; &lt;/span&gt;temperature&lt;span class="w"&gt; &lt;/span&gt;0.6
PARAMETER&lt;span class="w"&gt; &lt;/span&gt;top_p&lt;span class="w"&gt; &lt;/span&gt;0.95
PARAMETER&lt;span class="w"&gt; &lt;/span&gt;num_ctx&lt;span class="w"&gt; &lt;/span&gt;8192
PARAMETER&lt;span class="w"&gt; &lt;/span&gt;repeat_penalty&lt;span class="w"&gt; &lt;/span&gt;1.2
PARAMETER&lt;span class="w"&gt; &lt;/span&gt;stop&lt;span class="w"&gt; &lt;/span&gt;"&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;|endoftext|&amp;gt;"
PARAMETER&lt;span class="w"&gt; &lt;/span&gt;stop&lt;span class="w"&gt; &lt;/span&gt;"&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;|im_end|&amp;gt;"
PARAMETER&lt;span class="w"&gt; &lt;/span&gt;stop&lt;span class="w"&gt; &lt;/span&gt;"&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;|eot_id|&amp;gt;"

SYSTEM&lt;span class="w"&gt; &lt;/span&gt;"You&lt;span class="w"&gt; &lt;/span&gt;are&lt;span class="w"&gt; &lt;/span&gt;a&lt;span class="w"&gt; &lt;/span&gt;deep-thinking&lt;span class="w"&gt; &lt;/span&gt;AI&lt;span class="w"&gt; &lt;/span&gt;assistant.&lt;span class="w"&gt; &lt;/span&gt;For&lt;span class="w"&gt; &lt;/span&gt;complex&lt;span class="w"&gt; &lt;/span&gt;questions,
use&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;think&amp;gt;&lt;/span&gt;...&lt;span class="nt"&gt;&amp;lt;/think&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;tags&lt;span class="w"&gt; &lt;/span&gt;to&lt;span class="w"&gt; &lt;/span&gt;show&lt;span class="w"&gt; &lt;/span&gt;your&lt;span class="w"&gt; &lt;/span&gt;reasoning&lt;span class="w"&gt; &lt;/span&gt;process&lt;span class="w"&gt; &lt;/span&gt;before
providing&lt;span class="w"&gt; &lt;/span&gt;the&lt;span class="w"&gt; &lt;/span&gt;final&lt;span class="w"&gt; &lt;/span&gt;answer."
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Then:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;ollama&lt;span class="w"&gt; &lt;/span&gt;create&lt;span class="w"&gt; &lt;/span&gt;qwen35-reasoning&lt;span class="w"&gt; &lt;/span&gt;-f&lt;span class="w"&gt; &lt;/span&gt;Modelfile
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Ollama copies the GGUF into its own blob store, parses the architecture metadata, and registers it as a runnable model. The whole process takes about a minute on local storage.&lt;/p&gt;
&lt;h3&gt;The Stop Token Problem&lt;/h3&gt;
&lt;p&gt;The first run produced correct output followed by infinite repetition. The model answered a calculus question perfectly, then appended "This gives us the final answer:" and repeated the entire solution, over and over, until it hit the context window limit. The previous &lt;a href="https://www.marktechpost.com/2026/03/26/a-coding-implementation-to-run-qwen3-5-reasoning-models-distilled-with-claude-style-thinking-using-gguf-and-4-bit-quantization/"&gt;MarkTechPost&lt;/a&gt; article that inspired this experiment did not mention this issue, likely because their test prompts were short enough that the repetition was not obvious.&lt;/p&gt;
&lt;p&gt;The fix is explicit stop tokens in the Modelfile. Without them, the model does not know when to stop generating. This is a common issue with GGUF models imported into Ollama without a proper chat template: the model's native end-of-sequence tokens are not being interpreted by the inference engine. Adding &lt;code&gt;&amp;lt;|endoftext|&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;|im_end|&amp;gt;&lt;/code&gt;, and &lt;code&gt;&amp;lt;|eot_id|&amp;gt;&lt;/code&gt; as stop parameters catches the three most common EOS tokens used by Qwen and Llama-family models.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;repeat_penalty&lt;/code&gt; of 1.2 provides a second layer of defense by penalizing the model for reusing recent tokens. This helps but is not sufficient on its own. Without the stop tokens, the model can produce novel-but-meaningless text that avoids exact repetition while still degenerating into nonsense. More on this shortly.&lt;/p&gt;
&lt;h3&gt;Where It Works: Structured Problems&lt;/h3&gt;
&lt;p&gt;With the stop tokens in place, the model performs well on structured mathematical and analytical problems. I gave it a calculus question: find the derivative of x³sin(x) using the product rule.&lt;/p&gt;
&lt;p&gt;The response was genuinely good. The model opened a &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; block, identified the two component functions, recalled the product rule formula, computed each derivative, and applied the rule. Then it closed the think block and produced a clean, well-formatted answer with LaTeX notation, step-by-step derivation, and a factored final form. The thinking trace was coherent and tracked the actual reasoning process. It was not filler; each line in the trace corresponded to a meaningful step.&lt;/p&gt;
&lt;p&gt;Generation speed on the Strix Halo: &lt;strong&gt;10.3 tokens per second&lt;/strong&gt;. Not fast by cloud standards, but responsive enough for interactive use. You see the thinking appear in real time, which is surprisingly useful: you can watch the model work through the problem and catch errors before it commits to a final answer.&lt;/p&gt;
&lt;p&gt;For structured problems—mathematics, code analysis, formal logic—the distilled reasoning is genuinely functional. The model identifies subproblems, works through them sequentially, and arrives at correct answers. The think tags provide transparency into the process that you do not get from a standard instruction-tuned model.&lt;/p&gt;
&lt;h3&gt;Where It Falls Apart: The River Crossing&lt;/h3&gt;
&lt;p&gt;I ran the classic &lt;a href="https://en.wikipedia.org/wiki/Wolf,_goat_and_cabbage_problem"&gt;wolf-goat-cabbage river crossing&lt;/a&gt; puzzle as a comparison test, the same prompt on both the distilled Qwen model and Claude Haiku 4.5 via the Anthropic API.&lt;/p&gt;
&lt;p&gt;Claude Haiku returned a perfect, concise seven-step solution in 2.9 seconds. Two hundred and twenty-three tokens. The answer identified the critical insight (bring the goat back on one return trip), laid out the sequence clearly, and stopped.&lt;/p&gt;
&lt;p&gt;The Qwen model started well. It correctly identified that the goat must go first, recognized the wolf-goat conflict at the destination, and identified the need to bring the goat back. Then, around step three of the solution, the model began editorializing. "Oh joy what fun times ahead us humans truly enjoy sometimes huh?!" it wrote, mid-solution. Within a few more sentences, the output had degenerated into an unbroken stream-of-consciousness rant that cascaded into a wall of increasingly disconnected words. Not repeated words—the repeat penalty prevented that—but a firehose of unique, semantically null text that continued until it filled the entire 8,192-token context window.&lt;/p&gt;
&lt;p&gt;The output was, to use a technical term, unhinged. The model went from a correct partial solution to word salad in about two hundred tokens, and there was no recovery. The stop tokens could not save it because the model was not producing any end-of-sequence markers. It had entered a mode where it was generating fluent English syntax with zero semantic content, which is exactly the kind of failure that stop tokens and repeat penalties cannot catch.&lt;/p&gt;
&lt;h3&gt;What the Comparison Reveals&lt;/h3&gt;
&lt;p&gt;The numbers tell the story concisely:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Claude Haiku 4.5&lt;/th&gt;
&lt;th&gt;Qwen3.5-27B (Strix Halo)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.9 seconds&lt;/td&gt;
&lt;td&gt;Hit 8K context limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;75.9 tok/s&lt;/td&gt;
&lt;td&gt;~10 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;223 tokens, correct&lt;/td&gt;
&lt;td&gt;Thousands of tokens, degenerated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.0009&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;But the comparison is not really about speed or cost. It is about the difference between native reasoning and distilled reasoning.&lt;/p&gt;
&lt;p&gt;Claude's extended thinking is a capability that emerges from the model's architecture and training at scale. The model has internalized what it means to reason through a problem, including knowing when to stop, when a line of reasoning is unproductive, and when to switch strategies. These are meta-cognitive skills that are extremely difficult to distill.&lt;/p&gt;
&lt;p&gt;The Qwen model learned the &lt;em&gt;format&lt;/em&gt; of reasoning—the think tags, the step-by-step structure, the pattern of stating subproblems and working through them—from three thousand examples. What it did not learn, and arguably cannot learn from supervised fine-tuning alone, is the judgment about when reasoning is going off the rails. A model that has truly internalized reasoning has implicit quality checks: it recognizes incoherence in its own output and corrects course. A model that has learned to &lt;em&gt;mimic&lt;/em&gt; reasoning produces the surface pattern without the underlying self-monitoring.&lt;/p&gt;
&lt;p&gt;This is visible in the failure mode. The model did not produce wrong reasoning. It produced &lt;em&gt;no&lt;/em&gt; reasoning. It exited the reasoning pattern entirely and entered a generation mode that had nothing to do with the problem. A model with genuine reasoning capability would have recognized the incoherence and either corrected or terminated. The distilled model had no such circuit breaker.&lt;/p&gt;
&lt;h3&gt;The Economics&lt;/h3&gt;
&lt;p&gt;The cost comparison deserves its own section because it is often cited as the primary motivation for running local models.&lt;/p&gt;
&lt;p&gt;The Claude Haiku API call cost nine-tenths of a cent. If you ran a thousand similar queries per day, you would spend about nine dollars. That is less than the electricity cost of running the Strix Halo for a day under load. The Strix Halo draws roughly 65 watts at idle and 150 watts under GPU inference load. At Minnesota's residential electricity rate of around twelve cents per kilowatt-hour, running inference eight hours a day costs about fourteen cents. But the hardware itself cost north of two thousand dollars. You would need to amortize that over thousands of hours of inference to reach cost parity with the API, and only if you value your debugging time at zero.&lt;/p&gt;
&lt;p&gt;The economic case for local inference is not about per-query cost. It is about use cases where you need unlimited queries without metering, where data cannot leave your network, or where you want to experiment with model behavior without worrying about a bill. If you are evaluating a model's failure modes by running hundreds of adversarial prompts—which is exactly what I was doing—the local model is the right tool because you are not optimizing for answer quality. You are optimizing for the freedom to explore.&lt;/p&gt;
&lt;h3&gt;The Strix Halo as an Inference Platform&lt;/h3&gt;
&lt;p&gt;Ten tokens per second for a 27B Q4 model is respectable for an APU. It is not competitive with a discrete GPU: an RTX 3090 delivers 29-35 tokens per second on the same model, roughly three times faster. But the Strix Halo was not designed to compete with discrete GPUs on raw throughput.&lt;/p&gt;
&lt;p&gt;What it offers instead is capacity. The unified memory pool means you can load models that would not fit on most consumer GPUs. A Q8_0 quantization of this same model would be 28.6GB, which exceeds the VRAM of an RTX 4090 (24GB) but fits comfortably in the Strix Halo's addressable space. You could load a 70B Q4 model (roughly 40GB) without any of the layer-splitting gymnastics required on multi-GPU setups. I have run Llama 3.1 70B Q4 on this machine, and while the generation speed drops to about 4-5 tokens per second, it runs without errors or memory pressure.&lt;/p&gt;
&lt;p&gt;For a machine that also serves as a daily desktop, development workstation, and &lt;a href="https://tinycomputers.io/posts/ltx-api.html"&gt;video generation server&lt;/a&gt; (it runs LTX-2.3 on the same hardware), the ability to casually load and test a 27B reasoning model without dedicated GPU infrastructure is the actual value proposition. You do not plan a session. You do not allocate resources. You type &lt;code&gt;ollama run qwen35-reasoning&lt;/code&gt; and it works.&lt;/p&gt;
&lt;h3&gt;Lessons for the Blog Post Reader&lt;/h3&gt;
&lt;p&gt;If you want to replicate this setup, here is what I would emphasize:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The stop tokens are non-negotiable.&lt;/strong&gt; Without explicit &lt;code&gt;&amp;lt;|endoftext|&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;|im_end|&amp;gt;&lt;/code&gt;, and &lt;code&gt;&amp;lt;|eot_id|&amp;gt;&lt;/code&gt; stop parameters in your Modelfile, the model will produce infinite output on many prompts. This is not documented in the model card and is not mentioned in the MarkTechPost article that covers this implementation. It is the single most important configuration detail.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The model is good at structured problems and bad at open-ended ones.&lt;/strong&gt; Mathematics, code analysis, formal logic—anything where the reasoning has a clear structure and a definitive endpoint—works well. Open-ended problems, creative tasks, or anything requiring sustained coherent narrative are risky. The model can degenerate catastrophically and without warning.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A repeat penalty helps but does not solve the fundamental issue.&lt;/strong&gt; Setting &lt;code&gt;repeat_penalty&lt;/code&gt; to 1.2 prevents exact repetition loops but does not prevent the semantic degeneration I observed on the river crossing problem. The model simply produces unique garbage instead of repeated garbage.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Distillation captures form, not judgment.&lt;/strong&gt; The think tags are real and useful. The step-by-step reasoning format works. What is missing is the implicit self-monitoring that frontier models have: the ability to recognize when their own output has become incoherent and to course-correct. This is probably the hardest thing to distill, because it is not present in the training examples. The examples show successful reasoning. They do not show the model catching and recovering from failed reasoning, because Claude's failed reasoning attempts are filtered out before the training data is assembled.&lt;/p&gt;
&lt;h3&gt;Where This Goes&lt;/h3&gt;
&lt;p&gt;The distilled reasoning model is, despite its failure modes, genuinely interesting. The &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; tags provide a form of transparency that standard instruction-tuned models lack. When the model is working correctly—which is most of the time on appropriate tasks—you get a window into the reasoning process that helps you evaluate the answer's quality before you act on it.&lt;/p&gt;
&lt;p&gt;The failure mode is also instructive. It demonstrates, concretely, the gap between learning a behavior pattern and internalizing the capability that produces that pattern. Supervised fine-tuning on reasoning trajectories can teach a model to produce reasoning-shaped output, but it cannot, from three thousand examples, teach the model to actually reason in the way the source model does. That requires either far more training data, a different training methodology (reinforcement learning from reasoning feedback, perhaps), or simply a larger model with more capacity to internalize the underlying patterns.&lt;/p&gt;
&lt;p&gt;For now, the practical advice is: use these models for what they are good at, know their failure modes, and do not trust the output on open-ended problems without reading the thinking trace. The trace is the feature. If the trace is coherent, the answer is probably good. If the trace starts to wander, stop reading and retry.&lt;/p&gt;
&lt;p&gt;The model runs on my desk, generates ten tokens per second, costs nothing per query, and shows its work. For a sixteen-gigabyte download and ten minutes of setup time, that is a reasonable deal—as long as you know what you are buying.&lt;/p&gt;</description><category>amd</category><category>chain-of-thought</category><category>claude</category><category>distillation</category><category>gguf</category><category>inference</category><category>llm</category><category>ollama</category><category>open-source</category><category>quantization</category><category>qwen</category><category>reasoning</category><category>strix halo</category><guid>https://tinycomputers.io/posts/distilled-reasoning-on-strix-halo-qwen35-claude-thinking.html</guid><pubDate>Sun, 29 Mar 2026 14:00:00 GMT</pubDate></item><item><title>Running a 22B Video Model on Four Tesla P40s</title><link>https://tinycomputers.io/posts/running-ltx-video-on-four-tesla-p40s.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/running-ltx-video-on-four-tesla-p40s_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;22 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;LTX-Video 2.3 is a 22 billion parameter model that generates video from text prompts. It was designed for modern hardware: GPUs with bfloat16 support, high-bandwidth memory, and enough VRAM to hold the full model on one or two cards. The &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;Tesla P40&lt;/a&gt; has none of these things. It is a Pascal-generation GPU from 2016, with 24GB of GDDR5X per card, no native bfloat16, no Tensor Cores, and a PCIe 3.0 bus. It was built for data center inference workloads that no longer exist.&lt;/p&gt;
&lt;p&gt;I have four of them in a rack-mount server in an unheated shop building in Minnesota. Together they provide 96GB of VRAM. The question was whether that 96GB, spread across four old cards, could run a model that was never meant to run on any of them.&lt;/p&gt;
&lt;p&gt;The answer is yes, with significant caveats and a substantial amount of code to work around hardware limitations that the model's authors never anticipated.&lt;/p&gt;
&lt;h3&gt;The Problem&lt;/h3&gt;
&lt;p&gt;LTX-Video 2.3's transformer has 48 blocks. At fp16 precision, the model weights alone consume roughly 44GB. With the Gemma text encoder, the video VAE encoder/decoder, the spatial upsampler, and the audio components, the full pipeline needs more memory than any single P40 can provide. The model doesn't fit on one card. It doesn't fit on two. It barely fits on three, with no room for activations during inference.&lt;/p&gt;
&lt;p&gt;Four cards at 24GB each gives 96GB total, which is enough for the weights with room for intermediate activations. But CUDA doesn't automatically spread a model across multiple GPUs. You have to tell it how.&lt;/p&gt;
&lt;p&gt;The standard approach for multi-GPU inference is &lt;code&gt;accelerate&lt;/code&gt;'s &lt;code&gt;dispatch_model&lt;/code&gt;, which automatically distributes model layers across available GPUs based on memory constraints. This works for the Gemma text encoder, which is a straightforward transformer. For the LTX transformer, it doesn't work, because the model has a custom forward pass with audio-video cross-attention that &lt;code&gt;accelerate&lt;/code&gt;'s automatic dispatch can't handle correctly. The model needs to move data between GPUs at specific points in the forward pass, and &lt;code&gt;accelerate&lt;/code&gt; doesn't know where those points are.&lt;/p&gt;
&lt;p&gt;The solution was manual pipeline parallelism: split the 48 transformer blocks evenly across four GPUs (12 blocks per card), keep the shared components (patchify projections, normalization, output projections) on GPU 0, and write a custom forward pass that moves tensors between devices at block boundaries.&lt;/p&gt;
&lt;h3&gt;The Precision Problem&lt;/h3&gt;
&lt;p&gt;Even with the model split across four cards, nothing worked on the first attempt. Or the fifth. Getting LTX-Video running on Pascal hardware was an iterative process, with Claude Code generating solutions and me testing them against the actual hardware. Each failure revealed another assumption the model made about the GPU it would run on. The feedback loop was brutal: load a 22B model across four GPUs, wait eight minutes for a test generation, get a black frame or a NaN error, diagnose which precision boundary caused it, generate a fix, and try again.&lt;/p&gt;
&lt;p&gt;The first problem was bfloat16. The model weights are stored in bf16 format. Pascal GPUs cannot compute in bf16. PyTorch handles this silently for some operations by promoting to fp32, but other operations fail or produce garbage. The initial approach was the obvious one: monkey-patch &lt;code&gt;torch.bfloat16&lt;/code&gt; to redirect to &lt;code&gt;torch.float16&lt;/code&gt;. This seemed to work at load time. The model loaded, the weights populated, no errors. Then the first forward pass produced NaN everywhere. The monkey-patch had corrupted the safetensors weight loading. The weights loaded as fp16 bit patterns interpreted as bf16 values, which is not the same thing. A bf16 value of 1.0 has a different bit pattern than an fp16 value of 1.0. Reinterpret one as the other and you get a number that's either wildly wrong or NaN.&lt;/p&gt;
&lt;p&gt;The second attempt tried running everything in fp16 natively, converting weights properly during load. This got further: the model produced output that wasn't NaN. But the output was a solid green frame. The intermediate activations in the transformer blocks were overflowing fp16 range. Values above 65,504 become infinity in fp16, and the model's internal representations regularly exceed that during the attention and feedforward passes. The green frame was the model's attempt to decode latents that had been clipped to infinity at some point in the pipeline.&lt;/p&gt;
&lt;p&gt;The working solution was to let the model builder properly convert weights from bf16 to fp16 on load, then run the entire computation pipeline in float32. The weights sit in memory as fp16 (saving space), but every computation promotes to fp32 before executing. This required patching &lt;code&gt;F.linear&lt;/code&gt; to handle mixed dtype inputs:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;_orig_linear&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linear&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_mixed_linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_orig_linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linear&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_mixed_linear&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The same pattern extends to every normalization function and every convolution operation. Layer norm, group norm, RMS norm, conv1d through conv_transpose3d: all patched to handle mixed dtypes and accumulate in float32. Without these patches, intermediate values overflow fp16 range (values above 65,504 become infinity) and the output is a black frame.&lt;/p&gt;
&lt;h3&gt;The Gemma Problem&lt;/h3&gt;
&lt;p&gt;The text encoder is Google's Gemma 3, a separate model that converts text prompts into embeddings the video transformer can condition on. Gemma's attention mechanism overflows when run in fp16 on Pascal hardware. The attention scores grow large enough to exceed fp16 range, producing NaN values that propagate through the rest of the pipeline.&lt;/p&gt;
&lt;p&gt;The fix was running the entire Gemma encoder in float32. This uses more memory, but the text encoder only runs once per generation (to encode the prompt), and its weights can be freed from GPU memory before the transformer starts. The sequence is: load Gemma across all four GPUs using &lt;code&gt;accelerate&lt;/code&gt;, encode the prompt in float32, delete the encoder, free the memory, then load the video transformer.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;encode_prompt_float32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_ledger&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;model_ledger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;
    &lt;span class="n"&gt;te&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_ledger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_encoder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# Dispatch across all 4 GPUs for memory&lt;/span&gt;
    &lt;span class="n"&gt;max_memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_balanced_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;te&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"22GiB"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
        &lt;span class="n"&gt;no_split_module_classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Gemma3DecoderLayer"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;te&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dispatch_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;te&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;hidden_states&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attention_mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;te&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Free GPU memory before transformer loads&lt;/span&gt;
    &lt;span class="k"&gt;del&lt;/span&gt; &lt;span class="n"&gt;te&lt;/span&gt;
    &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;empty_cache&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This load-encode-delete cycle is ugly but necessary. There isn't enough total memory to hold both Gemma and the video transformer simultaneously, even across four cards. The sequential approach works because each component only needs to exist during its phase of the pipeline.&lt;/p&gt;
&lt;h3&gt;The Pipeline&lt;/h3&gt;
&lt;p&gt;The generation runs in two stages, matching LTX-Video's distilled inference schedule.&lt;/p&gt;
&lt;p&gt;Stage 1 generates a half-resolution latent video (e.g., 256x384) through 8 denoising steps. Each step runs the full 48-block transformer, with data moving across all four GPUs:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;patched_process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;perturbations&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ltx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transformer_blocks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;dev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;block_devices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;video&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;move_args_to_device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;audio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;move_args_to_device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;audio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                             &lt;span class="n"&gt;perturbations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;perturbations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;video&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;move_args_to_device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;audio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;move_args_to_device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;audio&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Every GPU boundary involves a tensor transfer across PCIe 3.0. With 12 blocks per GPU, there are 3 boundary crossings per denoising step (GPU 0 to 1, 1 to 2, 2 to 3), plus a final transfer back to GPU 0. With 8 denoising steps, that's 32 cross-device transfers per stage, each moving both video and audio state tensors. PCIe 3.0 x16 has a theoretical bandwidth of ~16 GB/s. The tensors being transferred are small relative to the bandwidth (attention states and activations, not full weight matrices), so the overhead is manageable. But it adds up.&lt;/p&gt;
&lt;p&gt;Stage 1 takes roughly 4 minutes for 241 frames at 24 fps (a 10-second clip). The spatial upsampler then doubles the resolution. Stage 2 runs 3 more denoising steps at full resolution (512x768), taking roughly 6.5 minutes. The VAE decoder converts latents to pixels and generates the audio track in another 40 seconds.&lt;/p&gt;
&lt;p&gt;Total generation time for a 10-second, 512x768 video with audio: approximately 18.5 minutes. For a 1-second clip (25 frames): about 8 minutes. For a 4-second clip (97 frames): about 10.5 minutes.&lt;/p&gt;
&lt;h3&gt;The Memory Layout&lt;/h3&gt;
&lt;p&gt;During inference, the four GPUs aren't loaded equally. GPU 0 carries extra weight because it hosts all the shared components (patchify projections, normalization layers, output projections) plus its 12 transformer blocks. The actual memory distribution:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;VRAM Used&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;10.8 GB&lt;/td&gt;
&lt;td&gt;Shared components + blocks 0-11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;9.3 GB&lt;/td&gt;
&lt;td&gt;Blocks 12-23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;9.3 GB&lt;/td&gt;
&lt;td&gt;Blocks 24-35&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;9.3 GB&lt;/td&gt;
&lt;td&gt;Blocks 36-47&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;That's 38.7 GB of the available 96 GB. The remaining 57 GB provides headroom for activations, KV cache growth, and the VAE decoder. There's enough margin that generation never OOMs, even at 241 frames.&lt;/p&gt;
&lt;h3&gt;The API&lt;/h3&gt;
&lt;p&gt;Running inference from the command line is fine for testing, but generating videos for blog content requires something more practical. I wrapped the generation script in a FastAPI server with an async job queue:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Submit a text-to-video job&lt;/span&gt;
curl&lt;span class="w"&gt; &lt;/span&gt;-X&lt;span class="w"&gt; &lt;/span&gt;POST&lt;span class="w"&gt; &lt;/span&gt;http://10.1.1.24:8585/jobs&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;-F&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prompt=A cinematic flyover of a Zilog Z80 processor on a PCB"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;-F&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"duration=10"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-F&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"seed=42"&lt;/span&gt;

&lt;span class="c1"&gt;# Submit an image-to-video job&lt;/span&gt;
curl&lt;span class="w"&gt; &lt;/span&gt;-X&lt;span class="w"&gt; &lt;/span&gt;POST&lt;span class="w"&gt; &lt;/span&gt;http://10.1.1.24:8585/jobs&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;-F&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prompt=A fluffy orange cat dancing"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;-F&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"duration=4"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-F&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"image=@cat.jpg"&lt;/span&gt;

&lt;span class="c1"&gt;# Check status&lt;/span&gt;
curl&lt;span class="w"&gt; &lt;/span&gt;http://10.1.1.24:8585/jobs/07420abb6d82

&lt;span class="c1"&gt;# Download result&lt;/span&gt;
curl&lt;span class="w"&gt; &lt;/span&gt;http://10.1.1.24:8585/jobs/07420abb6d82/video&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;output.mp4
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Jobs queue and execute sequentially. The GPU can only handle one generation at a time, and the load-encode-delete cycle for Gemma means there's significant setup overhead per job. The API spawns each job as a subprocess, which gives clean GPU memory cleanup between runs. If a generation crashes (which happened frequently during development), the next job starts fresh.&lt;/p&gt;
&lt;p&gt;The server supports both text-to-video and image-to-video. Image conditioning locks the first frame to a provided image and generates subsequent frames from it, which produces more controllable results for specific visual subjects. In practice, image-to-video is the more useful mode. Text-to-video gives the model complete creative freedom, which means the output is unpredictable. You might ask for a Z80 processor and get something that looks like a generic IC, or something that looks like a Z80, depending on the seed. Image-to-video lets you provide the exact first frame you want and the model animates from there. For blog content where visual accuracy matters, starting from a real photograph or a specific reference image gives consistently better results.&lt;/p&gt;
&lt;h3&gt;What the Output Looks Like&lt;/h3&gt;
&lt;p&gt;The video quality is genuinely good. LTX-Video 2.3 produces coherent motion, reasonable physics, and detailed textures. Here are three examples, generated entirely on the P40 server:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Text-to-video: "A cinematic flyover of a Zilog Z80 processor on a printed circuit board" (10 seconds, 18.5 minutes to generate)&lt;/strong&gt;&lt;/p&gt;
&lt;video controls preload="metadata" style="max-width: 100%; border-radius: 6px; box-shadow: 0 10px 20px rgba(0,0,0,.1); margin: 1em 0;"&gt;
&lt;source src="https://tinycomputers.io/ltx-z80-flyover.mp4" type="video/mp4"&gt;
&lt;/source&gt;&lt;/video&gt;

&lt;p&gt;&lt;strong&gt;Image-to-video: "A fluffy orange cat with a hat dancing" (4 seconds, 10.5 minutes to generate)&lt;/strong&gt;&lt;/p&gt;
&lt;video controls preload="metadata" style="max-width: 100%; border-radius: 6px; box-shadow: 0 10px 20px rgba(0,0,0,.1); margin: 1em 0;"&gt;
&lt;source src="https://tinycomputers.io/ltx-cat-dancing.mp4" type="video/mp4"&gt;
&lt;/source&gt;&lt;/video&gt;

&lt;p&gt;&lt;strong&gt;Text-to-video: "A cat sitting on a windowsill, sunlight streaming in" (1 second, 8 minutes to generate)&lt;/strong&gt;&lt;/p&gt;
&lt;video controls preload="metadata" style="max-width: 100%; border-radius: 6px; box-shadow: 0 10px 20px rgba(0,0,0,.1); margin: 1em 0;"&gt;
&lt;source src="https://tinycomputers.io/ltx-cat-windowsill.mp4" type="video/mp4"&gt;
&lt;/source&gt;&lt;/video&gt;

&lt;p&gt;The model understands object permanence, lighting consistency, and basic spatial relationships. The Z80 flyover produces a recognizable IC package with surrounding components, proper lighting, and smooth camera movement.&lt;/p&gt;
&lt;p&gt;The audio is a different story. LTX-Video 2.3 generates an audio track alongside the video, but the results are inconsistent. Prompts describing characters speaking produce odd ambient music instead of voices. Prompts describing environments produce vaguely appropriate soundscapes. The audio pipeline works mechanically (it generates real audio waveforms via a separate VAE decoder and vocoder), but the semantic connection between prompt and audio output is weak. For blog content, I'd likely strip the generated audio and add narration or music separately.&lt;/p&gt;
&lt;p&gt;The 512x768 resolution at 24fps is usable for web content. It's not 4K. It's not going to replace stock footage for production video. But for blog hero images in motion, visual demonstrations, or supplementary content alongside text, it works.&lt;/p&gt;
&lt;h3&gt;What This Cost&lt;/h3&gt;
&lt;p&gt;The hardware cost is zero incremental. The four P40s and the server already existed for &lt;a href="https://tinycomputers.io/posts/the-economics-of-owning-your-own-inference.html"&gt;LLM inference&lt;/a&gt;. LTX-Video is an additional workload on the same hardware.&lt;/p&gt;
&lt;p&gt;The electricity cost is modest. The server draws roughly 500W under full GPU load. An 18.5-minute generation (10-second video at full resolution) consumes about 0.15 kWh, roughly $0.024 at Minnesota residential rates. You could generate forty 10-second clips for a dollar.&lt;/p&gt;
&lt;p&gt;The real cost was development time. Getting from "model downloaded" to "working generation pipeline" took many iterations across multiple sessions with Claude Code. Each precision-related failure mode (bf16 corruption, fp16 overflow, mixed-dtype kernel errors, NaN propagation through attention) required diagnosis, a hypothesis, a code change, and a test cycle that involved loading a 22B model across four GPUs. The feedback loop was slow. A single test takes 8 to 18 minutes to confirm whether a change worked. Many didn't.&lt;/p&gt;
&lt;h3&gt;The Broader Point&lt;/h3&gt;
&lt;p&gt;A 22 billion parameter video generation model was not designed to run on 2016 hardware. The authors assumed bf16, assumed modern attention kernels, assumed enough memory on one or two cards. None of those assumptions hold on the P40.&lt;/p&gt;
&lt;p&gt;But the model runs anyway, because the underlying math doesn't actually require any of those features. Bfloat16 is a convenience, not a requirement; float32 computes the same function. Flash attention is an optimization, not a necessity; standard attention produces identical results. And 96GB across four cards is 96GB, regardless of whether it's cutting-edge HBM3 or decade-old GDDR5X.&lt;/p&gt;
&lt;p&gt;The generation is slow. Eighteen minutes for ten seconds of video is not competitive with a single A100, which would finish the same job in under two minutes. The float32 computation pipeline roughly doubles the FLOPS required compared to the bf16 path the model was designed for, and the PCIe 3.0 transfers between four separate memory pools add latency that a single modern GPU with unified HBM would never incur. But competitive wasn't the point. The point was that four GPUs I bought on eBay for a thousand dollars total, sitting in a server in a shop building, can run a model that was released this month. The gap between "latest model" and "latest hardware" is not as wide as the spec sheets suggest, as long as you're willing to write the code that bridges it.&lt;/p&gt;
&lt;p&gt;The P40 server was already paying for itself on &lt;a href="https://tinycomputers.io/posts/the-economics-of-owning-your-own-inference.html"&gt;LLM inference&lt;/a&gt; and &lt;a href="https://tinycomputers.io/posts/the-real-cost-of-running-qwen-tts-locally-three-machines-compared.html"&gt;TTS generation&lt;/a&gt;. Video generation is one more workload on a machine that I own, running models that I choose, on a schedule that I control. The 18-minute wait is the price of not asking anyone's permission.&lt;/p&gt;</description><category>ai</category><category>cuda</category><category>gpu</category><category>home lab</category><category>inference</category><category>ltx video</category><category>multi-gpu</category><category>pascal</category><category>pipeline parallelism</category><category>tesla p40</category><category>video generation</category><guid>https://tinycomputers.io/posts/running-ltx-video-on-four-tesla-p40s.html</guid><pubDate>Fri, 20 Mar 2026 13:00:00 GMT</pubDate></item><item><title>Processing 51,000 Photos with AI on AMD Strix Halo</title><link>https://tinycomputers.io/posts/processing-51000-photos-with-ai-on-amd-strix-halo.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/processing-51000-photos-with-ai-on-amd-strix-halo_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;17 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;I have roughly 20 years of photos sitting on a home fileserver. They span 2001 to 2020, shot on everything from a &lt;a href="https://baud.rs/StrgMz"&gt;Minolta&lt;/a&gt; DiMAGE F100 to a &lt;a href="https://baud.rs/qJQjcb"&gt;Nikon D5100&lt;/a&gt; to various iPhones over the years. A mix of 21,554 JPEGs and 29,860 Nikon RAW files (51,414 images total) organized in a &lt;a href="https://amzn.to/4lwULpW"&gt;Lightroom&lt;/a&gt; backup directory by year, month, and date. Most were shot handheld, many in a hurry. The kind of archive that accumulates when you take photos for two decades without ever going back to curate them.&lt;/p&gt;
&lt;p&gt;The Lightroom catalog that once made sense of all this was long gone, lost to a drive migration somewhere around 2018. What remained was a directory tree of raw files with no organization beyond the date folders. No star ratings, no keywords, no collections. Just files. Thousands of them, some sideways, some crooked, all unlabeled.&lt;/p&gt;
&lt;p&gt;I wanted to fix that. Not manually (I don't have a month to spend in Lightroom) but programmatically. The goals were straightforward: correct orientation issues, straighten crooked horizons, generate AI descriptions of every photo's content, and catalog the whole archive in a queryable database. The kind of batch processing job that would have been impractical five years ago but is now entirely doable with the right hardware and a weekend of scripting.&lt;/p&gt;
&lt;h3&gt;The Hardware&lt;/h3&gt;
&lt;p&gt;Two machines on the local network, each with a distinct role:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Machine&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Key Specs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fileserver&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NAS / photo storage&lt;/td&gt;
&lt;td&gt;28TB RAID (&lt;code&gt;/md0&lt;/code&gt;), 125GB RAM, NFS exports&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU workstation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ML inference&lt;/td&gt;
&lt;td&gt;&lt;a href="https://baud.rs/6jjmD9"&gt;AMD Ryzen AI Max+ 395&lt;/a&gt;, Radeon 8060S, 121GB RAM&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The fileserver is a straightforward storage box. The interesting machine is the GPU workstation running an AMD Strix Halo APU, specifically the AI Max+ 395 with its integrated Radeon 8060S. I've written about this chip &lt;a href="https://tinycomputers.io/posts/amd-ai-max+-395-system-review-a-comprehensive-analysis.html"&gt;before&lt;/a&gt;, and it continues to impress for inference workloads. The RDNA 3.5 integrated GPU shares system memory, giving it access to 65.2 GB of VRAM without the typical constraints of a discrete card. For a model like BLIP that needs maybe 2 GB, that's absurdly generous, but it means you never have to think about VRAM budgets, which is a luxury when you're iterating on a processing pipeline.&lt;/p&gt;
&lt;p&gt;The fileserver already had NFS configured, exporting &lt;code&gt;/md0&lt;/code&gt; to the local subnet. One mount command on the GPU workstation and both machines could see the same filesystem:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;mount&lt;span class="w"&gt; &lt;/span&gt;-t&lt;span class="w"&gt; &lt;/span&gt;nfs&lt;span class="w"&gt; &lt;/span&gt;fileserver.localnet:/md0&lt;span class="w"&gt; &lt;/span&gt;/md0
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;No file copying, no rsync scripts, no staging directories. The photos live on the NAS and get processed in-place over the network. Gigabit Ethernet introduces some I/O overhead (each 25 MB NEF file takes 200–300ms to read across the wire), but for an overnight batch job, the simplicity of a single shared filesystem is worth the throughput trade-off. If this were a recurring workflow, I'd invest in 10GbE, but for a one-time archive processing run, gigabit got it done.&lt;/p&gt;
&lt;h3&gt;The Software Stack&lt;/h3&gt;
&lt;p&gt;Everything runs in a Python virtual environment on the GPU workstation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;PyTorch 2.9.1+rocm6.3&lt;/strong&gt;: ML framework with AMD ROCm backend&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BLIP&lt;/strong&gt; (&lt;a href="https://huggingface.co/Salesforce/blip-image-captioning-large"&gt;&lt;code&gt;Salesforce/blip-image-captioning-large&lt;/code&gt;&lt;/a&gt;): vision-language model for image captioning&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenCV 4.13&lt;/strong&gt;: horizon detection via Canny edge detection and Hough transforms&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;rawpy 0.26.1&lt;/strong&gt;: Nikon NEF/NRW decoding (wraps LibRaw)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;piexif&lt;/strong&gt;: EXIF metadata extraction for JPEGs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;exiftool&lt;/strong&gt;: EXIF extraction for RAW files (called as a subprocess)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SQLite&lt;/strong&gt;: metadata and results database&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;The gfx1151 Situation&lt;/h4&gt;
&lt;p&gt;If you've followed my &lt;a href="https://tinycomputers.io/posts/getting-pytorch-working-with-amd-radeon-pro-w7900-max+-395-a-comprehensive-guide.html"&gt;previous posts on Strix Halo&lt;/a&gt;, you know the drill. The Radeon 8060S reports as &lt;code&gt;gfx1151&lt;/code&gt; in ROCm, which is newer than what PyTorch's ROCm wheels officially target. The fix is the same environment variable override:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nb"&gt;export&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;HSA_OVERRIDE_GFX_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;11&lt;/span&gt;.0.0
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This maps the GPU to a generic gfx11 target. In practice, it works without issues, with no compute errors and no performance penalties. ROCm 6.16 on this machine also reports &lt;code&gt;amdgcn-amd-amdhsa--gfx11-generic&lt;/code&gt; as a supported ISA, which is likely why the override works cleanly. I've been running production workloads with this flag for months now without incident.&lt;/p&gt;
&lt;h3&gt;The Processing Pipeline&lt;/h3&gt;
&lt;p&gt;Each photo passes through five stages: EXIF extraction, orientation correction, horizon detection and straightening, AI captioning, and finally saving the corrected image and cataloging everything in SQLite.&lt;/p&gt;
&lt;h4&gt;EXIF Metadata Extraction&lt;/h4&gt;
&lt;p&gt;For JPEGs, &lt;code&gt;piexif&lt;/code&gt; reads the embedded EXIF data directly; it's a pure Python library that parses the binary EXIF structure without needing any external dependencies. For NEF/NRW files, piexif can't handle Nikon's proprietary container format, so I shell out to &lt;code&gt;exiftool&lt;/code&gt; with JSON output (&lt;code&gt;exiftool -json -n &amp;lt;file&amp;gt;&lt;/code&gt;). The &lt;code&gt;-n&lt;/code&gt; flag is important; it returns numeric values instead of human-readable strings, which makes downstream processing much cleaner.&lt;/p&gt;
&lt;p&gt;The extracted fields cover the full gamut: camera make and model, lens, dates, exposure settings (shutter speed, aperture, ISO, focal length), flash, white balance, metering mode, GPS coordinates, and the original orientation tag.&lt;/p&gt;
&lt;p&gt;EXIF data is notoriously inconsistent across two decades of cameras. I'll come back to this; it became a debugging story of its own.&lt;/p&gt;
&lt;h4&gt;Orientation Correction&lt;/h4&gt;
&lt;p&gt;The EXIF orientation tag (values 1 through 8) encodes how the camera was held when the photo was taken. A value of 1 means the image is right-side up. A value of 6 means the camera was rotated 90 degrees clockwise. Value 3 means 180 degrees. Some values encode horizontal or vertical flips. The full matrix looks like this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FLIP_LEFT_RIGHT&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ROTATE_180&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FLIP_TOP_BOTTOM&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FLIP_LEFT_RIGHT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ROTATE_270&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ROTATE_270&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FLIP_LEFT_RIGHT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ROTATE_90&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ROTATE_90&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Out of the 51,411 successfully processed photos, &lt;strong&gt;8,797 (17.1%) needed orientation correction&lt;/strong&gt;. The majority came from the Nikon D5100 and iPhone 4, both of which set the orientation tag but don't bake the rotation into the pixel data itself. Without this correction, nearly one in five photos would display sideways or upside-down in any viewer that doesn't respect EXIF orientation.&lt;/p&gt;
&lt;p&gt;Here's what that looks like in practice. The raw pixel data from this iPhone photo is stored sideways; the camera recorded an EXIF orientation tag of 6, meaning "rotate 90 degrees clockwise to display correctly." Any viewer that ignores that tag renders the image on its side:&lt;/p&gt;
&lt;div style="display: flex; gap: 10px; margin: 20px 0;"&gt;
&lt;div style="flex: 1; text-align: center;"&gt;
&lt;img src="https://tinycomputers.io/images/photo-proc-dog-before.jpg" alt="Dog photo with incorrect orientation - displayed sideways" style="max-width: 100%; box-shadow: 2px 2px 6px rgba(0,0,0,0.3);"&gt;
&lt;p&gt;&lt;em&gt;Before: raw pixel data (EXIF orientation 6, displayed sideways)&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div style="flex: 1; text-align: center;"&gt;
&lt;img src="https://tinycomputers.io/images/photo-proc-dog-after.jpg" alt="Dog photo after EXIF orientation correction - displayed upright" style="max-width: 100%; box-shadow: 2px 2px 6px rgba(0,0,0,0.3);"&gt;
&lt;p&gt;&lt;em&gt;After: orientation corrected based on EXIF tag&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;h4&gt;Horizon Detection and Straightening&lt;/h4&gt;
&lt;p&gt;This stage uses classical computer vision, no neural network needed. The approach:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Downscale the image to 1200px on the long side for speed&lt;/li&gt;
&lt;li&gt;Convert to grayscale, apply Gaussian blur&lt;/li&gt;
&lt;li&gt;Run Canny edge detection&lt;/li&gt;
&lt;li&gt;Crop to the vertical middle 50%, since the horizon is rarely at the extreme top or bottom of a frame&lt;/li&gt;
&lt;li&gt;Apply the Hough Line Transform to find line segments, requiring a minimum length of one-quarter the image width&lt;/li&gt;
&lt;li&gt;Filter to near-horizontal lines (within 20 degrees of level)&lt;/li&gt;
&lt;li&gt;Compute a weighted average of the detected angles, weighted by line length&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The key is the threshold window. If the detected angle is less than 0.5 degrees, it's not worth correcting, since you'd introduce interpolation artifacts for no visible benefit. If it's greater than 15 degrees, it's probably not a tilted horizon at all; it's either intentional composition or the algorithm latching onto a staircase railing. The correction itself uses &lt;code&gt;cv2.warpAffine&lt;/code&gt; with Lanczos interpolation and a reflective border mode, followed by an inward crop to eliminate any border artifacts:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;crop_factor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;angle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;sin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;angle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The initial implementation used Canny edge detection and Hough line transforms, classical CV techniques from the 1980s. Fast, deterministic, 100ms per image. But it had a fatal flaw: it couldn't distinguish between a tilted horizon and a roofline receding toward a vanishing point. Architecture, roads, staircases, any strong line in the middle band of the image would register as a "tilted horizon," and the algorithm would dutifully rotate the image to "correct" it. In practice, this meant a significant number of photos were being made &lt;em&gt;worse&lt;/em&gt;, not better.&lt;/p&gt;
&lt;p&gt;The fix was to replace Hough line detection with semantic segmentation. SegFormer (&lt;code&gt;nvidia/segformer-b2-finetuned-ade-512-512&lt;/code&gt;), trained on the ADE20K dataset, segments each image into 150 classes, including sky. The approach is simple: find the sky pixels, trace the bottom edge of the sky region, fit a line to that boundary, and measure its angle. If there's no sky (less than 5% of the image), or the sky boundary is too fragmented (fewer than 20 points), skip the correction entirely.&lt;/p&gt;
&lt;p&gt;This eliminates false positives on indoor shots, close-ups, architecture, and anything without a visible sky. SegFormer runs on CPU at about 0.4 seconds per image; the model is only 25M parameters, so it doesn't need the GPU. The GPU stays dedicated to BLIP captioning.&lt;/p&gt;
&lt;p&gt;Two examples from the corrected archive. This bridge over a river had a 2.68-degree clockwise tilt, and the bridge deck and far shore are visibly leveled:&lt;/p&gt;
&lt;div style="display: flex; gap: 10px; margin: 20px 0;"&gt;
&lt;div style="flex: 1; text-align: center;"&gt;
&lt;img src="https://tinycomputers.io/images/photo-proc-river-before.jpg" alt="Bridge over river with tilted horizon" style="max-width: 100%; box-shadow: 2px 2px 6px rgba(0,0,0,0.3);"&gt;
&lt;p&gt;&lt;em&gt;Before: 2.68° clockwise tilt&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div style="flex: 1; text-align: center;"&gt;
&lt;img src="https://tinycomputers.io/images/photo-proc-river-after.jpg" alt="Bridge over river with corrected horizon" style="max-width: 100%; box-shadow: 2px 2px 6px rgba(0,0,0,0.3);"&gt;
&lt;p&gt;&lt;em&gt;After: horizon straightened via sky boundary detection&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;This rocky Lake Superior shore had a 3.85-degree clockwise tilt, and the far horizon is leveled:&lt;/p&gt;
&lt;div style="display: flex; gap: 10px; margin: 20px 0;"&gt;
&lt;div style="flex: 1; text-align: center;"&gt;
&lt;img src="https://tinycomputers.io/images/photo-proc-shore-before.jpg" alt="Rocky lakeshore with tilted horizon" style="max-width: 100%; box-shadow: 2px 2px 6px rgba(0,0,0,0.3);"&gt;
&lt;p&gt;&lt;em&gt;Before: 3.85° clockwise tilt&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div style="flex: 1; text-align: center;"&gt;
&lt;img src="https://tinycomputers.io/images/photo-proc-shore-after.jpg" alt="Rocky lakeshore with corrected horizon" style="max-width: 100%; box-shadow: 2px 2px 6px rgba(0,0,0,0.3);"&gt;
&lt;p&gt;&lt;em&gt;After: horizon straightened&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;h4&gt;AI Captioning with BLIP&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;Salesforce/blip-image-captioning-large&lt;/code&gt; model generates natural language descriptions of each photo. It runs in float16 on the Radeon 8060S. Each image is resized to a maximum of 1024px before inference. Beam search with 5 beams and a 75-token limit generates the caption:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;output_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_beams&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;early_stopping&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Caption inference takes about 0.5–0.7 seconds per image, consistent regardless of whether the input was a JPEG or a decoded NEF. The model handles a wide variety of subjects surprisingly well. Some examples from the archive:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"a brown and white dog standing next to a blue chair"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"two silos sitting in the middle of a field"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"a bird sitting on a branch of a tree"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"a wooden sign that says hoban road in front of some trees"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"a blurry photo of a car driving down a snowy road"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"a dog being groomed by a woman in a salon"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The captions tend toward a "there is a..." pattern, and they occasionally get details wrong (BLIP once described a photo of my living room as "a hotel lobby," which is generous). But for searchability and cataloging purposes, they're remarkably useful. Being able to query &lt;code&gt;WHERE caption LIKE '%dog%'&lt;/code&gt; across 51,000 photos and get meaningful results is something that would have required manual tagging before models like BLIP existed. For an archive this size, "good enough" captions on every photo are vastly more useful than perfect captions on none of them.&lt;/p&gt;
&lt;h4&gt;Save and Catalog&lt;/h4&gt;
&lt;p&gt;Corrected images are saved as high-quality JPEGs (quality 92) to &lt;code&gt;/md0/photos_processed/images/&lt;/code&gt;, mirroring the original directory structure. NEF and NRW files are converted to JPEG in the process; the corrected archive is a uniform format. All metadata flows into a SQLite database with WAL journaling, tracking 40+ fields per photo: every piece of EXIF data, processing flags (was orientation corrected? was the horizon straightened? by how many degrees?), the AI caption, file hashes, dimensions, and processing timestamps.&lt;/p&gt;
&lt;p&gt;The database makes the archive queryable in ways that were never possible before:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;-- What cameras did I use, and when?&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;camera_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_taken&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_taken&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;photos&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;camera_model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Photos with GPS data&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;caption&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gps_latitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gps_longitude&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;photos&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gps_latitude&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;IS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- How crooked were my photos, by camera?&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;camera_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ABS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;horizon_angle_degrees&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;avg_tilt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;photos&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;horizon_corrected&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;camera_model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;avg_tilt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;The EXIF Tuple Bug&lt;/h3&gt;
&lt;p&gt;The first processing pass completed 51,414 photos, but with 2,146 errors. All of them were &lt;code&gt;TypeError: type tuple doesn't define __round__ method&lt;/code&gt;. For a pipeline that had been running cleanly on thousands of Nikon D5100 and D60 photos, this was unexpected.&lt;/p&gt;
&lt;p&gt;The root cause turned out to be a two-part problem with how certain budget cameras from the 2008–2012 era write EXIF rational numbers.&lt;/p&gt;
&lt;h4&gt;Part 1: Malformed Tuples&lt;/h4&gt;
&lt;p&gt;The EXIF standard stores rational numbers as &lt;code&gt;(numerator, denominator)&lt;/code&gt; pairs. Most cameras follow this. But some, particularly a batch of older point-and-shoots, wrote the &lt;code&gt;ExposureBiasValue&lt;/code&gt; field as a 4-element tuple like &lt;code&gt;(36, 0, 18, 0)&lt;/code&gt; instead of the expected 2-element &lt;code&gt;(36, 0)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;My &lt;code&gt;_rational_to_float&lt;/code&gt; helper only handled 2-tuples:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_rational_to_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;  &lt;span class="c1"&gt;# passes through 4-tuples as raw tuples&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When a 4-tuple fell through, the downstream &lt;code&gt;round()&lt;/code&gt; call choked on it. The fix was simple: return &lt;code&gt;None&lt;/code&gt; for any tuple that isn't a standard rational pair.&lt;/p&gt;
&lt;h4&gt;Part 2: None Propagation&lt;/h4&gt;
&lt;p&gt;Even after fixing Part 1, many of these same cameras had written &lt;code&gt;(36, 0)&lt;/code&gt;, a rational with a zero denominator. The function correctly returned &lt;code&gt;None&lt;/code&gt; for division by zero, but the calling code then did &lt;code&gt;round(None, 2)&lt;/code&gt;, triggering the same &lt;code&gt;TypeError&lt;/code&gt; with a slightly different message.&lt;/p&gt;
&lt;p&gt;The fix was a &lt;code&gt;_safe_round&lt;/code&gt; wrapper:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_safe_round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;digits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;digits&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="ne"&gt;TypeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After both fixes, the second pass recovered all 2,143 photos. The remaining 3 errors were genuine file corruption: a truncated JPEG, a NEF that LibRaw couldn't parse, and a NEF with filesystem-level I/O errors. Probably bad sectors on the source drive. Those can't be fixed in code.&lt;/p&gt;
&lt;p&gt;This is one of those bugs that only surfaces at scale. Run the pipeline on a hundred Nikon photos and everything works perfectly. Run it on 51,000 photos spanning 15 different camera models over 20 years, and every edge case in the EXIF spec comes out to play. The lesson, which I should have internalized long ago: never trust external data formats at scale without defensive parsing on every field. The EXIF spec is a suggestion, not a contract, and camera manufacturers have been interpreting it creatively since the early 2000s.&lt;/p&gt;
&lt;h3&gt;Resumability&lt;/h3&gt;
&lt;p&gt;A 15-hour batch job will inevitably need to be restarted: bugs, system updates, a random hound disconnects the magsafe power cord from my MacBook Pro. The script tracks progress in SQLite and skips completed files on restart:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;is_already_processed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;"SELECT id FROM photos WHERE source_path = ? AND error IS NULL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_path&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Photos that failed with errors are intentionally &lt;em&gt;not&lt;/em&gt; skipped, so fixing a bug and re-running automatically retries them. This made the EXIF debugging cycle painless: fix the parser, clear the failed rows from the database, relaunch, and only the 2,143 affected photos get reprocessed.&lt;/p&gt;
&lt;h3&gt;Performance&lt;/h3&gt;
&lt;p&gt;The pipeline sustained &lt;strong&gt;1.0–1.8 photos per second&lt;/strong&gt;, depending on file format:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Time per Photo&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;JPEG load&lt;/td&gt;
&lt;td&gt;~10ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NEF decode (rawpy)&lt;/td&gt;
&lt;td&gt;~400ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MD5 hash&lt;/td&gt;
&lt;td&gt;~5ms (JPEG), ~100ms (NEF)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Horizon detection&lt;/td&gt;
&lt;td&gt;~100ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BLIP inference&lt;/td&gt;
&lt;td&gt;~500–700ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JPEG save&lt;/td&gt;
&lt;td&gt;~50ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;BLIP inference dominates the runtime. NEF decoding is the second bottleneck; each RAW file is 20–30 MB and requires full demosaicing through LibRaw. The NFS overhead for reading large NEFs over gigabit Ethernet is noticeable but not the primary constraint.&lt;/p&gt;
&lt;p&gt;Total wall time: &lt;strong&gt;15.5 hours&lt;/strong&gt; across two passes for 51,414 photos. The BLIP model uses roughly 2 GB of the 65.2 GB available VRAM on the Strix Halo. Memory was never a concern.&lt;/p&gt;
&lt;h3&gt;Final Results&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;Percentage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total photos&lt;/td&gt;
&lt;td&gt;51,414&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Successfully processed&lt;/td&gt;
&lt;td&gt;51,411&lt;/td&gt;
&lt;td&gt;99.99%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orientation corrected&lt;/td&gt;
&lt;td&gt;8,797&lt;/td&gt;
&lt;td&gt;17.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Horizon straightened&lt;/td&gt;
&lt;td&gt;15,251&lt;/td&gt;
&lt;td&gt;29.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI captioned&lt;/td&gt;
&lt;td&gt;51,411&lt;/td&gt;
&lt;td&gt;99.99%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unrecoverable errors&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0.006%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The top cameras in the archive tell the story of 20 years of gear:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Camera&lt;/th&gt;
&lt;th&gt;Photos&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://baud.rs/qJQjcb"&gt;Nikon D5100&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;24,073&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://baud.rs/mwoMko"&gt;Nikon D60&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;8,734&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iPhone 4&lt;/td&gt;
&lt;td&gt;2,664&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://baud.rs/ACrtrD"&gt;Nikon D3100&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1,698&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://baud.rs/jxhHU5"&gt;Panasonic DMC-FX07&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;975&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://baud.rs/StrgMz"&gt;Minolta DiMAGE F100&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;870&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iPad&lt;/td&gt;
&lt;td&gt;803&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iPhone 5s&lt;/td&gt;
&lt;td&gt;698&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://baud.rs/10it3U"&gt;Samsung SCH-I500&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;645&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The output lives on the NAS:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Corrected images&lt;/strong&gt;: &lt;code&gt;/md0/photos_processed/images/&lt;/code&gt;, 51,411 JPEGs preserving the original year/month/date folder structure, all NEFs converted, all orientation and horizon corrections applied.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SQLite database&lt;/strong&gt;: &lt;code&gt;/md0/photos_processed/photos.db&lt;/code&gt;, 40+ fields per photo with full EXIF metadata, processing results, and AI-generated captions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Processing log&lt;/strong&gt;: &lt;code&gt;/md0/photos_processed/processing.log&lt;/code&gt;, timestamped record of the entire run.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Takeaways&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;AMD's Strix Halo continues to earn its keep for ML inference.&lt;/strong&gt; The &lt;code&gt;HSA_OVERRIDE_GFX_VERSION=11.0.0&lt;/code&gt; workaround remains necessary, but once set, PyTorch and ROCm run without complaints. The 65 GB shared VRAM pool means you can load models without thinking about memory budgets, a workflow advantage that's easy to underestimate until you've experienced it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Classical computer vision still has its place.&lt;/strong&gt; The horizon detection pipeline uses Canny edge detection and Hough transforms, techniques from the 1980s. No training data, no GPU needed, deterministic results, and the whole thing runs in 100ms per image. For geometric corrections on photographic images, you don't need a neural network. You need line detection.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;EXIF is a minefield.&lt;/strong&gt; Twenty years of cameras from different manufacturers means every edge case in the spec gets exercised. Tuple lengths vary, denominators are zero, fields are missing or repurposed. If you're parsing EXIF at scale, assume nothing about the data's shape and validate everything.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Resumability is non-negotiable for long-running jobs.&lt;/strong&gt; Tracking progress in the database and skipping completed work made it trivial to iterate on bugs. Without this, every fix would mean reprocessing 51,000 photos from scratch.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NFS over gigabit is fine for batch processing.&lt;/strong&gt; Not optimal, but for an overnight job, the network overhead from NAS-attached storage is acceptable. The real bottleneck was ML inference at 0.6 seconds per photo. If I were doing this regularly, 10GbE would be worth the upgrade, but for a one-time archive processing run, gigabit got the job done.&lt;/p&gt;
&lt;p&gt;The whole project, from first SSH to final database entry, took about a day of wall time, most of which was unattended processing. The scripting itself was maybe three hours of work. Twenty years of photos, cataloged and corrected overnight. Not bad for a Strix Halo and some Python. The full source is available on &lt;a href="https://github.com/ajokela/photo-processor"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What I didn't expect was how useful the database would be after the fact. Being able to ask "show me every photo I took with the D5100 at ISO 3200 or higher" or "find photos with GPS data from 2015" turns a pile of files into something that actually tells a story. The AI captions add another dimension; I can now search my own photo archive by content, not just metadata. It's the kind of capability that makes you wonder why photo management software hasn't done this for years. The models have been available. The hardware has been affordable. Someone just needed to wire it together.&lt;/p&gt;</description><category>ai max+ 395</category><category>amd</category><category>blip</category><category>computer vision</category><category>exif</category><category>image captioning</category><category>machine learning</category><category>nef</category><category>nikon</category><category>opencv</category><category>photography</category><category>pytorch</category><category>rocm</category><category>sqlite</category><category>strix halo</category><guid>https://tinycomputers.io/posts/processing-51000-photos-with-ai-on-amd-strix-halo.html</guid><pubDate>Sat, 14 Mar 2026 17:00:00 GMT</pubDate></item><item><title>The Real Cost of Running Qwen TTS Locally: Three Machines Compared</title><link>https://tinycomputers.io/posts/the-real-cost-of-running-qwen-tts-locally-three-machines-compared.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-real-cost-of-running-qwen-tts-locally-three-machines-compared_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;17 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/qwen-tts-benchmark/p40-server-shop.jpg" alt="The Tesla P40 server standing on its side in an unheated Minnesota shop building, one of three machines benchmarked for local TTS generation" style="float: right; max-width: 40%; margin: 0 0 1em 1.5em; border-radius: 4px; box-shadow: 0 30px 40px rgba(0,0,0,.1);"&gt;&lt;/p&gt;
&lt;p&gt;Every post on this site has an audio version. A small player at the top, a few minutes of narration, generated entirely on local hardware. No cloud API, no per-character fees, no data leaving the network. I wrote about &lt;a href="https://tinycomputers.io/posts/qwen-tts-on-amd-strix-halo.html"&gt;setting up the pipeline on AMD Strix Halo&lt;/a&gt; earlier this year, and the system has been running in production since, generating narrations for new posts, regenerating old ones when I revise them, and occasionally processing long-form content that would cost real money through Google Cloud TTS or ElevenLabs.&lt;/p&gt;
&lt;p&gt;But I now have three machines capable of running Qwen3-TTS, and they could not be more different from each other. An Apple M3 Max laptop. An AMD Ryzen AI MAX+ 395 mini desktop with integrated Radeon graphics. And a &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;four-GPU Tesla P40 server&lt;/a&gt; built from decade-old enterprise hardware bought on eBay. Three different silicon vendors, three different compute backends (MPS, ROCm, and CUDA) running the same model on the same text.&lt;/p&gt;
&lt;p&gt;The question I wanted to answer is simple: how do they actually compare? Not on paper. Not in theoretical FLOPS. In wall-clock time, generating real audio from a real blog post.&lt;/p&gt;
&lt;p&gt;The answer turned out to be more interesting than I expected, because the numbers tell a story about hardware architecture that raw specifications completely miss.&lt;/p&gt;
&lt;h3&gt;The Setup&lt;/h3&gt;
&lt;p&gt;The model is &lt;a href="https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice"&gt;Qwen3-TTS-12Hz-1.7B-CustomVoice&lt;/a&gt;, a 1.7 billion parameter autoregressive text-to-speech model from Alibaba's Qwen team. It generates natural-sounding speech with multiple speaker voices. I use the Eric voice for all blog narrations: clear, professional, well-paced for technical content.&lt;/p&gt;
&lt;p&gt;The three machines:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Apple M3 Max&lt;/strong&gt;, a &lt;a href="https://amzn.to/4rwlTa6"&gt;MacBook Pro&lt;/a&gt; with Apple's M3 Max chip. 14 CPU cores, 30 GPU cores, 64GB unified memory. The GPU runs through PyTorch's MPS (Metal Performance Shaders) backend. This is my daily driver laptop, and it generates TTS when I am writing and editing posts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AMD Radeon 8060S&lt;/strong&gt;, a Bosgame M5 mini desktop running &lt;a href="https://amzn.to/4bv5CMG"&gt;AMD's Ryzen AI MAX+ 395&lt;/a&gt;. This is a Strix Halo APU with integrated RDNA 3.5 graphics, not a discrete GPU. It shares 128GB of DDR5 system memory with the CPU, with roughly 96GB addressable as VRAM. The GPU runs through ROCm 7.2 with PyTorch 2.9.1. The gfx1151 architecture requires specific PyTorch wheels from AMD's pre-release index and several environment variable overrides to function. I wrote a &lt;a href="https://tinycomputers.io/posts/qwen-tts-on-amd-strix-halo.html"&gt;full setup guide&lt;/a&gt; for this machine.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NVIDIA Tesla P40&lt;/strong&gt;, a 2U rack-mount server with four &lt;a href="https://www.ebay.com/itm/306087510352?_skw=nvidia+tesla+p40+24gb+gpu&amp;amp;epid=27032254618&amp;amp;itmmeta=01KKJEGQKSK110HNM6214EB0TT&amp;amp;hash=item47443cc150:g:qAwAAOSwy0toUHXh&amp;amp;itmprp=enc%3AAQALAAABAGfYFPkwiKCW4ZNSs2u11xAq6UjArKrgnuEyMVTZhAZhOSUGYags6TsDJvvCEOa51UH2r%2BRe%2F182ah6rgiTIAIRULQNEL9rbiinCXMor%2FBNNZk0GaNKqTWkq9pLWGoRBM8NL%2BjC1aSA63XPe4YsFHjQkb%2Fmup21S3UM7oqwBrW%2BHep1E07lnrt2vzkljSA4xg7SnrA%2BFDtOdqvDwO4tpgB0t%2BtCv9%2BlXoh%2BeoEgpJqXgaaM0ad48OfmgKB13PF9RIPXLNI6z4SjV2O%2FXOk6nYPyD9Eg5wbzdmsXfNRhwitz7HEZ1bTRUnRmvKzQrw4B3r3LAag5f8%2B8CcCWfCRAkkG8%3D%7Ctkp%3ABk9SR4j6ws6cZw&amp;amp;mkcid=1&amp;amp;mkrid=711-53200-19255-0&amp;amp;siteid=0&amp;amp;campid=5338960379&amp;amp;customid=&amp;amp;toolid=10001&amp;amp;mkevt=1"&gt;Tesla P40 GPUs&lt;/a&gt;, each with 24GB of GDDR5X. Pascal architecture from 2016. Compute capability 6.1. No Tensor Cores, no native bfloat16 support. The benchmark uses a single P40, since Qwen TTS runs on one GPU. This machine lives in an unheated shop building in Minnesota and &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;screams through the winter&lt;/a&gt; when the BMC misinterprets sub-zero ambient temperatures as a hardware malfunction.&lt;/p&gt;
&lt;p&gt;All three machines run the same model checkpoint, the same text input, and the same speaker voice. The only differences are the silicon and the compute backend.&lt;/p&gt;
&lt;h3&gt;The Benchmark&lt;/h3&gt;
&lt;p&gt;I used a standardized 2,411-character passage, five paragraphs on the Jevons Paradox, dense enough to exercise the model's prosody and pacing on real written content. Each machine ran three consecutive generations from the same loaded model, producing roughly three minutes of audio per run. The first run includes kernel compilation and cache warmup; subsequent runs reflect steady-state performance.&lt;/p&gt;
&lt;p&gt;The metric that matters is Real-Time Factor (RTF): how many seconds of wall-clock time it takes to generate one second of audio. An RTF of 1.0 means the model generates audio at exactly real-time speed. Below 1.0 is faster than real-time. Above 1.0 means you are waiting.&lt;/p&gt;
&lt;h4&gt;Individual Runs&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Apple M3 Max (MPS)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Run&lt;/th&gt;
&lt;th&gt;Generation Time&lt;/th&gt;
&lt;th&gt;Audio Length&lt;/th&gt;
&lt;th&gt;RTF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;698.5s&lt;/td&gt;
&lt;td&gt;197.7s&lt;/td&gt;
&lt;td&gt;3.53&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;533.1s&lt;/td&gt;
&lt;td&gt;184.2s&lt;/td&gt;
&lt;td&gt;2.89&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;447.8s&lt;/td&gt;
&lt;td&gt;179.2s&lt;/td&gt;
&lt;td&gt;2.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Average&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;559.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;187.0s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.97&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;AMD Radeon 8060S (ROCm)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Run&lt;/th&gt;
&lt;th&gt;Generation Time&lt;/th&gt;
&lt;th&gt;Audio Length&lt;/th&gt;
&lt;th&gt;RTF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;729.2s&lt;/td&gt;
&lt;td&gt;173.6s&lt;/td&gt;
&lt;td&gt;4.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;460.0s&lt;/td&gt;
&lt;td&gt;204.8s&lt;/td&gt;
&lt;td&gt;2.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;548.2s&lt;/td&gt;
&lt;td&gt;214.2s&lt;/td&gt;
&lt;td&gt;2.56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Average&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;579.1s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;197.5s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.00&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;NVIDIA Tesla P40 (CUDA)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Run&lt;/th&gt;
&lt;th&gt;Generation Time&lt;/th&gt;
&lt;th&gt;Audio Length&lt;/th&gt;
&lt;th&gt;RTF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1511.4s&lt;/td&gt;
&lt;td&gt;204.1s&lt;/td&gt;
&lt;td&gt;7.41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1225.7s&lt;/td&gt;
&lt;td&gt;171.6s&lt;/td&gt;
&lt;td&gt;7.14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1537.2s&lt;/td&gt;
&lt;td&gt;206.7s&lt;/td&gt;
&lt;td&gt;7.44&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Average&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1424.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;194.1s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.33&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h4&gt;Summary&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Machine&lt;/th&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;Avg RTF&lt;/th&gt;
&lt;th&gt;Best RTF&lt;/th&gt;
&lt;th&gt;Avg Gen Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MacBook Pro&lt;/td&gt;
&lt;td&gt;M3 Max (MPS)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.97&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.50&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;559.8s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bosgame M5&lt;/td&gt;
&lt;td&gt;Radeon 8060S (ROCm)&lt;/td&gt;
&lt;td&gt;3.00&lt;/td&gt;
&lt;td&gt;2.25&lt;/td&gt;
&lt;td&gt;579.1s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Penguin 2U&lt;/td&gt;
&lt;td&gt;Tesla P40 (CUDA)&lt;/td&gt;
&lt;td&gt;7.33&lt;/td&gt;
&lt;td&gt;7.14&lt;/td&gt;
&lt;td&gt;1424.8s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;What the Numbers Mean&lt;/h3&gt;
&lt;p&gt;The headline result is that the M3 Max and Radeon 8060S are essentially tied, and the Tesla P40 is roughly 2.4 times slower than both. But that summary hides the interesting details.&lt;/p&gt;
&lt;h4&gt;The Warmup Effect Is Massive&lt;/h4&gt;
&lt;p&gt;On both the M3 Max and the Radeon 8060S, the first run is dramatically slower than subsequent runs. The M3 Max goes from RTF 3.53 on run 1 to RTF 2.50 on run 3, a 29% improvement. The AMD shows an even larger swing: RTF 4.20 on run 1 dropping to RTF 2.25 on run 2, a 46% improvement.&lt;/p&gt;
&lt;p&gt;This is kernel compilation. Both MPS and ROCm compile GPU kernels on first use and cache them for subsequent calls. The Qwen TTS model hits a wide variety of kernel shapes during autoregressive generation (different sequence lengths, different attention patterns) and each new shape triggers a compilation on the first encounter. By run 2, most of the common shapes are cached, and performance stabilizes.&lt;/p&gt;
&lt;p&gt;The P40 shows almost no warmup effect. RTF 7.41 on run 1, 7.14 on run 2, 7.44 on run 3. CUDA's kernel compilation is faster and more mature, so the overhead is absorbed within the first few seconds rather than spread across the entire run. But this maturity does not translate into faster inference; CUDA compiles faster, but the P40's hardware is fundamentally slower at the operations this model requires.&lt;/p&gt;
&lt;p&gt;This has a practical implication that matters: &lt;strong&gt;short benchmarks on MPS and ROCm are misleading.&lt;/strong&gt; I initially ran a quick 276-character test on all three machines before doing the full benchmark. The short test showed the AMD at RTF 9.20, almost identical to the P40's RTF 10.01, and far behind the M3 Max's RTF 2.84. That result nearly led me to conclude the AMD was performing as poorly as decade-old hardware. The longer benchmark, with its warmup effect amortized across more generation, revealed the truth: the AMD is just as fast as the M3 Max once the kernels are cached. If I had stopped at the short test, I would have drawn exactly the wrong conclusion.&lt;/p&gt;
&lt;h4&gt;Why the P40 Is So Slow&lt;/h4&gt;
&lt;p&gt;The Tesla P40 is a Pascal-generation GPU from 2016. It has 3,840 CUDA cores and 24GB of GDDR5X memory. On paper, it should be competitive; 12 TFLOPS of FP32 compute is not trivial. And for LLM inference through Ollama, the P40 &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;performs remarkably well&lt;/a&gt;, outperforming quad T4 instances on models up to 8B parameters.&lt;/p&gt;
&lt;p&gt;TTS is a different workload. Qwen3-TTS is an autoregressive transformer that generates audio tokens one at a time, each conditioned on all previous tokens. This means the inference is heavily memory-bandwidth bound during the decoding phase, and compute-bound during the attention and feedforward passes. The model is distributed in bfloat16 precision, which the P40 cannot compute natively; Pascal predates bfloat16 support entirely. PyTorch silently promotes bf16 operations to fp32 on the P40, roughly doubling the computation per operation and halving the effective throughput.&lt;/p&gt;
&lt;p&gt;The P40 also lacks the SDPA (Scaled Dot-Product Attention) hardware acceleration that newer architectures provide. On the M3 Max, MPS routes attention through Metal's optimized primitives. On the AMD, ROCm's AOTriton provides experimental flash attention support. On the P40, attention runs through standard CUDA kernels without any of these accelerations. For a model that generates thousands of autoregressive steps per audio clip, each involving a full attention pass over the growing sequence, this compounds dramatically.&lt;/p&gt;
&lt;p&gt;The P40 is not bad hardware. It is excellent hardware for the workloads it was designed for: batch inference on quantized LLMs where its 24GB of VRAM per card creates a memory advantage. But autoregressive TTS in bfloat16 hits every one of its architectural weaknesses simultaneously.&lt;/p&gt;
&lt;h4&gt;Unified Memory Wins This Workload&lt;/h4&gt;
&lt;p&gt;Both the M3 Max and the Radeon 8060S use unified memory architectures, where the CPU and GPU share the same physical memory pool. The M3 Max has 64GB of unified LPDDR5. The Radeon 8060S shares 128GB of DDR5 with the CPU, with roughly 96GB addressable as VRAM.&lt;/p&gt;
&lt;p&gt;For a 1.7B parameter model in bf16, the weights occupy roughly 3.4GB. The model fits comfortably on all three machines. But the autoregressive generation pattern creates a stream of intermediate activations (KV cache entries, attention scores, feedforward intermediates) that grow with the sequence length. On a unified memory architecture, these intermediates exist in the same memory space as the model weights, avoiding any PCIe transfer overhead. On the P40, every interaction between CPU and GPU crosses a PCIe 3.0 bus.&lt;/p&gt;
&lt;p&gt;For LLM inference, where the bottleneck is token generation throughput and the KV cache fits in VRAM, the P40's discrete memory is fine. For TTS, where the model generates hundreds of audio tokens per second of speech and the attention window grows continuously, the memory access pattern favors unified architectures.&lt;/p&gt;
&lt;p&gt;This is not a universal statement about unified versus discrete memory. A modern discrete GPU with HBM2e or GDDR6X and PCIe 4.0 or 5.0 would likely outperform both the M3 Max and the Radeon 8060S on this workload. The P40's problem is not that its memory is discrete; it is that its memory is slow and its bus is narrow by 2026 standards.&lt;/p&gt;
&lt;h3&gt;The Model Architecture Question&lt;/h3&gt;
&lt;p&gt;While benchmarking Qwen TTS, I also ran a quick comparison with &lt;a href="https://huggingface.co/SWivid/F5-TTS"&gt;F5-TTS&lt;/a&gt; on the AMD machine to sanity-check the results. F5-TTS is a flow-matching model, fundamentally different from Qwen's autoregressive approach. Where Qwen generates audio tokens sequentially, each conditioned on all previous tokens, F5 generates audio in parallel through an iterative refinement process.&lt;/p&gt;
&lt;p&gt;The difference is stark. On the same Radeon 8060S, the same text, the same hardware:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Generation Time&lt;/th&gt;
&lt;th&gt;Audio Length&lt;/th&gt;
&lt;th&gt;RTF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3-TTS&lt;/td&gt;
&lt;td&gt;579.1s (avg)&lt;/td&gt;
&lt;td&gt;197.5s&lt;/td&gt;
&lt;td&gt;3.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;F5-TTS&lt;/td&gt;
&lt;td&gt;17.4s&lt;/td&gt;
&lt;td&gt;27.2s&lt;/td&gt;
&lt;td&gt;0.64&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;F5-TTS is faster than real-time. Qwen3-TTS takes three times longer than the audio it produces. On normalized terms, F5 is roughly five times faster than Qwen at steady-state, and the gap widens on shorter content where Qwen's warmup overhead is proportionally larger.&lt;/p&gt;
&lt;p&gt;This is not an apples-to-apples quality comparison. Qwen3-TTS generally produces more natural prosody, better handling of complex sentence structures, and more consistent speaker identity across long passages. F5-TTS is excellent but can occasionally drift in voice character or pacing on very long content. For blog narration, both are well above the threshold of "good enough," and the quality difference is smaller than you might expect given the architectural gap.&lt;/p&gt;
&lt;p&gt;The point is that hardware is only half the story. The choice of model architecture can matter more than the choice of GPU. A flow-matching model on integrated AMD graphics outperforms an autoregressive model on Apple's best laptop silicon by a wide margin. If generation speed is the constraint, switching models gains more than switching hardware.&lt;/p&gt;
&lt;h3&gt;What This Costs in Practice&lt;/h3&gt;
&lt;p&gt;The abstract benchmark numbers translate into concrete time and electricity costs when you are generating audio for a library of blog posts.&lt;/p&gt;
&lt;p&gt;A typical TinyComputers post runs 3,000 to 5,000 words, producing 15 to 25 minutes of narrated audio. At steady-state RTF:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Machine&lt;/th&gt;
&lt;th&gt;15 min audio&lt;/th&gt;
&lt;th&gt;25 min audio&lt;/th&gt;
&lt;th&gt;System Power&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;M3 Max&lt;/td&gt;
&lt;td&gt;~38 min&lt;/td&gt;
&lt;td&gt;~63 min&lt;/td&gt;
&lt;td&gt;~50W&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Radeon 8060S&lt;/td&gt;
&lt;td&gt;~38 min&lt;/td&gt;
&lt;td&gt;~63 min&lt;/td&gt;
&lt;td&gt;~100W&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tesla P40&lt;/td&gt;
&lt;td&gt;~110 min&lt;/td&gt;
&lt;td&gt;~183 min&lt;/td&gt;
&lt;td&gt;~400W&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The M3 Max and Radeon 8060S are tied on generation time, but the M3 Max draws roughly half the system power. For a single post, the electricity cost difference is negligible, a fraction of a cent. For batch processing a backlog of thirty posts, the M3 Max costs about \$0.18 in electricity versus \$0.36 for the AMD and \$3.50 for the P40.&lt;/p&gt;
&lt;p&gt;None of these numbers are alarming. Even the P40, at nearly two and a half hours per post and 400 watts from the wall, costs under fifteen cents in electricity per narration at Minnesota residential rates. The equivalent Google Cloud TTS job would cost \$4 to \$16 per post depending on the voice quality tier.&lt;/p&gt;
&lt;p&gt;To put cloud costs in perspective: I recently ran a fiction novel through Google's Chirp3-HD voice: 82,000 words, roughly 500,000 characters of text plus SSML markup. The bill came to \$17.25 at Google's rate of \$30 per million characters. That is not unreasonable for a one-off project, but it adds up quickly if you are generating audio regularly. The entire library of TinyComputers narrations (dozens of posts, hours of audio) has cost me nothing beyond the electricity to run the machines I already own. The economics of local TTS are favorable on every machine in the comparison.&lt;/p&gt;
&lt;p&gt;The real cost is time. If I am generating audio for a single new post, I start it on whichever machine is idle and check back in an hour. If I am regenerating audio for twenty posts after changing the speaker voice or updating the pipeline, the M3 Max or AMD will finish overnight. The P40 would take most of a weekend.&lt;/p&gt;
&lt;h3&gt;The Right Machine for the Job&lt;/h3&gt;
&lt;p&gt;After running these benchmarks, my workflow has shifted. The M3 Max is the default for new post narration; it is fast, quiet, and I am usually sitting in front of it when I finish writing. The AMD handles batch jobs and overnight processing, where its slightly higher power draw does not matter and its equivalent speed makes it interchangeable with the Mac. The P40 server is reserved for what it does best: &lt;a href="https://tinycomputers.io/posts/repurposing-enterprise-gpus-the-tesla-p40-home-lab-story.html"&gt;running large language models&lt;/a&gt; through Ollama, where its 96GB of aggregate VRAM gives it an advantage that neither the Mac nor the AMD can match.&lt;/p&gt;
&lt;p&gt;The P40 can still generate TTS in a pinch, and it does; when both other machines are occupied, I will queue a job on the P40 and accept the longer wait. But for a workload that is inherently autoregressive, memory-bandwidth sensitive, and dependent on bf16 precision, a ten-year-old Pascal GPU is the wrong tool.&lt;/p&gt;
&lt;p&gt;What surprised me most is how well the AMD performs. The Radeon 8060S is an integrated GPU sharing system memory with the CPU. It has no HBM, no dedicated VRAM, no NVLink. Its ROCm software stack requires environment variable hacks, pre-release PyTorch wheels, and a GFX version override to function at all. And yet, once the kernels warm up, it matches Apple's best laptop silicon stride for stride. The raw hardware is there: 40 RDNA 3.5 compute units with access to a deep pool of DDR5 memory. The software just needs to get out of the way, and on run 2 and beyond, it does.&lt;/p&gt;
&lt;h3&gt;Lessons&lt;/h3&gt;
&lt;p&gt;Three takeaways from this exercise that generalize beyond TTS:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Short benchmarks lie.&lt;/strong&gt; Kernel compilation overhead on MPS and ROCm is large enough to dominate a short test. If you are evaluating a new model on non-CUDA hardware, run it at least twice before drawing conclusions. The first run is measuring the software stack, not the hardware.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Architecture matters more than clock speed.&lt;/strong&gt; The P40 has more raw FLOPS than the Radeon 8060S. It does not matter. The P40 lacks native bf16, lacks efficient attention primitives, and sits behind a PCIe 3.0 bus. The Radeon has all three, and ties a chip designed by Apple's custom silicon team. For autoregressive models, the architectural fit between model and hardware dominates everything else.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Model choice can outweigh hardware choice.&lt;/strong&gt; F5-TTS running on the weakest GPU in this comparison is five times faster than Qwen3-TTS running on the strongest. If your constraint is generation speed and you can accept a modest quality trade-off, switching to a flow-matching architecture gains more than any hardware upgrade short of a data center GPU.&lt;/p&gt;
&lt;p&gt;The audio player at the top of each post on this site represents a few minutes of machine time on one of these three machines. Which machine generated it depends on the day, the workload, and what else is running. The listener cannot tell the difference. The audio sounds the same regardless of whether it was generated on a laptop, a mini desktop, or a rack-mount server in a cold Minnesota shop. That is the real benchmark: not which machine is fastest, but that all three are fast enough.&lt;/p&gt;</description><category>amd</category><category>apple silicon</category><category>audio</category><category>benchmarks</category><category>cuda</category><category>gpu</category><category>inference</category><category>m3 max</category><category>machine learning</category><category>mps</category><category>nvidia</category><category>qwen</category><category>rocm</category><category>strix halo</category><category>tesla p40</category><category>text-to-speech</category><category>tts</category><guid>https://tinycomputers.io/posts/the-real-cost-of-running-qwen-tts-locally-three-machines-compared.html</guid><pubDate>Thu, 12 Mar 2026 14:00:00 GMT</pubDate></item><item><title>Discretizing Continuous ML Models: Offline Ballistic Coefficient Corrections via Lookup Table Approximation</title><link>https://tinycomputers.io/posts/discretizing-continuous-ml-models-offline-ballistic-coefficient-corrections.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/discretizing-bc_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;38 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="https://tinycomputers.io/images/bc5d-5d-architecture.png" alt="BC5D 5-Dimensional Lookup Table Architecture" style="width: 100%; max-width: 700px; display: block; margin: 20px auto; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;h3&gt;Abstract&lt;/h3&gt;
&lt;p&gt;Machine learning models for ballistic coefficient (BC) correction have demonstrated significant improvements in trajectory prediction accuracy by capturing velocity-dependent drag variations that traditional constant-BC assumptions cannot model. However, deploying such models in field conditions presents challenges: network connectivity requirements, latency constraints, and computational overhead on resource-limited devices. This paper presents a methodology for discretizing continuous ML models into offline lookup tables, specifically addressing the problem of ballistic coefficient corrections across the flight envelope. We construct caliber-specific 5-dimensional lookup tables (BC5D) indexed by bullet weight, base BC, muzzle velocity, instantaneous velocity, and drag model type. Our approach samples the continuous ML function at fixed intervals and relies on piecewise-linear interpolation for queries between sample points. Empirical evaluation demonstrates that this discretization achieves velocity predictions within 5% of the continuous ML model through supersonic and early transonic regimes, with predictable divergence of 10-15% in deep transonic regions (Mach 0.8-1.2) where the underlying physics exhibit pronounced non-linearities. We argue that this accuracy-connectivity trade-off represents a practical compromise for field deployment, analogous to the relationship between analog signals and digital sampling in audio engineering.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;1. Introduction and Thesis&lt;/h3&gt;
&lt;p&gt;The ballistic coefficient (BC) serves as the primary aerodynamic descriptor for projectile flight, encoding the bullet's ability to overcome air resistance into a single dimensionless quantity. Traditionally, manufacturers publish BC values measured under specific conditions (typically referenced to standard atmospheric density at sea level) and these values are treated as constants throughout the trajectory calculation. This simplification, while computationally convenient, ignores a well-documented physical reality: drag characteristics vary substantially with velocity, particularly as projectiles decelerate through transonic regimes where the relationship between Mach number and drag coefficient undergoes rapid, non-linear transitions [1, 2].&lt;/p&gt;
&lt;p&gt;Machine learning approaches have emerged as a promising solution to this limitation. By training models on empirical drag data (obtained through Doppler radar tracking, spark range measurements, or computational fluid dynamics simulations) researchers can capture the complex, velocity-dependent nature of aerodynamic drag with greater fidelity than constant-BC assumptions permit [3, 4]. These ML models accept multiple input parameters (bullet geometry, muzzle velocity, current velocity, atmospheric conditions) and output a correction factor that adjusts the published BC to reflect instantaneous flight conditions.&lt;/p&gt;
&lt;p&gt;However, ML model deployment introduces practical constraints that conflict with many real-world use cases. Precision shooting applications often occur in environments lacking reliable network connectivity. Mobile devices and embedded systems may lack the computational resources for real-time model inference. Latency requirements for interactive ballistics calculators may preclude round-trip API calls to remote servers. These constraints motivate investigation into methods for deploying ML-derived insights without the ML infrastructure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Thesis:&lt;/strong&gt; Continuous machine learning models for ballistic coefficient correction can be effectively discretized into offline lookup tables that preserve the essential predictive improvements while eliminating connectivity and computational dependencies. The discretization introduces a piecewise-linear approximation that follows the general trend of the continuous model but exhibits stair-step behavior at sample boundaries, a trade-off analogous to digital audio sampling, where sufficiently fine discretization renders the steps imperceptible for practical applications.&lt;/p&gt;
&lt;p&gt;This paper makes three primary contributions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A methodology for constructing caliber-specific 5-dimensional BC correction tables from continuous ML models&lt;/li&gt;
&lt;li&gt;Empirical analysis of approximation fidelity across the velocity envelope, with particular attention to transonic degradation&lt;/li&gt;
&lt;li&gt;A practical deployment architecture enabling offline operation while maintaining compatibility with online systems&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h3&gt;2. Background and Related Work&lt;/h3&gt;
&lt;h4&gt;2.1 Ballistic Coefficient Fundamentals&lt;/h4&gt;
&lt;p&gt;The ballistic coefficient, as formalized by Ingalls and later refined by the Sporting Arms and Ammunition Manufacturers' Institute (SAAMI), relates a projectile's drag characteristics to a standard reference projectile [5]. The G1 and G7 drag models, representing flat-base and boat-tail projectile shapes respectively, define these reference functions. A projectile's BC expresses the ratio of its sectional density to its form factor relative to the standard:&lt;/p&gt;
&lt;p&gt;$$BC = \frac{SD}{i} = \frac{m/d^2}{C_D/C_{D_{ref}}}$$&lt;/p&gt;
&lt;p&gt;where $m$ is mass, $d$ is diameter, $C_D$ is the projectile's drag coefficient, and $C_{D_{ref}}$ is the reference projectile's drag coefficient at the same Mach number [6].&lt;/p&gt;
&lt;p&gt;The critical insight motivating this work is that the form factor $i$ is not constant; it varies with Mach number, particularly in the transonic regime (Mach 0.8-1.2) where shock wave formation and boundary layer interactions produce complex aerodynamic effects [7]. Modern Doppler radar measurements have quantified these variations, revealing that effective BC can change by 20-40% between supersonic cruise and transonic deceleration [8].&lt;/p&gt;
&lt;h4&gt;2.2 Model Compression and Quantization&lt;/h4&gt;
&lt;p&gt;The challenge of deploying complex models in resource-constrained environments has driven extensive research in model compression techniques. Neural network quantization reduces model precision from 32-bit floating point to lower bit widths (16-bit, 8-bit, or even binary), achieving 4-32x compression with modest accuracy degradation [9, 10]. Knowledge distillation trains smaller "student" models to mimic larger "teacher" models, transferring predictive capability without the full parameter count [11].&lt;/p&gt;
&lt;p&gt;Lookup table (LUT) approximation represents an extreme form of model compression: rather than deploying a parameterized model, we pre-compute outputs for a grid of input values and interpolate between them. This approach has deep roots in computer graphics (texture mapping, color correction) [12], signal processing (trigonometric function evaluation) [13], and embedded systems (sensor linearization) [14].&lt;/p&gt;
&lt;p&gt;The key insight from this literature is that LUT approximation quality depends on three factors: (1) the smoothness of the underlying function, (2) the density of the sampling grid, and (3) the interpolation scheme employed. For sufficiently smooth functions, linear interpolation over a fine grid achieves arbitrarily low approximation error. Non-linearities and discontinuities require finer sampling in affected regions or higher-order interpolation schemes.&lt;/p&gt;
&lt;h4&gt;2.3 Lookup Tables in Physics Simulation&lt;/h4&gt;
&lt;p&gt;Lookup table approaches have a long history in physics simulation, particularly for computationally expensive functions that must be evaluated repeatedly. Atmospheric models commonly employ tabulated thermodynamic properties, interpolating between pre-computed values for temperature, pressure, and density [15]. Real-time graphics engines use LUTs for physically-based rendering calculations, trading memory for computation [16].&lt;/p&gt;
&lt;p&gt;In ballistics specifically, tabulated drag functions have been standard since the 19th century. The original Ingalls tables provided drag coefficient values at discrete Mach numbers, with interpolation for intermediate velocities [17]. Modern implementations like JBM Ballistics and Applied Ballistics continue this tradition, albeit with finer discretization and more sophisticated interpolation [18].&lt;/p&gt;
&lt;p&gt;Our contribution extends this paradigm by tabulating not the drag function itself but the &lt;em&gt;correction&lt;/em&gt; to a drag function: the multiplicative factor that transforms a published BC into an effective BC accounting for velocity-dependent variations captured by ML models.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;3. Methodology&lt;/h3&gt;
&lt;h4&gt;3.1 BC5D Table Architecture&lt;/h4&gt;
&lt;p&gt;We construct lookup tables spanning five dimensions, hence the designation "BC5D":&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Bullet weight&lt;/strong&gt; (grains): Captures mass-dependent momentum retention characteristics&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Base BC&lt;/strong&gt; (dimensionless): The manufacturer-published ballistic coefficient&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Muzzle velocity&lt;/strong&gt; (fps): Initial conditions affecting Reynolds number and flight regime&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Current velocity&lt;/strong&gt; (fps): Instantaneous velocity determining Mach-dependent drag&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Drag model type&lt;/strong&gt; (categorical): G1, G7, or custom drag functions&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This 5-dimensional parameterization follows from the input signature of our continuous ML correction model, which accepts these parameters and returns a multiplicative correction factor in the range [0.5, 1.5]. A correction of 1.0 indicates no adjustment; values below 1.0 indicate reduced effective drag (higher effective BC), while values above 1.0 indicate increased drag.&lt;/p&gt;
&lt;h4&gt;3.2 Caliber-Specific Tables&lt;/h4&gt;
&lt;p&gt;Rather than constructing a single monolithic table covering all calibers, we generate separate tables for each bullet diameter: .224 (5.56mm), .243 (6mm), .264 (6.5mm), .277 (6.8mm), .284 (7mm), .308 (7.62mm), and .338 (8.6mm). This caliber-specific approach offers several advantages:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reduced file size:&lt;/strong&gt; Each table covers only the weight and BC ranges relevant to that caliber. A .224 table need not include entries for 300-grain bullets, nor does a .338 table require entries for 55-grain bullets. Typical table sizes range from 1.0-1.5 MB per caliber.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Targeted accuracy:&lt;/strong&gt; Bin boundaries can be optimized for each caliber's typical parameter ranges. The .224 table uses weight bins from 50-90 grains, while the .308 table spans 125-220 grains.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Independent updates:&lt;/strong&gt; Refinements to one caliber's model can be deployed without forcing users to re-download tables for calibers they don't use.&lt;/p&gt;
&lt;h4&gt;3.3 Sampling and Bin Definition&lt;/h4&gt;
&lt;p&gt;For each dimension, we define discrete bins that balance granularity against storage requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Weight:&lt;/strong&gt; 12 bins spanning caliber-appropriate range (e.g., 125-220 gr for .308)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Base BC:&lt;/strong&gt; 16 bins from 0.200 to 0.800&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Muzzle velocity:&lt;/strong&gt; 10 bins from 1800 to 3500 fps&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Current velocity:&lt;/strong&gt; 20 bins from 600 to 3200 fps&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Drag model:&lt;/strong&gt; 3 values (G1, G7, G8)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The current velocity dimension receives the finest discretization because BC corrections vary most rapidly with instantaneous velocity, particularly in transonic regimes. The resulting 5D grid contains approximately 115,000 cells per drag model type, yielding total table sizes of 1.0-1.5 MB depending on caliber-specific range spans.&lt;/p&gt;
&lt;h4&gt;3.4 Table Generation Process&lt;/h4&gt;
&lt;p&gt;Table generation proceeds by exhaustively querying the continuous ML model at each grid point:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;for each drag_model in [G1, G7, G8]:
    for each weight_bin in weight_bins:
        for each bc_bin in bc_bins:
            for each mv_bin in muzzle_velocity_bins:
                for each cv_bin in current_velocity_bins:
                    correction = ml_model.predict(
                        weight=weight_bin,
                        bc=bc_bin,
                        muzzle_velocity=mv_bin,
                        current_velocity=cv_bin,
                        drag_model=drag_model
                    )
                    store(correction)
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The resulting values are stored in a binary format with an 80-byte header containing metadata (version, caliber, dimensions, timestamp, CRC32 checksum) followed by float32 correction values in row-major order.&lt;/p&gt;
&lt;h4&gt;3.5 Runtime Interpolation&lt;/h4&gt;
&lt;p&gt;At query time, the lookup procedure locates the surrounding grid points in each dimension and performs multi-linear interpolation. For a 5D query point, this involves identifying 32 surrounding vertices (2^5) and computing the weighted average based on the query point's position within the hypercube.&lt;/p&gt;
&lt;p&gt;For efficiency, the implementation uses vectorized operations where possible, pre-computes dimension strides for direct array indexing, and caches recently accessed tables to avoid repeated disk I/O.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/bc5d-stair-step-vs-smooth.png" alt="Stair-Step vs Smooth Curve Approximation" style="width: 100%; max-width: 750px; display: block; margin: 30px auto; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;p style="text-align: center; font-style: italic; color: #666; margin-top: -15px;"&gt;Figure 1: The continuous ML model (red) produces smooth BC corrections across the velocity range. The discretized lookup table (blue) samples at fixed intervals, creating a stair-step approximation. Note the increased correction factors in the transonic region (900-1300 fps).&lt;/p&gt;

&lt;h3&gt;4. Results and Analysis&lt;/h3&gt;
&lt;h4&gt;4.1 Approximation Fidelity&lt;/h4&gt;
&lt;p&gt;We evaluated the BC5D lookup tables against the continuous ML model across a comprehensive test suite: 168-grain .308 projectiles with G1 BC of 0.475, fired at 2700 fps muzzle velocity. Table 1 presents velocity predictions at distances from 200 to 1000 yards.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Table 1: Remaining Velocity Comparison (fps)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Range&lt;/th&gt;
&lt;th&gt;Physics Only&lt;/th&gt;
&lt;th&gt;BC5D Lookup&lt;/th&gt;
&lt;th&gt;Online ML&lt;/th&gt;
&lt;th&gt;Δ (Lookup vs ML)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;200 yd&lt;/td&gt;
&lt;td&gt;2334&lt;/td&gt;
&lt;td&gt;2330&lt;/td&gt;
&lt;td&gt;2298&lt;/td&gt;
&lt;td&gt;+32 fps (+1.4%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;400 yd&lt;/td&gt;
&lt;td&gt;2002&lt;/td&gt;
&lt;td&gt;1994&lt;/td&gt;
&lt;td&gt;1951&lt;/td&gt;
&lt;td&gt;+43 fps (+2.2%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;600 yd&lt;/td&gt;
&lt;td&gt;1703&lt;/td&gt;
&lt;td&gt;1688&lt;/td&gt;
&lt;td&gt;1642&lt;/td&gt;
&lt;td&gt;+46 fps (+2.8%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;800 yd&lt;/td&gt;
&lt;td&gt;1444&lt;/td&gt;
&lt;td&gt;1416&lt;/td&gt;
&lt;td&gt;1364&lt;/td&gt;
&lt;td&gt;+52 fps (+3.8%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1000 yd&lt;/td&gt;
&lt;td&gt;1198&lt;/td&gt;
&lt;td&gt;1154&lt;/td&gt;
&lt;td&gt;1008&lt;/td&gt;
&lt;td&gt;+146 fps (+14.5%)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Several patterns emerge from this comparison. First, both BC5D lookup and online ML show substantially more velocity decay than physics-only calculations using constant BC, validating that both approaches capture drag enhancement effects invisible to traditional methods. Second, the lookup tables track the ML model within 3-4% through 800 yards, representing the supersonic and early transonic portions of the flight. Third, significant divergence appears at 1000 yards (+14.5%), where the projectile has decelerated deep into the transonic regime.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/bc5d-velocity-comparison.png" alt="Velocity Predictions Comparison" style="width: 100%; max-width: 750px; display: block; margin: 30px auto; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;h4&gt;4.2 Energy Predictions&lt;/h4&gt;
&lt;p&gt;Table 2 presents the same comparison for remaining kinetic energy, which exhibits squared sensitivity to velocity errors.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Table 2: Remaining Energy Comparison (ft-lb)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Range&lt;/th&gt;
&lt;th&gt;Physics Only&lt;/th&gt;
&lt;th&gt;BC5D Lookup&lt;/th&gt;
&lt;th&gt;Online ML&lt;/th&gt;
&lt;th&gt;Δ (Lookup vs ML)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;200 yd&lt;/td&gt;
&lt;td&gt;2033&lt;/td&gt;
&lt;td&gt;2024&lt;/td&gt;
&lt;td&gt;1970&lt;/td&gt;
&lt;td&gt;+54 ft-lb (+2.7%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;400 yd&lt;/td&gt;
&lt;td&gt;1495&lt;/td&gt;
&lt;td&gt;1483&lt;/td&gt;
&lt;td&gt;1420&lt;/td&gt;
&lt;td&gt;+63 ft-lb (+4.4%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;600 yd&lt;/td&gt;
&lt;td&gt;1081&lt;/td&gt;
&lt;td&gt;1062&lt;/td&gt;
&lt;td&gt;1005&lt;/td&gt;
&lt;td&gt;+57 ft-lb (+5.7%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;800 yd&lt;/td&gt;
&lt;td&gt;778&lt;/td&gt;
&lt;td&gt;748&lt;/td&gt;
&lt;td&gt;694&lt;/td&gt;
&lt;td&gt;+54 ft-lb (+7.8%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1000 yd&lt;/td&gt;
&lt;td&gt;535&lt;/td&gt;
&lt;td&gt;497&lt;/td&gt;
&lt;td&gt;379&lt;/td&gt;
&lt;td&gt;+118 ft-lb (+31.1%)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Energy predictions show proportionally larger deviations due to the v² relationship, reaching 31% at 1000 yards. However, for practical shooting applications, the 800-yard accuracy of 7.8% remains within acceptable bounds for most use cases.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/bc5d-energy-comparison.png" alt="Energy Predictions Comparison" style="width: 100%; max-width: 750px; display: block; margin: 30px auto; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/bc5d-deviation-analysis.png" alt="BC5D Deviation Analysis" style="width: 100%; max-width: 750px; display: block; margin: 30px auto; box-shadow: 0 4px 12px rgba(0,0,0,0.15);"&gt;&lt;/p&gt;
&lt;p style="text-align: center; font-style: italic; color: #666; margin-top: -15px;"&gt;Figure 2: Deviation of BC5D lookup table predictions from the continuous ML model. Note that velocity deviations remain under 5% through 800 yards, with pronounced divergence at 1000 yards where transonic effects dominate.&lt;/p&gt;

&lt;h4&gt;4.3 Transonic Degradation Analysis&lt;/h4&gt;
&lt;p&gt;The pronounced divergence at 1000 yards reflects a fundamental characteristic of our discretization approach: piecewise-linear interpolation cannot faithfully reproduce the rapid, non-linear BC variations occurring in transonic flow. Between Mach 1.2 and Mach 0.8 (approximately 1300-900 fps at sea level), shock wave formation and detachment produce drag coefficient changes that defy smooth approximation.&lt;/p&gt;
&lt;p&gt;The continuous ML model, trained on Doppler-derived measurements through this regime, captures these non-linearities through its learned function representation. The lookup table, sampling at fixed velocity intervals, necessarily smooths over rapid transitions between samples. This smoothing introduces systematic bias: the lookup table predicts more gradual drag increases than actually occur, resulting in optimistic velocity and energy predictions.&lt;/p&gt;
&lt;p&gt;Three potential mitigations exist for this transonic fidelity gap:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Finer sampling:&lt;/strong&gt; Reducing velocity bin spacing in the transonic region (e.g., 25 fps instead of 100 fps) would capture more of the non-linear structure, at the cost of increased table size.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Non-linear interpolation:&lt;/strong&gt; Cubic or spline interpolation could better approximate curved function behavior between samples, with increased computational cost.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hybrid approaches:&lt;/strong&gt; Using lookup tables for supersonic flight and falling back to simplified analytical transonic models could bound worst-case errors without requiring connectivity.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;4.4 Stair-Step vs. Smooth Curve Analogy&lt;/h4&gt;
&lt;p&gt;The relationship between continuous ML and discretized lookup tables parallels the distinction between analog and digital signals in audio engineering. The ML model evaluates its learned function continuously: every input maps to a precisely computed output through the model's parameter space, drawing a smooth curve through the correction landscape. The lookup table samples this smooth curve at fixed intervals, storing discrete values that are linearly interpolated at query time.&lt;/p&gt;
&lt;p&gt;Consider a CD's 44.1 kHz sampling rate: by capturing 44,100 amplitude values per second, digital audio achieves perceptual equivalence to the analog source because the samples are dense enough that interpolation artifacts fall below human hearing thresholds. The same principle applies here; our velocity bins are fine enough (typically 100 fps spacing) that for most of the flight envelope, the stair-step approximation is imperceptible in practical shooting applications.&lt;/p&gt;
&lt;p&gt;The transonic regime represents our "high-frequency content": rapid changes that require proportionally finer sampling to capture faithfully. Just as audio systems may exhibit aliasing when sampling signals containing frequencies above the Nyquist limit, our lookup tables exhibit approximation error when the underlying function changes faster than our sampling density can track.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;5. Discussion&lt;/h3&gt;
&lt;h4&gt;5.1 Practical Deployment Considerations&lt;/h4&gt;
&lt;p&gt;The BC5D tables have been deployed via a content delivery network with caliber-specific downloads. Users retrieve only the tables for calibers they actually shoot, with typical total downloads of 3-5 MB for a two-caliber configuration. Tables are cached locally with CRC32 validation ensuring data integrity after download.&lt;/p&gt;
&lt;p&gt;The command-line interface supports three operational modes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Online ML:&lt;/strong&gt; Direct API queries for maximum accuracy (requires connectivity)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Offline BC5D:&lt;/strong&gt; Lookup table interpolation (no connectivity required)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Physics only:&lt;/strong&gt; Traditional constant-BC calculation (baseline fallback)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This tiered approach allows users to select the accuracy-connectivity trade-off appropriate to their situation: competitive shooters may prefer online ML for load development, while field use may necessitate offline tables.&lt;/p&gt;
&lt;h4&gt;5.2 Comparison to Related Approaches&lt;/h4&gt;
&lt;p&gt;Our work relates to several established techniques in the model compression literature. Unlike neural network quantization, which reduces precision of model parameters, we compute exact outputs at sample points and interpolate between them; the stored values are full-precision, only the input space is discretized. Unlike knowledge distillation, we make no attempt to train a smaller model; the "student" is simply a lookup table with no learned parameters.&lt;/p&gt;
&lt;p&gt;The closest analogue is the function tabulation commonly employed in embedded systems and real-time simulation. Our contribution extends this paradigm to ML model outputs, demonstrating that the technique transfers effectively to learned functions trained on empirical data rather than analytical expressions.&lt;/p&gt;
&lt;h4&gt;5.3 Limitations and Future Work&lt;/h4&gt;
&lt;p&gt;Several limitations merit acknowledgment. First, the tables capture only the correction function learned by our specific ML model; improvements to the model require regenerating all tables. Second, atmospheric variations (temperature, pressure, humidity) are not currently parameterized; tables assume standard conditions, with atmospheric corrections applied as separate multiplicative factors. Third, the 14% transonic deviation may be unacceptable for applications requiring high precision at extreme range.&lt;/p&gt;
&lt;p&gt;Future work may address these limitations through:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Finer transonic sampling with adaptive bin spacing&lt;/li&gt;
&lt;li&gt;Additional dimensions for atmospheric parameters&lt;/li&gt;
&lt;li&gt;Version 2 tables with drag-model-specific optimization&lt;/li&gt;
&lt;li&gt;Exploration of non-linear interpolation schemes&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;6. Conclusion&lt;/h3&gt;
&lt;p&gt;This paper has presented a methodology for discretizing continuous machine learning models into offline lookup tables, specifically addressing ballistic coefficient corrections for trajectory prediction. The BC5D table architecture spans five dimensions (weight, BC, muzzle velocity, current velocity, drag model) with caliber-specific instantiation, achieving file sizes of 1.0-1.5 MB per caliber.&lt;/p&gt;
&lt;p&gt;Empirical evaluation demonstrates that piecewise-linear interpolation over this discretized space achieves velocity predictions within 5% of the continuous ML model through supersonic and early transonic flight regimes, with predictable degradation to 14% deviation in deep transonic regions where non-linear drag variations exceed the approximation capacity of fixed-interval sampling.&lt;/p&gt;
&lt;p&gt;We have argued that this accuracy-connectivity trade-off represents a practical compromise for field deployment, drawing analogy to digital audio sampling where sufficiently fine discretization renders quantization artifacts imperceptible for typical use cases. The transonic regime, exhibiting rapid non-linearities analogous to high-frequency audio content, requires proportionally finer sampling to capture faithfully, a trade-off that can be addressed through adaptive bin spacing in future table versions.&lt;/p&gt;
&lt;p&gt;The broader contribution of this work lies in demonstrating that ML model outputs can be effectively tabulated for offline deployment without resorting to model compression techniques that sacrifice learned representations. For application domains where the input space is bounded and query patterns are predictable, lookup table approximation offers a deployment pathway that preserves ML-derived insights while eliminating infrastructure dependencies.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;[1] McCoy, R. L. (1999). &lt;em&gt;Modern Exterior Ballistics: The Launch and Flight Dynamics of Symmetric Projectiles&lt;/em&gt;. Schiffer Publishing.&lt;/p&gt;
&lt;p&gt;[2] Carlucci, D. E., &amp;amp; Jacobson, S. S. (2018). &lt;em&gt;Ballistics: Theory and Design of Guns and Ammunition&lt;/em&gt; (3rd ed.). CRC Press.&lt;/p&gt;
&lt;p&gt;[3] Weinacht, P., Cooper, G. R., &amp;amp; Newill, J. F. (2005). "Analytical Prediction of Projectile Flight." Army Research Laboratory Technical Report ARL-TR-3567.&lt;/p&gt;
&lt;p&gt;[4] Silton, S. I. (2005). "Navier-Stokes Computations for a Spinning Projectile from Subsonic to Supersonic Speeds." Journal of Spacecraft and Rockets, 42(2), 223-231.&lt;/p&gt;
&lt;p&gt;[5] Litz, B. (2015). &lt;em&gt;Applied Ballistics for Long Range Shooting&lt;/em&gt; (3rd ed.). Applied Ballistics LLC.&lt;/p&gt;
&lt;p&gt;[6] SAAMI (2015). "Voluntary Industry Performance Standards for Pressure and Velocity of Centerfire Rifle Sporting Ammunition." Sporting Arms and Ammunition Manufacturers' Institute.&lt;/p&gt;
&lt;p&gt;[7] Anderson, J. D. (2017). &lt;em&gt;Fundamentals of Aerodynamics&lt;/em&gt; (6th ed.). McGraw-Hill Education.&lt;/p&gt;
&lt;p&gt;[8] Courtney, M., &amp;amp; Courtney, A. (2012). "Experimental Tests of the Litz Model for Ballistic Coefficient Variation with Velocity." arXiv:1201.3621.&lt;/p&gt;
&lt;p&gt;[9] Han, S., Mao, H., &amp;amp; Dally, W. J. (2016). "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding." ICLR 2016.&lt;/p&gt;
&lt;p&gt;[10] Jacob, B., et al. (2018). "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference." CVPR 2018.&lt;/p&gt;
&lt;p&gt;[11] Hinton, G., Vinyals, O., &amp;amp; Dean, J. (2015). "Distilling the Knowledge in a Neural Network." arXiv:1503.02531.&lt;/p&gt;
&lt;p&gt;[12] Heckbert, P. S. (1986). "Survey of Texture Mapping." IEEE Computer Graphics and Applications, 6(11), 56-67.&lt;/p&gt;
&lt;p&gt;[13] Jeong, K., &amp;amp; Kim, S. (2003). "Lookup Table-Based FPGA Implementation of Trigonometric Functions." Journal of the Korean Physical Society, 43, 843-847.&lt;/p&gt;
&lt;p&gt;[14] Fraden, J. (2016). &lt;em&gt;Handbook of Modern Sensors: Physics, Designs, and Applications&lt;/em&gt; (5th ed.). Springer.&lt;/p&gt;
&lt;p&gt;[15] Rienecker, M. M., et al. (2011). "MERRA: NASA's Modern-Era Retrospective Analysis for Research and Applications." Journal of Climate, 24(14), 3624-3648.&lt;/p&gt;
&lt;p&gt;[16] Karis, B. (2013). "Real Shading in Unreal Engine 4." SIGGRAPH 2013 Course Notes.&lt;/p&gt;
&lt;p&gt;[17] Ingalls, J. M. (1893). &lt;em&gt;Exterior Ballistics in the Plane of Fire&lt;/em&gt;. D. Van Nostrand Company.&lt;/p&gt;
&lt;p&gt;[18] Litz, B. (2011). "Ballistic Coefficient Testing of the .308 175gr Sierra Matchking." Applied Ballistics Technical Note.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;The author develops ballistics simulation software and maintains the trajectory prediction API at ballistics.7.62x51mm.sh. Source code for the BC5D table generator is available at github.com/ajokela/ballistics-engine.&lt;/em&gt;&lt;/p&gt;</description><category>aerodynamics</category><category>approximation theory</category><category>ballistics</category><category>caliber-specific models</category><category>drag coefficients</category><category>edge computing</category><category>embedded systems</category><category>interpolation</category><category>lookup tables</category><category>machine learning</category><category>model compression</category><category>model deployment</category><category>neural networks</category><category>numerical methods</category><category>offline computing</category><category>physics simulation</category><category>piecewise linear approximation</category><category>quantization</category><category>scientific computing</category><category>trajectory calculation</category><guid>https://tinycomputers.io/posts/discretizing-continuous-ml-models-offline-ballistic-coefficient-corrections.html</guid><pubDate>Wed, 28 Jan 2026 14:30:00 GMT</pubDate></item><item><title>Running Qwen TTS on AMD Strix Halo: A Complete Guide to Local Text-to-Speech</title><link>https://tinycomputers.io/posts/qwen-tts-on-amd-strix-halo.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;p&gt;The rise of high-quality text-to-speech models has opened new possibilities for content creators, accessibility advocates, and developers alike. Qwen3-TTS, developed by Alibaba's Qwen team, represents a significant leap forward in neural TTS technology, offering natural-sounding speech synthesis with multiple speaker voices. In this guide, we'll walk through setting up Qwen3-TTS on AMD's Strix Halo platform (specifically the AI Max+ 395 with its integrated Radeon 8060S graphics) and demonstrate how we use it to generate audio narrations for blog posts right here on TinyComputers.&lt;/p&gt;
&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/qwen-tts-on-amd-strix-halo_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;16 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;Why Qwen3-TTS?&lt;/h3&gt;
&lt;p&gt;The text-to-speech landscape has evolved dramatically over the past few years. While cloud-based services like Amazon Polly, Google Cloud TTS, and ElevenLabs offer impressive quality, they come with ongoing costs, privacy considerations, and internet dependency. Local TTS solutions have historically lagged behind in quality, often producing robotic or unnatural speech.&lt;/p&gt;
&lt;p&gt;Qwen3-TTS changes this equation. The model produces remarkably natural speech with proper intonation, pacing, and emphasis. It supports multiple pre-trained speaker voices (including options like Eric, Aiden, Dylan, Serena, and others) each with distinct characteristics suitable for different content types. For technical content like our blog posts, the Eric voice provides clear, professional narration that listeners find easy to follow.&lt;/p&gt;
&lt;p&gt;The model we're using, &lt;code&gt;Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice&lt;/code&gt;, weighs in at 1.7 billion parameters. While not small, this is manageable on modern hardware and runs efficiently on GPU. The 12Hz designation refers to the audio frame rate used during generation, balancing quality with computational requirements.&lt;/p&gt;
&lt;h3&gt;The Hardware: AMD AI Max+ 395&lt;/h3&gt;
&lt;p&gt;AMD's Strix Halo architecture represents their latest push into the high-performance APU market, combining powerful CPU cores with substantial integrated graphics. Our test system features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CPU&lt;/strong&gt;: AMD Ryzen AI Max+ 395 with 16 Zen 5 cores (32 threads)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU&lt;/strong&gt;: Integrated Radeon 8060S (RDNA 3.5 architecture)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory&lt;/strong&gt;: 128GB unified DDR5, configured with 96GB VRAM and 32GB system RAM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compute Units&lt;/strong&gt;: 40 CUs dedicated to graphics/compute workloads&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Our test system is the Bosgame M5 AI Mini Desktop, one of the first mini PCs to ship with AMD's Strix Halo silicon. The &lt;a href="https://baud.rs/gmVPEI"&gt;GMKtec EVO-X2&lt;/a&gt; is an extremely similar system if you're looking to replicate this setup. The unified memory architecture is particularly relevant for machine learning workloads. Unlike discrete GPUs with their own VRAM, the Radeon 8060S shares system memory with the CPU. This means no PCIe bottleneck for data transfers, and with 96GB allocated as VRAM, even large models fit comfortably.&lt;/p&gt;
&lt;p&gt;For our TTS workload, the 8060S provides adequate performance. The 1.7B parameter model fits comfortably in memory, and inference runs entirely on GPU once loaded. We see 100% GPU utilization during speech synthesis, indicating the hardware is being fully leveraged.&lt;/p&gt;
&lt;h3&gt;Setting Up the Environment&lt;/h3&gt;
&lt;p&gt;The first challenge with AMD GPUs is getting PyTorch working correctly with ROCm, AMD's open-source GPU compute stack. The Strix Halo uses a newer GPU architecture (gfx1151) that requires ROCm 6.x and some environment variable overrides.&lt;/p&gt;
&lt;h4&gt;Step 1: Create a Python Virtual Environment&lt;/h4&gt;
&lt;p&gt;We'll use a dedicated virtual environment to isolate our TTS dependencies:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;mkdir&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;~/qwen-tts
&lt;span class="nb"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;~/qwen-tts
python3&lt;span class="w"&gt; &lt;/span&gt;-m&lt;span class="w"&gt; &lt;/span&gt;venv&lt;span class="w"&gt; &lt;/span&gt;venv
&lt;span class="nb"&gt;source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;venv/bin/activate
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Step 2: Install PyTorch with ROCm Support&lt;/h4&gt;
&lt;p&gt;The standard PyTorch installation won't work; we need the ROCm-enabled build. As of this writing, ROCm 6.4 is the latest stable release:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;torch&lt;span class="w"&gt; &lt;/span&gt;torchvision&lt;span class="w"&gt; &lt;/span&gt;torchaudio&lt;span class="w"&gt; &lt;/span&gt;--index-url&lt;span class="w"&gt; &lt;/span&gt;https://download.pytorch.org/whl/rocm6.4
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This downloads PyTorch builds compiled specifically for AMD GPUs. The installation is larger than the standard CUDA builds due to the different compute libraries involved.&lt;/p&gt;
&lt;h4&gt;Step 3: Install Qwen-TTS&lt;/h4&gt;
&lt;p&gt;With PyTorch in place, install the Qwen TTS package:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;qwen-tts&lt;span class="w"&gt; &lt;/span&gt;soundfile&lt;span class="w"&gt; &lt;/span&gt;numpy
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;soundfile&lt;/code&gt; library handles WAV file I/O, while &lt;code&gt;numpy&lt;/code&gt; is needed for audio array manipulation.&lt;/p&gt;
&lt;h4&gt;Step 4: Install xformers for ROCm (Optional but Recommended)&lt;/h4&gt;
&lt;p&gt;The xformers library provides optimized attention implementations that can improve performance:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;xformers&lt;span class="w"&gt; &lt;/span&gt;--index-url&lt;span class="w"&gt; &lt;/span&gt;https://download.pytorch.org/whl/rocm6.4
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;While Qwen-TTS will work without xformers, having it available enables more efficient memory-attention patterns during inference.&lt;/p&gt;
&lt;h4&gt;Step 5: Configure Environment Variables&lt;/h4&gt;
&lt;p&gt;The Strix Halo's gfx1151 architecture isn't explicitly recognized by all ROCm components yet. We need to tell the system to treat it as a compatible architecture:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nb"&gt;export&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;HSA_OVERRIDE_GFX_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;11&lt;/span&gt;.0.0
&lt;span class="nb"&gt;export&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;GPU_MAX_ALLOC_PERCENT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;100&lt;/span&gt;
&lt;span class="nb"&gt;export&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;GPU_MAX_HEAP_SIZE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;100&lt;/span&gt;
&lt;span class="nb"&gt;export&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's break down what these do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;HSA_OVERRIDE_GFX_VERSION=11.0.0&lt;/strong&gt;: Tells the HSA runtime to report the GPU as gfx1100, which has broader library support&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU_MAX_ALLOC_PERCENT=100&lt;/strong&gt;: Allows the GPU to use up to 100% of available memory for allocations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU_MAX_HEAP_SIZE=100&lt;/strong&gt;: Similar memory allocation setting for heap operations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1&lt;/strong&gt;: Enables experimental efficient attention implementations for AMD GPUs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Add these to your &lt;code&gt;.bashrc&lt;/code&gt; or create an activation script for convenience.&lt;/p&gt;
&lt;h4&gt;Step 6: Verify GPU Detection&lt;/h4&gt;
&lt;p&gt;Before proceeding, confirm PyTorch can see your GPU:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"CUDA available: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_available&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Device count: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;device_count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Device name: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_device_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You should see output like:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;CUDA available: True
Device count: 1
Device name: AMD Radeon 8060S
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Note that PyTorch uses "CUDA" terminology even for AMD GPUs when using ROCm; this is for API compatibility.&lt;/p&gt;
&lt;h3&gt;Basic TTS Usage&lt;/h3&gt;
&lt;p&gt;With the environment configured, let's test basic speech synthesis:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;qwen_tts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Qwen3TTSModel&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;soundfile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sf&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch&lt;/span&gt;

&lt;span class="c1"&gt;# Load model on GPU with bfloat16 precision&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Qwen3TTSModel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;attn_implementation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'sdpa'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'cuda:0'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Check available speakers&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Available speakers: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_supported_speakers&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Generate speech&lt;/span&gt;
&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Hello, and welcome to TinyComputers. Today we're exploring text-to-speech on AMD hardware."&lt;/span&gt;
&lt;span class="n"&gt;audios&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_custom_voice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;speaker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'eric'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Save to file&lt;/span&gt;
&lt;span class="n"&gt;sf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'output.wav'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;audios&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;sample_rate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Saved audio at &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sample_rate&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;Hz"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A few important notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We use &lt;code&gt;attn_implementation='sdpa'&lt;/code&gt; for scaled dot-product attention, which works on ROCm&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;device_map='cuda:0'&lt;/code&gt; explicitly places the model on the GPU&lt;/li&gt;
&lt;li&gt;Using &lt;code&gt;dtype=torch.bfloat16&lt;/code&gt; reduces memory usage while maintaining quality&lt;/li&gt;
&lt;li&gt;The language parameter must be the full word &lt;code&gt;'english'&lt;/code&gt;, not the abbreviation &lt;code&gt;'en'&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Building a Blog-to-Speech Pipeline&lt;/h3&gt;
&lt;p&gt;For our use case (generating audio versions of blog posts) we need more than basic TTS. Blog posts contain markdown formatting, code blocks, images, and other elements that shouldn't be read aloud. We built a complete pipeline that handles these challenges.&lt;/p&gt;
&lt;h4&gt;The Blog Cleaner&lt;/h4&gt;
&lt;p&gt;Our cleaning process strips out non-spoken content while preserving the narrative flow:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;re&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;clean_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Remove YAML frontmatter&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'---'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'---'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;

    &lt;span class="c1"&gt;# Strip HTML tags (audio, video, images)&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'&amp;lt;audio[^&amp;gt;]*&amp;gt;[\s\S]*?&amp;lt;/audio&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'&amp;lt;video[^&amp;gt;]*&amp;gt;[\s\S]*?&amp;lt;/video&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'&amp;lt;img[^&amp;gt;]*/?&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'&amp;lt;[^&amp;gt;]+&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Remove markdown images and convert links to just text&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'!\[[^\]]*\]\([^)]+\)'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'\[([^\]]+)\]\([^)]+\)'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'\1'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Remove code blocks&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'```[\s\S]*?```'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'`[^`]+`'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Convert headers to sentences&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'^(#{1,6})\s+(.+)$'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'\2.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MULTILINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Remove emphasis markers&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'\*\*([^*]+)\*\*'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'\1'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'\*([^*]+)\*'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'\1'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Unit Conversion for Speech&lt;/h4&gt;
&lt;p&gt;Technical content often includes abbreviations that sound awkward when read literally. We convert common units to their spoken forms:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;convert_units_for_speech&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'(\d+)\s*GB\b'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'\1 gigabytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'(\d+)\s*MB\b'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'\1 megabytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'(\d+)\s*GHz\b'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'\1 gigahertz'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'(\d+)\s*MHz\b'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'\1 megahertz'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'(\d+)\s*KB\b'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'\1 kilobytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Chunking Long Content&lt;/h4&gt;
&lt;p&gt;TTS models work best with moderate-length inputs. Very long passages can cause quality degradation or memory issues. We split content into chunks at sentence boundaries:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_chars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;sentences&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'(?&amp;lt;=[.!?])\s+'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_chars&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s2"&gt;" "&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s2"&gt;" "&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;The Complete Script&lt;/h4&gt;
&lt;p&gt;Putting it all together, here's our &lt;code&gt;blog_to_speech.py&lt;/code&gt; script:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="ch"&gt;#!/usr/bin/env python3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;argparse&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;re&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pathlib&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;qwen_tts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Qwen3TTSModel&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;soundfile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sf&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;clean_blog_post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# Apply cleaning functions...&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cleaned_text&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;synthesize_speech&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;speaker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"eric"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Qwen3TTSModel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s1"&gt;'Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;attn_implementation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'sdpa'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'cuda:0'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;all_audio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Processing chunk &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;audios&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_custom_voice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;speaker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;speaker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;all_audio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audios&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;combined&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_audio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_rate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;sample_rate&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Saved &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;.1f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;s audio to: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vm"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;'__main__'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ArgumentParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'source'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'Blog post markdown file'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'-o'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'--output'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'output.wav'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'--speaker'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'eric'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse_args&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clean_blog_post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;synthesize_speech&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;speaker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Choosing the Right Speaker Voice&lt;/h3&gt;
&lt;p&gt;Qwen3-TTS ships with nine pre-trained speaker voices, each with distinct characteristics:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Speaker&lt;/th&gt;
&lt;th&gt;Characteristics&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Eric&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Clear, professional male voice with measured pacing&lt;/td&gt;
&lt;td&gt;Technical content, tutorials, documentation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Aiden&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Younger male voice, slightly more casual&lt;/td&gt;
&lt;td&gt;Blog posts, conversational content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dylan&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deeper male voice with authoritative tone&lt;/td&gt;
&lt;td&gt;Formal presentations, announcements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ryan&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Energetic male voice&lt;/td&gt;
&lt;td&gt;Marketing content, product demos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Serena&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Clear female voice, professional&lt;/td&gt;
&lt;td&gt;Corporate content, tutorials&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vivian&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Warm female voice&lt;/td&gt;
&lt;td&gt;Storytelling, narrative content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ono Anna&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Female voice with distinct character&lt;/td&gt;
&lt;td&gt;Creative content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sohee&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Female voice, versatile&lt;/td&gt;
&lt;td&gt;General purpose&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Uncle Fu&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Character voice&lt;/td&gt;
&lt;td&gt;Specialized applications&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For our technical blog content, we primarily use Eric. His clear enunciation and measured pacing work well for complex technical explanations. The voice handles acronyms, numbers, and technical terminology naturally, making it ideal for content about hardware, programming, and system administration.&lt;/p&gt;
&lt;p&gt;You can easily switch voices by changing the &lt;code&gt;speaker&lt;/code&gt; parameter:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;audios&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_custom_voice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;speaker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'serena'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Try different voices&lt;/span&gt;
    &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Consider matching voice characteristics to content type. A hardware review might work better with Eric's authoritative tone, while a personal essay might benefit from Aiden's more conversational style.&lt;/p&gt;
&lt;h3&gt;Comparing TTS Options&lt;/h3&gt;
&lt;p&gt;Before settling on Qwen3-TTS, we evaluated several alternatives. Here's how they compare for our use case:&lt;/p&gt;
&lt;h4&gt;Cloud Services&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Amazon Polly&lt;/strong&gt; and &lt;strong&gt;Google Cloud TTS&lt;/strong&gt; offer excellent quality with minimal setup. However, costs accumulate quickly for long-form content. At roughly \$4-16 per million characters (depending on voice quality), a 3000-word blog post costs \$0.10-0.40 per generation. For a site with dozens of posts requiring periodic regeneration, this adds up.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ElevenLabs&lt;/strong&gt; produces arguably the most natural voices available, with impressive emotional range. But their pricing model (based on character quotas) makes it expensive for regular content generation. The quality is exceptional, but overkill for straightforward narration.&lt;/p&gt;
&lt;h4&gt;Local Alternatives&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Coqui TTS&lt;/strong&gt; (now deprecated) was a popular open-source option but development has stalled. &lt;strong&gt;Bark&lt;/strong&gt; from Suno produces impressive results but runs slowly and lacks fine-grained control. &lt;strong&gt;XTTS&lt;/strong&gt; offers voice cloning but requires more setup and compute resources.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Piper&lt;/strong&gt; deserves special mention as a lightweight option. It runs quickly even on CPU and produces acceptable quality for many applications. However, the voices sound noticeably synthetic compared to Qwen3-TTS, fine for notifications or short snippets, but fatiguing for 30-minute narrations.&lt;/p&gt;
&lt;p&gt;Qwen3-TTS hits a sweet spot: quality approaching cloud services, reasonable compute requirements, and fully local operation. The 1.7B parameter model is large enough for natural prosody but small enough to run on consumer hardware.&lt;/p&gt;
&lt;h3&gt;Batch Processing for Multiple Posts&lt;/h3&gt;
&lt;p&gt;When generating audio for multiple blog posts, efficiency matters. Loading the model takes 15-30 seconds, so we keep it loaded while processing multiple files:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="ch"&gt;#!/usr/bin/env python3&lt;/span&gt;
&lt;span class="sd"&gt;"""Batch TTS processing for multiple blog posts"""&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pathlib&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;qwen_tts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Qwen3TTSModel&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;soundfile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sf&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# Load model once&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Loading model..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Qwen3TTSModel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;attn_implementation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'sdpa'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'cuda:0'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'post1.md'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'post2.md'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'post3.md'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;'='&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Processing: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;'='&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clean_blog_post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"/tmp/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stem&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;_tts.wav"&lt;/span&gt;

    &lt;span class="c1"&gt;# Process chunks&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;all_audio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"  Chunk &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;audios&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_custom_voice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;speaker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'eric'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;all_audio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audios&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;combined&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_audio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Saved: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This approach processes our five-post backlog overnight, with results ready for review in the morning.&lt;/p&gt;
&lt;h3&gt;Performance Characteristics&lt;/h3&gt;
&lt;p&gt;On the AI Max+ 395, speech synthesis runs at roughly real-time to 0.5x real-time speed, meaning a 30-minute audio file takes 30-60 minutes to generate. This is slower than high-end discrete GPUs but perfectly acceptable for batch processing.&lt;/p&gt;
&lt;p&gt;For reference, here's how different content lengths performed in our testing:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Content&lt;/th&gt;
&lt;th&gt;Characters&lt;/th&gt;
&lt;th&gt;Chunks&lt;/th&gt;
&lt;th&gt;Audio Duration&lt;/th&gt;
&lt;th&gt;Generation Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Short post&lt;/td&gt;
&lt;td&gt;5,000&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;~5 min&lt;/td&gt;
&lt;td&gt;~15 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium post&lt;/td&gt;
&lt;td&gt;15,000&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;~15 min&lt;/td&gt;
&lt;td&gt;~45 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long post&lt;/td&gt;
&lt;td&gt;25,000&lt;/td&gt;
&lt;td&gt;55&lt;/td&gt;
&lt;td&gt;~27 min&lt;/td&gt;
&lt;td&gt;~90 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Very long&lt;/td&gt;
&lt;td&gt;40,000&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;td&gt;~45 min&lt;/td&gt;
&lt;td&gt;~150 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The relationship between content length and generation time is roughly linear after the initial model warmup.&lt;/p&gt;
&lt;p&gt;Some observations from our testing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;First chunk latency&lt;/strong&gt;: The first chunk takes longer due to GPU kernel compilation and caching&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory usage&lt;/strong&gt;: Peak usage around 8-10GB during inference&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU utilization&lt;/strong&gt;: Consistent 100% during active synthesis&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Quality&lt;/strong&gt;: Indistinguishable from cloud TTS services for most content&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The MIOpen library sometimes logs workspace warnings during execution. These don't affect output quality and can be safely ignored:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;MIOpen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HIP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Warning&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IsEnoughWorkspace&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Solver&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;GemmFwdRest&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;workspace&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;103133184&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Integrating Audio into Blog Posts&lt;/h3&gt;
&lt;p&gt;Once we have the WAV file, we convert to MP3 for web delivery and embed an HTML5 audio player:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;ffmpeg&lt;span class="w"&gt; &lt;/span&gt;-i&lt;span class="w"&gt; &lt;/span&gt;blog_post.wav&lt;span class="w"&gt; &lt;/span&gt;-codec:a&lt;span class="w"&gt; &lt;/span&gt;libmp3lame&lt;span class="w"&gt; &lt;/span&gt;-qscale:a&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;blog_post.mp3
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For reviewing TTS output quality, we recommend using &lt;a href="https://baud.rs/tn2v8w"&gt;studio monitor headphones&lt;/a&gt; that reveal any artifacts or unnatural tones in the generated speech.&lt;/p&gt;
&lt;p&gt;The player HTML is straightforward:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"background: #f8f9fa; border: 1px solid #e9ecef;&lt;/span&gt;
&lt;span class="s"&gt;            border-radius: 8px; padding: 16px 20px; margin: 20px 0;"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"audio-widget-header"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;span&lt;/span&gt; &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"audio-widget-icon"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;🎧&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;span&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;span&lt;/span&gt; &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"color: #495057; font-weight: 600;"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;Listen to this article&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;span&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;audio&lt;/span&gt; &lt;span class="na"&gt;controls&lt;/span&gt; &lt;span class="na"&gt;preload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"metadata"&lt;/span&gt; &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"width: 100%;"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;source&lt;/span&gt; &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"/audio/blog_post.mp3"&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"audio/mpeg"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"audio-widget-footer"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    27 min · AI-generated narration
  &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Why We're Doing This&lt;/h3&gt;
&lt;p&gt;Adding audio narration to blog posts serves multiple purposes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Accessibility&lt;/strong&gt;: Readers with visual impairments or reading difficulties can consume content aurally&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Convenience&lt;/strong&gt;: Listeners can enjoy posts during commutes, workouts, or other activities&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Engagement&lt;/strong&gt;: Audio content creates a more personal connection with the audience&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reach&lt;/strong&gt;: Some audiences prefer audio format, expanding our potential readership&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Running TTS locally rather than using cloud services gives us:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cost control&lt;/strong&gt;: No per-character or per-minute fees&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Privacy&lt;/strong&gt;: Content never leaves our infrastructure&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Consistency&lt;/strong&gt;: Same voice and quality across all posts&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flexibility&lt;/strong&gt;: Full control over processing pipeline&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Troubleshooting Common Issues&lt;/h3&gt;
&lt;h4&gt;"CUDA not available" despite GPU present&lt;/h4&gt;
&lt;p&gt;Ensure you've installed the ROCm version of PyTorch, not the standard build:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;pip&lt;span class="w"&gt; &lt;/span&gt;uninstall&lt;span class="w"&gt; &lt;/span&gt;torch&lt;span class="w"&gt; &lt;/span&gt;torchvision&lt;span class="w"&gt; &lt;/span&gt;torchaudio
pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;torch&lt;span class="w"&gt; &lt;/span&gt;torchvision&lt;span class="w"&gt; &lt;/span&gt;torchaudio&lt;span class="w"&gt; &lt;/span&gt;--index-url&lt;span class="w"&gt; &lt;/span&gt;https://download.pytorch.org/whl/rocm6.4
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Model runs on CPU instead of GPU&lt;/h4&gt;
&lt;p&gt;Check that &lt;code&gt;device_map='cuda:0'&lt;/code&gt; is specified when loading the model. Also verify the environment variables are set before starting Python.&lt;/p&gt;
&lt;h4&gt;"Unsupported language 'en'"&lt;/h4&gt;
&lt;p&gt;Use the full language name: &lt;code&gt;language='english'&lt;/code&gt; not &lt;code&gt;language='en'&lt;/code&gt;.&lt;/p&gt;
&lt;h4&gt;Out of memory errors&lt;/h4&gt;
&lt;p&gt;Try reducing chunk size or using a smaller batch. The model should fit in 16GB, but very long chunks can spike memory usage.&lt;/p&gt;
&lt;h4&gt;Slow first chunk&lt;/h4&gt;
&lt;p&gt;This is normal; ROCm compiles GPU kernels on first use. Subsequent chunks process faster.&lt;/p&gt;
&lt;h3&gt;Future Improvements&lt;/h3&gt;
&lt;p&gt;Our current pipeline works well but has room for enhancement. Some improvements we're considering:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Voice cloning&lt;/strong&gt;: Qwen3-TTS supports custom voice training. With sufficient audio samples, we could create a unique voice for TinyComputers rather than using the stock speakers. This would provide brand consistency and differentiation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Automatic post detection&lt;/strong&gt;: Currently we manually select posts for TTS generation. A CI/CD integration could automatically generate audio for new posts when they're published, keeping the audio library current without manual intervention.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Chapter markers&lt;/strong&gt;: For longer posts, embedding chapter markers in the audio file would allow listeners to skip to specific sections. This requires parsing the markdown headers and mapping them to audio timestamps.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Multiple format export&lt;/strong&gt;: Beyond MP3, offering Opus or AAC formats could reduce file sizes while maintaining quality, benefiting listeners on metered connections.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Speed adjustment&lt;/strong&gt;: Some listeners prefer 1.25x or 1.5x playback speed. Pre-generating speed-adjusted versions could provide better quality than real-time speed adjustment in the browser.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Running Qwen3-TTS on AMD's Strix Halo platform demonstrates that high-quality local TTS is now accessible beyond NVIDIA hardware. While setup requires some ROCm-specific configuration, the results are impressive: natural-sounding narration suitable for professional content.&lt;/p&gt;
&lt;p&gt;The democratization of AI capabilities continues apace. What once required expensive cloud subscriptions or high-end NVIDIA GPUs now runs on integrated graphics. The AI Max+ 395's Radeon 8060S, primarily designed for gaming and general compute tasks, handles a 1.7-billion parameter language model without breaking a sweat.&lt;/p&gt;
&lt;p&gt;We're actively using this pipeline to generate audio versions of posts across TinyComputers, making our technical content more accessible and convenient for our readers. As of this writing, we've processed our retrocomputing series, hardware reviews, and technical tutorials, dozens of hours of content generated entirely on local hardware.&lt;/p&gt;
&lt;p&gt;The combination of AMD's capable integrated graphics and Qwen's excellent TTS model proves that you don't need expensive discrete GPUs or cloud subscriptions to achieve broadcast-quality speech synthesis. For content creators, educators, and accessibility advocates, this opens new possibilities for enriching written content with audio without ongoing service costs.&lt;/p&gt;
&lt;p&gt;If you're running AMD hardware and want to add audio narration to your own content, this guide should get you started. The initial setup investment pays dividends in ongoing cost savings and the satisfaction of running capable AI models entirely on your own infrastructure. And if you encounter issues along the way, the troubleshooting section above addresses the most common pitfalls we discovered during our own setup process.&lt;/p&gt;
&lt;p&gt;The audio player at the top of many TinyComputers posts now represents a small but meaningful step toward making technical content more accessible. Every post you can listen to while commuting, exercising, or doing dishes is content that might otherwise go unread. That's the real value of local TTS: not just cost savings, but expanded reach for the ideas we share.&lt;/p&gt;</description><category>ai max+ 395</category><category>amd</category><category>audio</category><category>machine learning</category><category>pytorch</category><category>qwen</category><category>rocm</category><category>strix halo</category><category>text-to-speech</category><category>tts</category><guid>https://tinycomputers.io/posts/qwen-tts-on-amd-strix-halo.html</guid><pubDate>Sat, 24 Jan 2026 18:00:00 GMT</pubDate></item><item><title>Getting YOLOv8 Training Working on AMD Ryzen™ AI Max+ 395</title><link>https://tinycomputers.io/posts/getting-yolov8-training-working-on-amd-ryzentm-al-max%2B-395.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/getting-yolov8-training-working-on-amd-ryzentm-al-max+-395_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;20 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;Machine learning on AMD GPUs has always been... interesting. With NVIDIA's CUDA dominating the landscape, AMD's ROCm platform remains the underdog: powerful, but often requiring patience and persistence to get working properly. This is the story of how I got YOLOv8 object detection training working on an AMD Radeon 8060S integrated GPU (gfx1151) in the AMD RYZEN AI MAX+ 395 after encountering batch normalization failures, version mismatches, and a critical bug in MIOpen.&lt;/p&gt;
&lt;p&gt;The goal was simple: train a bullet hole detection model for a ballistics application using YOLOv8. The journey? Anything but simple.&lt;/p&gt;
&lt;h3&gt;The Hardware&lt;/h3&gt;
&lt;p&gt;System Specifications:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: AMD RYZEN AI MAX+ 395&lt;/li&gt;
&lt;li&gt;GPU: AMD Radeon 8060S (integrated, RDNA 3.5 architecture, gfx1151)&lt;/li&gt;
&lt;li&gt;VRAM: 96GB shared system memory&lt;/li&gt;
&lt;li&gt;ROCm Version: 7.0.2&lt;/li&gt;
&lt;li&gt;ROCk module: 6.14.14&lt;/li&gt;
&lt;li&gt;PyTorch: 2.8.0+rocm7.0.0.git64359f59&lt;/li&gt;
&lt;li&gt;MIOpen: Initially 3.0.5.1 (version code 3005001), later custom build&lt;/li&gt;
&lt;li&gt;OS: Linux (conda environment: pt2.8-rocm7)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The AMD Radeon 8060S is an integrated GPU in the AMD RYZEN AI MAX+ 395 based on AMD's RDNA 3.5 architecture (gfx1151). What makes this system particularly interesting for machine learning is the massive 96GB of shared system memory available to the GPU, far more VRAM than typical consumer discrete GPUs. While machine learning support on RDNA 3.5 is still maturing compared to older RDNA 2 architectures, the memory capacity makes it compelling for AI workloads.&lt;/p&gt;
&lt;p&gt;But, for about $1,699, you can get up to 96GB of VRAM in a &lt;a href="https://baud.rs/r4rMKO"&gt;whisper-quiet form factor&lt;/a&gt;. This setup beats the pants off of my &lt;a href="https://tinycomputers.io/posts/eights-years-on-the-NVIDIA-tesla-p100-still-delivers-for-budget-artificial-intelligence-work.html"&gt;old GPU rig&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Why YOLOv8 and Ultralytics?&lt;/h3&gt;
&lt;p&gt;Before diving into the technical challenges, it's worth explaining why we chose YOLOv8 from &lt;a href="https://baud.rs/jf4gLA"&gt;Ultralytics&lt;/a&gt; for this project.&lt;/p&gt;
&lt;p&gt;YOLOv8 (You Only Look Once, version 8) is the latest iteration of one of the most popular object detection architectures. Developed and maintained by Ultralytics, it offers several advantages:&lt;/p&gt;
&lt;h4&gt;Why Ultralytics YOLOv8?&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;State-of-the-art Accuracy: YOLOv8 achieves excellent detection accuracy while maintaining real-time inference speeds, critical for practical applications.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Ease of Use: Ultralytics provides a clean, well-documented Python API that makes training custom models remarkably straightforward:&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;ultralytics&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;YOLO&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;YOLO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"yolov8n.pt"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"dataset.yaml"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Active Development: Ultralytics is actively maintained with frequent updates, bug fixes, and community support. This proved invaluable during debugging.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Model Variants: YOLOv8 comes in multiple sizes (nano, small, medium, large, extra-large), allowing us to balance accuracy vs. speed for our specific use case.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Built-in Data Augmentation: The framework includes extensive data augmentation capabilities out of the box, essential for training robust detection models with limited training data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;PyTorch Native: Being built on PyTorch meant it should theoretically work with ROCm (AMD's CUDA equivalent)... in theory.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For our bullet hole detection application, YOLOv8's ability to accurately detect small objects (bullet holes in paper targets) while training efficiently made it the obvious choice. Little did I know that "training efficiently" would require a week-long debugging odyssey.&lt;/p&gt;
&lt;h3&gt;The Initial Setup (ROCm 7.0.0)&lt;/h3&gt;
&lt;p&gt;I started with ROCm 7.0.0, following AMD's official installation guide. Everything installed cleanly:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;$&lt;span class="w"&gt; &lt;/span&gt;python&lt;span class="w"&gt; &lt;/span&gt;-c&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"import torch; print(torch.cuda.is_available())"&lt;/span&gt;
True

$&lt;span class="w"&gt; &lt;/span&gt;python&lt;span class="w"&gt; &lt;/span&gt;-c&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"import torch; print(torch.cuda.get_device_name(0))"&lt;/span&gt;
AMD&lt;span class="w"&gt; &lt;/span&gt;Radeon&lt;span class="w"&gt; &lt;/span&gt;Graphics
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Perfect! PyTorch recognized the GPU. Time to train some models, right?&lt;/p&gt;
&lt;h3&gt;The First Failure: Batch Normalization&lt;/h3&gt;
&lt;p&gt;I loaded a simple YOLOv8 nano model and kicked off training:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;ultralytics&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;YOLO&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;YOLO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"yolov8n.pt"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"data/bullet_hole_dataset_combined/data.yaml"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;imgsz&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;416&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"cuda:0"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Within seconds, the training crashed:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;RuntimeError&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;miopenStatusUnknownError&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The error was cryptic, but digging deeper revealed the real issue: MIOpen was failing to compile batch normalization kernels with inline assembly errors:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&amp;lt;inline asm&amp;gt;:14:20: error: not a valid operand.
v_add_f32 v4 v4 v4 row_bcast:15 row_mask:0xa
                   ^
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Batch normalization. The most common operation in modern deep learning, and it was failing spectacularly on gfx1151. The inline assembly instructions (&lt;code&gt;row_bcast&lt;/code&gt; and &lt;code&gt;row_mask&lt;/code&gt;) appeared incompatible with the RDNA 3.5 architecture.&lt;/p&gt;
&lt;h4&gt;What is Batch Normalization?&lt;/h4&gt;
&lt;p&gt;Batch normalization (BatchNorm) is a technique that normalizes layer inputs across a mini-batch, helping neural networks train faster and more stably. It's used in virtually every modern CNN architecture, including YOLO.&lt;/p&gt;
&lt;p&gt;The error message pointed to &lt;code&gt;MIOpen&lt;/code&gt;, AMD's equivalent of NVIDIA's cuDNN, a library of optimized deep learning primitives.&lt;/p&gt;
&lt;h3&gt;Attempt 1: Upgrade to ROCm 7.0.2&lt;/h3&gt;
&lt;p&gt;My first instinct was to upgrade ROCm. Version 7.0.0 was relatively new, and perhaps 7.0.2 had fixed the batch normalization issues.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Upgraded PyTorch to ROCm 7.0.2&lt;/span&gt;
pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;--upgrade&lt;span class="w"&gt; &lt;/span&gt;torch&lt;span class="w"&gt; &lt;/span&gt;--index-url&lt;span class="w"&gt; &lt;/span&gt;https://download.pytorch.org/whl/rocm7.0
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Result? Same error. Batch normalization still failed.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;RuntimeError&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;miopenStatusUnknownError&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;With the same inline assembly compilation errors about invalid &lt;code&gt;row_bcast&lt;/code&gt; and &lt;code&gt;row_mask&lt;/code&gt; operands. At this point, I realized this wasn't a simple version mismatch; there was something fundamentally broken with MIOpen's batch normalization implementation for the gfx1151 architecture.&lt;/p&gt;
&lt;h3&gt;The Revelation: It's MIOpen, Not ROCm&lt;/h3&gt;
&lt;p&gt;After hours of testing different PyTorch versions, driver configurations, and kernel parameters, I turned to the ROCm community for help.&lt;/p&gt;
&lt;p&gt;I posted my issue on &lt;a href="https://baud.rs/N50zpY"&gt;Reddit's r/ROCm subreddit&lt;/a&gt;, describing the inline assembly compilation failures and &lt;code&gt;miopenStatusUnknownError&lt;/code&gt; on gfx1151. Within a few hours, a knowledgeable Redditor responded with a crucial piece of information:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;"There's a known issue with MIOpen 3.0.x and gfx1151 batch normalization. The inline assembly instructions use operands that aren't compatible with RDNA 3. A fix was recently merged into the develop branch. Try using a nightly build of MIOpen or build from source."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This was the breakthrough I needed. The issue wasn't with ROCm itself or PyTorch; it was specifically MIOpen version 3.0.5.1 that shipped with ROCm 7.0.x. The maintainers had already fixed the gfx1151 batch normalization bug in a recent pull request, but it hadn't made it into a stable release yet.&lt;/p&gt;
&lt;p&gt;The Reddit user suggested two options:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Use a nightly Docker container with the latest MIOpen build&lt;/li&gt;
&lt;li&gt;Build MIOpen 3.5.1 from source using the develop branch&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Testing the Theory: Docker Nightly Builds&lt;/h3&gt;
&lt;p&gt;Before committing to building from source, I wanted to verify that a newer MIOpen would actually fix the problem. AMD provides nightly Docker images with bleeding-edge ROCm builds:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;docker&lt;span class="w"&gt; &lt;/span&gt;pull&lt;span class="w"&gt; &lt;/span&gt;rocm/pytorch-nightly:latest

docker&lt;span class="w"&gt; &lt;/span&gt;run&lt;span class="w"&gt; &lt;/span&gt;--rm&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;--device&lt;span class="o"&gt;=&lt;/span&gt;/dev/kfd&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;--device&lt;span class="o"&gt;=&lt;/span&gt;/dev/dri&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;--group-add&lt;span class="w"&gt; &lt;/span&gt;video&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;-v&lt;span class="w"&gt; &lt;/span&gt;~/ballistics_training:/workspace&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;-w&lt;span class="w"&gt; &lt;/span&gt;/workspace&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;rocm/pytorch-nightly:latest&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;bash&lt;span class="w"&gt; &lt;/span&gt;-c&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'pip install ultralytics &amp;amp;&amp;amp; python3 test_yolo.py'&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The nightly container included MIOpen 3.5.1 from the develop branch.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# test_yolo.py&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;ultralytics&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;YOLO&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"PyTorch: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__version__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"CUDA available: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_available&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Device: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_device_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;YOLO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"yolov8n.pt"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"data_docker.yaml"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;imgsz&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;416&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"cuda:0"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;✅ SUCCESS! Nightly build FIXES gfx1151 batch normalization!
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It worked! The &lt;code&gt;miopenStatusUnknownError&lt;/code&gt; was gone, no more inline assembly compilation failures. Training completed successfully with MIOpen 3.5.1 from the develop branch. The newer version had updated the batch normalization kernels to use instructions compatible with RDNA 3.5's gfx1151 architecture.&lt;/p&gt;
&lt;p&gt;This confirmed the Reddit user's tip: the fix was indeed in the newer MIOpen code that hadn't been released in a stable version yet.&lt;/p&gt;
&lt;h3&gt;The Solution: Building MIOpen from Source&lt;/h3&gt;
&lt;p&gt;Docker was great for testing, but I needed a permanent solution for my native conda environment. That meant building MIOpen 3.5.1 from source.&lt;/p&gt;
&lt;h4&gt;Step 1: Clone the Repository&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nb"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;~/ballistics_training
git&lt;span class="w"&gt; &lt;/span&gt;clone&lt;span class="w"&gt; &lt;/span&gt;https://github.com/ROCm/MIOpen.git&lt;span class="w"&gt; &lt;/span&gt;rocm-libraries/projects/miopen
&lt;span class="nb"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rocm-libraries/projects/miopen
git&lt;span class="w"&gt; &lt;/span&gt;checkout&lt;span class="w"&gt; &lt;/span&gt;develop&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;# Latest development branch with gfx1151 fixes&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Step 2: Build MIOpen&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;mkdir&lt;span class="w"&gt; &lt;/span&gt;build&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;build

cmake&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;-DCMAKE_PREFIX_PATH&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/opt/rocm"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;-DCMAKE_INSTALL_PREFIX&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/ballistics_training/rocm-libraries/projects/miopen/build"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;-DMIOPEN_BACKEND&lt;span class="o"&gt;=&lt;/span&gt;HIP&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;-DCMAKE_BUILD_TYPE&lt;span class="o"&gt;=&lt;/span&gt;Release&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;..

make&lt;span class="w"&gt; &lt;/span&gt;-j&lt;span class="k"&gt;$(&lt;/span&gt;nproc&lt;span class="k"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;98&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Building&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;CXX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;CMakeFiles&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;MIOpen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dir&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;softmax_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cpp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;o&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Linking&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;CXX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;shared&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;library&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;libMIOpen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;so&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Built&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;MIOpen&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Success! MIOpen 3.5.1 was built from source.&lt;/p&gt;
&lt;h4&gt;Step 3: Install Custom MIOpen to Conda Environment&lt;/h4&gt;
&lt;p&gt;Now came the tricky part: replacing the system MIOpen (version 3.0.5.1) with my custom-built version 3.5.1.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nv"&gt;CONDA_LIB&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;~/anaconda3/envs/pt2.8-rocm7/lib

&lt;span class="c1"&gt;# Backup the original MIOpen&lt;/span&gt;
cp&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$CONDA_LIB&lt;/span&gt;/libMIOpen.so.1.0&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$CONDA_LIB&lt;/span&gt;/libMIOpen.so.1.0.backup_system

&lt;span class="c1"&gt;# Install custom MIOpen&lt;/span&gt;
cp&lt;span class="w"&gt; &lt;/span&gt;~/ballistics_training/rocm-libraries/projects/miopen/build/lib/libMIOpen.so.1.0&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$CONDA_LIB&lt;/span&gt;/

&lt;span class="c1"&gt;# Update symlinks&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$CONDA_LIB&lt;/span&gt;
ln&lt;span class="w"&gt; &lt;/span&gt;-sf&lt;span class="w"&gt; &lt;/span&gt;libMIOpen.so.1.0&lt;span class="w"&gt; &lt;/span&gt;libMIOpen.so.1
ln&lt;span class="w"&gt; &lt;/span&gt;-sf&lt;span class="w"&gt; &lt;/span&gt;libMIOpen.so.1&lt;span class="w"&gt; &lt;/span&gt;libMIOpen.so
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Step 4: Verify the Installation&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;conda&lt;span class="w"&gt; &lt;/span&gt;activate&lt;span class="w"&gt; &lt;/span&gt;pt2.8-rocm7
python&lt;span class="w"&gt; &lt;/span&gt;-c&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"import torch; print(f'MIOpen version: {torch.backends.cudnn.version()}')"&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Output:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;MIOpen version: 3005001
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Wait, &lt;code&gt;3005001&lt;/code&gt;? That's version 3.5.1! (MIOpen uses an integer versioning scheme: major&lt;em&gt;1000000 + minor&lt;/em&gt;1000 + patch)&lt;/p&gt;
&lt;p&gt;The custom MIOpen was successfully loaded.&lt;/p&gt;
&lt;h3&gt;The Final Test: YOLOv8 Training&lt;/h3&gt;
&lt;p&gt;Time for the moment of truth. Could I finally train YOLOv8 on my AMD GPU?&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;ultralytics&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;YOLO&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"="&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Testing YOLOv8 Training with Custom MIOpen 3.5.1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"="&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"PyTorch: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__version__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"CUDA available: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_available&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"MIOpen version: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;backends&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cudnn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;YOLO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"yolov8n.pt"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Starting training..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"data/bullet_hole_dataset_combined/data.yaml"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;imgsz&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;416&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"cuda:0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bullet_hole_detector"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Output:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;============================================================
Testing YOLOv8 Training with Custom MIOpen 3.5.1
============================================================
PyTorch: 2.8.0+rocm7.0.0.git64359f59
CUDA available: True
MIOpen version: 3005001

Starting training...

Ultralytics 8.3.217 🚀 Python-3.12.11 torch-2.8.0+rocm7.0.0 CUDA:0 (AMD Radeon Graphics, 98304MiB)

Model summary: 129 layers, 3,011,043 parameters, 3,011,027 gradients, 8.2 GFLOPs

Transferred 319/355 items from pretrained weights
AMP: running Automatic Mixed Precision (AMP) checks...
AMP: checks passed ✅

Starting training for 1 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        1/1     0.172G      3.022      3.775      1.215         29        416
        1/1     0.174G      2.961      4.034      1.147         46        416
        1/1     0.203G      3.133       4.08      1.251         36        416
        1/1     0.205G       3.14      4.266       1.25         60        416
        1/1     0.205G      3.028      4.194      1.237         18        416
        1/1     0.205G      2.995      4.114      1.235         28        416
        1/1     0.205G      3.029      4.118      1.226         41        416
        1/1     0.205G      2.961      4.031      1.209         26        416
        1/1     0.205G      2.888      3.998      1.193         22        416
        1/1     0.205G      2.861      3.823      1.185         49        416
        1/1     0.205G      2.812      3.657      1.169         46        416
        1/1     0.205G      2.821      3.459      1.149         78        416
        1/1     0.205G      2.776      3.253      1.134         26        416
        1/1     0.217G      2.784      3.207      1.131        122        416
        1/1     0.217G      2.772      3.074      1.121         40        416
        1/1     0.217G      2.774       2.98      1.114         13        416
        1/1     0.217G      2.763      2.914      1.118         37        416
        1/1     0.217G       2.75      2.876      1.113         81        416
        1/1     0.217G      2.731      2.799      1.104         31        416
        1/1     0.217G      2.736      2.732      1.101         30        416: 100% 14.8it/s

                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)
                   all         60        733      0.653      0.473       0.53      0.191

1 epochs completed in 0.002 hours.

==============================================================
✅ SUCCESS! Training completed without errors!
==============================================================

Speed: 0.0ms preprocess, 1.9ms inference, 0.0ms loss, 0.5ms postprocess per image
Results saved to runs/detect/bullet_hole_detector/
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It worked! Batch normalization executed flawlessly. The training progressed smoothly from epoch to epoch, with GPU utilization staying high, memory management remaining stable, and losses converging as expected. The model achieved 53.0% mAP50 and trained without a single error.&lt;/p&gt;
&lt;p&gt;After a week of debugging, version wrangling, and source code compilation, I finally had GPU-accelerated YOLOv8 training working on my AMD RDNA 3.5 GPU. The custom MIOpen 3.5.1 build resolved the inline assembly compatibility issues, and training now runs as smoothly on gfx1151 as it would on any other supported GPU.&lt;/p&gt;
&lt;h3&gt;Performance Notes&lt;/h3&gt;
&lt;p&gt;With the custom MIOpen build, training performance was excellent:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Training Speed: 70.5 images/second (batch size 16, 416×416 images)&lt;/li&gt;
&lt;li&gt;Training Time: 32.6 seconds for 10 epochs (2,300 total images)&lt;/li&gt;
&lt;li&gt;Throughput: 9.7-9.9 iterations/second&lt;/li&gt;
&lt;li&gt;GPU Utilization: ~95% during training with no throttling&lt;/li&gt;
&lt;li&gt;Memory Usage: ~1.2 GB VRAM for YOLOv8n with batch size 16&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The GPU utilization stayed consistently high with no performance degradation across epochs. Each epoch averaged approximately 3.3 seconds with solid consistency. For comparison, CPU-only training on the same dataset would be roughly 15-20x slower. The GPU acceleration was well worth the effort.&lt;/p&gt;
&lt;h3&gt;Lessons Learned&lt;/h3&gt;
&lt;p&gt;This debugging journey taught me several valuable lessons:&lt;/p&gt;
&lt;h4&gt;1. The ROCm Community is Invaluable&lt;/h4&gt;
&lt;p&gt;The Reddit r/ROCm community proved to be the key to solving this issue. When official documentation fails, community knowledge fills the gap. Don't hesitate to ask for help; chances are someone has encountered your exact issue before.&lt;/p&gt;
&lt;h4&gt;2. MIOpen ≠ ROCm&lt;/h4&gt;
&lt;p&gt;I initially assumed upgrading ROCm would fix the problem. In reality, MIOpen (the deep learning library) had a separate bug that was independent of the ROCm platform version. Understanding the component architecture of ROCm saved hours of debugging time.&lt;/p&gt;
&lt;h4&gt;3. RDNA 3.5 (gfx1151) Support is Still Maturing&lt;/h4&gt;
&lt;p&gt;AMD's latest integrated GPU architecture is powerful, but ML support lags behind older architectures like RDNA 2 (gfx1030) and Vega. If you're doing serious ML work on AMD, consider that newer hardware may require more troubleshooting.&lt;/p&gt;
&lt;h4&gt;4. Nightly Builds Can Be Production-Ready&lt;/h4&gt;
&lt;p&gt;There's often hesitation to use nightly/development builds in production. However, in this case, the develop branch of MIOpen was actually more stable than the official release for my specific GPU. Sometimes bleeding-edge code is exactly what you need.&lt;/p&gt;
&lt;h4&gt;5. Docker is Great for Testing&lt;/h4&gt;
&lt;p&gt;The ROCm nightly Docker containers were instrumental in proving my hypothesis. Being able to test a newer MIOpen version without committing to a full rebuild saved significant time.&lt;/p&gt;
&lt;h4&gt;6. Source Builds Give You Control&lt;/h4&gt;
&lt;p&gt;Building from source is time-consuming and requires understanding the build system, but it gives you complete control over your environment. When binary distributions fail, source builds are your safety net.&lt;/p&gt;
&lt;h3&gt;Tips for AMD GPU Machine Learning&lt;/h3&gt;
&lt;p&gt;If you're attempting to do machine learning on AMD GPUs, here are some recommendations:&lt;/p&gt;
&lt;h4&gt;Environment Setup&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Use conda/virtualenv: Isolate your Python environment to avoid system package conflicts&lt;/li&gt;
&lt;li&gt;Pin your versions: Lock PyTorch, ROCm, and MIOpen versions once you have a working setup&lt;/li&gt;
&lt;li&gt;Keep backups: Always backup working library files before swapping them out&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Debugging Strategy&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;Verify GPU detection first: Ensure &lt;code&gt;torch.cuda.is_available()&lt;/code&gt; returns &lt;code&gt;True&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Test simple operations: Try basic tensor operations before complex models&lt;/li&gt;
&lt;li&gt;Check MIOpen version: &lt;code&gt;torch.backends.cudnn.version()&lt;/code&gt; can reveal version mismatches&lt;/li&gt;
&lt;li&gt;Monitor logs: ROCm logs (&lt;code&gt;MIOPEN_ENABLE_LOGGING=1&lt;/code&gt;) provide valuable debugging info&lt;/li&gt;
&lt;li&gt;Try Docker first: Test potential fixes in Docker before modifying your system&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;Hardware Considerations&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;RDNA 2 (gfx1030) is more mature than RDNA 3.5 (gfx1151) for ML workloads&lt;/li&gt;
&lt;li&gt;Server GPUs (MI series) have better ROCm support than consumer cards&lt;/li&gt;
&lt;li&gt;Integrated GPUs with large shared memory (like the Radeon 8060S with 96GB) offer unique advantages for ML&lt;/li&gt;
&lt;li&gt;Check compatibility: Always verify your specific GPU (gfx code) is supported before purchasing&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Getting YOLOv8 training working on an AMD RDNA 3.5 GPU wasn't easy, but it was achievable. The combination of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Community support from r/ROCm pointing me to the right solution&lt;/li&gt;
&lt;li&gt;Docker testing to verify the fix&lt;/li&gt;
&lt;li&gt;Building MIOpen 3.5.1 from source&lt;/li&gt;
&lt;li&gt;Carefully replacing system libraries&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;...resulted in a fully functional GPU-accelerated machine learning training environment.&lt;/p&gt;
&lt;p&gt;AMD's ROCm platform still has rough edges compared to NVIDIA's CUDA ecosystem, but it's improving rapidly. With some patience, persistence, and willingness to dig into source code, AMD GPUs can absolutely be viable for machine learning workloads.&lt;/p&gt;
&lt;p&gt;The bullet hole detection model trained successfully, achieved excellent accuracy, and now runs in production. Sometimes the journey is as valuable as the destination; I learned more about ROCm internals, library dependencies, and GPU computing in this week than I would have in months of smooth sailing.&lt;/p&gt;
&lt;p&gt;If you're facing similar issues with AMD GPUs and ROCm, I hope this guide helps. And remember: when in doubt, check r/ROCm. The community might just have the answer you're looking for.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;System Details (for reference):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: AMD RYZEN AI MAX+ 395&lt;/li&gt;
&lt;li&gt;GPU: AMD Radeon 8060S (integrated, gfx1151)&lt;/li&gt;
&lt;li&gt;VRAM: 96GB shared system memory&lt;/li&gt;
&lt;li&gt;ROCm: 7.0.2&lt;/li&gt;
&lt;li&gt;ROCk module: 6.14.14&lt;/li&gt;
&lt;li&gt;PyTorch: 2.8.0+rocm7.0.0.git64359f59&lt;/li&gt;
&lt;li&gt;MIOpen: 3.5.1 (custom build from develop branch)&lt;/li&gt;
&lt;li&gt;Conda Environment: pt2.8-rocm7&lt;/li&gt;
&lt;li&gt;YOLOv8: Ultralytics 8.3.217&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Key Files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MIOpen source: https://github.com/ROCm/MIOpen&lt;/li&gt;
&lt;li&gt;Ultralytics YOLOv8: https://github.com/ultralytics/ultralytics&lt;/li&gt;
&lt;li&gt;ROCm installation: https://rocm.docs.amd.com/&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Special thanks to the r/ROCm community for pointing me toward the MIOpen develop branch fix!&lt;/p&gt;</description><category>amd gpu</category><category>batch normalization</category><category>debugging</category><category>deep learning</category><category>gpu training</category><category>machine learning</category><category>miopen</category><category>object detection</category><category>pytorch</category><category>rdna 3</category><category>rocm</category><category>ultralytics</category><category>yolov8</category><guid>https://tinycomputers.io/posts/getting-yolov8-training-working-on-amd-ryzentm-al-max%2B-395.html</guid><pubDate>Wed, 22 Oct 2025 14:54:43 GMT</pubDate></item><item><title>Transfer Learning for Predictive Custom Drag Modeling: Automated Generation of Drag Coefficient Curves Using Multi-Modal AI</title><link>https://tinycomputers.io/posts/transfer-learning-for-predictive-custom-drag-modeling-automated-generation-of-drag-coefficient-curves-using-multi-modal-ai.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/transfer-learning-for-predictive-custom-drag-modeling-automated-generation-of-drag-coefficient-curves-using-multi-modal-ai_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;15 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;TL;DR&lt;/h3&gt;
&lt;p&gt;We built a neural network that predicts full drag coefficient curves (41 Mach points from 0.5 to 4.5) for rifle bullets using only basic specifications like weight, caliber, and ballistic coefficient. The system achieves 3.15% mean absolute error and has been serving predictions in production since September 2025. This post walks through the technical implementation details, architecture decisions, and lessons learned building a real-world ML system for ballistic physics.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Read the full whitepaper: &lt;a href="https://tinycomputers.io/data/cdm_transfer_learning.pdf"&gt;Transfer Learning for Predictive Custom Drag Modeling&lt;/a&gt; (17 pages)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h3&gt;The Problem: Drag Curves Are Scarce, But Critical&lt;/h3&gt;
&lt;p&gt;If you've ever built a ballistic calculator, you know the challenge: accurate drag modeling is everything. Standard drag models (G1, G7, G8) work okay for "average" bullets, but modern precision shooting demands better. Custom Drag Models (CDMs), full drag coefficient curves measured with doppler radar, are the gold standard. They capture the unique aerodynamic signature of each bullet design.&lt;/p&gt;
&lt;p&gt;The catch? Getting a CDM requires:
- Access to a doppler radar range (≈$500K+ equipment)
- Firing 50-100 rounds at various velocities
- Expert analysis to process the raw data
- Cost: $5,000-$15,000 per bullet&lt;/p&gt;
&lt;p&gt;For manufacturers like Hornady and Lapua, this is routine. For smaller manufacturers or custom bullet makers? Not happening. We had 641 bullets with real radar-measured CDMs and thousands of bullets with only basic specs. Could we use machine learning to bridge the gap?&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;The Vision: Transfer Learning from Radar Data&lt;/h3&gt;
&lt;p&gt;The core insight: bullets with similar physical characteristics have similar drag curves. A 168gr .308 boattail match bullet from Manufacturer A will drag similarly to one from Manufacturer B. We could train a neural network on our 641 radar-measured bullets and use transfer learning to predict CDMs for bullets we've never measured.&lt;/p&gt;
&lt;p&gt;But we faced an immediate data problem: 641 samples isn't much for deep learning. Enter synthetic data augmentation.&lt;/p&gt;
&lt;h3&gt;Part 1: Automating Data Extraction with Claude Vision&lt;/h3&gt;
&lt;p&gt;Applied Ballistics publishes ballistic data for 704+ bullets as JPEG images. Manual data entry would take 1,408 hours (704 bullets × 2 hours each). We needed automation.&lt;/p&gt;
&lt;h4&gt;The Vision Processing Pipeline&lt;/h4&gt;
&lt;p&gt;We built an extraction pipeline using Claude 3.5 Sonnet's vision capabilities:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;base64&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pathlib&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;extract_bullet_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Extract bullet specifications from AB datasheet JPEG."""&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ANTHROPIC_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Load and encode image&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"rb"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;image_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;standard_b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"utf-8"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Vision extraction prompt&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"claude-3-5-sonnet-20241022"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="s2"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s2"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="s2"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s2"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="s2"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"base64"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="s2"&gt;"media_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"image/jpeg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="s2"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="s2"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"""Extract the following from this Applied Ballistics bullet datasheet:&lt;/span&gt;
&lt;span class="s2"&gt;                    - Caliber (inches, decimal format)&lt;/span&gt;
&lt;span class="s2"&gt;                    - Bullet weight (grains)&lt;/span&gt;
&lt;span class="s2"&gt;                    - G1 Ballistic Coefficient&lt;/span&gt;
&lt;span class="s2"&gt;                    - G7 Ballistic Coefficient&lt;/span&gt;
&lt;span class="s2"&gt;                    - Bullet length (inches, if visible)&lt;/span&gt;
&lt;span class="s2"&gt;                    - Ogive radius (calibers, if visible)&lt;/span&gt;

&lt;span class="s2"&gt;                    Return as JSON with keys: caliber, weight_gr, bc_g1, bc_g7, length_in, ogive_radius_cal"""&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Parse response&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Physics validation&lt;/span&gt;
    &lt;span class="n"&gt;validate_bullet_physics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;validate_bullet_physics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Sanity checks for extracted data."""&lt;/span&gt;
    &lt;span class="n"&gt;caliber&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'caliber'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'weight_gr'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Caliber bounds&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="mf"&gt;0.172&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;caliber&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mf"&gt;0.50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Invalid caliber: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;caliber&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Weight-to-caliber ratio (sectional density proxy)&lt;/span&gt;
    &lt;span class="n"&gt;ratio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;caliber&lt;/span&gt;  &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;ratio&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Implausible weight for caliber: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;gr @ &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;caliber&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;in"&lt;/span&gt;

    &lt;span class="c1"&gt;# BC sanity&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'bc_g1'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mf"&gt;1.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Invalid G1 BC: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'bc_g1'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'bc_g7'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"Invalid G7 BC: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'bc_g7'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Vision Processing Pipeline" src="https://tinycomputers.io/images/vision_pipeline.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: Claude Vision extraction pipeline - from JPEG datasheets to structured bullet specifications&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Results:
- 704/704 successful extractions (100% success rate)
- 2.3 seconds per bullet (average)
- 27 minutes total vs. 1,408 hours manual
- 99.97% time savings&lt;/p&gt;
&lt;p&gt;We validated against a manually-verified subset of 50 bullets:
- 100% match on caliber
- 98% match on weight (±0.5 grain tolerance)
- 96% match on BC values (±0.002 tolerance)&lt;/p&gt;
&lt;p&gt;The vision model occasionally struggled with hand-drawn or low-quality scans, but the physics validation caught these errors before they corrupted our dataset.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Part 2: Generating Synthetic CDM Curves&lt;/h3&gt;
&lt;p&gt;Now we had 704 bullets with BC values but no full CDM curves. We needed to synthesize them.&lt;/p&gt;
&lt;h4&gt;The BC-to-CDM Transformation Algorithm&lt;/h4&gt;
&lt;p&gt;The relationship between ballistic coefficient and drag coefficient is straightforward:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;BC = m / (C_d × d²)

Rearranging:
C_d(M) = m / (BC(M) × d²)
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;But BC values are typically single scalars, not curves. We developed a 5-step hybrid algorithm combining standard drag model references with BC-derived corrections:&lt;/p&gt;
&lt;h5&gt;Step 1: Base Reference Curve&lt;/h5&gt;
&lt;p&gt;Start with the G7 standard drag curve as a baseline (better for modern boattail bullets than G1):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get_g7_reference_curve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mach_points&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""G7 standard drag curve from McCoy (1999)."""&lt;/span&gt;
    &lt;span class="c1"&gt;# Precomputed G7 curve at 41 Mach points&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;interpolate_standard_curve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"G7"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mach_points&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h5&gt;Step 2: BC-Based Scaling&lt;/h5&gt;
&lt;p&gt;Scale the reference curve using extracted BC values:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;scale_by_bc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_base&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bc_actual&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bc_reference&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.221&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Scale drag curve to match actual BC.&lt;/span&gt;

&lt;span class="sd"&gt;    BC_G7_ref = 0.221 (G7 standard projectile)&lt;/span&gt;
&lt;span class="sd"&gt;    """&lt;/span&gt;
    &lt;span class="n"&gt;scaling_factor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bc_reference&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;bc_actual&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cd_base&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;scaling_factor&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h5&gt;Step 3: Multi-Regime Interpolation&lt;/h5&gt;
&lt;p&gt;When both G1 and G7 BCs are available, blend them based on Mach regime:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;blend_drag_models&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mach&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cd_g1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cd_g7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Blend G1 and G7 curves based on flight regime.&lt;/span&gt;

&lt;span class="sd"&gt;    - Supersonic (M &amp;gt; 1.2): Use G1 (better for shock wave region)&lt;/span&gt;
&lt;span class="sd"&gt;    - Transonic (0.8 &amp;lt; M &amp;lt; 1.2): Cubic spline interpolation&lt;/span&gt;
&lt;span class="sd"&gt;    - Subsonic (M &amp;lt; 0.8): Use G7 (better for low-speed)&lt;/span&gt;
&lt;span class="sd"&gt;    """&lt;/span&gt;
    &lt;span class="n"&gt;cd_blended&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mach&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;M&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mach&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;M&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;1.2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Supersonic: G1 better captures shock effects&lt;/span&gt;
            &lt;span class="n"&gt;cd_blended&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cd_g1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;M&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Subsonic: G7 better for boattail bullets&lt;/span&gt;
            &lt;span class="n"&gt;cd_blended&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cd_g7&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Transonic: smooth interpolation&lt;/span&gt;
            &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;M&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;  &lt;span class="c1"&gt;# Normalize to [0, 1]&lt;/span&gt;
            &lt;span class="n"&gt;cd_blended&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cubic_interpolate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_g7&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;cd_g1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cd_blended&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h5&gt;Step 4: Transonic Peak Generation&lt;/h5&gt;
&lt;p&gt;Model the transonic drag spike using a Gaussian kernel:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;add_transonic_peak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_base&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mach&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="n"&gt;bc_g1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bc_g7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Add realistic transonic drag spike.&lt;/span&gt;

&lt;span class="sd"&gt;    Peak amplitude calibrated from BC ratio (G1 worse than G7 in transonic).&lt;/span&gt;
&lt;span class="sd"&gt;    """&lt;/span&gt;
    &lt;span class="c1"&gt;# Estimate peak amplitude from BC discrepancy&lt;/span&gt;
    &lt;span class="n"&gt;bc_ratio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bc_g1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;bc_g7&lt;/span&gt;
    &lt;span class="n"&gt;peak_amplitude&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bc_ratio&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Empirically tuned&lt;/span&gt;

    &lt;span class="c1"&gt;# Gaussian centered at critical Mach&lt;/span&gt;
    &lt;span class="n"&gt;M_crit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;
    &lt;span class="n"&gt;sigma&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;

    &lt;span class="n"&gt;transonic_spike&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;peak_amplitude&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;mach&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;M_crit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;sigma&lt;/span&gt;  &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cd_base&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;transonic_spike&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h5&gt;Step 5: Monotonicity Enforcement&lt;/h5&gt;
&lt;p&gt;Apply Savitzky-Golay smoothing to prevent unphysical oscillations:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;scipy.signal&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;savgol_filter&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;enforce_smoothness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_curve&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window_length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;polyorder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Smooth drag curve while preserving transonic peak.&lt;/span&gt;

&lt;span class="sd"&gt;    Savitzky-Golay filter preserves peak shape better than moving average.&lt;/span&gt;
&lt;span class="sd"&gt;    """&lt;/span&gt;
    &lt;span class="c1"&gt;# Must have odd window length&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;window_length&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;window_length&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;savgol_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_curve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window_length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;polyorder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'nearest'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Validation Against Ground Truth&lt;/h4&gt;
&lt;p&gt;We validated synthetic curves against 127 bullets where both BC values and full CDM curves were available:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mean Absolute Error&lt;/td&gt;
&lt;td&gt;3.2%&lt;/td&gt;
&lt;td&gt;Across all Mach points&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transonic Error&lt;/td&gt;
&lt;td&gt;4.8%&lt;/td&gt;
&lt;td&gt;Mach 0.8-1.2 (most challenging)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supersonic Error&lt;/td&gt;
&lt;td&gt;2.1%&lt;/td&gt;
&lt;td&gt;Mach 1.5-3.0 (best performance)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shape Correlation&lt;/td&gt;
&lt;td&gt;r = 0.984&lt;/td&gt;
&lt;td&gt;Pearson correlation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The synthetic curves satisfied all physics constraints:
- Monotonic decrease in supersonic regime
- Realistic transonic peaks (1.3-2.0× baseline)
- Smooth transitions between regimes&lt;/p&gt;
&lt;p&gt;&lt;img alt="Physics Validation" src="https://tinycomputers.io/images/physics_validation.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 3: Validation of synthetic CDM curves against ground truth radar measurements&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Total training data: 1,345 bullets (704 synthetic + 641 real), 2.1x data augmentation.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Part 3: Architecture Exploration&lt;/h3&gt;
&lt;p&gt;With data ready, we explored four neural architectures:&lt;/p&gt;
&lt;h4&gt;1. Multi-Layer Perceptron (Baseline)&lt;/h4&gt;
&lt;p&gt;Simple feedforward network:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch.nn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;nn&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;CDMPredictor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""MLP for CDM prediction: 13 features → 41 Cd values."""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dropout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;super&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;network&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReLU&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dropout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dropout&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReLU&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dropout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dropout&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReLU&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dropout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dropout&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReLU&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dropout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dropout&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;41&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Output: 41 Mach points&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Input Features (13 total):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'caliber'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# inches&lt;/span&gt;
    &lt;span class="s1"&gt;'weight_gr'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;# grains&lt;/span&gt;
    &lt;span class="s1"&gt;'bc_g1'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;# G1 ballistic coefficient&lt;/span&gt;
    &lt;span class="s1"&gt;'bc_g7'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;# G7 ballistic coefficient&lt;/span&gt;
    &lt;span class="s1"&gt;'length_in'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# bullet length (imputed if missing)&lt;/span&gt;
    &lt;span class="s1"&gt;'ogive_radius_cal'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ogive radius in calibers&lt;/span&gt;
    &lt;span class="s1"&gt;'meplat_diam_in'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# meplat diameter&lt;/span&gt;
    &lt;span class="s1"&gt;'boat_tail_angle'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# boattail angle (degrees)&lt;/span&gt;
    &lt;span class="s1"&gt;'bearing_length'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# bearing surface length&lt;/span&gt;
    &lt;span class="s1"&gt;'sectional_density'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# weight / caliber²&lt;/span&gt;
    &lt;span class="s1"&gt;'form_factor_g1'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# i / BC_G1&lt;/span&gt;
    &lt;span class="s1"&gt;'form_factor_g7'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# i / BC_G7&lt;/span&gt;
    &lt;span class="s1"&gt;'length_to_diameter'&lt;/span&gt; &lt;span class="c1"&gt;# L/D ratio&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Network Architecture" src="https://tinycomputers.io/images/network_architecture.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 4: MLP architecture - 13 input features through 4 hidden layers to 41 output Mach points&lt;/em&gt;&lt;/p&gt;
&lt;h4&gt;2. Physics-Informed Neural Network (PINN)&lt;/h4&gt;
&lt;p&gt;Added physics loss term enforcing drag model constraints:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;PINN_CDMPredictor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Physics-Informed NN with drag equation constraints."""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;super&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="c1"&gt;# Same architecture as MLP&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;network&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;build_mlp_network&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;physics_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mach&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="sd"&gt;"""Enforce physics constraints on predictions.&lt;/span&gt;

&lt;span class="sd"&gt;        Constraints:&lt;/span&gt;
&lt;span class="sd"&gt;        1. Drag increases with Mach in subsonic&lt;/span&gt;
&lt;span class="sd"&gt;        2. Transonic peak exists near M=1&lt;/span&gt;
&lt;span class="sd"&gt;        3. Monotonic decrease in supersonic&lt;/span&gt;
&lt;span class="sd"&gt;        """&lt;/span&gt;
        &lt;span class="c1"&gt;# Constraint 1: Subsonic gradient&lt;/span&gt;
        &lt;span class="n"&gt;subsonic_mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mach&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;
        &lt;span class="n"&gt;subsonic_cd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;subsonic_mask&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;subsonic_grad&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subsonic_cd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;subsonic_violation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;subsonic_grad&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Penalize decreases&lt;/span&gt;

        &lt;span class="c1"&gt;# Constraint 2: Transonic peak&lt;/span&gt;
        &lt;span class="n"&gt;transonic_mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mach&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mach&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mf"&gt;1.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;transonic_cd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;transonic_mask&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;peak_violation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;transonic_cd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Must exceed 1.1&lt;/span&gt;

        &lt;span class="c1"&gt;# Constraint 3: Supersonic monotonicity&lt;/span&gt;
        &lt;span class="n"&gt;supersonic_mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mach&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;1.5&lt;/span&gt;
        &lt;span class="n"&gt;supersonic_cd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;supersonic_mask&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;supersonic_grad&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;supersonic_cd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;supersonic_violation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;supersonic_grad&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Penalize increases&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;subsonic_violation&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;peak_violation&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;supersonic_violation&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;total_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cd_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mach&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lambda_physics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Combined data + physics loss."""&lt;/span&gt;
    &lt;span class="n"&gt;data_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MSELoss&lt;/span&gt;&lt;span class="p"&gt;()(&lt;/span&gt;&lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cd_true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;physics_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;physics_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mach&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data_loss&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;lambda_physics&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;physics_loss&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Result: Over-regularization. Physics loss was too strict, preventing the model from learning subtle variations. Performance degraded to 4.86% MAE.&lt;/p&gt;
&lt;h4&gt;3. Transformer Architecture&lt;/h4&gt;
&lt;p&gt;Treated the 41 Mach points as a sequence:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;TransformerCDM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Transformer encoder for sequence-to-sequence CDM prediction."""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nhead&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_layers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;super&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;encoder_layer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TransformerEncoderLayer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;nhead&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;nhead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;dim_feedforward&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transformer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TransformerEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoder_layer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_layers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_layers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_head&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;41&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# x: [batch, 13]&lt;/span&gt;
        &lt;span class="n"&gt;embedded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# [batch, d_model]&lt;/span&gt;
        &lt;span class="n"&gt;embedded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedded&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unsqueeze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;expand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;41&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# [batch, 41, d_model]&lt;/span&gt;

        &lt;span class="n"&gt;transformed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedded&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# [batch, 41, d_model]&lt;/span&gt;

        &lt;span class="n"&gt;cd_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transformed&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="p"&gt;:])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;squeeze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# [batch, 41]&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cd_pred&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Result: Mismatch between architecture and problem. CDM prediction isn't a sequence modeling task; Mach points are independent given bullet features. Performance: 6.05% MAE.&lt;/p&gt;
&lt;h4&gt;4. Neural ODE&lt;/h4&gt;
&lt;p&gt;Attempted to model drag as a continuous ODE:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torchdiffeq&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;odeint&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;DragODE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Neural ODE for continuous drag modeling."""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hidden_dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;super&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hidden_dim&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# Mach + features&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tanh&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hidden_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hidden_dim&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tanh&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hidden_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# dCd/dM&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# t: current Mach number&lt;/span&gt;
        &lt;span class="c1"&gt;# state: [Cd, features...]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;predict_cdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mach_points&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Integrate ODE to get Cd curve."""&lt;/span&gt;
    &lt;span class="n"&gt;initial_cd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tensor&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# Initial guess&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;initial_cd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;solution&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;odeint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ode_func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mach_points&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;solution&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Extract Cd values&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Result: Failed to converge due to dimension mismatch errors and extreme sensitivity to initial conditions. Abandoned after 2 days of debugging.&lt;/p&gt;
&lt;h4&gt;Architecture Comparison Results&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;MAE&lt;/th&gt;
&lt;th&gt;Smoothness&lt;/th&gt;
&lt;th&gt;Shape Correlation&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MLP Baseline&lt;/td&gt;
&lt;td&gt;3.66%&lt;/td&gt;
&lt;td&gt;90.05%&lt;/td&gt;
&lt;td&gt;0.9380&lt;/td&gt;
&lt;td&gt;✅ Best&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Physics-Informed NN&lt;/td&gt;
&lt;td&gt;4.86%&lt;/td&gt;
&lt;td&gt;64.02%&lt;/td&gt;
&lt;td&gt;0.8234&lt;/td&gt;
&lt;td&gt;❌ Over-regularized&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transformer&lt;/td&gt;
&lt;td&gt;6.05%&lt;/td&gt;
&lt;td&gt;56.83%&lt;/td&gt;
&lt;td&gt;0.7891&lt;/td&gt;
&lt;td&gt;❌ Poor fit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Neural ODE&lt;/td&gt;
&lt;td&gt;---&lt;/td&gt;
&lt;td&gt;---&lt;/td&gt;
&lt;td&gt;---&lt;/td&gt;
&lt;td&gt;❌ Failed to converge&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;img alt="Architecture Comparison" src="https://tinycomputers.io/images/architecture_comparison.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 5: Performance comparison across four neural architectures - MLP baseline wins&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Key Insight: Simple MLP with dropout outperformed complex physics-constrained models. The training data already contained sufficient physics signal; explicit constraints hurt generalization.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Part 4: Production System Design&lt;/h3&gt;
&lt;p&gt;The POC model (3.66% MAE) validated the approach. Now we needed production hardening.&lt;/p&gt;
&lt;h4&gt;Training Pipeline Improvements&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pytorch_lightning&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pl&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch.utils.data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TensorDataset&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;ProductionCDMModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LightningModule&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Production-ready CDM predictor with monitoring."""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1e-3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weight_decay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1e-4&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;super&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;save_hyperparameters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CDMPredictor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;learning_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;learning_rate&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weight_decay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weight_decay&lt;/span&gt;

        &lt;span class="c1"&gt;# Metrics tracking&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train_mae&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;val_mae&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;training_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_idx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cd_true&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;
        &lt;span class="n"&gt;cd_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Weighted MSE loss (emphasize transonic region)&lt;/span&gt;
        &lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_get_mach_weights&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_pred&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;cd_true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Metrics&lt;/span&gt;
        &lt;span class="n"&gt;mae&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_pred&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;cd_true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'train_loss'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'train_mae'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mae&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;validation_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_idx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cd_true&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;
        &lt;span class="n"&gt;cd_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MSELoss&lt;/span&gt;&lt;span class="p"&gt;()(&lt;/span&gt;&lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cd_true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;mae&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_pred&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;cd_true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'val_loss'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'val_mae'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mae&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Physics validation&lt;/span&gt;
        &lt;span class="n"&gt;smoothness&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_calculate_smoothness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;transonic_quality&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_check_transonic_peak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'smoothness'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;smoothness&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'transonic_quality'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transonic_quality&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;configure_optimizers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;optimizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optim&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AdamW&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;lr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;weight_decay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weight_decay&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;scheduler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optim&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lr_scheduler&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReduceLROnPlateau&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'min'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;patience&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s1"&gt;'optimizer'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'lr_scheduler'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'monitor'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'val_loss'&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_get_mach_weights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="sd"&gt;"""Weight transonic region more heavily."""&lt;/span&gt;
        &lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;41&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;transonic_indices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mach_points&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mach_points&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mf"&gt;1.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;transonic_indices&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;  &lt;span class="c1"&gt;# 2x weight in transonic&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_calculate_smoothness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="sd"&gt;"""Measure curve smoothness (low = better)."""&lt;/span&gt;
        &lt;span class="n"&gt;second_derivative&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;second_derivative&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_check_transonic_peak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="sd"&gt;"""Verify transonic peak exists and is realistic."""&lt;/span&gt;
        &lt;span class="n"&gt;transonic_mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mach_points&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mach_points&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mf"&gt;1.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;peak_cd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="n"&gt;transonic_mask&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;baseline_cd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Subsonic baseline&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;peak_cd&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;baseline_cd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Should be &amp;gt; 1.0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Training Configuration&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# Data preparation&lt;/span&gt;
&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prepare_features&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# 1,039 → 831 / 104 / 104&lt;/span&gt;
&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prepare_targets&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;train_dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TensorDataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;val_dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TensorDataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;train_loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;val_loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val_dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Model training&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ProductionCDMModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1e-3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weight_decay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1e-4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;max_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EarlyStopping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;monitor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'val_loss'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;patience&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'min'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ModelCheckpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;monitor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'val_mae'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'min'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;save_top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LearningRateMonitor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logging_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'epoch'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;accelerator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'gpu'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;devices&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;log_every_n_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_loader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;val_loader&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Training Convergence" src="https://tinycomputers.io/images/training_convergence.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 6: Training and validation loss convergence over 60 epochs&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Training Results:
- Converged at epoch 60 (early stopping)
- Final validation loss: 0.0023
- Production model MAE: 3.15% (13.9% improvement over POC)
- Smoothness: 88.81% (close to ground truth 89.6%)
- Shape correlation: 0.9545&lt;/p&gt;
&lt;p&gt;&lt;img alt="CDM Predictions" src="https://tinycomputers.io/images/cdm_predictions.png"&gt;
&lt;em&gt;Figure 7: Example predicted CDM curves compared to ground truth measurements&lt;/em&gt;&lt;/p&gt;
&lt;h4&gt;API Integration&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# ballistics/ml/cdm_transfer_learning.py&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pickle&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pathlib&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;CDMTransferLearning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Production CDM prediction service."""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"models/cdm_transfer_learning/production_mlp.pkl"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_load_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Feature statistics for normalization&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'.pkl'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'_stats.pkl'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'rb'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature_stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pickle&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bullet_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="sd"&gt;"""Predict CDM curve from bullet specifications.&lt;/span&gt;

&lt;span class="sd"&gt;        Args:&lt;/span&gt;
&lt;span class="sd"&gt;            bullet_data: Dict with keys: caliber, weight_gr, bc_g1, bc_g7, etc.&lt;/span&gt;

&lt;span class="sd"&gt;        Returns:&lt;/span&gt;
&lt;span class="sd"&gt;            Dict with mach_numbers, drag_coefficients, validation_metrics&lt;/span&gt;
&lt;span class="sd"&gt;        """&lt;/span&gt;
        &lt;span class="c1"&gt;# Feature engineering&lt;/span&gt;
        &lt;span class="n"&gt;features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_extract_features&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bullet_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;features_normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_normalize_features&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Prediction&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;cd_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tensor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features_normalized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="c1"&gt;# Denormalize&lt;/span&gt;
        &lt;span class="n"&gt;cd_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cd_pred&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Validation&lt;/span&gt;
        &lt;span class="n"&gt;validation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_validate_prediction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s1"&gt;'mach_numbers'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mach_points&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="s1"&gt;'drag_coefficients'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cd_values&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="s1"&gt;'source'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'ml_transfer_learning'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'method'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'mlp_prediction'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'validation'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;validation&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_validate_prediction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cd_values&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="sd"&gt;"""Physics validation of predicted curve."""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s1"&gt;'smoothness'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_calculate_smoothness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_values&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="s1"&gt;'transonic_quality'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_check_transonic_peak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_values&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="s1"&gt;'negative_cd_count'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_values&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="s1"&gt;'physical_plausibility'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_check_plausibility&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cd_values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;REST API Endpoint&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="c1"&gt;# routes/bullets_unified.py&lt;/span&gt;

&lt;span class="nd"&gt;@bp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/search'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'GET'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;search_bullets&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Search unified bullet database with optional CDM prediction."""&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'q'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;use_cdm_prediction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'use_cdm_prediction'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'true'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;'true'&lt;/span&gt;

    &lt;span class="c1"&gt;# Search database&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;search_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;cdm_predictions_made&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;use_cdm_prediction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cdm_predictor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CDMTransferLearning&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;bullet&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;bullet&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'cdm_data'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# Predict CDM if not available&lt;/span&gt;
                &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;cdm_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cdm_predictor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                        &lt;span class="s1"&gt;'caliber'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bullet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'caliber'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="s1"&gt;'weight_gr'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bullet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'weight_gr'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="s1"&gt;'bc_g1'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bullet&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'bc_g1'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="s1"&gt;'bc_g7'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bullet&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'bc_g7'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="s1"&gt;'length_in'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bullet&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'length_in'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="s1"&gt;'ogive_radius_cal'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bullet&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ogive_radius_cal'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="p"&gt;})&lt;/span&gt;

                    &lt;span class="n"&gt;bullet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'cdm_data'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cdm_data&lt;/span&gt;
                    &lt;span class="n"&gt;bullet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'cdm_predicted'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;
                    &lt;span class="n"&gt;cdm_predictions_made&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

                &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="ne"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"CDM prediction failed for bullet &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bullet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="s1"&gt;'results'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'cdm_prediction_enabled'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;use_cdm_prediction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'cdm_predictions_made'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cdm_predictions_made&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Example Response:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"results"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"manufacturer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sierra"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"MatchKing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"caliber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.308&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"weight_gr"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;168&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"bc_g1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.462&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"bc_g7"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.237&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"cdm_data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;"mach_numbers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.55&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;4.5&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;"drag_coefficients"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.287&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.289&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.295&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.312&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ml_transfer_learning"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mlp_prediction"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;"validation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;"smoothness"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;91.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;"transonic_quality"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;"negative_cd_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;"physical_plausibility"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"cdm_predicted"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"cdm_prediction_enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"cdm_predictions_made"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;hr&gt;
&lt;h3&gt;Part 5: Deployment and Monitoring&lt;/h3&gt;
&lt;h4&gt;Model Serving Architecture&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;┌─────────────────┐
│   Client App    │
└────────┬────────┘
         │
         ▼
┌─────────────────────────┐
│  Google Cloud Function  │
│  (Python 3.12)          │
│  - Flask routing        │
│  - Request validation   │
│  - Response formatting  │
└────────┬────────────────┘
         │
         ▼
┌─────────────────────────┐
│  CDMTransferLearning    │
│  - PyTorch model (2.1MB)│
│  - CPU inference (&amp;lt;10ms)│
│  - Feature engineering  │
└────────┬────────────────┘
         │
         ▼
┌─────────────────────────┐
│  Physics Validation     │
│  - Smoothness check     │
│  - Peak detection       │
│  - Plausibility gates   │
└─────────────────────────┘
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Performance Characteristics&lt;/h4&gt;
&lt;p&gt;Model Size:
- PyTorch state dict: 2.1 MB
- TorchScript (optional): 2.3 MB
- ONNX (optional): 1.8 MB&lt;/p&gt;
&lt;p&gt;Inference Speed (CPU):
- Single prediction: 6-8 ms
- Batch of 10: 12-15 ms (1.2-1.5 ms per bullet)
- Batch of 100: 80-100 ms (0.8-1.0 ms per bullet)&lt;/p&gt;
&lt;p&gt;Cold Start:
- Model load time: 150-200 ms
- First prediction: 220-280 ms (including load)
- Subsequent predictions: 6-8 ms&lt;/p&gt;
&lt;p&gt;Memory Footprint:
- Model in memory: ~15 MB
- Peak during inference: ~30 MB&lt;/p&gt;
&lt;p&gt;&lt;img alt="Production Performance" src="https://tinycomputers.io/images/production_performance.png"&gt;
&lt;em&gt;Figure 8: Production inference performance metrics across different batch sizes&lt;/em&gt;&lt;/p&gt;
&lt;h4&gt;Monitoring and Observability&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;newrelic.agent&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;MonitoredCDMPredictor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""CDM predictor with New Relic monitoring."""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predictor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CDMTransferLearning&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prediction_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="nd"&gt;@newrelic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_trace&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bullet_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="sd"&gt;"""Predict with telemetry."""&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prediction_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Track prediction time&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;newrelic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FunctionTrace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'cdm_prediction'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predictor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bullet_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Custom metrics&lt;/span&gt;
            &lt;span class="n"&gt;newrelic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;record_custom_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'CDM/Predictions/Total'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prediction_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;newrelic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;record_custom_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'CDM/Validation/Smoothness'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                               &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'validation'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'smoothness'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;newrelic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;record_custom_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'CDM/Validation/TransonicQuality'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                               &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'validation'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'transonic_quality'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

            &lt;span class="c1"&gt;# Track feature availability&lt;/span&gt;
            &lt;span class="n"&gt;features_available&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bullet_data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;newrelic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;record_custom_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'CDM/Features/Available'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;features_available&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="ne"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;newrelic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;record_custom_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'CDM/Errors/Total'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;newrelic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;notice_error&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Key Metrics Tracked:
- Prediction latency (p50, p95, p99)
- Validation scores (smoothness, transonic quality)
- Feature availability (how many inputs provided)
- Error rate and types
- Cache hit rate (if caching enabled)&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Lessons Learned&lt;/h3&gt;
&lt;h4&gt;1. Simple Architectures Often Win&lt;/h4&gt;
&lt;p&gt;We spent a week exploring Transformers and Neural ODEs, only to find the vanilla MLP performed best. Why?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data alignment: Our problem is function approximation, not sequence modeling&lt;/li&gt;
&lt;li&gt;Inductive bias mismatch: Transformers expect temporal dependencies; drag curves don't have them&lt;/li&gt;
&lt;li&gt;Regularization sufficiency: Dropout + weight decay provided enough regularization without physics constraints&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Lesson: Start simple. Add complexity only when data clearly demands it.&lt;/p&gt;
&lt;h4&gt;2. Physics Validation &amp;gt; Physics Loss&lt;/h4&gt;
&lt;p&gt;Hard-coded physics loss functions became a liability:
- Over-constrained the model
- Required manual tuning of loss weights
- Didn't generalize to all bullet types&lt;/p&gt;
&lt;p&gt;Better approach: Validate predictions post-hoc and flag anomalies. Let the model learn physics from data.&lt;/p&gt;
&lt;h4&gt;3. Synthetic Data Quality Matters More Than Quantity&lt;/h4&gt;
&lt;p&gt;We generated 704 synthetic CDMs, but spent equal time validating them. Key insight: One bad synthetic sample can poison dozens of real samples during training.&lt;/p&gt;
&lt;p&gt;Validation process:
1. Compare synthetic vs. real CDMs (where both exist)
2. Physics plausibility checks
3. Cross-validation with different BC values
4. Manual inspection of outliers&lt;/p&gt;
&lt;h4&gt;4. Feature Engineering &amp;gt; Model Complexity&lt;/h4&gt;
&lt;p&gt;The most impactful changes weren't architectural:
- Adding &lt;code&gt;sectional_density&lt;/code&gt; as a feature: -0.8% MAE
- Computing &lt;code&gt;form_factor_g1&lt;/code&gt; and &lt;code&gt;form_factor_g7&lt;/code&gt;: -0.6% MAE
- Imputing missing features (length, ogive) using physics-based defaults: -0.5% MAE&lt;/p&gt;
&lt;p&gt;&lt;img alt="Feature Importance" src="https://tinycomputers.io/images/feature_importance.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 9: Feature importance analysis showing impact of each input feature on prediction accuracy&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Combined improvement: -1.9% MAE with zero code changes to the model.&lt;/p&gt;
&lt;h4&gt;5. Production Deployment ≠ POC&lt;/h4&gt;
&lt;p&gt;Our POC model worked great in notebooks. Production required:
- Input validation and sanitization
- Graceful degradation when features missing
- Physics validation gates
- Monitoring and alerting
- Model versioning and rollback capability
- A/B testing infrastructure&lt;/p&gt;
&lt;p&gt;Time split: 30% research, 70% production engineering.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;What's Next?&lt;/h3&gt;
&lt;h4&gt;Phase 2: Uncertainty Quantification&lt;/h4&gt;
&lt;p&gt;Current model outputs point estimates. We're implementing Bayesian Neural Networks to provide confidence intervals:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;BayesianCDMPredictor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;"""Bayesian NN with dropout as approximate inference."""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;predict_with_uncertainty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="sd"&gt;"""Monte Carlo dropout for uncertainty estimation."""&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Enable dropout during inference&lt;/span&gt;

        &lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="n"&gt;pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;std&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s1"&gt;'cd_mean'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'cd_std'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'cd_lower'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;1.96&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# 95% CI&lt;/span&gt;
            &lt;span class="s1"&gt;'cd_upper'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.96&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Use case: Flag predictions with high uncertainty for manual review or experimental validation.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Building a production ML system for ballistic drag prediction required more than just training a model:
- Data engineering (Claude Vision automation saved countless hours)
- Synthetic data generation (2.1× data augmentation)
- Architecture exploration (simple MLP won)
- Real-world validation (94% physics check pass rate)&lt;/p&gt;
&lt;p&gt;The result: 1,247 bullets now have accurate drag models that didn't exist before. Not bad for a side project.&lt;/p&gt;
&lt;p&gt;Read the full technical whitepaper for mathematical derivations, validation details, and complete bibliography: &lt;a href="https://tinycomputers.io/data/cdm_transfer_learning.pdf"&gt;cdm_transfer_learning.pdf&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Resources&lt;/h3&gt;
&lt;p&gt;References:
1. McCoy, R. L. (1999). &lt;em&gt;Modern Exterior Ballistics&lt;/em&gt;. Schiffer Publishing.
2. Litz, B. (2016). &lt;em&gt;Applied Ballistics for Long Range Shooting&lt;/em&gt; (3rd ed.).&lt;/p&gt;</description><category>ballistics</category><category>claude ai</category><category>computer vision</category><category>machine learning</category><category>neural networks</category><category>physics</category><category>pytorch</category><category>transfer learning</category><guid>https://tinycomputers.io/posts/transfer-learning-for-predictive-custom-drag-modeling-automated-generation-of-drag-coefficient-curves-using-multi-modal-ai.html</guid><pubDate>Fri, 10 Oct 2025 18:10:00 GMT</pubDate></item></channel></rss>