<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>TinyComputers.io (Posts about cloud-init)</title><link>https://tinycomputers.io/</link><description></description><atom:link href="https://tinycomputers.io/categories/cloud-init.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 A.C. Jokela 
&lt;!-- div style="width: 100%" --&gt;
&lt;a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"&gt;&lt;img alt="" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" /&gt; Creative Commons Attribution-ShareAlike&lt;/a&gt;&amp;nbsp;|&amp;nbsp;
&lt;!-- /div --&gt;
</copyright><lastBuildDate>Sat, 27 Jun 2026 03:22:16 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>The Driver Nobody Wrote: OpenBSD's ena(4) Works Now — and Can't Go Upstream</title><link>https://tinycomputers.io/posts/the-driver-nobody-wrote-openbsd-ena4-works-and-cant-go-upstream.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/the-driver-nobody-wrote-openbsd-ena4-works-and-cant-go-upstream_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;32 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h2&gt;The Driver Nobody Wrote: OpenBSD's ena(4) Works Now — and Can't Go Upstream&lt;/h2&gt;
&lt;p&gt;A week ago I ended &lt;a href="https://tinycomputers.io/posts/the-doorbell-that-killed-the-device-an-ena-driver-for-openbsd-on-graviton.html"&gt;a post about writing OpenBSD's missing &lt;code&gt;ena(4)&lt;/code&gt; driver&lt;/a&gt; with a confession disguised as a list.&lt;/p&gt;
&lt;p&gt;The driver worked, I said, in the one sense that genuinely mattered — the ENA device protocol functioned, implemented from scratch, on real Graviton hardware, proven by the most boring fix in the file. And it did not work in nearly every other sense. So I wrote each of those senses down, because the exciting half of "it works" is forever trying to eat the honest half, and I wanted the honest half on the record. No userland DHCP — I'd driven the round-trip from a kernel thread, not from &lt;code&gt;ifconfig&lt;/code&gt;. No disk install — I was booting the install ramdisk out of RAM. Scaffolding everywhere — polling threads, a packet injector, a keep-alive timer I'd built for a problem that turned out not to exist. One DHCP exchange, which is not throughput, not stability, not the dozen edge cases a NIC driver owes the world. Not reviewed. Not submitted. Would not survive &lt;code&gt;tech@&lt;/code&gt;, nor should it.&lt;/p&gt;
&lt;p&gt;This is the post where I cross them off. Every one.&lt;/p&gt;
&lt;p&gt;And then I walk straight into the single wall that crossing them off can't move — a wall that has nothing to do with the code, and everything to do with who, or what, wrote it.&lt;/p&gt;
&lt;h3&gt;The list, crossed off&lt;/h3&gt;
&lt;p&gt;Start with the headline item, the one that would let me retire the ridiculous machine from &lt;a href="https://tinycomputers.io/posts/the-os-that-couldnt-see-the-network-native-openbsd-arm64-on-aws-graviton.html"&gt;the first post in this series&lt;/a&gt; — the bare-metal Graviton running OpenBSD as a QEMU guest behind a virtio shim, because OpenBSD couldn't see the real network card. The milestone was a normal OpenBSD instance: a real disk install, booting multiuser, with the ENA adapter as its &lt;em&gt;only&lt;/em&gt; network interface, that I could SSH into over that interface like any other server.&lt;/p&gt;
&lt;p&gt;That works now. There is an OpenBSD/arm64 instance on a Graviton2 that boots off an EBS volume, brings up &lt;code&gt;ena0&lt;/code&gt; from a DHCP lease, starts &lt;code&gt;sshd&lt;/code&gt;, and accepts my key over the wire. No shim. No emulation. No QEMU host underneath it pretending to be hardware. The kernel talks to Amazon's network card directly, because it finally has a driver that knows how.&lt;/p&gt;
&lt;p&gt;Getting there cost me one more bug of exactly the kind the last post was about — and I want to tell it quickly, because it rhymes.&lt;/p&gt;
&lt;p&gt;Once the driver could do RAM-disk DHCP, the disk install should have been a formality: partition the EBS volume, extract the sets, write a bootloader, reboot into a real system. It installed fine. It booted fine. And then, the instant it tried to send a packet from the installed system rather than the ramdisk, the device went into &lt;code&gt;FATAL_ERROR&lt;/code&gt; and took the interface down with it — reliably, every boot, a few hundred microseconds into the first real transmit.&lt;/p&gt;
&lt;p&gt;I had, by this point, learned the lesson the doorbell taught me, and I tried very hard to apply it: &lt;em&gt;look for the boring cause first&lt;/em&gt;. But the boring causes all checked out. The transmit ring was set up correctly. The descriptors were well-formed. The doorbell — the memory-mapped register write that tells the device "I've queued work for you," the same doorbell that named the last post — was firing at the right offset with the right value. I spent a day convinced it was a memory-barrier problem, the kind of thing that only shows up on real ARM with real out-of-order completion and never in emulation, because on the QEMU path the same code had been flawless.&lt;/p&gt;
&lt;p&gt;It was not a barrier. It was that I was submitting transmit descriptors to the device's submission queue &lt;em&gt;in a batch built for the low-latency LLQ path&lt;/em&gt; — the "low-latency queue" mode where you push packet headers directly into device memory — and the Graviton2 instance I was installing onto doesn't use LLQ. It uses the older host-memory path, where the device reads descriptors back out of host RAM, one at a time, and expects them submitted one at a time, each with its own doorbell, in a way the batched path violated. The device wasn't crashing on a subtle race. It was crashing because I was speaking the wrong dialect of its own protocol and it had no polite way to say so.&lt;/p&gt;
&lt;p&gt;The fix was to submit host-path descriptors per-descriptor, the way the device's own reference code does, the way the documentation says in a sentence I had read and not absorbed. One commit. The &lt;code&gt;FATAL&lt;/code&gt; vanished. The installed system transmitted, and kept transmitting, and I SSH'd into a native OpenBSD/arm64 box on AWS for the first time.&lt;/p&gt;
&lt;p&gt;That is the entire pattern of this project in one paragraph: an exotic theory that flatters your understanding, sitting on top of a mundane mistake that flatters nothing. I had a second helping of it on the disk side too — the NVMe driver OpenBSD uses for the EBS volume itself wouldn't create its I/O queues until I clamped a maximum-queue-size field the AWS device reports in a way the stock driver didn't expect. Another one-line fix for a thing that looked, for an afternoon, like it might be deep.&lt;/p&gt;
&lt;p&gt;So: disk install, done. SSH-in over &lt;code&gt;ena0&lt;/code&gt;, done. The QEMU shim from the first post is, as of this writing, retired — I can build native OpenBSD/arm64 binaries on a real OpenBSD/arm64 instance now, the way the platform's own users would, on a cloud the platform supposedly cannot run on.&lt;/p&gt;
&lt;p&gt;The scaffolding is gone too. The polling threads, the packet injector, the diagnostic &lt;code&gt;printf&lt;/code&gt;s, the keep-alive timer for the imaginary problem — all stripped out, the real fixes separated from the detritus, the tree readable. It is not the same artifact I described last time as "full of scaffolding, none of it belonging in code anyone else should read." It's a driver now, not a demo with a driver inside it.&lt;/p&gt;
&lt;h3&gt;Faster than it had any right to be&lt;/h3&gt;
&lt;p&gt;The last post's most embarrassing admission was the testing: &lt;em&gt;one DHCP round-trip is not throughput.&lt;/em&gt; So I went and got throughput, and then I went and got more of it than I expected.&lt;/p&gt;
&lt;p&gt;The first honest number was 684 megabits per second — a single queue, a single CPU, real TCP over the real device, measured rather than imagined. That was already a strange feeling, watching a number that meant the data path was not just &lt;em&gt;functional&lt;/em&gt; but &lt;em&gt;fast enough to be useful&lt;/em&gt;, on a driver I'd been afraid to claim could pass a single packet a week earlier.&lt;/p&gt;
&lt;p&gt;Then I built the parts a real NIC driver actually needs, the ones I'd listed as missing. Checksum offload, so the device computes and verifies IPv4/TCP/UDP checksums instead of the CPU — gated on the device advertising the feature, because not every ENA generation does. MTU control and jumbo frames. A watchdog that notices when the device has wedged and tears the data path down and brings it back up without a reboot. And then the big one: multiple queues.&lt;/p&gt;
&lt;p&gt;A modern NIC isn't one ring of packets; it's many, one per CPU, so that traffic for different flows lands on different cores and the whole machine scales instead of bottlenecking on a single interrupt. Wiring that up on OpenBSD/arm64 meant allocating per-queue interrupts through the ARM interrupt controller, pinning each queue to its own CPU, and — the part that took longest to get right — receive-side scaling, RSS, where the device itself hashes each incoming packet by its flow and steers it to the correct queue so that packet order within a flow is preserved while load spreads across cores.&lt;/p&gt;
&lt;p&gt;RSS is configured by handing the device a hash key and an indirection table: a little array that maps hash buckets to queues. I filled mine with what I was sure were the right queue identifiers, tested it, and got &lt;em&gt;intermittent&lt;/em&gt; connectivity — about nine successful connections out of fifteen, which is the worst possible result, because it means you're close enough to be wrong in a way that looks almost right. The indirection table, it turned out, doesn't want the queue's completion-ring index, which is what I'd put there. It wants the queue's &lt;em&gt;submission&lt;/em&gt;-ring index, a different number that happens to coincide on some devices and not others. Half my hash buckets were steering to a queue that wasn't listening. One field, the wrong index, surfacing only as a statistical haze of dropped connections. I changed it; twenty connections out of twenty succeeded, and the traffic spread cleanly across both cores.&lt;/p&gt;
&lt;p&gt;On a two-vCPU &lt;code&gt;t4g.medium&lt;/code&gt; — a deliberately small instance, because the point was to prove the mechanism, not to win a benchmark — the driver now moves &lt;strong&gt;3.16 gigabits per second&lt;/strong&gt;, balanced across both CPUs, with RSS steering flows to cores and each core fielding its own interrupts. That's not a line-rate claim on a big instance; it's a small machine using both of the hands it has. But it's a multi-queue, RSS-steered, checksum-offloaded network driver doing the actual job, and five weeks ago the honest claim was "one DHCP packet, from a kernel thread."&lt;/p&gt;
&lt;p&gt;There was one more thing I wanted, less a feature than a verdict. I'd written and tested everything against OpenBSD 7.9, the release. But OpenBSD's real life happens on &lt;em&gt;-current&lt;/em&gt;, the rolling development branch, and a driver that only works against one frozen release is a museum piece. So I built the whole thing against -current — and it compiled and ran with &lt;strong&gt;zero source changes&lt;/strong&gt;, full feature parity, multi-queue and RSS and all. The interfaces I was building against had held. That mattered more to me than the throughput number, because it meant the driver was written against OpenBSD as it actually is, not against a single snapshot I'd reverse-engineered my way into.&lt;/p&gt;
&lt;p&gt;By every measure I'd set for myself in that closing list, the driver was done. Which is precisely when I learned where it could not go.&lt;/p&gt;
&lt;h3&gt;The wall that isn't technical&lt;/h3&gt;
&lt;p&gt;Here is the thing I believed, quietly, the entire time I was writing this driver: that if I made it good enough — really good, idiomatic OpenBSD, clean &lt;code&gt;bus_dma&lt;/code&gt; and honest locking and no vendor-HAL slop, the kind of code that earns its place — there was a path, however narrow, to it going &lt;em&gt;upstream&lt;/em&gt;. To &lt;code&gt;ena(4)&lt;/code&gt; becoming part of OpenBSD, so that the next person who tries to boot OpenBSD on Graviton doesn't have to write what I wrote. That was never the &lt;em&gt;reason&lt;/em&gt; I did it — I did it because the gap was infuriating and the problem was beautiful — but it was the daydream underneath, the one that makes you clean up the scaffolding instead of leaving it.&lt;/p&gt;
&lt;p&gt;The daydream is dead, and it died for a reason I didn't see coming and can't really argue with.&lt;/p&gt;
&lt;p&gt;OpenBSD doesn't accept AI-generated code — not out of taste, but because code a model wrote has no human author, and with no author there's nothing to hold the copyright and nothing to license under the BSD/ISC terms the tree is built from. A provenance gate, not a quality one. Being good was never the question.&lt;/p&gt;
&lt;p&gt;And this driver is AI-assisted in a way I want to be precise about, because it isn't how my &lt;em&gt;other&lt;/em&gt; AI-assisted projects work. When I built &lt;a href="https://tinycomputers.io/posts/a-stack-based-bytecode-vm-for-lattice.html"&gt;Lattice&lt;/a&gt;, my programming language, or the ballistics engine that kicked off this whole Graviton saga, I was in the loop the entire time. The ideas were mine; I held the design in my head and used the model the way you use a sharp colleague — to think &lt;em&gt;through&lt;/em&gt;, to draft against, to argue with. Authorship was never in question, because I was the one making the decisions.&lt;/p&gt;
&lt;p&gt;The ena driver was not that. It was hands-off in a way I'd never tried: I pointed an agent at the problem, told it to run in a loop and build an ENA driver for OpenBSD, and let it go. &lt;em&gt;It&lt;/em&gt; decided what the driver needed. &lt;em&gt;It&lt;/em&gt; decided how to structure the attach path, the queues, the locking. &lt;em&gt;It&lt;/em&gt; decided when to read Linux's driver for intent, when FreeBSD's, when NetBSD's, when Amazon's &lt;code&gt;ena-com&lt;/code&gt;. I set the direction and the hard constraint — port from BSD-licensed sources, never copy the GPL Linux code — then read what came back and steered when it drifted. But I did not hold this driver in my head the way I held Lattice. For long stretches I was the reviewer of something being authored where I couldn't watch, by something making the decisions a driver's author makes.&lt;/p&gt;
&lt;p&gt;And I should be blunter than I was last time. In the Doorbell post I wrote that &lt;em&gt;I&lt;/em&gt; built the keep-alive timer and &lt;em&gt;I&lt;/em&gt; implemented the host-attributes handshake — &lt;em&gt;clean, correct code&lt;/em&gt;, I called it. The honest version is that an agent wrote both while I read along. I let the "I" stand because the project and the loop were mine, and they still are — but the decisions a driver's author makes were the agent's, and I'd rather say that plainly here than let the earlier "I" keep implying otherwise.&lt;/p&gt;
&lt;p&gt;Which is why, &lt;em&gt;here&lt;/em&gt;, the policy stops being abstract. With Lattice I can sign my name to every architectural choice and mean it. With this, the honest answer to "who decided that?" is, often, &lt;em&gt;the agent did.&lt;/em&gt; Machine-authored, human-directed — and &lt;em&gt;directed&lt;/em&gt; is not the same word as &lt;em&gt;wrote.&lt;/em&gt; I don't think OpenBSD is wrong to refuse it. I think it might be the cleanest example I have of exactly the thing they're refusing.&lt;/p&gt;
&lt;p&gt;So &lt;code&gt;ena(4)&lt;/code&gt; stays mine. An independent driver, openly AI-assisted, for people who want native OpenBSD on Graviton badly enough to point their kernel config at a tree that isn't the official one. The work-in-progress repo, now considerably less work-in-progress, is at &lt;a href="https://github.com/ajokela/openbsd-ena"&gt;github.com/ajokela/openbsd-ena&lt;/a&gt; — open for reading, open for building, open for forking, and closed, by its own nature, to the one destination I'd quietly been building it toward.&lt;/p&gt;
&lt;p&gt;There's a strange grief in that I didn't anticipate. Not for the work — the work is done and it runs. For the &lt;em&gt;commons&lt;/em&gt;. The natural arc of a thing like this is that you give it away into the shared pool so the next person inherits it, and the reward for doing it well is that it stops being yours and becomes everyone's. This one can't take that arc. It's good enough to belong to everyone and it will belong to no one, because the question "who wrote this?" no longer has an answer the commons can accept. The driver works. The driver has no author. Both of those are true, and the second one is the price of the first.&lt;/p&gt;
&lt;h3&gt;The gap I couldn't close&lt;/h3&gt;
&lt;p&gt;If I'm going to be honest about the wall, I have to be honest about the one place the &lt;em&gt;engineering&lt;/em&gt; didn't close either.&lt;/p&gt;
&lt;p&gt;Everything above is true on Graviton2 — the &lt;code&gt;t4g&lt;/code&gt; family, AWS's first ARM generation, the host-memory data path. On Graviton3 — the &lt;code&gt;c7g&lt;/code&gt; family, which uses the low-latency LLQ path and a newer revision of the virtual ENA device — the driver attaches, configures, brings up its admin queue, reads every device attribute, and then fails at the very first step of creating an I/O queue. The device rejects the &lt;code&gt;CREATE_CQ&lt;/code&gt; command — the request to make a completion ring — with an unhelpful status code, before a single packet has had the chance to flow.&lt;/p&gt;
&lt;p&gt;I spent a genuinely unreasonable amount of effort on this. I did the thing systematic debugging tells you to do when a system has multiple components and you can't see inside one of them: I went and instrumented the component that works. I booted Amazon's own Linux on a &lt;code&gt;c7g&lt;/code&gt; instance and traced its stock, vendor-blessed ENA driver with bpftrace — every admin command it sends, in order, with its exact arguments, from attach to first packet — so I'd have a known-good transcript to diff mine against.&lt;/p&gt;
&lt;p&gt;The transcript refuted, one by one, every theory I had. The bytes of my &lt;code&gt;CREATE_CQ&lt;/code&gt; command are identical to Linux's. The interrupt vector layout is identical — same number of MSI-X vectors, same assignment. The completion-descriptor size is identical. The order I create things in doesn't matter — I tried Linux's order exactly and the &lt;em&gt;first&lt;/em&gt; queue still fails, whichever kind it is, which means it's not an ordering bug but a missing prerequisite the device wants before it will make any queue at all. I thought I'd found it in the host-info block, the little structure where the driver tells the device about itself — Linux fills in capability flags I was leaving zero — so I filled them in. No change. I thought it was the RSS hash configuration Linux sets up before its queues; I read the device's own feature bitmap and found that this device, like the Graviton2 one, doesn't even &lt;em&gt;allow&lt;/em&gt; the host to set the hash function. Refuted by the device itself.&lt;/p&gt;
&lt;p&gt;Nine times I baked a kernel and booted it on real &lt;code&gt;c7g&lt;/code&gt; hardware to test a hypothesis, and nine times the device said no in the same flat way, and I never found the prerequisite it's waiting for. I have it narrowed to a short list of things Linux does before its first queue that I don't — a couple of feature &lt;em&gt;reads&lt;/em&gt; I skip, a second host-info push after negotiation — but "narrowed to a short list, each refuted or untested" is not "solved," and I'm not going to dress it up as solved. The driver runs beautifully on the previous ARM generation and stops at the threshold of the current one, and I wrote down exactly where the threshold is so that whoever picks this up next — me, on a better day, or someone who forks the tree — starts from a map instead of a mystery.&lt;/p&gt;
&lt;p&gt;That's the honest shape of it. A driver that's finished on the hardware it's finished on, and has a precisely-documented hole on the hardware it isn't.&lt;/p&gt;
&lt;h3&gt;The only OpenBSD in the cloud&lt;/h3&gt;
&lt;p&gt;I'll end where the practical and the strange meet.&lt;/p&gt;
&lt;p&gt;At some point, late, I got curious about what I'd actually built relative to what already existed — whether I'd spent two months reinventing something I could have launched in one click. So I searched. Every public Amazon Machine Image, across multiple regions, with "OpenBSD" anywhere in the name or description.&lt;/p&gt;
&lt;p&gt;There are none. Not one. Zero community images, zero in the AWS Marketplace, in every region I checked. The entire BSD presence in Amazon's marketplace is FreeBSD — dozens of official, well-maintained, &lt;em&gt;free&lt;/em&gt; images published by the FreeBSD Foundation, every release and architecture, ARM and x86, the way a first-class cloud citizen looks. OpenBSD isn't a paid option anyone's gouging for. It isn't an option at all. The only way to run OpenBSD on EC2 today is to build the image yourself, which is to say: to first solve the exact problem these three posts have been about.&lt;/p&gt;
&lt;p&gt;So I finished the job. I took the driver, baked it into a real OpenBSD/arm64 image, and made the image &lt;em&gt;self-configuring&lt;/em&gt; the way every cloud image is expected to be — it pulls your SSH key, its hostname, and an optional first-boot setup script from the instance metadata service, creates an unprivileged login user, regenerates its own host keys so every instance has a unique identity. OpenBSD doesn't ship the Linux cloud-init and never will; the community answer is a slim shell agent written years ago by an OpenBSD developer, and I wired it in. Launch the image with your own key pair, and you SSH in as a normal user thirty seconds later. It is, as far as I can tell, the only launch-and-go OpenBSD/arm64 image on AWS, because it's the only OpenBSD/arm64 image on AWS.&lt;/p&gt;
&lt;p&gt;Which leaves me holding a small, sharp irony. The most finished thing I've built in a while — a driver, a tuned data path, a clean image, a working cloud-init story, the genuinely-only-one-of-its-kind artifact — is the one that can't go where work like this is supposed to go. I could publish it as a public image tomorrow and be the sole OpenBSD on the entire platform. And I'd be publishing it the way I have to publish all of it: off to the side, in my own tree, under my own name, with a note that says &lt;em&gt;an AI helped write this and so it can never be yours, only borrowed.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The first post in this series put a translator between two systems that disagreed, so each could be right without meeting. The second taught one system to speak the other's language directly, register by register, and found the bug that mattered was the most boring one in the file. This one finishes that language — makes it fluent, fast, multi-queue, native, bootable, launchable — and discovers that fluency was never the thing standing between the work and its home. The doorbell rings. The device lives. The network is seen, on a cloud that couldn't see it a season ago.&lt;/p&gt;
&lt;p&gt;And the driver that makes it so has no author, and no home, and runs perfectly anyway.&lt;/p&gt;</description><category>aarch64</category><category>ai</category><category>arm64</category><category>aws</category><category>bsd</category><category>bus_dma</category><category>cloud-init</category><category>copyright</category><category>device drivers</category><category>ec2</category><category>ena</category><category>graviton</category><category>infrastructure</category><category>kernel</category><category>licensing</category><category>msi-x</category><category>multiprocessing</category><category>multiqueue</category><category>networking</category><category>openbsd</category><category>rss</category><guid>https://tinycomputers.io/posts/the-driver-nobody-wrote-openbsd-ena4-works-and-cant-go-upstream.html</guid><pubDate>Fri, 26 Jun 2026 23:30:00 GMT</pubDate></item></channel></rss>