The Doorbell That Killed the Device: Writing OpenBSD's Missing ena(4) Driver

A.C. Jokela — Mon, 22 Jun 2026 01:30:00 GMT

The Doorbell That Killed the Device: Writing OpenBSD's Missing ena(4) Driver

Earlier last week, I wrote about a problem with a clean, almost funny shape: OpenBSD's arm64 kernel has no driver for the network card AWS puts in every EC2 instance, so an OpenBSD/arm64 instance boots into a billable void it can never reach. The fix in that post was to stop asking — to run OpenBSD as a KVM guest inside QEMU on a bare-metal Graviton host, hand it virtio devices it already understands, and let a Linux host own the real Amazon adapter. Put a translator on the seam, and two systems that disagree about what a network card is can both be right.

That solved the build-server problem. It did not solve the itch.

Because the actual missing piece is small and specific and writable: the driver. Amazon's adapter is called the Elastic Network Adapter — ena — and it's a documented device with a permissively licensed reference implementation. FreeBSD has an ena driver. NetBSD has one. The protocol is published. OpenBSD just doesn't have the code, because effectively nobody runs OpenBSD as a first-class EC2 guest and so nobody wrote it. "Nobody wrote it" is not the same as "can't be written." So I decided to write it.

This is the story of that — and, more than the previous post, it's a story about a specific kind of failure. The driver attached on the first serious try. Then the device started killing itself, silently, a few microseconds after every bring-up, and I spent the better part of two days proposing increasingly clever reasons why. The real reason turned out to be embarrassingly dull. And because I want this to be useful and not just triumphant, the last third of the post is about the word "working": what I can actually claim, and the uncomfortable distance between that and "done."

What an ENA driver has to do

A modern NIC isn't a thing you poke registers at to send a packet. It's a small message-passing computer that shares host memory with you: rings of descriptors in DMA memory, doorbell registers to announce new work, an interrupt path for completions. ENA has three kinds of queue, and all three matter to this story.

The admin queue is how the host configures the device — a submission ring and a completion ring in shared memory. Write a command (read attributes, set a feature, create an IO queue), ring the admin doorbell, wait for a completion. OpenBSD's driver polls for that completion rather than taking an interrupt, which keeps bring-up simple.

The AENQ — Asynchronous Event Notification Queue — is the device's back-channel: it announces link up/down and, critically, posts a keep-alive event about once a second. The keep-alive is a heartbeat; the host is expected to drain these and, by draining them, prove it's still paying attention.

The IO queues are the actual network — a submission ring you fill with buffers, a completion ring the device writes back. You create them with admin commands (CREATE_CQ, then CREATE_SQ), and once they exist and the link is up, you can move traffic.

Get all three right, in the right order, and the card works. Get the order subtly wrong and — as I'd learn — the card decides you're not a real driver and quietly bricks itself.

The reference is Amazon's ena-com, a hardware-abstraction layer shared across the Linux and FreeBSD drivers. Its BSD-licensed parts — ena-com itself and FreeBSD's driver on top — are fair to read and port; the Linux driver is GPL, kept strictly read-only, a thing to consult for intent and never to copy. Writing the OpenBSD version means rewriting all of it in OpenBSD's idiom anyway — bus_dma(9) for the rings, pci(9) for attachment, ifnet/ifq for the stack. The protocol is the spec; the code is yours.

Phase zero: it attaches

The first milestone was just attachment, and it went well enough that I'd half-convinced myself the hard work was behind me. The driver resets the device, sets up the admin submission and completion rings, and issues commands. GET_FEATURE(DEVICE_ATTRIBUTES) comes back with the real MAC and maximum MTU — proof the admin queue works end to end, DMA is coherent, the device is listening. The console prints the line I'd been chasing: ena0 ... ENA ver 0.10 ... address 12:xx:xx:xx:xx:xx, on a real Graviton instance. The card was talking.

And then, every single time, a few seconds later, it stopped talking. The first attempt to create an IO queue — CREATE_CQ, the command that begins turning a configured device into a working network interface — would sit there and time out. No completion. No error. The admin queue that had just answered four commands flawlessly had gone silent.

When I finally added code to read the device's status register at each step, the shape of it came into focus and got worse. DEV_STS reads 0x1 — ready — through reset, through the admin handshake, through reading device attributes. Then, somewhere shortly after, it reads 0x21. Bit five is set. FATAL_ERROR. The device had, of its own accord, entered a fault state and was now refusing all further work. That's why CREATE_CQ vanished: you can't drive a device that's already decided it's dead.

A healthy card, healthy through every step I could see, that turns to stone the moment I look away. That's the wall.

Five wrong theories

Here is the part I'm telling on myself, because it's the honest center of the whole thing.

When you don't know why a device faults, the device gives you almost nothing — a single bit that says "something is wrong" and not a syllable about what. So you reason from the reference code: what does the working driver do that mine doesn't? And the trouble with that question, on hardware this unfamiliar, is that it has too many plausible answers. Every difference between my driver and ena-com looks like it could be the one that matters.

Theory one: the keep-alive watchdog. The device sends heartbeats; the host must drain them; if the host stops, surely the device fences off the unresponsive driver and faults. My driver had no periodic task draining the AENQ — it relied on a single interrupt that, I could see, fired exactly once and went quiet. This was a beautiful theory. It explained the timing. It matched a real mechanism FreeBSD implements. I built a proper one-second timer to drain the queue, mirroring FreeBSD's ena_timer_service, complete with a mutex to keep the timer and the interrupt from racing on a multi-core guest. It was clean, correct code. It fixed nothing. The device faulted on exactly the same schedule, and the diagnostic I'd added showed the AENQ had processed zero events — there was nothing to drain. I had carefully solved a problem that wasn't happening.

Theory two: the MMIO response region. ENA has a "readless" register mode backed by a small DMA region; maybe the device faulted without it. I added it. It didn't help — and worse, I'd added it before I started reading the status register, so for an embarrassingly long time I was carrying an unvalidated change that could have been the cause itself. (It wasn't.)

Theory three: host attributes. Both reference drivers register a 4 KB "host info" page right after admin init — SET_FEATURE(HOST_ATTR_CONFIG), a "yes, a real driver lives here" handshake. My driver skipped it. This had to be what the device validated before deciding I was legitimate. I implemented it properly. The status register read 0x1 right after it succeeded — and faulted anyway, at the same point as always.

Theory four: a stale completion. Maybe the admin queue was reading the wrong completion slot, so the feature data showing the device supporting zero AENQ event groups — a suspicious value — was garbage from an uninitialized ring. I instrumented the completion path down to the command IDs and phase bits. It was reading the right slot. The suspicious value was real.

Theory five: interrupt ordering. I'd unmasked the device's interrupt after a particular doorbell write; the reference does it before. I swapped the order. The device faulted one line later than before, which felt like progress and was not.

Somewhere in the middle I did what I increasingly do with a problem that has too many branches: I handed it to a fleet of AI agents, one per hypothesis, reading the reference trees in parallel, and had them synthesize a ranked root cause. The synthesis came back confident, specific, and wrong. What saved it was the same harness's adversarial step — three more agents told to refute the conclusion, and all three did, pointing out the timing didn't fit and the real anomaly was being hand-waved. The machine talked itself out of its own clever answer. That's the part worth remembering: not "the AI solved it," but "the AI proposed something plausible and the only thing that caught it was forcing a second pass that tried to tear it down."

What none of the five theories were was boring enough.

The doorbell

I gave up on theories and did the dumb, mechanical thing I should have done first. I made the driver poll the status register in a tight loop after every single register write in the bring-up, printing the exact moment the fault bit flipped. Not "is it healthy at the end" — which write kills it.

The answer came back in one line, and it was not ambiguous. Healthy after writing the queue's base address. Healthy after writing its size. Healthy after the feature commands. Then:

STS->FATAL 0x21 at [post-AENQ_HEAD_DB] iter=3

The fault appears about 150 microseconds after one specific write: the AENQ head doorbell — the register that tells the device "I've made the event ring's slots available; the queue is live." Not the interrupt unmask, which I'd swapped order on. Not any feature command. The doorbell that activates the event queue. Ring it, and the device dies.

The doorbell value was correct — I'd checked it against the reference a dozen times. The ring's address was correct. Its size was correct. So why would activating a correctly-configured queue kill a healthy device?

Because the queue wasn't really configured. It only looked like it was.

The most boring possible cause

I lined my driver's bring-up sequence up against ena-com's, write for write, and the difference was finally visible because I now knew exactly which write to care about.

ena-com registers the AENQ — writes its base address and size registers — inside ena_com_admin_init, in the same breath as the admin submission and completion queues, before the device is ever told initialization is finished. All three rings get registered together, as one atomic-feeling handshake, while the device is still in its "setting up" phase.

My driver registered the admin queues during admin init, exactly like the reference. But it registered the AENQ much later, in a separate function that ran after host attributes, after reading device attributes, after feature negotiation — long after the admin handshake had closed and the device considered itself up and running.

And here's the thing: writing those AENQ registers late worked, in the sense that the device accepted the writes and stayed healthy. The register values landed. The status bit stayed green. Everything looked fine. The device had quietly noted the address and size of a ring it had never actually wired into its event subsystem, because that wiring only happens during the init handshake I'd already finished. The AENQ was a ghost: registered on paper, uninitialized in the device's mind.

Then I rang the doorbell. "The event queue is live; start using it." The device went to use a subsystem that was never set up, and faulted. A hundred and fifty microseconds later, the bit flipped.

The fix is four lines moved earlier. I pulled the AENQ registration out of its late function and into admin init, immediately after the admin queues, exactly where ena-com does it. The later function kept only the parts that genuinely belong late — subscribing to event groups, and the final doorbell-and-unmask that says "go."

I rebuilt, booted, and watched the status register stay 0x1 straight through the doorbell. CREATE_CQ succeeded. CREATE_SQ succeeded. The link came up. The driver enqueued a hand-built DHCP DISCOVER, the device transmitted it, and a 590-byte IPv4 packet — the DHCP OFFER, a real reply from AWS's network — came back up the receive ring. Transmit and receive, on real silicon, for the first time.

There was one more gift in the logs, the kind that tells you a fix is right and not just lucky. Remember theory four — the device reporting zero supported AENQ event groups, the value I'd half-dismissed as a possible misread? With the AENQ now registered during init, that same query on the same hardware came back reporting all the groups supported. The zero had never been a VF limitation or a misread. It was the device telling me, in the one channel it had, that its event subsystem wasn't initialized — because I hadn't initialized it yet. One root cause had been wearing five costumes. The watchdog that had nothing to drain, the missing handshake, the suspicious zero, the doorbell fault — all of it was the single fact that I'd set up a queue in the wrong order, refracted through a device that can only ever tell you "something is wrong."

This is the same lesson the last post ended on, and I clearly didn't learn it hard enough the first time. Out here, far off the beaten path, every failure presents as exotic, because the strange explanation announces itself and the boring one doesn't. An out-of-disk error wears the costume of a dependency that won't compile. A queue initialized in the wrong order wears the costume of a keep-alive watchdog, a missing security handshake, and a device that lies about its capabilities. The further out you are, the more deliberately you have to rule out the dull thing first — and "I did the steps in the wrong order" is about as dull as it gets.

The loop that made it bearable

I have to mention the iteration speed, because for most of this saga it was the actual bottleneck, and fixing it is what turned a slog into something tractable.

The previous post's build server bakes a disk image, snapshots it, registers an AMI, boots a real EC2 instance, and reads the serial console — about eighteen minutes a turn. That's fine for building a binary. It is agony for debugging a driver, where you want to change one line and see what the hardware does. Eighteen minutes times the number of wrong theories above is a number I'd rather not compute.

So I built a faster loop, and it leans on the same bare-metal host the build server already needs. That a1.metal host has a real ARM SMMU — the IOMMU that makes device passthrough safe — and Linux's VFIO framework can hand a physical PCI device straight to a QEMU guest. So I attached a second network interface to the metal instance, bound it to vfio-pci on the Linux host (leaving the primary NIC alone, so I didn't saw off the SSH branch I was sitting on), and passed it through to the OpenBSD guest. Now the OpenBSD VM sees a real ENA device — actual Amazon silicon, vendor 1d0f, product ec20 — on its virtual PCI bus, and my driver attaches to that. No AMI bake. Build the kernel, reboot the guest, watch ena0 come up against real hardware. Two minutes a turn instead of eighteen.

There was one gotcha worth writing down, because the error is opaque: VFIO refused with failed to set iommu for container: Operation not permitted. ARM's SMMU here can't remap interrupts, so the kernel blocks passthrough by default — fixed with the module parameter allow_unsafe_interrupts=1, entirely fine for a trusted device on a machine I own by the hour.

That loop is also the honest reason the fix arrived when it did. The "poll after every write" instrumentation only became practical once I could run it and read it in two minutes. The clever theories flourished in the eighteen-minute dark; the boring method won the moment I could see in real time.

What "working" means

Here is where I have to be careful, because "I wrote a working ENA driver for OpenBSD" is a sentence that can mean five very different things, and only one of them is true today.

What is true: on a real EC2 Graviton2 instance, an OpenBSD/arm64 kernel with my driver attaches to the real ENA adapter, completes the full bring-up — admin queue, host attributes, device attributes, AENQ, IO queue creation — brings the link up, transmits a packet the device really puts on the wire, and receives the reply. The send and receive paths through the driver are the real ones: the transmit path is the same ifq enqueue the network stack uses, and the receive path is the same completion handler that would feed packets up to IP. A DHCP DISCOVER went out and an OFFER came back. The device protocol — the genuinely hard, genuinely undocumented-in-OpenBSD-idiom part — works, and it works on the production hardware, not just the passthrough rig.

First boot. OpenBSD/arm64 on a real Graviton2 EC2 instance, ena(4) attached, link up, DHCP round-trip completed against the real device.

What is not yet true, and I want to be exact about each one:

I drove that DHCP exchange from a kernel thread, not from userland. The test harness builds a DISCOVER packet in the kernel, hands it to the transmit ring, and watches the rings directly, because the minimal RAM-disk environment I'm booting doesn't have room for a real dhclient. The packets and the ring mechanics are real; the thing wrapping them is a debug scaffold, not ifconfig ena0 up; dhclient ena0; ping. I have not yet typed those three commands at a shell and watched them work. That's the next milestone, and until I've done it I won't claim the interface works "from userland," only that the driver's data paths do.

I am booting the OpenBSD install ramdisk, bsd.rd, which runs entirely from RAM. I have not done a full disk install and booted a persistent OpenBSD that comes up multiuser with ena0 as its only network interface and lets me SSH in over it. That — a normal OpenBSD instance you log into over the network it sees natively — is the milestone that would let me retire the QEMU-shim build server from the last post. I'm not there. I've proven the hard part is possible; I haven't assembled it into a system you'd actually run.

The driver is full of scaffolding. The status-register polling, the packet-injection thread, a dozen diagnostic printfs, the keep-alive timer I built for a problem that didn't exist — all still in the tree, behind a debug flag. None of it belongs in code anyone else should read. Before this is a contribution rather than a demo, that all comes out, the real fixes get separated from the detritus, and the whole thing gets the kind of review OpenBSD's tree rightly demands. It has had none of that. No OpenBSD developer has looked at a line of it. It is not submitted, not reviewed, and would not survive tech@ in its current state, nor should it. The work-in-progress tree, scaffolding and all, lives at github.com/ajokela/openbsd-ena — open for reading, not for trusting.

And the testing is thin. One DHCP round-trip is not throughput, not stability under load, not days of uptime, not the dozen edge cases — checksum offload, multi-queue, MTU changes, link flaps — a NIC driver has to handle before anyone trusts it. I've shown the path is real. I have not shown it's robust.

So: working in the sense that the central, doubted, genuinely difficult thing — does the device protocol function, correctly implemented from scratch, on real hardware — is now answered yes. Not working in the sense of something you'd deploy, or even the sense of something you'd ifconfig by hand yet. Both halves of that sentence are true and I don't want the exciting half to eat the honest one.

What it was about, again

The last post put a translator between two systems that disagreed, so each could be right without meeting. This one is the opposite move: no shim, just teaching one system to speak the other's language directly — a driver doing the actual work of turning OpenBSD's idea of a network interface into ENA's, register by register and ring by ring.

But the deeper rhyme isn't the architecture, it's the failure. Both times the headline problem had a one-sentence answer — "run it as a guest," "register the queue during init" — and both times that sentence was the easy part, with the real work in a gap where everything looked exotic and the truth was mundane. And both times the trap was the same: a plausible, faintly flattering, wrong explanation is far more available than the boring one underneath it — especially with a tireless machine happy to generate plausible explanations on demand. The machine is genuinely useful; it read three reference drivers in parallel and caught its own bad guess on the second pass. But it has no instinct for "you probably just did the steps out of order," because it has never spent an afternoon being humiliated by exactly that.

What I have now is a driver that makes OpenBSD see the network on a cloud that, a month ago, OpenBSD couldn't see at all. It is not done. It is, for the first time, possible — proven on the hardware, by the most boring fix in the file. The doorbell rings, and the device lives.

TinyComputers.io (Posts about msi-x)

The Doorbell That Killed the Device: Writing OpenBSD's Missing ena(4) Driver