What Terra Populus Taught Me About Cancelling Quiver

A.C. Jokela — Sun, 26 Apr 2026 13:00:00 GMT

I joined Terra Populus in 2012 as a senior engineer. It was a project at the Minnesota Population Center, now part of the Institute for Social Research and Data Innovation at the University of Minnesota, and it was funded by an NSF cooperative agreement totaling about \$8 million across the life of the project. The output was free to the research community. The work that produced it was not.

In 2013 the original leadership of the project left and I started inheriting the work. Nobody hands you the lead role on a federally-funded data integration project the way you'd hand someone a clean repo and a Friday lunch. You inherit it the way you inherit a house with the heating system halfway through a retrofit — there's a budget, there's a deadline, there's a researcher in Massachusetts who needs a specific harmonized variable to ship a paper, and there's a stack of NSF reporting requirements that don't pause while you figure out what's going on. By 2014 the role was formalized and I led the engineering through to the project's wind-down in late 2016 and early 2017.

I've been thinking about that experience for the last week because I'm cancelling a Quiver Quantitative subscription. The Quiver tier I'm on costs somewhere in the \$30 to \$70 per month range and it's the one with the alternative-data feeds I needed for a recent research project. The project is done. The subscription renews in a few weeks. I have a decision to make.

What surprised me, sitting with the decision, is how much my comfort with making it draws on the years I spent building the academic version of what Quiver does. The two services look nothing alike on the surface. Underneath, they are doing exactly the same shape of work. And the question of who pays for that work — and how — is more interesting than any cost-per-month arithmetic.

What Terra Populus Actually Did

Terra Populus integrated three things that don't natively join: population microdata (people, households, decade-by-decade), environmental data (climate, land cover, atmospheric measurements), and land-use data (parcels, agricultural designations, administrative boundaries). The pitch to NSF was that researchers studying the relationship between human populations and the natural environment shouldn't have to spend two years writing crosswalks before they can ask their question.

The user-facing version of the project shipped under the IPUMS Terra brand and ran inside the existing IPUMS data-extraction infrastructure. The screenshot above is what a researcher saw when they came to ask the question we were trying to make askable: pick a population dataset, pick an environmental or land-use dataset, choose the join structure that fits your analysis (microdata, area-level, or raster), and pull a custom extract. The three little nodes in the middle of that diagram look simple. Most of the engineering effort in the project was in making them simple.

The hardest specific problem we ran into, over and over, was geographic boundary harmonization across census decades. Census tracts redraw every ten years. A neighborhood that was one tract in 1990 might be three tracts in 2000 and two tracts in 2010 with completely different shapes. A researcher who wants to study, say, neighborhood-level income trajectories from 1980 to 2010 cannot just join the 1980 file to the 2010 file on tract ID. The tract IDs don't refer to the same places. The boundaries shift. Pieces split. Pieces merge. Pieces get absorbed into adjacent tracts when populations decline. The whole map breathes between decades.

The honest answer to "what does the same place look like across these boundary changes" is that there is no single right answer. There are defensible answers that depend on what you're studying. If you're tracking land-use change you probably want fractional area weighting — give a 2010 tract that overlaps 60% with a 1990 tract sixty percent of the 1990 measurement. If you're tracking population movement you want something different, because populations don't distribute uniformly across area. If you're tracking voting patterns you want yet another thing, because voting precincts don't align with tracts at all.

We made decisions. We documented them. We defended them in deliverable reports to NSF every year. We versioned them, because once a researcher cited a specific harmonized variable in a published paper we couldn't silently change what that variable meant in the next data release. Schema discipline as a research infrastructure problem, not a code problem. Every variable we shipped came with a paper trail that explained what it was, what it wasn't, and what alternative we considered.

Spatial harmonization was only the worst-named version of a more general problem. The same shape of decision had to be made on the temporal axis. Population microdata in the US is decadal. Environmental data is hourly, daily, monthly, depending on the source. Land-use data is irregular — it changes when somebody updates the parcel records, which can happen on any timescale from weeks to decades depending on jurisdiction. Joining a 2010 census tract to MODIS satellite imagery for the same area requires a decision about what time window of imagery counts as "matching" the 2010 measurement. We documented that decision too. It was different from the spatial decision, defended in different sections of the same NSF report, and tested by different researchers downstream.

The harmonized output sat on a server somewhere and was free to download. The labor of arriving at it was funded at roughly \$1.6 million per year for five years. Engineers, researchers, project managers, the costs of running servers, the costs of presenting at conferences and writing the reports that justified the next year's allocation. It was a lot of money. It produced a finite, citable, queryable dataset that hadn't existed before, and would not have existed without that money.

The Funding Model

NSF cooperative agreements are a particular kind of grant. Unlike a standard research grant, where the agency funds you and then mostly leaves you alone, a cooperative agreement keeps NSF involved in the project's direction. There's a program officer who attends meetings. There are quarterly check-ins. There are annual reports that include not just what you spent and what you produced but where you're going next and why. There are presentations, in person, to people whose job is to ask hard questions about whether the work is worth what it costs.

This structure exists because the agency is making a public-good argument. Federal money pays for outputs that meet certain criteria — they have to be broadly useful, they have to serve a research community that can actually use them, and they have to produce evidence of impact. If your data is sitting on a server and nobody is downloading it, the next funding cycle is going to be a hard conversation.

The trade-off is that this model only works for cross-domain data services that have a research community organized enough to push for them. The Census has one. Climate data has one. Land-use and population integration has one. Alternative financial data — the kind of feeds that say "here's what Pelosi traded last week" or "here's what Cramer mentioned on Mad Money" — does not. There is no academic constituency lobbying NSF to fund harmonized politician-trading disclosures. The data exists; the demand is real; the funding model isn't there.

The other thing the academic model doesn't solve is what happens when the grant ends. Terra Populus wound down in late 2016 and early 2017. New harmonization work, new boundary updates, new integrations — those mostly stopped when the funding stopped. Sustainability is the fragile part of the academic public-good model. You can produce a high-quality dataset for five years and then watch it slowly decay because there's no organizational reason to keep the engineering staffed. I left the University of Minnesota in early 2019, so, I have less knowledge as to what the legacy of Terra Populus is at ISRDI; I'm sure lessons were learned and some of the data itself likely shifted into other projects.

What Quiver Actually Sells

Quiver Quantitative sells the same shape of work for financial alternative data. They are a commercial service, not a federally-funded one, and the visible product is a paywalled API and a public-facing dashboard. But the engineering underneath is recognizable. I know it on sight because I spent seven years doing the academic version of it.

A few examples of what Quiver actually does, beyond "scraping public data":

A politician's personal trade disclosure shows up on the House STOCK Act feed under whatever name the politician uses. That name has to map to a BioGuide ID. The BioGuide ID has to join to the politician's party affiliation, chamber, and committee assignments — at the date of the trade, not at the date of the disclosure, which can be months later, and not as of today, because the politician may have switched committees or parties since. The join logic has to handle people who change their legal name mid-term. It has to handle filers who use a spouse's name on the disclosure. None of this is intellectually deep. All of it is the kind of thing that fails silently in a hand-rolled scraper if you're not careful.

A corporate jet flight is reported by tail number. The tail number is owned by an LLC. The LLC is owned by a holding company. The holding company is owned by an executive, or by a fractional-ownership program, or by a trust whose beneficiary is the executive. The executive worked at Company A from 2018 to 2022 and at Company B since then. If you want to attribute the flight history to "the Company A CEO's plane" correctly across that transition you need to maintain the ownership graph and apply it temporally, with an understanding of when M&A and management changes happened.

An insider trade reported on Form 4 belongs to an officer, a director, a 10% beneficial owner, or a relative. The role hierarchy matters because the predictive content of the trade depends on the insider's position. The company's filing history runs through name changes, ticker swaps, and reorganizations that would break a naive join on stock symbol.

A WallStreetBets ticker mention has to be disambiguated from English words ("ANY", "REAL", "GO", "ALL"), from financial terms used as common nouns ("CASH", "BANK", "DD"), and from the constantly-shifting community vocabulary that makes "Roaring Kitty" mean something specific to the reader on a specific date. Sentiment scoring on those mentions has to track what counts as bullish in the WSB dialect, which is not the same as what sentiment models trained on news text think bullish means.

The hard part of all of these is maintenance. Source formats change. Disclosure requirements add fields. Politicians change parties. Companies merge. Tail numbers transfer. New tickers IPO; old tickers delist; ETFs split into share classes. Quiver writes those decisions into customer-facing changelogs and SLAs instead. The decisions are no less real.

What you pay for, when you pay for Quiver, is the labor of making those decisions and the sustainability of having an organization that keeps making them. The data feed is the visible artifact. The engineering function is the actual product.

What You Take On When You Cancel

A third funding model exists alongside the academic-grant and commercial-subscription models: pay with your own time. Build it yourself. Maintain it yourself. Absorb every format change personally, on your own schedule, without an SLA or a changelog or a research community to share the burden.

If you cancel a service like Quiver, here is the inventory of what you take on:

Name disambiguation. Politicians, companies, executives, all changing identifiers and roles over time. Every join needs to know which version of the entity it's joining.
Ticker stability through corporate actions. M&A, name changes, share class splits, delistings. The historical price series for Activision Blizzard ends in October 2023 because Microsoft acquired it; if you want to study Activision's history you need to know that and stitch it together.
Schema normalization across changing source formats. House STOCK Act PDFs and Senate eFD filings have different schemas. SEC EDGAR Form 4 has had three or four major XBRL revisions in the last decade. Every change is a parser update on your end.
Audit trail and reproducibility. If you want anyone — including future you — to be able to re-run your analysis or cite your data, you need to record what you used, when you pulled it, and what cleaning decisions you made. This is the part that sounds like overhead until it isn't.
Format-change response time. When the upstream source publishes a new PDF template with shifted column boundaries, you have days to fix the parser before the data gap shows up in your analysis.
Adversarial inputs on the source side. The upstream entities have their own incentives. A politician's disclosure form filled out the legal minimum way is not necessarily the version that makes downstream joins clean. Filers transpose digits in CUSIPs. They report a transaction date that's the settlement date instead of the trade date. They list a partial name for a holding LLC that you have to disambiguate against three plausible matches. None of this is fixed by trying harder; it's fixed by accumulating a library of known patterns and a tolerance for ambiguity.

What you give up: coverage, consistency, the audit trail that makes the data citable in research papers, and the option to forget about the data layer entirely while you focus on the question you actually wanted to answer. What you keep: full control over the corner cases that matter to your specific question, and the ability to make different cleaning decisions than the service made.

I just finished a study of Cramer's Mad Money recommendations as a quant filter signal — write-up coming separately — and the data work for that project is what made the engineering layer concrete to me again. I needed specific segment classifications that didn't quite match Quiver's defaults. I needed corner-case handling for tickers that were renamed mid-window. I built most of that infrastructure anyway. The subscription was a useful starting point, but the actual analysis lived in code I owned.

I Live Here Already: DirtScout

The Cramer study is the recent example. The longer-running one is DirtScout, a land acquisition platform I've been building. Strip away the user-facing parts and DirtScout is essentially a series of custom data ingestors for county and state level real property information. It is the DIY funding model lived, not described.

The pipelines that keep DirtScout fed:

Minnesota statewide parcel data, refreshed quarterly. Every county in the state has its own schema for what counts as a parcel record, with different field names, different geometry encodings, and different conventions for whether vacant land is a separate row or a flag on the residential record. Harmonizing those into a single queryable parcel table is the largest single piece of cleaning work in the project.
St. Louis County weekly, with tax-delinquent enrichment. The county publishes a delinquent register that uses different identifiers than the parcel records, and the join is fragile in exactly the way you'd expect — name spellings drift, owner addresses change, parcel splits get attributed differently between the assessor's database and the treasurer's database. Cross-source enrichment is where most of the corner-case handling lives.
Tax Forfeited Land sale lists, scraped daily off an SSRS-rendered report. The format is defined by Microsoft's reporting service and emitted as the world's least-loved HTML table. The report definition changes occasionally and the parser has to track it.
Elevation profiles, computed from USGS topographic data and joined to parcel polygons on demand. Cross-domain integration: real property data and digital elevation models live in different schemas, projections, and coordinate systems, and joining them properly requires a stack of decisions that look exactly like the ones we made joining environmental rasters to census tracts in Terra Populus.

I keep all of it running with cron jobs on a Linux box and weekly integration tests. The cron jobs say pull the new data; the integration tests say scream if the join logic broke. Together they constitute exactly the same kind of ongoing engineering that went into NSF reports, and exactly the same kind of function Quiver writes into customer-facing changelogs and SLAs. The funding model is different — DirtScout is funded by my time and the electricity bill on a workhorse machine — but the labor is recognizably the same shape.

That's the DIY model in actual practice rather than as an abstraction. When I cancel Quiver, I'm not adding a new category of work to my life. I'm adding one more pipeline to the maintenance surface I already carry. The cancel decision is about whether the marginal pipeline is worth the marginal time, given the alternative of paying for the labor.

The Decision

The arithmetic is simple. Thirty to seventy dollars per month, times twelve, equals three hundred sixty to eight hundred forty per year for the Quiver tier I needed. The cleaning labor for the Cramer study, in hours of my own time, came out below that — partly because the project window was bounded (no ongoing maintenance burden after publication), partly because the corner cases I cared about were corner cases Quiver wouldn't optimize for in any case, and partly because the subscription's marginal value to me dropped to near zero the moment the study finished its primary analysis.

So I'm cancelling.

This is a verdict on the use case, not on the service. Quiver is correctly priced for what it does. The labor it absorbs on behalf of its customers is real, valuable, and the kind of work I have personal sympathy for because I used to do its academic equivalent. The question is not "is this overpriced." The question is "does what they do match what I need on this specific project, on this specific timeline, with this specific willingness to absorb maintenance personally."

For me, right now, the answer is no. For someone running a fund and consuming alt-data continuously, the answer is obviously yes. For an academic researcher with grant funding and a multi-year project, the answer probably depends on whether the institution will pay. For a hobbyist with one specific question and time on their hands, the answer is probably no. Different use cases, different absorption preferences, different right answers. The subscription is a tool, and like any tool it's correctly chosen against a specific job.

Three Funding Models, One Bill

The labor doesn't disappear in any of these models. The variable is who absorbs it.

Academic grants socialize the cost across taxpayers in exchange for outputs that meet public-good criteria and serve a research community. The output is free to use. The sustainability is fragile, because the moment the grant ends the engineering staff disperses and the dataset starts ossifying. Terra Populus is one of dozens of NSF-funded data integration projects that produced excellent work for a fixed window and then went into a maintenance-mode purgatory or were eventually decommissioned.

Commercial subscriptions price the cost into the customer relationship. The output is paywalled. The sustainability is more robust, because as long as renewals cover engineering payroll the work continues — but the paywall means the data is unevenly distributed. Some questions that should be asked about Cramer or Pelosi or insider trading get asked only by people willing to pay for the data layer.

DIY personalizes the cost into your own time. The output is yours. The sustainability is one upstream format change away from breaking, and the audit trail depends entirely on your discipline. Most DIY data work decays for the same reason most personal projects decay: the maintenance burden is invisible until it isn't.

When somebody tells you a commercial data service is overpriced, ask them what they're proposing to do with the labor. The answer is always one of those three models, and the right answer depends on what you're actually optimizing for. If you want broad public access to a citable dataset, the academic model is correct and the price tag (which is real, just paid by someone else) is justified. If you want consistent, maintained, audit-trailed access on a service-level agreement, the commercial model is correct. If you want full control over the corner cases and have time to spend, DIY is correct.

There's a Jevons-flavored pattern hiding here. Cheaper data access creates demand for more data work — more analyses, more cross-cuts, more downstream products built on top. The labor savings at one layer (the cleaning is already done; just query it) get spent at the next layer (now the question becomes "what does this dataset tell me," and that question has no upper bound). The bill is real either way. The form it takes shifts depending on how the supply side is funded.

The same logic, weirdly, applies to a different kind of subscription decision I wrote about a few weeks back. Local LLM inference versus a Claude Max subscription is the same shape of question as Quiver versus DIY scraping: who absorbs the cost, what coverage do you get, how fragile is the sustainability, and what specifically are you optimizing for. The arithmetic is different. The structure isn't.

The subscription cancellation in front of me this week is small — a few hundred dollars saved per year, in exchange for some weekend hours I'd have spent on the data layer anyway. The bigger thing it's clarified is the model I want to use when I make the next one.

TinyComputers.io (Posts about quiver quantitative)