Building DirtScout: A Land Acquisition Platform with Claude Code

A.C. Jokela

2026-03-25

Nine years ago, I built something similar.

It was 2017, St. Louis County, Minnesota. I wanted to find raw undeveloped land from the county's delinquent tax rolls. The county had an ArcGIS service, but the APIs were primitive compared to what they offer now. I stood up a PostgreSQL database with PostGIS extensions, wrote Ruby scripts to scrape parcel data from the county's map server, geocoded addresses, and built a Ruby on Rails frontend to browse the results. The whole thing lived on a single VPS. It worked for one county. The data model was rigid, the scraping was fragile, and every time St. Louis County changed their GIS service, something broke.

That project died the way side projects do: I got what I needed from it and moved on.

In March 2026, I came back to the idea. The landscape had changed. ArcGIS REST APIs are now standardized and reliable. Wisconsin publishes a statewide parcel dataset covering all 72 counties through a single endpoint. Minnesota counties expose delinquent tax data through queryable feature services. AWS Lambda and DynamoDB mean I don't need to manage a database server. And I had a tool that didn't exist in 2017: Claude Code.

DirtScout is the result. It's a full-stack land acquisition platform at dirtscout.land that searches delinquent tax parcels across 21 Minnesota counties and browses raw land across 72 Wisconsin counties. It has AI-powered investment analysis, environmental and soil assessments, a deal pipeline with offer letter generation, tax forfeit auction tracking, and automated monitoring with email alerts. The codebase is about 29,000 lines across Python, TypeScript, and infrastructure-as-code.

I built it with Claude Code. Not "Claude Code assisted me" or "Claude Code helped with the boilerplate." Claude Code wrote the code. I directed the architecture, made the decisions, and did the debugging when things broke. But the actual lines of code came from conversations, not from me typing in an editor.

The Architecture

The 2017 version was PostgreSQL + PostGIS + Ruby on Rails on a single server. The 2026 version:

Frontend: Next.js 16, static export, Tailwind CSS, react-leaflet for maps. Hosted on S3 behind CloudFront. The entire frontend is pre-rendered HTML and JavaScript; there's no server-side rendering. CloudFront serves it from edge locations. A URL rewrite function handles dynamic routes for deal detail pages and shared parcel links.

Backend: Python FastAPI running on a single AWS Lambda function behind API Gateway. Mangum adapts the ASGI app to Lambda's event format. Every API request hits the same Lambda, which cold-starts in about 3.5 seconds and handles subsequent requests in under a second. The function has 512MB of memory and a 5-minute timeout.

Data: Two DynamoDB tables. The main table stores user data, flagged parcels, deals, preferences, notes, attachments, saved searches, tax list imports, and auction tracking. The cache table stores land cover analysis, environmental data, soil analysis, and geometry with TTLs. No PostgreSQL. No PostGIS. No database server to manage.

Infrastructure: AWS CDK in Python. One cdk deploy command creates the Lambda, API Gateway, DynamoDB tables, S3 buckets, SQS queues, EventBridge schedules, Route 53 records, CloudFront distributions, and ACM certificates. The entire infrastructure is version-controlled and reproducible.

On-premises worker: A service running on a local AMD Strix Halo machine (Ryzen AI MAX+ 395, 128GB RAM) processes delinquent tax list PDFs using pdfplumber for text extraction and a local Qwen3 32B model via Ollama for structured data extraction. It polls an SQS queue for jobs.

This is a fundamentally different architecture than what I could have built in 2017. No servers to patch aside from the Strix Halo. No database to back up. No PostGIS extensions to compile. The Lambda handles the compute, DynamoDB handles the storage, and the on-prem machine handles the jobs that need a real browser or a local LLM.

What Claude Code Actually Did

I want to be specific about this because the "AI-assisted development" conversation is usually vague. People say "I used AI to help me code" and it could mean anything from autocomplete suggestions to full application generation. Here's what actually happened.

I started with a Rust TUI. The original project was a terminal application that queried a handful of Minnesota county ArcGIS services and displayed delinquent parcels in a text interface. It had county configurations, a query client, land cover analysis via USGS NLCD, and a flagging system. Claude Code built this from my descriptions of what I wanted: "query this ArcGIS service for parcels where the delinquent flag is set, filter by acreage and land use, show me the results in a table with navigation."

Then I decided to make it a web app. I described the architecture I wanted: FastAPI on Lambda, Next.js on S3, DynamoDB for storage. Claude Code ported the Rust query logic to Python, built the FastAPI routes, created the React components, wrote the CDK infrastructure, and handled the deployment. Each feature was a conversation: "add Google OAuth," "add a deal pipeline with stages - make it look like Kanban," "generate offer letter PDFs," "add an AI investment summary using the Claude API."

The codebase grew to 29,000 lines across 113 files in the initial commit. Later sessions added another 60 files and 5,000 lines for Wisconsin support, soil analysis, tax list imports, auction tracking, spatial search, and saved searches.

I didn't write these lines. I directed them. There's a difference, and it matters.

When I say "directed," I mean I made every architectural decision. I chose DynamoDB over PostgreSQL because I didn't want to manage a database. I chose Lambda over ECS because I didn't want to manage containers. I chose static export over SSR because I didn't want to manage a Node.js server. I chose to use a local LLM for PDF parsing instead of Claude API because the parsing is structured data extraction that doesn't need frontier model quality.

Claude Code implemented these decisions. When something broke, I described the symptom and Claude Code diagnosed the cause. When I wanted a new feature, I described the behavior and Claude Code wrote the code. The feedback loop was: describe what I want, review what I get, deploy, test, describe what's wrong, iterate.

Some things broke in interesting ways. DynamoDB doesn't accept Python floats; you have to convert everything to Decimal. The county field maps are reverse-keyed from what you'd expect (ArcGIS field names are the keys, common names are the values). Google OAuth redirect URIs need a trailing slash. CloudFront caches aggressively and you have to invalidate after every deploy. The Census TIGER API for county boundaries is painfully slow, so we downloaded the GeoJSON once and serve it as a static file. Each of these was discovered in production and fixed in conversation.

The Data Sources

The interesting part of DirtScout isn't the web framework. It's the data integration.

Minnesota parcel data comes from 11 different county ArcGIS REST services, each with its own field names, query syntax, and data quality. St. Louis County has DELINQUENT_TAX_FLAG and BAL_DUE. Aitkin has DELINQUENT_FLAG (text: "YES"/"NO") and BALDUE. Hennepin stores acreage in square feet (divide by 43,560). Goodhue stores acreage as a string that requires CAST in the SQL WHERE clause. Each county is a separate configuration with field mappings, WHERE clause templates, and normalization logic.

Minnesota tax lists come from 15 county PDFs and Excel files. Itasca County publishes an Excel file updated monthly. The rest publish PDF legal notices. The PDFs are processed by either the Claude API (Haiku model, cheapest tier) or a local Qwen3 32B running on the Strix Halo machine. The AI extracts parcel IDs, owner names, delinquent amounts, and addresses from the unstructured PDF text and returns structured JSON.

Wisconsin parcel data comes from a single statewide ArcGIS feature service maintained by the State Cartographer's Office. One endpoint, all 72 counties, standardized fields. Owner names, mailing addresses, assessed values, acreage, property class. No delinquent tax data in the GIS, but we supplement with 9 county-level PDF lists of tax-delinquent and tax-forfeited properties.

Environmental analysis layers: FEMA NFHL for flood zones, NWI for wetlands, NHD for water bodies, Minnesota DNR County Well Index for well data, MPCA for contamination sites. Each is a separate ArcGIS REST service query using the parcel's centroid.

Soil analysis comes from the USDA Soil Data Access REST API (SSURGO). A SQL query with the parcel's centroid returns soil components, drainage class, hydric rating, slope, farmland classification, and capability class. We compute a "buildability" score from these factors.

Land cover comes from the USGS MRLC WMS service, querying the NLCD 2021 Land Cover layer. We sample the parcel area and return a breakdown by cover type: forest, agriculture, water, wetlands, developed, barren.

Each of these integrations was built in conversation with Claude Code. "Add flood zone analysis using FEMA's service." "The NWI wetlands query needs table-prefixed field names." "The SSURGO soil query needs a WKT point geometry."

What Changed Since 2017

The PostgreSQL + PostGIS + Ruby on Rails stack I used nine years ago was the right choice for 2017. PostGIS let me do spatial queries locally. I had to store the parcel data because the ArcGIS services weren't reliable enough to query in real time. Rails rendered server-side because that's what Rails did.

None of that is necessary anymore. The ArcGIS services are fast and reliable enough to query live. DynamoDB handles the persistence without a schema to manage. Lambda eliminates server management. Static export means the frontend is just files on S3.

There's a personal angle here too. In graduate school, I spent an entire semester manually developing land cover classifications for a final project — hand-labeling training data, running supervised classification algorithms, validating results against ground truth. It was weeks of work for one study area. For DirtScout, I told Claude Code "add a buildability score based on soil drainage, hydric percentage, slope, and capability class" and had a working assessment in minutes. The SSURGO soil data query, the scoring logic, the frontend panel with color-coded ratings — all from a single conversation. The knowledge that took a semester to develop is now a commodity you can describe and deploy.

But the bigger change is the development process. In 2017, I wrote every line of Ruby and SQL by hand. I designed the PostGIS schema, wrote the scraping scripts, built the Rails views, configured the Nginx proxy, set up the SSL certificates, and wrote the systemd service files. It took months of evenings and weekends for a single-county tool.

In 2026, I built a two-state, multi-service platform with AI analysis, auction tracking, deal management, and offer letter generation in a series of conversations over a few days. The code isn't hand-crafted. I'm not interested in hand-crafted code when that's not the point. The point is finding undervalued rural land from delinquent tax records and making offers to motivated sellers. The code is the means. Claude Code made the means faster.

The On-Prem Angle

A tax list import worker runs on a Bosgame M5 mini PC in my basement.

The worker exists because I didn't want to pay for Claude API calls to parse 24 county PDFs every week. The AMD Strix Halo has 128GB of RAM and runs Qwen3 32B through Ollama. The worker downloads each PDF, extracts text with pdfplumber (a Python library that does the PDF-to-text conversion locally, no model needed), then sends the extracted text to the local Ollama instance for structured JSON extraction. Each 2-page chunk takes 5-7 minutes on the 32B model. It's slower than a cloud API. It's also free.

The worker is a systemd service that starts on boot and polls an SQS queue continuously. A weekly systemd timer enqueues an "import all" message every Monday morning.

This is the economics of owning your own inference in practice. The frontier model handles the quality-sensitive work (AI investment analysis, parcel chat). The local model handles the batch extraction work. The split happens naturally based on the task requirements.

What It Does Now

The production site at dirtscout.land:

Searches delinquent tax parcels across 21 Minnesota counties (8 Tier 1 with full ArcGIS data, 2 Tier 2 with partial data, 2 Tier 3 with minimal data, 9 Tier 4 with imported tax list data)
Browses raw land parcels across all 72 Wisconsin counties via the statewide parcel service
Scores each MN parcel on a 0-100 scale (grades A through F) based on financial opportunity, road access, environmental factors, and land character
Generates AI investment summaries using Claude Sonnet with full context: parcel data, land cover, environmental analysis, soil data, owner's other properties, and attached documents
Tracks deals through a pipeline (prospecting, offer sent, negotiating, under contract, closed, dead) with offer letter PDF generation using three templates
Monitors for new delinquent parcels daily via EventBridge-triggered Lambda scans, with email alerts
Tracks tax forfeit auction dates across 8 Minnesota counties, with a floating widget showing upcoming auctions
Imports delinquent tax lists from 15 MN and 9 WI county PDFs/Excel files weekly
Provides environmental analysis (flood zones, wetlands, water bodies, wells, contamination), land cover classification (NLCD 2021), and soil analysis (SSURGO) for each parcel
Shows parcel boundaries on satellite imagery, with an interactive explore map that loads parcel shapes at high zoom levels
Manages parcel notes, file attachments (via S3 presigned URLs), shareable parcel links, and saved searches

What I'd Do Differently

I'd add geometry caching earlier. Every map view that shows parcel boundaries makes a live ArcGIS query with returnGeometry=true, which is slower than querying attributes only. Caching the geometry in DynamoDB with a TTL would make the explore map significantly faster.

I'd standardize the county configurations into a more declarative format. Right now each county is a Python dataclass with hand-tuned field mappings. A JSON configuration file that Claude Code could modify more easily would reduce the friction of adding new counties.

I'd separate the frontend into a proper monorepo with shared types between the API client and the backend models. The current setup has TypeScript interfaces in the frontend that mirror Pydantic models in the backend, and they get out of sync when fields are added.

But these are optimizations, not regrets. The system works. It finds land. It makes the research process faster. And it was built in conversations, not in sprints.