Physics-First ML: Why AI Should Correct, Not Replace, Scientific Models

A.C. Jokela

2026-01-25

The Seductive Trap of Pure ML

There's a pattern I've seen repeated across scientific computing: a team has a physics-based model that works reasonably well. Someone suggests "we could use machine learning to improve this." Six months later, they've replaced the physics entirely with a neural network trained on historical data. The model works great—until it doesn't.

In ballistics, this failure mode isn't just embarrassing; it's dangerous. A 10% error in predicting bullet drop at 1000 yards translates to missing a target by nearly a foot. In hunting, that's a wounded animal. In defense applications, the consequences are graver still.

After spending a considerable amount of time thinking about and studying ballistics systems, I've arrived at a principle that runs counter to the current AI zeitgeist: machine learning should correct physics, not replace it. This isn't a rejection of ML—it's a recognition that physics provides something ML cannot: bounded, predictable behavior grounded in first principles.

The Philosophy: Physics as Foundation, ML as Refinement

Consider two approaches to predicting muzzle velocity from powder charge:

# Approach A: Pure ML
velocity = neural_network.predict(powder_charge, bullet_weight, barrel_length, bc)

# Approach B: Physics-First with ML Correction
base_velocity = physics_model.calculate(powder_charge, bullet_weight, barrel_length, bc)
correction = ml_model.predict_correction(powder_charge, bullet_weight, barrel_length, bc)
final_velocity = base_velocity * correction  # correction is bounded: 0.95-1.05

Approach A learns everything from data. It might achieve lower training error, but it has no guardrails. Feed it an unusual combination of inputs, and it might predict a velocity of -500 fps or 50,000 fps. The model has no concept of what's physically possible.

Approach B starts with physics—conservation of energy, gas dynamics, thermodynamics. These equations have been validated for centuries. The ML component only learns the residual: the small systematic errors that arise from simplified assumptions, manufacturing tolerances, or environmental factors the physics model doesn't capture.

Critically, the correction factor is bounded. In our production systems, we enforce limits of 0.8x to 1.25x for ballistic coefficient corrections. If the ML model wants to apply a larger correction, we reject it entirely rather than trust an outlier prediction.

Why Bounded Corrections Matter

The bound isn't arbitrary. It emerges from understanding what ML can legitimately learn versus what indicates a fundamental mismatch.

A ballistic coefficient (BC) published by a manufacturer might differ from real-world performance by 5-15% due to:

Manufacturing tolerances in bullet production
Differences between the manufacturer's test conditions and yours
Simplifications in how BC is measured and reported
Velocity-dependent effects not captured in a single BC value

These are exactly the kinds of systematic errors ML can learn to correct. A well-trained model might learn that Brand X's published BCs are consistently 8% optimistic, or that handloaded ammunition with a specific powder tends to perform 3% better than factory loads.

But a correction factor of 2.5x? That's not a refinement—that's a fundamental mismatch. Either the input data is wrong, or we've matched against the wrong reference bullet entirely.

The chart above illustrates the relationship between correction factor and prediction error. Within the acceptable zone (0.8x to 1.25x), errors remain manageable—typically under 20 inches at 1000 yards. But as correction factors grow larger, errors explode. The red dot marks a real bug we discovered: a 2.49x correction that produced 37% error in drop predictions.

A Real-World Failure: The 2.49x Bug

This isn't theoretical. In our BC enhancement service, we had a bug that perfectly illustrates the danger of unbounded ML corrections.

A user submitted a calculation for a 140-grain 6.5mm bullet with a G7 BC of 0.238. Our system attempted to enhance this BC using doppler-derived reference data. The matching algorithm found a reference bullet—a 142-grain Sierra MatchKing—based on caliber and weight similarity.

The problem? The Sierra 142gr SMK has a G7 BC of approximately 0.593. Our system computed a "correction factor" of 2.49x and confidently applied it.

The results were catastrophic:

Metric	User's BC (0.238)	"Enhanced" BC (0.593)	Error
Drop at 1000 yards	312.4"	196.8"	37%
Time of Flight	1.847s	1.512s	18%
Wind Drift (10 mph)	58.2"	36.7"	37%

A shooter trusting the "enhanced" prediction would have aimed nearly 10 feet too low. The ML system was confidently wrong because it had no concept of reasonable bounds.

The fix was straightforward: reject any match where the reference BC differs from the input BC by more than 30%. If the user says their BC is 0.238, we don't believe a database entry claiming 0.593 is the "true" value—no matter how similar the bullet weights.

# The fix: BC tolerance check
bc_ratio = matched_bc / user_input_bc
BC_TOLERANCE = 0.30  # 30% maximum deviation

if bc_ratio < (1 - BC_TOLERANCE) or bc_ratio > (1 + BC_TOLERANCE):
    logger.info(f"BC mismatch: user={user_input_bc:.4f}, matched={matched_bc:.4f}")
    return None  # Reject the match, don't guess

Ground Truth: Doppler-Derived Data

The foundation of any ML correction system is ground truth data. In exterior ballistics, the gold standard is doppler radar measurement—tracking a bullet's actual velocity throughout its flight path, not just at the muzzle.

Published ballistic coefficients are typically derived from limited testing under controlled conditions. Doppler data captures real-world performance across the entire velocity envelope, from supersonic through transonic to subsonic flight. This is particularly crucial in the transonic region (roughly Mach 0.9 to 1.1), where drag characteristics change dramatically and simple models break down.

We've built our correction models on an extensive dataset of doppler-derived measurements. This data captures the true behavior of projectiles across varying conditions—not the idealized behavior assumed by physics models or the optimistic values sometimes found in marketing materials.

Lapua, to their credit, publishes comprehensive doppler-derived BC data for their projectiles, making it freely available to the shooting community. Their data shows the velocity-dependent nature of BC that simpler models ignore:

The green curve shows actual BC measured via doppler radar across the velocity envelope. Notice the significant drop in the transonic region—this is real physics that a single published BC value (the dashed red line) cannot capture.

When our ML correction system encounters a bullet, it doesn't just look up a single BC. It retrieves or interpolates a velocity-dependent BC curve, then applies a bounded correction based on how similar bullets have performed relative to their published specifications.

The Correction Architecture

Our BC enhancement service follows a strict hierarchy:

Physics first: Calculate trajectory using established equations of motion, drag models (G1, G7), and atmospheric corrections.
Data lookup: Match the input bullet against our reference database using caliber, weight, and—critically—BC similarity.
Bounded correction: If a match is found and the reference BC is within tolerance, compute a correction factor clamped to [0.8, 1.25].
Confidence scoring: Report how confident we are in the enhancement, based on match quality.
Graceful degradation: If no good match exists, return the original physics prediction with enhanced=false rather than guessing.

class BCEnhancementService:
    # Correction factor bounds - these are NOT arbitrary
    MIN_CORRECTION = 0.80  # -20% maximum reduction
    MAX_CORRECTION = 1.25  # +25% maximum increase
    BC_MATCH_TOLERANCE = 0.30  # 30% BC similarity required

    def enhance_bc(self, user_bc: float, caliber: float, weight: float,
                   velocity: float) -> EnhancementResult:

        # Step 1: Find matching reference bullet
        match = self._find_reference_match(caliber, weight, user_bc)

        if match is None:
            return EnhancementResult(
                enhanced_bc=user_bc,
                applied=False,
                reason="No matching reference data"
            )

        # Step 2: Verify BC is within tolerance
        bc_ratio = match.reference_bc / user_bc
        if not (1 - self.BC_MATCH_TOLERANCE <= bc_ratio <= 1 + self.BC_MATCH_TOLERANCE):
            return EnhancementResult(
                enhanced_bc=user_bc,
                applied=False,
                reason=f"BC mismatch: {bc_ratio:.2f}x outside tolerance"
            )

        # Step 3: Compute bounded correction
        raw_correction = match.doppler_bc / match.published_bc
        correction = max(self.MIN_CORRECTION,
                        min(self.MAX_CORRECTION, raw_correction))

        enhanced_bc = user_bc * correction

        return EnhancementResult(
            enhanced_bc=enhanced_bc,
            applied=True,
            correction_factor=correction,
            confidence=match.confidence_score
        )

The key insight is the multiple validation gates. Each step can reject the enhancement and fall back to physics. The ML component only activates when we have high-quality reference data that closely matches the input.

Quantifying the Impact

How much does proper bounding actually matter? We analyzed prediction errors across our dataset, comparing three approaches:

Physics only: Standard trajectory calculation with published BC
Unbounded ML: ML corrections with no limits
Bounded ML: ML corrections clamped to [0.8, 1.25]

The results are instructive:

Physics only produces a bell curve centered around 14 inches of error—respectable, predictable, but leaving accuracy on the table.
Bounded ML shifts the distribution left, reducing mean absolute error to 8.7 inches—a 39% improvement. The tight bounds prevent catastrophic failures while capturing real improvements.
Unbounded ML has a lower peak error for some shots but develops a long tail of catastrophic failures. Mean error is actually worse than bounded ML (11.4" vs 8.7") because the outliers are so severe.

The unbounded approach wins on the easy cases but fails catastrophically on edge cases. The bounded approach trades a small amount of peak performance for dramatically improved worst-case behavior.

When ML Should Admit Ignorance

Perhaps the most important principle in physics-first ML is knowing when to say "I don't know."

Our system encounters situations where it has no good answer:

A wildcat cartridge with no reference data
A bullet design we've never seen
Input parameters that seem inconsistent or erroneous
Conditions outside our training distribution

In each case, the correct response is to return to physics rather than hallucinate an answer. The fallback hierarchy:

def get_best_prediction(inputs: BallisticInputs) -> Prediction:
    # Try enhanced prediction with doppler-derived corrections
    enhanced = bc_enhancement.enhance(inputs)
    if enhanced.applied and enhanced.confidence > 0.7:
        return run_trajectory(inputs, enhanced.bc)

    # Fall back to velocity-segmented BC if available
    if inputs.bullet in SEGMENTED_BC_DATABASE:
        segments = SEGMENTED_BC_DATABASE[inputs.bullet]
        return run_trajectory_segmented(inputs, segments)

    # Fall back to published BC with physics
    return run_trajectory(inputs, inputs.published_bc)

Each fallback is still grounded in physics. We never reach a state where the system is guessing without foundation.

Practical Implementation Considerations

Building a physics-first ML system requires discipline:

Separate training and inference bounds: During training, you might observe correction factors outside [0.8, 1.25]. Record these as anomalies for investigation—they often indicate data quality issues—but don't let them influence your production bounds.
Log rejection reasons: When the system refuses to apply ML enhancement, log why. These logs become valuable for identifying gaps in your reference database and cases where users have unrealistic expectations.
Expose confidence to users: Don't hide uncertainty. Our API returns a confidence score with every enhanced prediction. Users who need guaranteed accuracy can filter for high-confidence results or fall back to pure physics.
Validate against ground truth continuously: We continuously compare predictions against new doppler measurements as they become available. Any systematic drift in correction factors triggers investigation.
Version your bounds: The [0.8, 1.25] bounds aren't eternal truth—they're empirically derived from current data. As reference databases grow and ML models improve, bounds might tighten. Version them alongside your models.

The Broader Principle

This approach extends beyond ballistics. Any domain where physics provides a solid foundation can benefit from physics-first ML:

Fluid dynamics: ML can correct for turbulence model errors, but Navier-Stokes remains the foundation.
Structural engineering: ML can refine material property estimates, but equilibrium equations are non-negotiable.
Orbital mechanics: ML can improve atmospheric drag estimates, but Kepler's laws aren't learned from data.
Weather prediction: ML can enhance parameterizations, but conservation of mass, momentum, and energy are axiomatic.

In each case, the physics provides constraints that keep ML predictions physically plausible, while ML captures systematic errors and unmodeled effects that pure physics misses.

Conclusion

The current AI enthusiasm has created pressure to replace working systems with end-to-end neural networks. In scientific computing, this is often a mistake.

Physics models have centuries of validation behind them. They're interpretable, bounded, and fail gracefully. Machine learning excels at capturing complex patterns and correcting systematic errors, but it lacks physical intuition and can fail catastrophically on out-of-distribution inputs.

The synthesis—physics as foundation, ML as refinement, with bounded corrections that can be rejected entirely—gives us the best of both worlds. We get improved accuracy where data supports it, and guaranteed physical plausibility everywhere else.

When the ML system wants to apply a 2.49x correction, the bounded approach says "no, that's not a correction, that's a different bullet." When it has no reference data, it says "I'll defer to physics rather than guess." When conditions are within its training distribution, it says "here's an 8% correction I'm confident about."

That humility—knowing when to correct and when to abstain—is what separates useful ML from dangerous ML. Physics provides the guardrails. ML provides the refinement. Together, they're more accurate than either alone.

The ballistics engine described in this post is open source and available at ballistics.rs.