Ten mistakes and one breakthrough: four months of hands-on experience

December 31, 2025

Posted in Blog

In September we thought we would calibrate the model in two weeks. In December we looked at numbers that finally converged — and realised the most valuable thing we earned was not multipliers or configs. It was the list of what does not work.

The report

Engineering blogs love victory stories: here is the problem, here is the fix, here is the chart going up. Our autumn looked different. Twenty-plus approaches, tested on real fires in Polissia. Most produced nothing. Several made things worse. And no scientific paper warned us in advance — because negative results in this field are almost never published.

That is unfair to whoever walks this path next. So here is our report — complete, with numbers, with dead ends. If you are building a fire forecast for your own country, this text will save you months. It cost us four.

How WildFiresUA works

Two paragraphs of context so the mistakes below make sense.

Fire evolution is cellular. The territory is split into cells; the model computes step by step how the fire front moves from cell to cell depending on fuel, wind and moisture. Fuel is described by the Anderson 13 fuel model classification — the de-facto world standard — which we built and validated for the Kyiv and Dnipropetrovsk oblasts, and are now extending to further regions of Ukraine.

Weather comes from NOAA’s GFS global model: wind, temperature, humidity, precipitation. WindNinja is integrated for terrain wind downscaling; in parallel we are experimenting with assimilating local weather-station data, so the forecast rests not only on a global grid but on measurements near the fire.

Evolution needs three things, each a separate workstream:

the right start — ignition points from VIIRS and MODIS satellite detections (and other sources): if the start is displaced, everything downstream is mathematically correct and geographically useless;
the right place — the forecast must grow where the real fire grows, measured by an area-overlap index (IoU);
the right burned-area shape — the hardest of the three: reproducing not just the size but the contour of the scar, with its wind-driven tongues and stops at barriers.

How fire-forecast success is measured at all

The core metric of this report is IoU (Intersection over Union, a.k.a. the Jaccard index): the overlap area of the predicted and the real burn scar divided by the area of their union. One means pixel-perfect agreement. Zero means no overlap at all. Its sibling is the Sørensen index — the same idea on a different scale, always slightly higher than IoU (e.g. IoU 0.16 corresponds to Sørensen ≈0.28).

What counts as “normal” in this field? The honest numbers from the literature are sobering:

Cruz and Alexander (2013), analysing 49 validation datasets with 1,278 observations, showed that mean error of rate-of-spread models ranges from 20 to 310%, and that ±35% error is considered a good result under research conditions;
Filippi et al. (2014) ran four models over 80 real Mediterranean fires in operational mode — knowing only an approximate ignition point, as in real life. Conclusion: even the best physical models score low agreement on most fires; complex models beat empirical ones, but nobody gets близько to pixel-perfect;
Sørensen values around 0.6 are classified in the literature as the boundary between moderate and substantial agreement; scores of 0.8–0.9 do appear — but in hindcast runs with exact perimeters, local weather and calibrated fuel, not in operational mode on global data.

We work precisely in operational mode: global weather, a satellite ignition point, no manual touch-ups.

Where the ten directions came from

We assembled them from the validation literature of the world’s systems — the Rothermel/FARSITE model family and its operational descendants: which levers authors turned, what they recommended, what they complained about. We added our own hypotheses for the specifics of Polissia. The backlog grew past twenty items; below are the ten that ate the most time and returned the least. In exactly the order we walked them.

Ten ways to not move forward

1Synthetic weather: “an average bad day”

The engineer’s first temptation is to simplify the world. We calibrated on a template: wind 5 m/s, FFMC dryness 90 (Fine Fuel Moisture Code — a fine-fuel dryness index from the Canadian fire-danger system; 90 means very dry, easy to ignite), identical for every fire. Fast, convenient, reproducible. And scientifically empty: every real fire lives in its own weather, and we were training the model on weather that never existed. When we finally plugged in real archived GFS fields, some seasons shifted sevenfold. Sevenfold.

Lesson: no calibration on synthetics. Real weather from day one, even if it is slow and painful.

2A month of turning one multiplier

It seemed logical: the model overpredicts area — slow it down. Multiplier down, run, multiplier down, run. A month of this meditation produced near-perfect area — and a shape that did not move a millimetre. IoU sat nailed at ~0.16. Because a multiplier scales the size of the blob, while the shape is drawn by ignition point, wind and fuel. We spent a month rubbing the wrong lamp.

Lesson: area and shape are independent axes. Separate levers, separate metrics, separate charts.

3“More accurate” fuel-moisture physics that broke everything

We replaced a simple empirical shortcut with the full American fuel-moisture system. Expected precision — got a massacre: large fires improved slightly, while small and medium fires were choked down to a tenth of their real area. Rolled back in a day. One day to roll back, a week to integrate.

Lesson: more complex physics is not better physics. Test every upgrade on all fire size classes before believing it.

4Mountain wind for flat Polissia

A week integrating WindNinja — terrain wind downscaling, a serious tool Californian systems are proud of. Effect on Polissia with its 0.17% slope: ±1% of area, zero on shape. Flatland does not bend wind. It only seemed to us that it did. The tool stays in the stack — it will matter in the Carpathians, not here.

Lesson: the lever must match the landscape. A mountain tool on a plain is ballast with great documentation.

5Overfitting to one perfect fire

We had one almost-flawless case: a large fire with clean satellite data, an exact ignition point, good weather. It was tempting to tune every parameter against it to the maximum IoU — and we did. On that case it looked beautiful. On the rest of the archive it got worse: what fit one fire perfectly dragged the model off and broke a dozen others. The classic trap — fitting the most convenient example instead of the whole distribution.

Lesson: one fire is not validation, it is an anecdote. Tune only against the entire archive at once.

6Tuning a fire that never happens

We tuned the crown-fire model — canopy transition, crown density. Effect: exactly zero, to within noise. Polissia burns low: grass, reeds, litter, peat. Crown-fire physics simply never engaged in our runs. We were tuning an aircraft engine on a car.

Lesson: first understand how your region burns. Then tune the model layer that actually operates there.

7Tiny screws better left alone

Two internal parameters of the spread-ellipse geometry. Combined effect on IoU: −0.006 to zero. Bonus: one of them, set “per official recommendation”, stopped the spread entirely — the simulated fire just refused to grow. A great way to lose two days hunting a bug that did not exist in our own code.

Lesson: engine-internal constants last, with an A/B test on every touch.

8The literature favourite that collapsed

Perimeter reseeding — the darling of data-assimilation papers: restart the model daily from the observed front. Elegant on paper. In our engine, a front started from a perimeter collapsed to 4–5 hectares — versus 1,500+ from a point ignition. The initialisation conflicted with the ignition mechanics deep inside the engine. We found the workaround much later — a cloud of points instead of a contour — but a month went into the sand.

Lesson: before building on a technique from the literature — run the cheapest smoke test: can your engine do this at all.

9The ceiling we refused to believe in

A systematic grid sweep of the speed multiplier — honest, methodical, to the end. IoU grew, grew, and hit a ceiling of 0.20. Beyond it, any further reduction started killing the fire’s first day: the model could not ramp up where the real fire was already running. The grid proved mathematically what we felt in our bones: a static global multiplier will never deliver IoU 0.5. Never. That was not bad news — that was the first honest news.

Lesson: if a parameter has a ceiling, the parameter is not bad. It is the wrong lever.

10The fuel map that lied to all of us for three months

The most expensive mistake of the autumn — and the most instructive. A global land-cover crosswalk assigned Polissia’s peatlands the class “fast dry grass”. Peat does not burn like that. Peat smoulders — slowly, stubbornly, for weeks. The model flew across “grass” with a tenfold area overprediction, and for months we treated the symptoms — multipliers, screws, sweeps from items 2, 7 and 9 — instead of asking what the model was burning on.

The revelation took one evening and two rasters: the real burn scar from Sentinel-2 imagery, overlaid on the peatland map. Match — one hundred percent. The entire test fire, down to the last pixel, sat on peat. I remember the silence in the team chat after that screenshot. Three months of tuning versus one data layer we could have overlaid in September.

Lesson: before calibrating the model — check what fuel it is burning on. One peatland raster was worth more than a quarter of tuning.

Ten items. Four months. Not one chart went up. If the story ended here, it would be a text about a team honestly losing an autumn. But there was an eleventh step.

Step 11. Fire sleeps at night. The model did not.

The breakthrough came not from a new paper and not from a clever algorithm. It came from a question so simple it was embarrassing to ask aloud: what does our fire do at night?

A real fire almost stops at night: temperature falls, humidity rises, grass takes on moisture, the front freezes till morning. The engine’s default drives the fire at full speed twenty-four hours a day. Every night hour inflated area that never existed. Every multi-day forecast was bloated by this invisible night driving.

We switched on a diurnal burning window: full speed at the daytime peak, about five percent at night. One config. And the results table we had stared at for four months without hope finally answered:

every second false hectare vanished from the forecast. Area overprediction fell 52%: from 6.5× to 3.1× on the test fire, from 11.4× to 5.5× in the hard configurations. One parameter did what three months of tuning could not;
shape agreement jumped 119% — testbed IoU from 0.13 to almost 0.28, more than double in a single step. For scale: the best of the previous twenty levers had delivered single percent;
across the full 2020–2025 fire archive the diurnal cycle became the largest single contributor to the final median IoU of 0.16 — Sørensen ≈0.28 in the scale most of the world literature reports. For operational mode on global data — where Filippi et al. recorded low scores for most models — this is an honest working level, and exactly the point from which the next one, 0.3, is visible;
the full stack got three times cheaper in error. Together with non-burnable barriers — roads, water, cropland — total overprediction fell from 31× to 12×: minus 61%, a forecast 2.6 times closer to reality than the engine’s out-of-the-box state at the start of autumn.

Why did this work when twenty other levers did not? Because all the others adjusted space: where it burns, how fast, in which direction. The diurnal cycle is the only one that adjusted time: when it burns at all. For the multi-day fires of Polissia — where peat smoulders for weeks — the time axis turned out to be the main one. We were searching for the answer in maps and multipliers, and it was in the clock.

What we carry into 2026

Peat-aware physics. A dedicated fuel class for peatlands — drained and undrained — with live moisture from satellite soil data. Treating the root, not the symptoms, of mistake 10.
Thermal-detection classification. Telling the advancing front from the smouldering core in VIIRS/MODIS data: the front shapes the forecast, the core drives smoke and re-ignitions. Today the model sees them as the same thing.
12-hour assimilation. Re-anchoring the forecast to every satellite overpass — via the ignition point cloud, the workaround born from mistake 8. Plus local weather-station data, so the model’s weather is more than a global GFS grid.
Stratified validation. Peat and mineral-soil fires in separate tables. Different physics must not be averaged into one global multiplier — we already know how that ends.
New regions. The Anderson fuel map, proven for Kyiv and Dnipropetrovsk oblasts, rolls out to further regions of Ukraine.

Four months ago we wanted a pretty IoU. Now we want correct physics — and we know the pretty IoU will follow. That is probably the main result of the autumn, and it does not fit into any metric.

Work with us

WildFiresUA is a Ukrainian climate-tech team working on air quality, building an operational wildfire detection and forecasting system on open data — Copernicus Sentinel-2, VIIRS/MODIS, GFS — validated against the world’s published benchmarks, on real fires, in a country at war.

We are open to:

acceleration programs and research collaborations in climate tech, Earth observation and disaster resilience;
strategic partnerships with teams and organisations that need our competence in fire spread modelling, satellite data pipelines and operational validation.

Contact: yourairtest.com — contact page

References

Cruz M.G., Alexander M.E. Uncertainty associated with model predictions of surface and crown fire rates of spread. Environmental Modelling & Software, 2013, 47:16–28. doi.org
Filippi J.-B., Mallet V., Nader B. Evaluation of forest fire models on a large observation database. Natural Hazards and Earth System Sciences, 2014, 14:3077–3091. nhess.copernicus.org
Duff T.J., Chong D.M., Tolhurst K.G. Indices for the evaluation of wildfire spread simulations using contemporaneous predictions and observations of burnt area. Environmental Modelling & Software, 2016, 83:276–285. doi.org
Anderson H.E. Aids to Determining Fuel Models for Estimating Fire Behavior. USDA Forest Service, GTR INT-122, 1982. research.fs.usda.gov
Rothermel R.C. A Mathematical Model for Predicting Fire Spread in Wildland Fuels. USDA Forest Service, Research Paper INT-115, 1972. research.fs.usda.gov
Aragoneses E. et al. Comparison of Different Models to Simulate Forest Fire Spread: A Case Study. Forests, 2024, 15(3):563. mdpi.com
NASA FIRMS — Fire Information for Resource Management System (VIIRS/MODIS active fires). firms.modaps.eosdis.nasa.gov
NOAA — Global Forecast System (GFS). ncei.noaa.gov
WindNinja — Missoula Fire Sciences Laboratory. github.com/firelab