Google’s new hurricane model was breathtakingly good this season

The Atlantic hurricane season is drawing to a close, and with the tropics quieting down for a winter slumber, the focus of forecasters turns to evaluating what worked and what did not during the preceding season.

This year, the answers are clear. Although Google DeepMind’s Weather Lab only started releasing cyclone track forecasts in June, the company’s AI forecasting service performed exceptionally well. By contrast, the Global Forecast System model, operated by the US National Weather Service and is based on traditional physics and runs on powerful supercomputers, performed abysmally.

The official data comparing forecast model performance will not be published by the National Hurricane Center for a few months. However, Brian McNoldy, a senior researcher at the University of Miami, has already done some preliminary number crunching.

The results are stunning:

A little help in reading the graphic is in order. This chart sums up the track forecast accuracy for all 13 named storms in the Atlantic Basin this season, measuring the mean position error at various hours in the forecast, from 0 to 120 hours (five days). On this chart, the lower a line is, the better a model has performed.

A new champion

The dotted black line shows the average forecast error for official forecasts from the 2022 to 2024 seasons. What jumps out is that the United States’ premier global model, the GFS (denoted here as AVNI), is by far the worst-performing model. Meanwhile, at the bottom of the chart, in maroon, is the Google DeepMind model (GDMI), performing the best at nearly all forecast hours.

The difference in errors between the US GFS model and Google’s DeepMind is remarkable. At five days, the Google forecast had an error of 165 nautical miles compared to 360 nautical miles for the GFS model, more than twice as bad. This is the kind of error that causes forecasters to completely disregard one model in favor of another.

But there’s more. Google’s model was so good that it regularly beat the official forecast from the National Hurricane Center (OFCL), which is produced by human experts looking at a broad array of model data. The AI-based model also beat highly regarded “consensus models,” including the TVCN and HCCA products. For more information on various models and their designations, see here.

This early model comparison does not include the “gold standard” traditional, physics-based model produced by the European Centre for Medium-Range Weather Forecasts. However, the ECMWF model typically does not do better on hurricane track forecasts than the hurricane center or consensus models, which weigh several different model outputs. So it is unlikely to be superior to Google’s DeepMind.

This will change forecasting forever

It’s worth noting that DeepMind also did exceptionally well at intensity forecasting, which is the fluctuations in the strength of a hurricane. So in its first season, it nailed both hurricane tracks and intensity.

As a forecaster who has relied on traditional physics-based models for a quarter of a century, it is difficult to say how gobsmacking these results are. Going forward, it is safe to say that we will rely heavily on Google and other AI weather models, which are likely to improve in the coming years, as they are relatively new and have room for improvement.

“The beauty of DeepMind and other similar data-driven, AI-based weather models is how much more quickly they produce a forecast compared to their traditional physics-based counterparts that require some of the most expensive and advanced supercomputers in the world,” noted Michael Lowry, a hurricane specialist and author of the Eye on the Tropics newsletter, about the model performance. “Beyond that, these ‘smart’ models with their neural network architectures have the ability to learn from their mistakes and correct on-the-fly.”

What about the North American model?

As for the GFS model, it is difficult to explain why it performed so poorly this season. In the past, it has been, at worst, worthy of consideration in making a forecast. But this year, myself and other forecasters often disregarded it.

“It’s not immediately clear why the GFS performed so poorly this hurricane season,” Lowry wrote. “Some have speculated the lapse in data collection from DOGE-related government cuts this year could have been a contributing factor, but presumably such a factor would have affected other global physics-based models as well, not just the American GFS.”

With the US government in shutdown mode, we probably cannot expect many answers soon. But it seems clear that the massive upgrade of the model’s dynamic core, which began in 2019, has largely been a failure. If the GFS was a little bit behind some competitors a decade ago, it is now fading further and faster.