What is it worth to know the weather forecast 1 day earlier? That’s not a hypothetical question, traditional forecasting systems have been improving their skill at a rate of 1 day per decade. In other words, today’s 6-day forecast is as accurate as the 5-day forecast ten years ago. No one expects this rate of improvement to hold steady, it has to slow down eventually, right? Well in the last couple years GPUs and modern deep learning have actually sped it up.
Since 2022 there has been a flurry of weather deep learning systems research at companies like NVIDIA, Google DeepMind, Huawei and Microsoft (some of them built by yours truly). These models have little to no built-in physics and learn to forecast purely from data. Astonishingly, this approach, done correctly, produces better forecasts than traditional simulations of the physics of our atmosphere.
Jayesh and Cris came face-to-face with this technology’s potential while they were respectively leading the [ClimaX](https://arxiv.org/abs/2301.10343) and [Aurora](https://arxiv.org/abs/2405.13063) projects at Microsoft. The foundation models they built improved on the ECMWF’s forecasts, considered the gold standard in weather prediction, while only using a fraction of the available training data. Our mission at Silurian is to scale these models to their full potential and push them to the limits of physical predictability. Ultimately, we aim to model all infrastructure that is impacted by weather including the energy grid, agriculture, logistics, and defense. Hence: simulate the Earth.
Before we do all that, this summer we’ve built our own foundation model, GFT (Generative Forecasting Transformer), a 1.5B parameter frontier model that simulates global weather up to 14 days ahead at approximately 11km resolution (https://www.ycombinator.com/launches/Lcz-silurian-simulate-t...). Despite the scarce amount of extreme weather data in historical records, we have seen that GFT is performing extremely well on predicting 2024 hurricane tracks (https://silurian.ai/posts/001/hurricane_tracks). You can play around with our hurricane forecasts at https://hurricanes2024.silurian.ai. We visualize these using [cambecc/earth] (https://github.com/cambecc/earth), one of our favorite open source weather visualization tools.
We’re excited to be launching here on HN and would love to hear what you think!
One nit on your framing: NeuralGCM (https://www.nature.com/articles/s41586-024-07744-y), built by my team at Google, is currently at the top of the WeatherBench leaderboard and actually builds in lots of physics :).
We would love to metrics from your model in WeatherBench for comparison. When/if you have that, please do reach out.
Re NeuralGCM, indeed, our post should have said "*most* of these models". Definitely proves that combining ML and physics models can work really well. Thanks for your comments!
Main takeaway, gives me some hope:
But I will admit, I clicked the link to answer a more cynical question: why is Google funding a presumably super-expensive team of engineers and meteorologists to work on this without a related product in sight? The answer is both fascinating and boring: From https://research.google/philosophy/. Talk about a cool job! I hope such programs rode the intimidation-layoff wave somewhat peacefully…(Former Google employee, but I have no inside knowledge; this is just my speculation from public data.)
Owning your own data and serving systems can also make previously impossible features possible. When I was a Google intern in 2007 I attended a presentation by someone who had worked on Google's then-new in-house routing system for Google Maps (the system that generates directions between two locations). Before, they licensed a routing system from a third party, and it was expensive ($) and slow.
The in-house system was cheap enough to be almost free in comparison, and it produced results in tens of milliseconds instead of many hundreds or even thousands of milliseconds. That allowed Google to build the amazing-at-the-time "drag to change the route" feature that would live-update the route to pass through the point under your cursor. It ran a new routing query many times per second.
Haha. The old NLP saying "every time I fire a linguist, my performance goes up", now applies to the physicists....
What else do you hope to simulate, if this becomes successful?
But it's non-trivial to scale these new techniques into the field. A major factor is the scale of interest. FEMA's FIRMaps are typically at a 10m resolution not 11km.
They’re selling height maps of South-Africa, primary for flooding prediction for insurance companies.
Smart & friendly bunch.
They're a cool little team based in Copenhagen. Would be useful, for example, to look at the correlation between your weather data and regional energy production (solar and wind). Next level would be models to predict national hydro storage, but that is a lot more complex.
My advice is to drop the grid itself to the bottom of the list, and I say this as someone who worked at a national grid operator as the primary grid analyst. You'll never get access to sufficient data, and your model will never be correct. You're better off starting from a national 'adequacy' level and working your way down based on information made available via market operators.
Signed,
A California Resident
Shameless plug: recently we've built a demo that allows you to search for objects in San Francisco using natural language. You can look for things like Tesla cars, dry patches, boats, and more. Link: https://demo.bluesight.ai/
We've tried using Clay embeddings but we quickly found out that they perform poorly for similarity search compared to embeddings produced by CLIP fine tuned on OSM captions (SkyScript).
We did try to relate OSM tags to Clay embeddings, but it didn't scale well. We did not give up, but we are re-considering ( https://github.com/Clay-foundation/earth-text ). I think SatClip plus OSM is a better approach. or LLM embeddings mapped to Clay embeddings...
We tried to search for bridges, beaches, tennis courts, etc. It worked, but it didn't work well. The top of the ranking was filled with unrelated objects. We found that similarity scores are stacked together too much (similarity values are between 0.91 and 0.92 with 4 digit difference, ~200k tiles), so the encoder made very little difference between objects.
I believe that Clay can be used with additional fine-tuning for classification and segmentation, but standalone embeddings are pretty poor.
Check this: https://github.com/wangzhecheng/SkyScript. It is a dataset of OSM tags and satellite images. CLIP fine-tuned on that gives good embeddings for text-to-image search as well as image-to-image.
Disclosure: I work there.
https://climavision.com/
Dead Comment
Best of luck, and thanks for taking the leap! Humanity will surely thank you. Hopefully one day you can claim a bit of the NWS’ $1.2B annual budget, or the US Navy’s $infinity budget — if you haven’t, definitely reach out to NRL and see if they’ll buy what you’re selling!
Oh and C) reach out if you ever find the need to contract out a naive, cheap, and annoyingly-optimistic full stack engineer/philosopher ;)
Re question 2: Simulations don't need to be explainable. Being able to simulate simply means being able to provide a resonable evolution of a system given some potential set of initial conditions and other constraints. Even for physics-based simulations, when run at huge scale like with weather, it's debatable to what degree they are "interpretable".
Thanks for your questions!
[1] https://x.com/karpathy/status/1835024197506187617 [2] https://www.youtube.com/watch?v=-KMdo9AWJaQ&t=1010s