Launch HN: Silurian (YC S24) – Simulate the Earth

Glad to see that you can make ensemble forecasts of tropical cyclones! This absolutely essential for useful weather forecasts of uncertain events, and I am a little dissapointed by the frequent comparisons (not just you) of ML models to ECMWF's deterministic HRES model. HRES is more of a single realization of plausible weather, rather than an best estimate of "average" weather, so this is a bit of apples vs oranges.

One nit on your framing: NeuralGCM (https://www.nature.com/articles/s41586-024-07744-y), built by my team at Google, is currently at the top of the WeatherBench leaderboard and actually builds in lots of physics :).

We would love to metrics from your model in WeatherBench for comparison. When/if you have that, please do reach out.

cbodnar · a year ago

Agree looking at ensembles is super essential in this context and this is what the end of our blogpost is meant to highlight. At the same time, a good control run is also a prerequisite for good ensembles.

Re NeuralGCM, indeed, our post should have said "*most* of these models". Definitely proves that combining ML and physics models can work really well. Thanks for your comments!

bbor · a year ago

HN never disappoints, jeez. Thanks for chiming in with some expert context! I highly recommend any meteoronoobs like me to check out the pdf version of the linked paper, the diagrams are top notch — https://www.nature.com/articles/s41586-024-07744-y.pdf

Main takeaway, gives me some hope:

  Our results provide strong evidence for the disputed hypothesis that learning to predict short-term weather is an effective way to tune parameterizations for climate. NeuralGCM models trained on 72-hour forecasts are capable of realistic multi-year simulation. When provided with historical SSTs, they capture essential atmospheric dynamics such as seasonal circulation, monsoons and tropical cyclones.

But I will admit, I clicked the link to answer a more cynical question: why is Google funding a presumably super-expensive team of engineers and meteorologists to work on this without a related product in sight? The answer is both fascinating and boring:

  In recent years, computing has both expanded as a field and grown in its importance to society. Similarly, the research conducted at Google has broadened dramatically, becoming more important than ever to our mission. As such, our research philosophy has become more expansive than the hybrid approach to research we described in our CACM article six years ago and now incorporates a substantial amount of open-ended, long-term research driven more by scientific curiosity than current product needs.

From https://research.google/philosophy/. Talk about a cool job! I hope such programs rode the intimidation-layoff wave somewhat peacefully…

bruckie · a year ago

Google uses a lot of weather data in their products (search, Android, maps, assistant, probably others). If they license it (they previously used AccuWeather and Weather.com, IIRC), it presumably costs money. Now that they generate it in house, maybe it costs less money?

(Former Google employee, but I have no inside knowledge; this is just my speculation from public data.)

Owning your own data and serving systems can also make previously impossible features possible. When I was a Google intern in 2007 I attended a presentation by someone who had worked on Google's then-new in-house routing system for Google Maps (the system that generates directions between two locations). Before, they licensed a routing system from a third party, and it was expensive ($) and slow.

The in-house system was cheap enough to be almost free in comparison, and it produced results in tens of milliseconds instead of many hundreds or even thousands of milliseconds. That allowed Google to build the amazing-at-the-time "drag to change the route" feature that would live-update the route to pass through the point under your cursor. It ran a new routing query many times per second.

Fascinating. I have two quick questions, if you find the time:

  …we’ve built our own foundation model, GFT (Generative Forecasting Transformer), a 1.5B parameter frontier model that simulates global weather…

I’m constantly scolding people for trying to use LLMs for non-linguistic tasks, and thus getting deceptively disappointing results. The quintessential example is arithmetic, which makes me immediately dubious of a transformer built to model physics. That said, you’ve obviously found great empirical success already, so something’s working. Can you share some of your philosophical underpinnings for this approach, if they exist beyond “it’s a natural evolution of other DL tech”? Does your transformer operate in the same rough way as LLMs, or have you radically changed the architecture to better approach this problem?

  Hence: simulate the Earth.

When I read “simulate”, I immediately think of physics simulations built around interpretable/symbolic systems of elements and forces, which I would usually put in basic opposition to unguided/connectionist ML models. Why choose the word “simulate”, given that your models are essentially black boxes? Again, a pretty philosophical question that you don’t necessarily have to have an answer to for YC reasons, lol

Best of luck, and thanks for taking the leap! Humanity will surely thank you. Hopefully one day you can claim a bit of the NWS’ $1.2B annual budget, or the US Navy’s $infinity budget — if you haven’t, definitely reach out to NRL and see if they’ll buy what you’re selling!

Oh and C) reach out if you ever find the need to contract out a naive, cheap, and annoyingly-optimistic full stack engineer/philosopher ;)

cbodnar · a year ago

Re question 1: LLMs are already working pretty well for video generation (e.g. see Sora). You can also think of weather as some sort of video generation problem where you have hundreds of channels (one for each variable). So this is not inconsistent with other LLM success stories from other domains.

Re question 2: Simulations don't need to be explainable. Being able to simulate simply means being able to provide a resonable evolution of a system given some potential set of initial conditions and other constraints. Even for physics-based simulations, when run at huge scale like with weather, it's debatable to what degree they are "interpretable".

Thanks for your questions!

britannio · a year ago

Andrej Karpathy states that LLMs are highly general purpose technology for statistical modelling of token streams [1]. For example, comma.ai uses transformers in their self-driving model which is far from a linguistic task.

[1] https://x.com/karpathy/status/1835024197506187617 [2] https://www.youtube.com/watch?v=-KMdo9AWJaQ&t=1010s

shoyer · a year ago

d_burfoot · a year ago

> These models have little to no built-in physics and learn to forecast purely from data. Astonishingly, this approach, done correctly, produces better forecasts than traditional simulations of the physics of our atmosphere.

Haha. The old NLP saying "every time I fire a linguist, my performance goes up", now applies to the physicists....

joshdavham · a year ago

> Silurian builds foundation models to simulate the Earth, starting with the weather.

What else do you hope to simulate, if this becomes successful?

CSMastermind · a year ago

The actual killer thing would be flooding. Insurance has invested billions into trying to simulate risk here and models are still relatively weak.

raprosse · a year ago

100% aggree. Flooding is the single costliest natural disaster.

But it's non-trivial to scale these new techniques into the field. A major factor is the scale of interest. FEMA's FIRMaps are typically at a 10m resolution not 11km.

andruby · a year ago

If anyone wants to get into flooding, I recently met the people of geosmart.space

They’re selling height maps of South-Africa, primary for flooding prediction for insurance companies.

Smart & friendly bunch.

sbrother · a year ago

Wildfire would be a huge deal for insurance as well.

danielmarkbruce · a year ago

Why is it difficult? Is it predicting the amount of rain that is difficult? Or the terrain that will cause x amount of rain to cause problems? Or something else?

nikhil-shankar · a year ago

We want to branch out to industries which are highly dependent on weather. That way we can integrate their data together with our core competency: the weather and climate. Some examples include the energy grid, agriculture, logistics, and defense.

probablypower · a year ago

you'll have trouble simulating the grid, but for energy data you might want to look at (or get in touch with) these people: https://app.electricitymaps.com/map

They're a cool little team based in Copenhagen. Would be useful, for example, to look at the correlation between your weather data and regional energy production (solar and wind). Next level would be models to predict national hydro storage, but that is a lot more complex.

My advice is to drop the grid itself to the bottom of the list, and I say this as someone who worked at a national grid operator as the primary grid analyst. You'll never get access to sufficient data, and your model will never be correct. You're better off starting from a national 'adequacy' level and working your way down based on information made available via market operators.

cshimmin · a year ago

Do earthquakes next!

Signed,

A California Resident

Seems hard… weather is a structure in the Piagetian sense, with lots of individual elements influencing each other via static forces. Earthquakes are-AFAIU as a non-expert Californian-more about physical rock structures within the crust that we have only a vague idea of. Although hey, hopefully I’m wrong; maybe there’s a kind of pre-earthquake tremor for some kinds of quake that a big enough transformer could identify…

markstock · a year ago

The Earth is a multi-physics complex system and OP claiming to "Simulate the Earth" is misleading. Methods that work on the atmosphere may not work on other parts. There are numerous scientific projects working on simulation earthquakes, both using ML and more "traditional" physics.

If there is sufficient data, we can train on it!

brunosan · a year ago

Can we help you? We build the equivalent for land, as a non-profit. It's basically a geo Transformer MAE model (plus DINO, plus matrioska, plus ...), but largest and most trained (35 trillion pixels roughly). Most importantly fully open source and open license. I'd love to help you replace land masks with land embeddings, they should significantly help downscale the local effects (e.g. forest versus city) that afaik most weather forecast simplify with static land cover classes at most. https://github.com/Clay-foundation/model

Hi, this looks really cool! Can we meet? Shoot us an email at contact@silurian.ai

jonplackett · a year ago

Maybe between the two of you, you can tell me why my Alexa is telling me there’s no rain today, but it’s raining right now.

serjester · a year ago

This is awesome - how does this compare to the model that Google released last year, GraphCast?

Hi, Nikhil here. We haven't done a head-to-head comparison of GFT vs GraphCast, but our internal metrics show GFT improves on Aurora and published metrics show Aurora improves on GraphCast. You can see some technical details in section 6 of the Aurora paper (https://arxiv.org/pdf/2405.13063)

furiousteabag · a year ago

Curious to see what other things you will simulate in the future!

Shameless plug: recently we've built a demo that allows you to search for objects in San Francisco using natural language. You can look for things like Tesla cars, dry patches, boats, and more. Link: https://demo.bluesight.ai/

We've tried using Clay embeddings but we quickly found out that they perform poorly for similarity search compared to embeddings produced by CLIP fine tuned on OSM captions (SkyScript).

howdy! Clay makers here. Can you share more? Did you try Clay v1 or v0.2 What image size embeddings from what instrument?

We did try to relate OSM tags to Clay embeddings, but it didn't scale well. We did not give up, but we are re-considering ( https://github.com/Clay-foundation/earth-text ). I think SatClip plus OSM is a better approach. or LLM embeddings mapped to Clay embeddings...

Hey hey! We tried Clay v1 with 768 embeddings size using your tutorials. We then split NAIP SF to chips and indexed them. Afterwards, we performed image-to-image similarity search like in your explorer.

We tried to search for bridges, beaches, tennis courts, etc. It worked, but it didn't work well. The top of the ranking was filled with unrelated objects. We found that similarity scores are stacked together too much (similarity values are between 0.91 and 0.92 with 4 digit difference, ~200k tiles), so the encoder made very little difference between objects.

I believe that Clay can be used with additional fine-tuning for classification and segmentation, but standalone embeddings are pretty poor.

Check this: https://github.com/wangzhecheng/SkyScript. It is a dataset of OSM tags and satellite images. CLIP fine-tuned on that gives good embeddings for text-to-image search as well as image-to-image.

sltr · a year ago

Check out Climavision. They use AI to generate both hyper-local ("will there be a tornado over my town in the next 30 minutes?") and seasonal ("will there be a draught next fall?") forecasts, and they do it faster than the National Weather Service. They also operate their own private radar network to fill observational gaps.

Disclosure: I work there.

https://climavision.com/

Dead Comment