Launch HN: Sorcerer (YC S24) – Weather balloons that collect more data

Hey HN! We’re Max, Alex, and Austin, the team behind Sorcerer (https://sorcerer.earth). Sorcerer builds weather balloons that last for over six months, collecting 1000x more data per dollar and reaching previously inaccessible regions.

In 1981, weather disasters caused $3.5 billion in damages in the United States. In 2023, that number was $94.9 billion (https://www.ncei.noaa.gov/access/billions/time-series). The National Weather Service spends billions annually on its network of weather balloons, satellites, and aircraft sensors – generating hundreds of terabytes of data every day. This data, called observation data, is fed into massive supercomputers running advanced physics to produce global weather forecasts. Despite this cost, there are still places in the US where we don't know what the temperature will be two days from now: https://www.washingtonpost.com/climate-environment/interacti.... And for the rest of the world that lacks weather infrastructure? There’s always the Weather Rock: https://en.wikipedia.org/wiki/Weather_rock.

The most important data for these forecasts come from vertical data ‘slices’ of the atmosphere, called soundings. Every day 2,500 single-use latex radiosondes are launched across the globe to collect these soundings. They stay aloft for about two hours before popping and falling back to Earth. Launch sites for these systems are sparse in Latin America and Africa, and they’re completely non-existent over oceans. This leaves about 80% of the globe with inadequate weather data for accurate predictions.

The coverage gap became painfully evident to Max and Alex during their time at Urban Sky. While building balloons for high-altitude aerial imaging, they kept running into a problem: no matter what weather forecast they used, they couldn’t get accurate wind predictions for the upper atmosphere. They tried all of the free and commercial forecast products, but none of them were accurate enough. Digging into it more, we learned that a big part of the problem was the lack of high-quality in-situ data at those altitudes.

To solve this problem, our systems ascend and descend between sea level and 65,000ft several times a day to collect vertical data soundings. Each vehicle (balloon + payload) weighs less than a pound and can be launched from anywhere in the world, per the FAA and ICAO reg. Here’s one we launched from Potrero Hill in SF, https://youtu.be/75fN5WpRWH0 and here’s another near the Golden Gate Bridge, https://youtu.be/7yLmzLPUFVQ. Although we can’t “drive” these balloons laterally, we can use opposing wind layers to target or avoid specific regions. Here’s what a few simulated flight paths look like, to give you an idea: https://youtu.be/F_Di8cjaEUY

Our payload uses a satellite transceiver for communications and a small, thin film solar panel array to generate power. In addition to the weather data, we also get real-time telemetry from the vehicles, which we use to optimize their flight paths. This includes maintaining the optimal spacing between balloons and steering them to a recovery zone at the end of their lifespan so we can recycle them.

These systems spend most of their time in the stratosphere which is an extremely unforgiving environment. We’ll often see temperatures as low as -80°C while flying near the equator. Throughout the day, we experience extreme temperature cycling as they ascend and descend through the atmosphere. We’ll often encounter 100mph+ wind shears near the boundary with the troposphere (the tropopause) that can rip apart the balloon envelope. These conditions make the stratosphere a very difficult place to deploy to prod.

The real magic of what we’re building will come into play when we have hundreds of these systems in the air over data-sparse regions. But even now, we can do useful and interesting things with them. Some of our early customers are companies who fly very big, very expensive things into the stratosphere. They use our balloons to give them a clear idea of what conditions are ahead of their operations, and we’re working on a forecast product specifically designed for the stratosphere.

The combination of long duration and low cost is novel. We can theoretically maintain thousands of balloons in the atmosphere at any given time for a tenth of the cost of one useful weather satellite. We’re also using the data we collect to train AI models that produce forecasts with better accuracy than existing numerical (supercomputer) forecasts. Because we’re collecting totally unique data over areas that lack observation, our models will maintain a consistent edge versus models that are only trained on open data.

We’re really excited to be launching Sorcerer here with you! We’d love to hear what you think. And if you find one of our balloons in the Bay Area: Sorry! It’s still a work in progress (and please get it back to us).

I’ll leave you all with a bonus video of Paul Buchheit launching one of our balloons, which we thought was pretty cool: https://www.youtube.com/watch?v=-sngF9VvDzg

Very cool! How are the balloons transferring telemetry back to earth for analysis, etc?

Asking because my research at the University of Oxford was around hyper space-efficient data transfer from remote locations for a fraction of the price.

The result was an award-winning technology (https://jsonbinpack.sourcemeta.com) to serialise plain JSON that was proven to be more space-efficient than every tested alternative (including Protocol Buffers, Apache Avro, ASN.1, etc) in every tested case (https://arxiv.org/abs/2211.12799).

If it's interesting, I'd love to connect and discuss (jv@jviotti.com) how at least the open-source offering could help.

jviotti · a year ago

It surprised me how popular this message got. I love nerding out about binary serialization and space-efficiency and great to see I'm not the only one :)

If you want to get deeper, I published two (publicly available) deep papers studying the current state of JSON-compatible binary serialization that you might enjoy. They study in a lot of detail technologies like Protocol Buffers, CBOR, MessagePack, and others that were mentioned in the thread:

- https://arxiv.org/abs/2201.02089

- https://arxiv.org/abs/2201.03051

Hope they are useful!

leeoniya · a year ago

> JSON BinPack is space-efficient, but what about runtime-efficiency?

> When transmitting data over the Internet, time is the bottleneck, making computation essentially free in comparison.

i thought this was an odd sales pitch from the jsonbinpack site, given that a central use-case is IoT, which frequently runs on batteries or power-constrained environments where there's no such thing as "essentially free"

jviotti · a year ago

Fair point! "Embedded" and "IoT" are overloaded terms. For example, you find "IoT" devices all the way from extremely low powered micro-controllers to Linux-based ones with plenty of power and they are all considered "embedded". I'll take notes to improve the wording.

That said, the production-ready implementation of JSON BinPack is designed to run on low powered devices and still provide those same benefits.

A lot of the current work is happening at https://github.com/sourcemeta/jsontoolkit, a dependency of JSON BinPack that implements a state-of-the-art JSON Schema compiler (I'm a TSC member of JSON Schema btw) to do fast and efficient schema evaluation within JSON BinPack on low powered devices compared to the current prototype (which requires schema evaluation for resolving logical schema operators). Just an example of the complex runtime-efficiency tracks we are pursuing.

ok_dad · a year ago

> batteries or power-constrained environments

I would imagine that CPUs are much more efficient than a satellite transmitter, probably? I guess you'd have to balance the additional computational energy required vs. the savings in energy from less transmitting.

freeone3000 · a year ago

For sure, but radio transmitter time is almost always much more expensive than CPU time! It’s 4mA-20mA vs 180mA on an esp32; having the radio on is a 160mA load! As long as every seven milliseconds compressing saves a millisecond of transmission, your compression algorithm comes out ahead.

tndl · a year ago

Let's definitely talk, we're using protobufs right now. I'll send an email

lajr · a year ago

This looks promising! One of the important aspects of protocol buffers, avro etc is how they deal with evolving schemas and backwards/forward compatibility. I don't see anything in the docs addressing that. Is it possible for old services to handle new payloads / new services to handle old payloads or do senders and receivers need to be rewritten each time the schema changes?

jviotti · a year ago

Good question! Compared to Protocol Buffers and Apache Avro, that each have their own specialised schema languages created by them, for them, JSON BinPack taps into the popular and industry-standard JSON Schema language.

That means that you can use any tooling/approach from the wide JSON Schema ecosystem to manage schema evolution. A popular one from the decentralised systems world is Cambria (https://www.inkandswitch.com/cambria/).

That said, I do recognise that schema evolution tech in the JSON Schema world is not as great as it should be. I'm a TSC member of JSON Schema and a few of us are definitely thinking hard on this problem too and trying to make it even better that the competition.

michaelmior · a year ago

A lot of people already think about this problem with respect to API compatibility for REST services using the OpenAPI spec for example. It's possible to have a JSON Schema which is backwards compatible with previous versions. I'm not sure how backwards-compatible the resulting JSON BinPack schemas are however.

promiseofbeans · a year ago

Do you have any info on how your system stacks up to msgpack? (https://msgpack.org/index.html)

Asking because we use msgpack in production at work and it can sometimes be a bit slower to encode/decode than is ideal when dealing with real-time data.

jviotti · a year ago

We do! See https://benchmark.sourcemeta.com for a live benchmark and https://arxiv.org/abs/2211.12799 for a more detailed academic benchmark.

The TLDR is that is that if you use JSON BinPack on schema-less mode, its still more space-efficient than MessagePack but not by a huge margin (depends on the type of data of course). But if you start passing a JSON Schema along with your data, the results become way smaller.

Please reach out to jv@jviotti.com. I would love to discuss your use case more.

the__alchemist · a year ago

Why this over a compact, data-specific format? JSON feels like an unnecessary limitation for this company's use case. I am having a hard time believing it is more space-efficient than a purpose-built format.

jviotti · a year ago

Compared to other serialisation formats, JSON BinPack analyses your data and derives custom encoding rules that are specific to the data at hand given all the context it had on it. That's why on JSON BinPack, the static analysis part is the most complex one by far, and why I'm building so much JSON Schema advanced tooling in https://github.com/sourcemeta/jsontoolkit (i.e. check the huge API surface for JSON Schema in the docs: https://jsontoolkit.sourcemeta.com/group__jsonschema.html)

Of course there is still a lot to do, but the idea being that what you get with JSON BinPack is extremely close to what you would have done for manually encoding your data, except that you don't have to worry about encoding things yourself :) Thus you get the best of both worlds: the nicety of JSON and the space-efficiency of manual encoding.

kyrofa · a year ago

From the OP:

> Our payload uses a satellite transceiver for communications

jviotti · a year ago

That's the hardware. I meant on the software side through the transceiver. If you transfer less bits through the satellite transceiver, I believe you can probably reduce costs.

f549abd01 · a year ago

Sounds cool. How does it differ from CBOR?

jviotti · a year ago

CBOR is a schema-less binary format. JSON BinPack supports both schema-less (like CBOR) and schema-driven (with JSON Schema) modes. Even on the schema-less mode, JSON BinPack is more space-efficient than CBOR. See https://benchmark.sourcemeta.com for a live benchmark and https://arxiv.org/abs/2211.12799 for a more detailed academic benchmark

A person who is 6 years of age or younger who intentionally releases, organizes the release of, or intentionally causes to be released balloons as prohibited by s. 379.233 does not violate subsection (4) and is not. subject to the penalties specified in subparagraph 1.

johnsillings · a year ago

This is one of the most fascinating Launch HNs in a while. Excited to follow your progress and congrats on the launch!

faitswulff · a year ago

It’s a literal launch!

Being a balloon company means we get to launch pretty much every day, which is very fun :)

davidw · a year ago

Yeah, I don't have anything of substance to say other than it's really cool to see someone doing something innovative in a niche I'd never really thought of.

Thanks :)

firesteelrain · a year ago

Very Cool! We have made and built many PicoBalloons that have circumnavigated the globe. No weather reports - just WSPR reports. We can detect spots in the world where GPS spoofing is happening.

“ Each vehicle (balloon + payload) weighs less than a pound and can be launched from anywhere in the world, per the FAA and ICAO reg”

Florida recently passed a law that does not allow PicoBalloon or your weather balloon type launches from Florida soil. It will result in a $150 fine.

HB321

https://www.flsenate.gov/Session/Bill/2024/321/BillText/er/P...

Article

https://www.cbsnews.com/miami/news/floridas-balloon-ban-will...

We're actually just going to have kids launch our balloons, it's no problem:

maxmclau · a year ago

Got our start with WSPR pico balloons!

Just saw that - No Florida launches on the horizon luckily

simjnd · a year ago

> These conditions make the stratosphere a very difficult place to deploy to prod

This sentence is legendary

Every once in a while things go spectacularly wrong up there and we kick ourselves for not doing a b2b saas

darknavi · a year ago

> b2b

You are, as long as you mean balloons to business

freestyle24147 · a year ago

[flagged]

OccamsMirror · a year ago

You didn't have to contribute this snark.

tagami · a year ago

For those interested, check out Bill Brown, the grandfather of lightweight modern ballooning. Multiple circumnavigations have been achieved with his equipment. https://www.stratoballooning.org/membership#!biz/id/5f4d7b97...

Bill Brown is a legend! Love Lee Meadows with SBS as well - 767+ days is unreal

https://www.scientificballoonsolutions.com/news/

reachableceo · a year ago

Happy to discuss super pressure / float in depth.

Can help with the federal contract side and mass manufacturing etc.

Charles@turnsys.com

Sent you an email, thanks!

pagade · a year ago

To a layperson like me, could you explain how these balloons will be cleaned up / collected after their life? What material are they made up of?

Sure thing! They're made of about 300 grams of polyethylene. Towards the end of their lifespan, we can steer them to an area that's easy for us to drive out and pick them up. The payload has a GPS, which lets us track where they are both in the sky and on the ground.

Right now, most weather balloons fall back to Earth and stay where they land unless someone happens across them (since they can't be controlled and only last a couple of hours).

mariushn · a year ago

How do you control the altitude? I would imagine 'heat/cool the air inside the baloon', but this would be too energy intensive?

Congratulations for a great non-saas market and product!

DAlperin · a year ago

> we can steer them to an area that's easy for us to drive out and pick them up.

What does this look like in practice? As you mentioned I know you don't really have any lateral control, but I imagine you can wait for it to overfly somewhere convenient to descend?

bigveech · a year ago

Here's a video of one of our recent recoveries: https://youtu.be/8DWYLG_95V0

aflukasz · a year ago

> In 1981, weather disasters caused $3.5 billion in damages in the United States. In 2023, that number was $94.9 billion (https://www.ncei.noaa.gov/access/billions/time-series).

Does that surprise someone? I think I would not have guessed this growth to be on such a scale. The chart suggests that severe storms are the main culprit.

mdorazio · a year ago

Not at all. Look at the growth in human buildings in the most at-risk areas and you’ll see why that number is so big now. It’s only slightly due to an increase in severe weather event frequency / severity.

mrandish · a year ago

Indeed, and not just building more in more at-risk places but also the cost of building materials, construction labor and code compliance requirements have all generally increased more than baseline inflation. Factors like these tend to greatly increase recent estimates vs historical.

I read a paper a few years back which dove into how the data sources for weather damage assessment have changed a lot over the years. Much of the increase is due to more complete reporting and changes in categorization. Also, nowadays more things are insured and modern IT has made gathering the insurance reporting far more exhaustive. Plus local, state and federal agencies responsible for relief and/or recovery are gathering and reporting increasing amounts of data with each decade since the 70s (in part because their budgets rely on it). Factors like these mean in prior decades the total damage costs may have been more similar to today's than they appear but a lot of the damage data we gather and report now wasn't counted or gathered then.

Although I have no experience related to weather science, I remember the paper because it made me realize how many broad-based, multi-decadal historical data comparisons we see should have sizable error bars (which never make it into the headline and rarely even into the article). Data sources, gathering and reporting methods and motivations are rarely constant on long time scales - especially since the era of modern computing. Of course, good data scientists try to adjust for known variances but in a big ecosystem with so many evolving sources, systems, entities and agencies, it quickly gets wickedly complex.

Definitely a bit of cherry picking. Just 2 years later in 1983 the damages were $36 billion, but that wouldn't make quite as scary of a statement for the website.

agurk · a year ago

One detail here is that 1981 dollars aren't 2023 dollars, so to compare they need to be adjusted.

Using [0] $3.5 bn in 1981 would have been worth $11.7 bn in 2023.

Another comment [1] noted (but unfortunately didn't cite) that two years later the damage was assessed at $36 bn, or $110 bn in 2023 dollars.

[0] https://www.usinflationcalculator.com/

[1] https://news.ycombinator.com/item?id=41295116

No, they don't need to be adjusted. The linked website has already adjusted for CPI. There's even an option to turn on/off adjusting, and it's on by default. I didn't cite because this is using the same data / website as the original claim.

Yeah, it's a shocking number, and it's just for the US. The global estimates for severe weather are even higher [0], and in places with less infrastructure, the costs are usually more heavily weighted toward human life lost.

Obviously what we're doing can't prevent severe weather from happening, but even very small improvements in accuracy and timelines can have a massive beneficial effect when a disaster does happen. My cofounders and I are all from Florida, so hurricanes are the most visceral examples for us. When hurricanes hit, there are always issues along the lines of "we didn't have the right resources in the right places to respond effectively." Those types of issues can be combated with better info.

[0]: https://www.statista.com/statistics/818411/weather-catastrop...