Databricks acquires Neon

Data warehousing is quickly becoming a commodity through open-source. I know a company who had 2PBs+ of data in Cloudera. But instead of moving to the cloud (and Databricks), they saved 5X costs by building their own analytics platform with Iceberg, Trino and Superset. The k8s operators are enterprise quality now. On-premises S3 is good, too. You can have great hardware (servers with 128 cpus and 1 TB) and networking. It's not just Trino. StarRocks and Clickhouse have enterprise grade k8s helm charts/operators. That 60bn valuation is an albtross on Databrick's neck - their pricing will have to justify it, and their core business is commoditizing.

Neon filled their product gap of not having an operational (row-oriented) DB.

richardw · 9 months ago

Not commoditising for enterprise. My last gig wouldn’t allow open source software or any company that might not be there in a decade, or which kept data anywhere but our own tenant. We’d look for the “call us” pricing rather than hate it, which I normally do. We added databricks and it was considered one of my top three achievements, because they don’t have to think about data platforms again, just focus on using it. It’s SO expensive for an enterprise to rejig for a new platform that you can’t rely on (insert open source project here).

I managed to add one startup and so far it’s done very well, but it was an exceptional case and the global CEO wanted the functionality. But it used MongoDB and ops team didn’t have any skills, so rather than learn one tiny thing for an irrelevant data store they added cash to use Atlas with all the support and RBAC etc etc. They couldn’t use the default Azure firewall because they only know one firewall, so added one of those too. Also loaded with contracts. Kept hiring load down, one number to call, job done. Startups cost is $5-10k per year. Support BS about $40k. (I forget the exact numbers but it dwarfed the startup costs.)

Startups are from Venus, enterprise are from Jupiter.

antruok · 9 months ago

Enterpise also often wants a full data platform (like Databricks), not a plain data warehouse.

jeffbee · 9 months ago

> My last gig wouldn’t allow open source software or any company that might not be there in a decade

I bet they had VMware all over the place.

adolph · 9 months ago

> Not commoditising for enterprise. My last gig wouldn’t allow open source software or any company that might not be there in a decade, or which kept data anywhere but our own tenant.

Hence IBM talking up Iceberg: https://www.ibm.com/think/topics/apache-iceberg

hlpn · 9 months ago

Totally agree. Happy open source StarRocks user here using the k8s operator for customer-facing analytics on terabytes of data. There's very little need for Databricks in my world.

anilshanbhag · 9 months ago

Looking at StarRocks site (https://www.starrocks.io/), they compare against Clickhouse, Druid and Trino. Don't even compare against Spark/Databricks! Guess Spark is just not competitive.

lars_francke · 9 months ago

Anyone looking for an open-source Cloudera alternative based on Kubernetes operators. We're building one (~5 years old now): https://stackable.tech/ & https://github.com/stackabletech/

On-premise open-source S3 is a problem though. MinIO is not something we're touching and other than that it looks a bit empty with enterprise ready solutions.

SOLAR_FIELDS · 9 months ago

Don’t SeaweedFS and ceph/rook also offer this? Ceph/rook is definitely enterprise ready

__turbobrew__ · 9 months ago

> On-premise open-source S3 is a problem though

Rook/ceph with object storage is pretty bulletproof: https://www.rook.io/docs/rook/v1.17/Storage-Configuration/Ob...

I do wish more systems had high quality operators out there. A lot of operators I have looked into are half baked, not reliable, or not supported.

pjdbruin · 9 months ago

Great to see cost-effective alternatives to Cloudera and Databricks! We’ve spent three years building IOMETE, a self-hosted data lakehouse that combines Apache Iceberg and Spark, designed to run natively on Kubernetes. We’re focused on on-premises deployments to address the growing need for data sovereignty and low TCO, with a streamlined setup for large-scale analytics. Early adopters are seeing strong results. Curious about your experience with Trino and Superset—any tips for optimizing performance at scale?

bittermandel · 9 months ago

Wouldn't Rook be a good solution? It's definitely proven in much larger settings than Minio, as it's just Ceph.

matt-p · 9 months ago

What's wrong with minio out of curiosity? Ceph an option?

kwillets · 9 months ago

It's been a commodity for decades now. Metrics like price-performance have a long history, but the SnowBricks products fail at them quite dramatically. The difference is hard-sell vs. soft or no-sell.

datadrivenangel · 9 months ago

Not having to buy an appliance and pay for it up front is quite a valuable option. Also the split between processing and storage allows for better archival and scaling strategies.

jflkdsjlcsuq · 9 months ago

I have been happily using ClickHouse for the past couple of years without any issue. Rock solid database with wide variety of features and fulfills all my needs. My favourite is the "external dictionary" feature which easily allows me to integrate it with other datastores like Postgres and Redis.

datadrivenangel · 9 months ago

But why would you buy an operational DB from Databricks? The only thing that makes sense is Databricks flailing to maintain market cap.

antruok · 9 months ago

In addition to the AI use cases, sometimes you wanna share the data warehouse data in oltp way for fast lookups and high concurrency. Not sure whether Neon will do that but I hope so.

One example from Snowflake is hybrid tables which adds rowstore next to columnar.

OLAP + OLTP = HTAP

ako · 9 months ago

ETL to bring all your data into Databricks/Snowflake is a lot of effort. Much better if your OLTP data already exists in Databricks and you directly access it from your OLAP layer.

swyx · 9 months ago

if Databricks just wanted a row DB they couldve done postgres themselves. paying this much for Neon i think is a sign that Neon has something special they want (which, knowing their marketing line, is "independently scalable storage and compute for postgres")

yencabulator · 9 months ago

Easy quick cheap forks of database state for AI agents to muck with.

t0mas88 · 9 months ago

That sounds like AWS Aurora?

dustingetz · 9 months ago

"Time is the denominator"

data_marsupial · 9 months ago

what do they use for ETL?

Maybe unrelated but Databricks is the most annoying garbage I have ever had to use. It fascinates me how anyone uses it by choice.

mritchie712 · 9 months ago

Databricks started in 2013 when Spark sucked (it still does) and they aimed to make it better / faster (which they do).

The product is still centered Spark, but most companies don't want or need Spark and a combination of Iceberg and DuckDB will work for 95% of companies. It's cheaper, just as fast or faster and way easier to reason about.

We're building a data platform around that premise at Definite[0]. It includes everything you need to get started with data (ETL, BI, datalake).

0 - https://www.definite.app/

isignal · 9 months ago

Aren't the alternatives you mentioned - icerberg and duckdb - both storage solutions while spark is a way to express distributed compute? I'm a bit out of touch with this space, is there a newer way to express distributed compute?

MOARDONGZPLZ · 9 months ago

Databricks is the Jira of dealing with data. No one wants to use it, it sucks, there are too many features to try to appease all possible users but none of them particularly good, and there are substantially better options now than there were not long ago. I would never, ever use it by choice.

winwang · 9 months ago

What options do you use? I don't work for Databricks but I am building my own data infra startup, so I'd like to hear what "good" looks like!

swalsh · 9 months ago

Really hard disagree. Coming from hadoop, databricks is utopia. It's stable, fast, scales really well if you have massive datasets.

The biggest gripe in have is how crazy expensive it is.

willvarfar · 9 months ago

Spark was a really big step up from hadoop.

But these days just use trino or whatever. There are lots of new ways to work on data that are all bigger steps up - ergonomically, performance and price - over spark as spark was over hadoop.

DebtDeflation · 9 months ago

Hadoop was fundamentally a batch processing system for large data files that was never intended for the sort of online reporting and analytics workloads for which the DW concept addressed. No amount of Pig and Hive and HBase and subsequent tools layered on top of it could ever change that basic fact.

winwang · 9 months ago

If cost (or perf) is the issue, we're building a super-efficient, GPU-accelerated, easy-to-use Spark: https://news.ycombinator.com/item?id=43964505

robertkoss · 9 months ago

I used to be a big fan of the platform because back in 2020 / 2021 it really was the only reasonable choice compared to AWS / Azure / Snowflake for building data platforms.

Today it suffers from feature creep and too many pivots & acquisitions. That they are insanely bad at naming features doesn't help either.

kristjansson · 9 months ago

I’d settle for only one bad name per feature from them. Alas, they don’t feel so limited

winwang · 9 months ago

I'm building another Spark-based choice now with ParaQuery (GPU-accelerated Spark): https://news.ycombinator.com/item?id=43964505

apwell23 · 9 months ago

Is hosting spark really that groundbreaking ? Also isn't spark kind of too complicated for 90% of enterprisey data-processing .

I really don't understand the valuation for this company. Why is it so high.

teetertater · 9 months ago

Yes, spark is too complicated for most cases;

But if you're inclined to use it, databricks' setup of spark just saves you an incredible amount of time that you'd normally waste on configuration and wiring infrastructure (storage, compute, pipelines, unified access, VPNs etc). It's expensive and opinionated, but the data engineers you need to deal with spark OOM errors constantly is greater. Also databricks' default configs give you MUCH better performance out of the box than anything DIY and you don't have to fiddle with partitions and super niche config options to get even medium workloads stable

isoprophlex · 9 months ago

The market for IBM-like software and platforms (everyone else uses this! It must be good!) apparently wasn't saturated yet

viccis · 9 months ago

They push Serverless so hard but there are SO MANY limitations and surprise gotchas. It's driving me absolutely insane.

datadrivenangel · 9 months ago

And it tends to be notably more expensive! 4-5x the price for less features...

hacliff · 9 months ago

Hey, what are the most painful limitations/gotchas you're hitting? I'm on this team and would like to hear about painpoints.

sh34r · 9 months ago

TBH it's really quite boring. You just have to go back in time to the late 2010s. They had an excellent Spark-as-a-Service product, at a time when you'd have better luck finding a leprechaun than a reliable self-hosted Spark instance in an enterprise environment. That was simply beyond the capabilities of most enterprise IT teams at the time. The first-party offerings from the hyperscalars were relatively spartan.

Databricks' proprietary notebook format that introduced subtle incompatibilities with Jupyter was infuriating embrace-extend-extinguish style bullshit, but on-prem cluster instability causing jobs to crash on a daily basis was way more infuriating, and at that time, enterprises were more than happy to pay a premium to accelerate analytics teams.

In the 2010s, Databricks had a solid billion-dollar business. But Spark-as-a-Service by itself was never going to be a unicorn idea. AWS EMR was the giant tortoise lurking in the background, slowly but surely closing the gap. The status quo couldn't hold, and who doesn't want to be a unicorn? So, they bloated the hell out of the product, drank that off-brand growth-hacker Kool-Aid, and started spewing some of the most incoherent buzz-word salad to ever come out of the Left Coast. Just slapping data, lake, and house onto the ends of everything, like it was baby oil at a Diddy Party.

Now, here we are in 2025, deep into the terminal decline of enshittification, and they're just rotting away, waiting for One Real Asshole Called Larry Ellison to scoop them up and take them straight to Hell. The State of Florida, but for Big Data companies.

It would be a mystery to me too, why anyone would pick Databricks today for a greenfield project, but those enterprises from 5+ years ago are locked in hard now. They'll squeeze those whales and they'll shit money like a golden goose for a few more years, but their market share will steadily decrease over the next few years.

It's the cycle of life. Entropy always wins. Eventually the Grim Reaper Larry comes for us all. I wouldn't hate on them too hard. They had a pretty solid run.

DarkWiiPlayer · 9 months ago

With cookies disabled I get a blank website, which is a massive red flag and an immediate nope from me.

Can't imagine someone incapable of building a website would deliver a good (digital) product.

fkyoureadthedoc · 9 months ago

They did build a website though. It even looks pretty nice. The restriction you've placed on yourself just prevents you from viewing it.

fuzzy_biscuit · 9 months ago

But.. but.... we MUST track you! That's the whole purpose of our site /s