Data warehousing is quickly becoming a commodity through open-source.
I know a company who had 2PBs+ of data in Cloudera. But instead of moving to the cloud (and Databricks), they saved 5X costs by building their own analytics platform with Iceberg, Trino and Superset. The k8s operators are enterprise quality now. On-premises S3 is good, too. You can have great hardware (servers with 128 cpus and 1 TB) and networking.
It's not just Trino. StarRocks and Clickhouse have enterprise grade k8s helm charts/operators. That 60bn valuation is an albtross on Databrick's neck - their pricing will have to justify it, and their core business is commoditizing.
Neon filled their product gap of not having an operational (row-oriented) DB.
Not commoditising for enterprise. My last gig wouldn’t allow open source software or any company that might not be there in a decade, or which kept data anywhere but our own tenant. We’d look for the “call us” pricing rather than hate it, which I normally do. We added databricks and it was considered one of my top three achievements, because they don’t have to think about data platforms again, just focus on using it. It’s SO expensive for an enterprise to rejig for a new platform that you can’t rely on (insert open source project here).
I managed to add one startup and so far it’s done very well, but it was an exceptional case and the global CEO wanted the functionality. But it used MongoDB and ops team didn’t have any skills, so rather than learn one tiny thing for an irrelevant data store they added cash to use Atlas with all the support and RBAC etc etc. They couldn’t use the default Azure firewall because they only know one firewall, so added one of those too. Also loaded with contracts. Kept hiring load down, one number to call, job done. Startups cost is $5-10k per year. Support BS about $40k. (I forget the exact numbers but it dwarfed the startup costs.)
Startups are from Venus, enterprise are from Jupiter.
> Not commoditising for enterprise. My last gig wouldn’t allow open source software or any company that might not be there in a decade, or which kept data anywhere but our own tenant.
Totally agree. Happy open source StarRocks user here using the k8s operator for customer-facing analytics on terabytes of data. There's very little need for Databricks in my world.
Looking at StarRocks site (https://www.starrocks.io/), they compare against Clickhouse, Druid and Trino. Don't even compare against Spark/Databricks! Guess Spark is just not competitive.
On-premise open-source S3 is a problem though. MinIO is not something we're touching and other than that it looks a bit empty with enterprise ready solutions.
Great to see cost-effective alternatives to Cloudera and Databricks! We’ve spent three years building IOMETE, a self-hosted data lakehouse that combines Apache Iceberg and Spark, designed to run natively on Kubernetes. We’re focused on on-premises deployments to address the growing need for data sovereignty and low TCO, with a streamlined setup for large-scale analytics. Early adopters are seeing strong results. Curious about your experience with Trino and Superset—any tips for optimizing performance at scale?
It's been a commodity for decades now. Metrics like price-performance have a long history, but the SnowBricks products fail at them quite dramatically. The difference is hard-sell vs. soft or no-sell.
Not having to buy an appliance and pay for it up front is quite a valuable option. Also the split between processing and storage allows for better archival and scaling strategies.
I have been happily using ClickHouse for the past couple of years without any issue. Rock solid database with wide variety of features and fulfills all my needs. My favourite is the "external dictionary" feature which easily allows me to integrate it with other datastores like Postgres and Redis.
In addition to the AI use cases, sometimes you wanna share the data warehouse data in oltp way for fast lookups and high concurrency. Not sure whether Neon will do that but I hope so.
One example from Snowflake is hybrid tables which adds rowstore next to columnar.
ETL to bring all your data into Databricks/Snowflake is a lot of effort. Much better if your OLTP data already exists in Databricks and you directly access it from your OLAP layer.
if Databricks just wanted a row DB they couldve done postgres themselves. paying this much for Neon i think is a sign that Neon has something special they want (which, knowing their marketing line, is "independently scalable storage and compute for postgres")
I applied to neon last week and then the news broke about the acquisition. They rejected it this morning — I have never been happier to receive a rejection to an application.
This would’ve been three acquisitions straight for me and… I’m okay, they’re awful. I just want stability.
Congrats to the neon team! I use and love neon. Really hope this doesn’t change them too much.
I got hired at Kenna Security a month before they were acquired by Cisco and it was such a miserable experience that I won't work for any company the Kenna leadership are involved with, nor would I ever consider working at Cisco.
I've been through two now, and for one of them nothing much changed, and the other one I was basically lost in a stack of papers for a year. Can I ask what made the experience miserable for you?
The first acquisition I was apart of wasn’t too bad! But we were still culturally very different. So after 2 years and properly transitioning things, I bounced to another start up.
Walking into something like that is tough because the two teams sort of don’t like each other and you’re really “neither”. I’d want to make sure I was interviewed by both teams
I've been part of an acquisition as a first-year engineering manager, during which I had to navigate subsequent two rounds of layoffs. I was also a part of the group to help restructure teams and help make calls on who to keep. Morale was terrible, and the cultures also did not gel at all.
It led to some serious burnout and I took several months off. I'm now happily working as an IC again.
Yes, that is what i expect, too. They have been paying DynamoDB and CosmosDB for a few years now. However, Neon is not competitive latency/throughput-wise for real-time workloads, needed for high end AI (like personalized recommendations). There are a few others I would have expected like Cockroach, Aerospike, or RonDB.
I was a very early employee at the other two start ups that were acquired and even with equity it was not worth it. After all the class A shares were paid out, the rest of us got little.
I mean, hindsight 20/20 here, but I would have loved the theoretical money @ 1 billion. But those are so rare and my experience in the past 15 years hasn’t matched those unicorns.
Basically I’ve come to the conclusion unless you have serious equity or you’re a founder, acquisition suck. You’re the one doing the work making these two companies come together, while the founders usually bounce or are stripped of any real power to change things.
Databricks started in 2013 when Spark sucked (it still does) and they aimed to make it better / faster (which they do).
The product is still centered Spark, but most companies don't want or need Spark and a combination of Iceberg and DuckDB will work for 95% of companies. It's cheaper, just as fast or faster and way easier to reason about.
We're building a data platform around that premise at Definite[0]. It includes everything you need to get started with data (ETL, BI, datalake).
Aren't the alternatives you mentioned - icerberg and duckdb - both storage solutions while spark is a way to express distributed compute? I'm a bit out of touch with this space, is there a newer way to express distributed compute?
Databricks is the Jira of dealing with data. No one wants to use it, it sucks, there are too many features to try to appease all possible users but none of them particularly good, and there are substantially better options now than there were not long ago. I would never, ever use it by choice.
But these days just use trino or whatever. There are lots of new ways to work on data that are all bigger steps up - ergonomically, performance and price - over spark as spark was over hadoop.
Hadoop was fundamentally a batch processing system for large data files that was never intended for the sort of online reporting and analytics workloads for which the DW concept addressed. No amount of Pig and Hive and HBase and subsequent tools layered on top of it could ever change that basic fact.
I used to be a big fan of the platform because back in 2020 / 2021 it really was the only reasonable choice compared to AWS / Azure / Snowflake for building data platforms.
Today it suffers from feature creep and too many pivots & acquisitions. That they are insanely bad at naming features doesn't help either.
But if you're inclined to use it, databricks' setup of spark just saves you an incredible amount of time that you'd normally waste on configuration and wiring infrastructure (storage, compute, pipelines, unified access, VPNs etc). It's expensive and opinionated, but the data engineers you need to deal with spark OOM errors constantly is greater.
Also databricks' default configs give you MUCH better performance out of the box than anything DIY and you don't have to fiddle with partitions and super niche config options to get even medium workloads stable
TBH it's really quite boring. You just have to go back in time to the late 2010s. They had an excellent Spark-as-a-Service product, at a time when you'd have better luck finding a leprechaun than a reliable self-hosted Spark instance in an enterprise environment. That was simply beyond the capabilities of most enterprise IT teams at the time. The first-party offerings from the hyperscalars were relatively spartan.
Databricks' proprietary notebook format that introduced subtle incompatibilities with Jupyter was infuriating embrace-extend-extinguish style bullshit, but on-prem cluster instability causing jobs to crash on a daily basis was way more infuriating, and at that time, enterprises were more than happy to pay a premium to accelerate analytics teams.
In the 2010s, Databricks had a solid billion-dollar business. But Spark-as-a-Service by itself was never going to be a unicorn idea. AWS EMR was the giant tortoise lurking in the background, slowly but surely closing the gap. The status quo couldn't hold, and who doesn't want to be a unicorn? So, they bloated the hell out of the product, drank that off-brand growth-hacker Kool-Aid, and started spewing some of the most incoherent buzz-word salad to ever come out of the Left Coast. Just slapping data, lake, and house onto the ends of everything, like it was baby oil at a Diddy Party.
Now, here we are in 2025, deep into the terminal decline of enshittification, and they're just rotting away, waiting for One Real Asshole Called Larry Ellison to scoop them up and take them straight to Hell. The State of Florida, but for Big Data companies.
It would be a mystery to me too, why anyone would pick Databricks today for a greenfield project, but those enterprises from 5+ years ago are locked in hard now. They'll squeeze those whales and they'll shit money like a golden goose for a few more years, but their market share will steadily decrease over the next few years.
It's the cycle of life. Entropy always wins. Eventually the Grim Reaper Larry comes for us all. I wouldn't hate on them too hard. They had a pretty solid run.
Databricks is Oracle-level bad. They will definitely ruin Neon or make it expensive. In the medium to long term, I will start looking for Neon alternatives.
Definitely agree, their M&A strategy is setup to strangle whoever they buy and they don't even know it. They're struggling in the face of Iceberg, DuckDB and the other tectonic shifts happening in the open source world. They are trying to innovate through acquisition, but can't quite make it because their culture kills the companies they buy.
I'm biased, I'm a big-data-tech refugee (ex-Snowflake) and am working on https://tower.dev right now, but we're definitely seeing the open source trend supported by Iceberg. It'll be really interesting to see how this plays out.
>As Neon became GA last year, they noticed an interesting stat: 30% of the databases were created by AI agents, not humans. When they looked at their stats again recently, the number went from 30% to over 80%. That is, AI agents were creating 4 times more databases versus humans.
For me this has alarm bells all over it. Databricks is trying to pump postgres as some sort of AI solution. We do live in weird times.
Congratz to neon team (i like what they built), but i don’t see the value or relation to databricks. I hope neon will continue as a standalone product, otherwise we lose a solid postgres provider from the market.
Its pretty heavy in Azure, so I would be surprised if it went away. This is DBX play to move into the transactional database space in addition to the analytical database.
I remember the first post by the Neon team here on HN. I think I commented at the time that I thought it was a great idea. I’ve never had a need to use them yet, but thought I always would.
Cynically, am I the only one who takes pause because of an acquisition like this? It worries me that they will need to be more focused on the needs of their new owners, rather than their users. In theory, the needs should align — but I’m not sure it usually works out that way in practice.
> I remember the first post by the Neon team here on HN. I think I commented at the time that I thought it was a great idea.
Same! I remember it too. I found it quite fascinating. Separation of storage and compute was something new to me, and I was asking them about Pageserver [0]. I also asked for career advice on how to get into database development [1].
Two years later, I ended up working on very similar disaggregated storage at Turso database.
Taking a pause also... I don't believe serving IA can be aligned to serving devs. I hope that the part of the work related to the core of PostgreSQL will help the community.
Congrats to the Neon team. They make an awesome product. Obviously it’s sad to see this, but it’s inevitable when you’re VC funded. Let’s hope Nikita and co remain strong and don’t let Databricks bit.io them.
Neon filled their product gap of not having an operational (row-oriented) DB.
I managed to add one startup and so far it’s done very well, but it was an exceptional case and the global CEO wanted the functionality. But it used MongoDB and ops team didn’t have any skills, so rather than learn one tiny thing for an irrelevant data store they added cash to use Atlas with all the support and RBAC etc etc. They couldn’t use the default Azure firewall because they only know one firewall, so added one of those too. Also loaded with contracts. Kept hiring load down, one number to call, job done. Startups cost is $5-10k per year. Support BS about $40k. (I forget the exact numbers but it dwarfed the startup costs.)
Startups are from Venus, enterprise are from Jupiter.
I bet they had VMware all over the place.
Hence IBM talking up Iceberg: https://www.ibm.com/think/topics/apache-iceberg
On-premise open-source S3 is a problem though. MinIO is not something we're touching and other than that it looks a bit empty with enterprise ready solutions.
Rook/ceph with object storage is pretty bulletproof: https://www.rook.io/docs/rook/v1.17/Storage-Configuration/Ob...
I do wish more systems had high quality operators out there. A lot of operators I have looked into are half baked, not reliable, or not supported.
One example from Snowflake is hybrid tables which adds rowstore next to columnar.
OLAP + OLTP = HTAP
This would’ve been three acquisitions straight for me and… I’m okay, they’re awful. I just want stability.
Congrats to the neon team! I use and love neon. Really hope this doesn’t change them too much.
In a couple cases I’ve been recruited because I have a history of scaling and integrating acquisitions into companies successfully
Walking into something like that is tough because the two teams sort of don’t like each other and you’re really “neither”. I’d want to make sure I was interviewed by both teams
It led to some serious burnout and I took several months off. I'm now happily working as an IC again.
My guess is that this team gets rolled into Online Tables tech, which would make product sense.
https://docs.databricks.com/aws/en/machine-learning/feature-...
I mean, hindsight 20/20 here, but I would have loved the theoretical money @ 1 billion. But those are so rare and my experience in the past 15 years hasn’t matched those unicorns.
Basically I’ve come to the conclusion unless you have serious equity or you’re a founder, acquisition suck. You’re the one doing the work making these two companies come together, while the founders usually bounce or are stripped of any real power to change things.
Deleted Comment
The product is still centered Spark, but most companies don't want or need Spark and a combination of Iceberg and DuckDB will work for 95% of companies. It's cheaper, just as fast or faster and way easier to reason about.
We're building a data platform around that premise at Definite[0]. It includes everything you need to get started with data (ETL, BI, datalake).
0 - https://www.definite.app/
The biggest gripe in have is how crazy expensive it is.
But these days just use trino or whatever. There are lots of new ways to work on data that are all bigger steps up - ergonomically, performance and price - over spark as spark was over hadoop.
Today it suffers from feature creep and too many pivots & acquisitions. That they are insanely bad at naming features doesn't help either.
I really don't understand the valuation for this company. Why is it so high.
But if you're inclined to use it, databricks' setup of spark just saves you an incredible amount of time that you'd normally waste on configuration and wiring infrastructure (storage, compute, pipelines, unified access, VPNs etc). It's expensive and opinionated, but the data engineers you need to deal with spark OOM errors constantly is greater. Also databricks' default configs give you MUCH better performance out of the box than anything DIY and you don't have to fiddle with partitions and super niche config options to get even medium workloads stable
Databricks' proprietary notebook format that introduced subtle incompatibilities with Jupyter was infuriating embrace-extend-extinguish style bullshit, but on-prem cluster instability causing jobs to crash on a daily basis was way more infuriating, and at that time, enterprises were more than happy to pay a premium to accelerate analytics teams.
In the 2010s, Databricks had a solid billion-dollar business. But Spark-as-a-Service by itself was never going to be a unicorn idea. AWS EMR was the giant tortoise lurking in the background, slowly but surely closing the gap. The status quo couldn't hold, and who doesn't want to be a unicorn? So, they bloated the hell out of the product, drank that off-brand growth-hacker Kool-Aid, and started spewing some of the most incoherent buzz-word salad to ever come out of the Left Coast. Just slapping data, lake, and house onto the ends of everything, like it was baby oil at a Diddy Party.
Now, here we are in 2025, deep into the terminal decline of enshittification, and they're just rotting away, waiting for One Real Asshole Called Larry Ellison to scoop them up and take them straight to Hell. The State of Florida, but for Big Data companies.
It would be a mystery to me too, why anyone would pick Databricks today for a greenfield project, but those enterprises from 5+ years ago are locked in hard now. They'll squeeze those whales and they'll shit money like a golden goose for a few more years, but their market share will steadily decrease over the next few years.
It's the cycle of life. Entropy always wins. Eventually the Grim Reaper Larry comes for us all. I wouldn't hate on them too hard. They had a pretty solid run.
Can't imagine someone incapable of building a website would deliver a good (digital) product.
I'm biased, I'm a big-data-tech refugee (ex-Snowflake) and am working on https://tower.dev right now, but we're definitely seeing the open source trend supported by Iceberg. It'll be really interesting to see how this plays out.
>As Neon became GA last year, they noticed an interesting stat: 30% of the databases were created by AI agents, not humans. When they looked at their stats again recently, the number went from 30% to over 80%. That is, AI agents were creating 4 times more databases versus humans.
For me this has alarm bells all over it. Databricks is trying to pump postgres as some sort of AI solution. We do live in weird times.
Cynically, am I the only one who takes pause because of an acquisition like this? It worries me that they will need to be more focused on the needs of their new owners, rather than their users. In theory, the needs should align — but I’m not sure it usually works out that way in practice.
Same! I remember it too. I found it quite fascinating. Separation of storage and compute was something new to me, and I was asking them about Pageserver [0]. I also asked for career advice on how to get into database development [1].
Two years later, I ended up working on very similar disaggregated storage at Turso database.
Congrats to the Neon team!
[0] - https://news.ycombinator.com/item?id=31756671
[1] - https://news.ycombinator.com/item?id=31756510