Readit News logoReadit News
cdbattags · 7 years ago
In the latest world of Postgres:

- we now have closed source Amazon Aurora infrastructure that boasts performance gains that might never see it back upstream (who knows if it's just hardware or software or what behind the scenes here)

- we now have Amazon DocumentDB that is a closed source MongoDB-like scripting interface with Postgres under the hood

- lastly, with this news, looks like Microsoft is now doubling down on the same strategy to build out infrastructure and _possibly_ closed source "forked" wins on top of the beautiful open source world that is Postgres

Please, please, please let's be sure to upstream! I love the cloud but when I go to "snapshot" and "restore" my PG DB I want a little transparency how y'all are doing this. Same with DocumentDB; I'd love an article of how they are using JSONB indices at this supposed scale! Not trying to throw shade; just raising my eyebrows a little.

craigkerstiens · 7 years ago
Craig here from Citus. We're actually a bit different than past forks. Many years ago Citus itself was a fork, but about 3 years ago we became a pure extension[1]. This means we hook into lower level extension APIs[2] that exist within Postgres and are able to stay current with the latest Postgres versions.

[1]. https://www.citusdata.com/blog/2016/03/24/citus-unforks-goes...

[2]. https://www.citusdata.com/blog/2017/10/25/what-it-means-to-b...

sytse · 7 years ago
Congrats on the acquisition. I love that the complete extension is open source and will stay available: "And we will continue to actively participate in the Postgres community, working on the Citus open source extension as well as the other open source Postgres extensions you love.".

As we continue to grow GitLab this Citus is the leading option to scale out database out. I'm glad that this option will still be there tomorrow.

cdbattags · 7 years ago
Holy wow! Thanks for the response!

Yep, I love the fact that y'all went the extension route much like https://www.timescale.com/ and others.

asah · 7 years ago
user here: can confirm.
ABeeSea · 7 years ago
If the creators of Postgres wanted all improvements to be upstreamed, they wouldn’t have released under a permissive license. The ability to use Postgres commercially without exposing your entire codebase to copyleft risk is one of the reasons it’s used commercially in the first place.
basilgohar · 7 years ago
This is a big assumption. There are many reasons to release something as copyleft – not everyone that releases BSD-like is actively choosing to deprioritize upstreaming. Rather, they are choosing a license that is less restrictive which has other advantages beyond non-copyleft.

Moreover, using copyleft software doesn't mean using forces you to release code. There are specific interactions that trigger the sharing clause in, for example, the GPL, such as distribution, linking, and so on. There remain many, many uses that allow commercialization that do not run afoul of the copyleft nature of the GPL.

I am commenting because I have seen this sentiment repeated ad nauseum on here and, maybe that's not what you meant, but I felt the need to clarify. Moreover, if the code is not AGPL, most online uses do not run afoul, because the code product (say executables) are not themselves being distributed. AGPL was formulated to close this loophole, but GPL code is free from this.

scarface74 · 7 years ago
And this is a benefit to prevent lock-in. Amazon’s OLAP database, Redshift, is protocol compliant with Postgres. Even if you won’t get the performance benefits of Redshift if you move to a standard Postgres, at least you don’t have to change your code.

Now you can move to Azure without having to change your code.

cdbattags · 7 years ago
100% agree. I'm just weary of all these "mini optimizations" that all these cloud providers are about to start doing differently.
CodesInChaos · 7 years ago
When I publish code using a permissive license I want people to contribute back under the same license. But I don't want to force them to.
rch · 7 years ago
The good news is that we'll have another reliable, growing, potentially profitable, PostgreSQL company up and running in no time.
manigandham · 7 years ago
Amazon Aurora doesn't have much to do with Postgres and is a custom storage subsystem used by many different database engines. Aurora Postgres is actually using Postgres code on top to handle queries, and eventually PG itself will get pluggable storage engines.

It's similar with Redshift although it's a much older codebase from the v8 branch with more customizations. The changes are very specific to their infrastructure and wouldn't help anyone else since it's not designed as an on-prem deployable product.

There's also no confirmation that DocumentDB runs on Postgres and its most likely a custom interface layer they wrote themselves. If you just want MongoDB on postgres then there are already open source projects that do it.

atombender · 7 years ago
Redshift isn't even developed by Amazon — it's a commercial product called ParAccel, which they license (and modify, presumably).

Another commercial MPP database based on Postgres 8.x, GreenplumDB, was open-sourced a few years back. The changes are so extensive that there's little hope of catching up with the current Postgres codebase. Given the focus on OLAP and analytics over OLTP, there might not even be a strong motivation to catch up, either.

koolba · 7 years ago
Redshift isn’t just a custom storage tier atop an older version of Postgres. It has an entirely different execution engine that compiles commands to an executable and farms them out to multiple data nodes.
cdbattags · 7 years ago
I replied to a few others further down the thread that had similar thoughts as these.
stingraycharles · 7 years ago
CitusData made tons of improvements to upstream postgresql, though. Can’t say that about Amazon.
tomnipotent · 7 years ago
OP is referring to the habit of cloud providers to invest in open source platforms to build cloud services but not contribute back to the community.
SEJeff · 7 years ago
But at least they open sourced their fork, designed for data warehousing, before this happened:

https://www.citusdata.com/product/community

ohthehugemanate · 7 years ago
Kudos to azure for opening so much of what they do. Lots of kubernetes work, including AKS-engine which runs their k8s implementation. Machine learning toolkit. Media services (faceid etc) as a container. The whole azure shabang runs on service fabric, which they've also open sourced.

It's a differentiator for some of their workloads: you don't have to hand your business over to a black box.

spullara · 7 years ago
Aurora databases and DocumentDB share the same underlying reliable single-writer, many-reader block device for storage. That is all the magic. Not sure where you got the idea that DocumentDB has Postgres underneath it.
cdbattags · 7 years ago
See this thread: https://news.ycombinator.com/item?id=18869755

The HN community did a little bit of reverse engineering.

timClicks · 7 years ago
I get what you're saying, but BSD-licenses are specifically designed to facilitate things not being sent upstream. I don't understand why people moan about companies complying with the license agreement.
cdbattags · 7 years ago
Your argument is legal and my argument is moral =P
pjmlp · 7 years ago
This is what happens in a world devoid of the GPL, or where a large majority doesn't sponsor the work of upstream.
ezrast · 7 years ago
MongoDB was already under the AGPL; Amazon just replicated the API on top of their own storage engine (or an existing permissive-licensed storage engine? Who knows?).

If we're at the point where Amazon can just re-implement whatever project they want, more or less from scratch, I'm not sure there's any license that can save us. :(

sudhirj · 7 years ago
Amazon has explained in their reinvent videos that Aurora is the storage layer of Postgres rewritten to be tightly coupled to their AWS infrastructure. So it is just regular Postgres (they upgrade to latest on a slightly slower cadence). And there’s no benefit to getting the Aurora layer upstream, no one else could use it anyway.

Citus is an extension, not a fork.

So neither of these projects are doing Postgres a dis-service. Both are actually pretty heavily aligned with the continued success and maintenance of mainline open source Postgres.

anarazel · 7 years ago
> Amazon has explained in their reinvent videos that Aurora is the storage layer of Postgres rewritten to be tightly coupled to their AWS infrastructure. So it is just regular Postgres (they upgrade to latest on a slightly slower cadence). And there’s no benefit to getting the Aurora layer upstream, no one else could use it anyway.

I don't think this is an accurate analysis. For one, they had to make a lot of independent improvements to not have performance regress horribly after their changes, and a lot of those could be upstreamed. Similarly, they could help with the effort to make table storage pluggable, but they've not, instead opting to just patch out things.

> Citus is an extension, not a fork.

Used to be a fork though.

> Both are actually pretty heavily aligned with the continued success and maintenance of mainline open source Postgres.

How is Amazon meaningfully involved in the maintenance of open source postgres?

zjaffee · 7 years ago
This is the future and it's not just big companies doing it.

Virtually all of the companies that were built on open source products in the past few years stopped centering their focus as being the best place to run said open source program, but instead holding back performance and feature improvement as proprietary instead of pushing back upstream.

scarface74 · 7 years ago
we now have closed source Amazon Aurora infrastructure that boasts performance gains that might never see it back upstream (who knows if it's just hardware or software or what behind the scenes here)

The performance benefits of Aurora over Postgres are mostly because Amazon rewrote the storage engine to run on top of their infrastructure.

cdbattags · 7 years ago
All I'm saying is that it looks like Azure and Microsoft are about to do the same.
illumin8 · 7 years ago
> - we now have Amazon DocumentDB that is a closed source MongoDB-like scripting interface with Postgres under the hood

To clarify, Amazon DocumentDB uses the Aurora storage engine, which is the same proprietary storage engine that is used by Aurora MySQL and Aurora PostgreSQL, and gives you multi-facility durability by writing 6 copies of your data across 3 facilities, with a 4 of 6 quorum before writes are acknowledged back to the client.

So, it's a bit inaccurate to say that DocumentDB has anything to do with Postgres.

luhn · 7 years ago
There’s evidence to suggest that DocumentDB is actually running Aurora Postgres under the hood. https://news.ycombinator.com/item?id=18870397
cbsmith · 7 years ago
I would argue Microsoft's strategy actually makes them more wedded and committed to ensuring the vitality of open source PostgreSQL than anything AWS is doing.
Smerity · 7 years ago
The big news here: Citus Data donated 1% of their equity to non-profit PostgreSQL organizations[1] so this acquisition is a win for the community even in the darkest scenario of Citus Data disappearing into a canyon on the Microsoft campus.

Given Microsoft's change in operation over recent years there's also hope that they can continue their contributions into the future.

It's fascinating to see Microsoft leave behind the "embrace, extend, extinguish" narrative only to have Amazon adopt it, causing massive rifts and action within the database community[2][3]. I am genuinely concerned about the future of open source software in this continued scenario.

An article with what I considered an outrageous headline ("Is Amazon 'strip mining' open source?"[4]) has only rung more true over time. Amazon is one of the largest companies on earth, selling products that they receive for free but never improve[5], attacking the primary open source provider, and then shift toward their comparable proprietary closed offerings.

Hopefully new ways to "give back", such as equity contribution, can be one of the many paths forward needed to keep open source software healthy. Given how much innovation is unlocked by this, it'd be a crime to go back to the past era.

[1]: https://www.citusdata.com/newsroom/press/citus-data-donates-...

[2]: https://www.cnbc.com/2018/11/30/aws-is-competing-with-its-cu...

[3]: https://techcrunch.com/2019/01/09/aws-gives-open-source-the-...

[4]: https://www.cbronline.com/analysis/aws-managed-kafka

[5]: From [2], "Jay Kreps, a creator of Kafka and co-founder and CEO of Confluent ... said Amazon has not contributed a single line of code to the Apache Kafka open-source software and is not reselling Confluent’s cloud tool."

koolba · 7 years ago
Any clue what the base for that 1% is going to be? Didn’t see any mention of the total acquisition amount anywhere.
craigkerstiens · 7 years ago
In case folks are interested here are the details from our founders on the Citus blog - https://www.citusdata.com/blog/2019/01/24/microsoft-acquires...
jarym · 7 years ago
Well this is great news for the guys at Citus - they created something great as a Postgres add-on and a big chunk of it was open sourced.

They made a decent cloud business model out of it (no idea how successful but everyone I asked was happy with it).

I just hope Microsoft allow the tech to evolve as open source!

iKevinShah · 7 years ago
"I just hope Microsoft allow the tech to evolve as open source!"

Current Microsoft sure will. They're good with open source stuff.

shangxiao · 7 years ago
What about future Microsoft? :)
jarym · 7 years ago
Yes, agreed. Long may this continue.
manigandham · 7 years ago
Citus is already used by Microsoft itself internally, a recent example being the VeniceDB project to analyze Windows telemetry: https://www.youtube.com/watch?v=AeMaBwd90SI

Considering the competitive database landscape, this is a compelling offering to add to any cloud portfolio. Congrats to the Citus team.

skunkworker · 7 years ago
I still can't get over the fact that Microsoft is using Postgres internally, if you had told me that 5 years ago I wouldn't have believed it. Did they go into why over MSSQL?
manigandham · 7 years ago
MSSQL currently does not have horizontal sharding capabilities like this, or easy UPSERT functionality.
pritambarhate · 7 years ago
The main question is: Did MS want an expert PgSQL team to work on Azure PostgreSQL (and may to create a proprietary competitor to Aurora)? Or Did they acquire Citus for its product, to improve and market it further?

It feels like it was the first. If so, it means bad news for Citus product as it will most likely be ignored for a while. That will be really sad, as I don't know any actively supported automated sharding solution for PgSQL other than Citus. There is PostgresXL[1], but there isn't much focus to make it community friendly.

[1]: https://www.postgres-xl.org/overview/

mjw1007 · 7 years ago
I don't think anyone should expect acquihiring an expert Postgres team to work on a proprietary product to work well, because the programmers' skills are eminently transferrable.

Half the team would probably wander off to work for one of the other postgres-centered companies (and quite possibly continue to work on the open source Citus code).

bgentry · 7 years ago
Fun fact: the team that built Citus Cloud began with 3 people that came over from Heroku after building its (proprietary) Postgres cloud service.
scarface74 · 7 years ago
This is more of a competitor to Redshift than Aurora.
teej · 7 years ago
Citus improves the performance of OLAP query loads but it's not an analytics solution first. They say so themselves -

https://www.citusdata.com/blog/2018/06/07/what-is-citus-good...

---

When we first started building Citus, we began on the OLAP side. As time has gone on Citus has evolved to have full transactional support, first when targeting a single shard, and now fully distributed across your Citus database cluster.

Today, we find most who use the Citus database do so for either:

(OLTP) Fully transactional database powering their system of record or system of engagement (often multi-tenant)

(HTAP) For providing real-time insights directly to internal or external users across large amounts of data.

tosh · 7 years ago
Great news for Citus, Microsoft, Postgres and for people using open source relational databases. This makes so much sense. (I know this comment might read naive to some but I’m genuinely excited right now)
tracker1 · 7 years ago
I'm pretty excited as well... Especially if this means improvements to Azure's PostgreSQL options. DBaaS is one of the areas where cloud providers give a LOT of value, more so as long as the interfaces you use can be used internally/locally for development.

Similarly, I really appreciate MS-SQL for Linux on Docker as it is a lot easier to setup for CI/CD and local for dev and testing and is nearly transparent going to Azure SQL or MS SQL Enterprise for hosted deployments. I'd much rather use PostgreSQL with PLv8 than MS-SQL though.

AlexB138 · 7 years ago
I wonder how long it will be before they shutdown their own Citus Cloud hosted offering, which is hosted on AWS. Seems obvious that will become part of Azure soon.
btown · 7 years ago
I doubt they'd disrupt their AWS operations right away - this certainly won't be the first time that a MSFT team/subsidiary has used AWS.

What's more worrying to me is if they try to do both - build out a Citus offering on Azure, and simultaneously try to keep high-reliability of their AWS Citus Cloud, which may be the most reliable option for some time. It's tough for any organization, no matter how much capital has been injected, to keep a laser-sharp eye on two inevitably-competing initiatives, each of which have their own performance and automation characteristics. I don't want the one person in the company who knows, say, cloud hard drive recovery patterns like the back of their hand and had previously been the EBS guru, to suddenly be pulled into the new Azure optimization project... and that's not something that capital injections can necessarily fix.

That said, this could accelerate their development timelines overall, and it guarantees stability for the product for quite a while. Overall I think this is good news! Citus is one of those things that you want to have in your back pocket when building any type of app on Postgres, and we certainly see it in our company as a long-term "escape hatch" when we're forced to make database-heavy design decisions at currently relatively-small scale. This deal keeps it alive and prospering!