- we now have closed source Amazon Aurora infrastructure that boasts performance gains that might never see it back upstream (who knows if it's just hardware or software or what behind the scenes here)
- we now have Amazon DocumentDB that is a closed source MongoDB-like scripting interface with Postgres under the hood
- lastly, with this news, looks like Microsoft is now doubling down on the same strategy to build out infrastructure and _possibly_ closed source "forked" wins on top of the beautiful open source world that is Postgres
Please, please, please let's be sure to upstream! I love the cloud but when I go to "snapshot" and "restore" my PG DB I want a little transparency how y'all are doing this. Same with DocumentDB; I'd love an article of how they are using JSONB indices at this supposed scale! Not trying to throw shade; just raising my eyebrows a little.
Craig here from Citus. We're actually a bit different than past forks. Many years ago Citus itself was a fork, but about 3 years ago we became a pure extension[1]. This means we hook into lower level extension APIs[2] that exist within Postgres and are able to stay current with the latest Postgres versions.
Congrats on the acquisition. I love that the complete extension is open source and will stay available: "And we will continue to actively participate in the Postgres community, working on the Citus open source extension as well as the other open source Postgres extensions you love.".
As we continue to grow GitLab this Citus is the leading option to scale out database out. I'm glad that this option will still be there tomorrow.
If the creators of Postgres wanted all improvements to be upstreamed, they wouldn’t have released under a permissive license. The ability to use Postgres commercially without exposing your entire codebase to copyleft risk is one of the reasons it’s used commercially in the first place.
This is a big assumption. There are many reasons to release something as copyleft – not everyone that releases BSD-like is actively choosing to deprioritize upstreaming. Rather, they are choosing a license that is less restrictive which has other advantages beyond non-copyleft.
Moreover, using copyleft software doesn't mean using forces you to release code. There are specific interactions that trigger the sharing clause in, for example, the GPL, such as distribution, linking, and so on. There remain many, many uses that allow commercialization that do not run afoul of the copyleft nature of the GPL.
I am commenting because I have seen this sentiment repeated ad nauseum on here and, maybe that's not what you meant, but I felt the need to clarify. Moreover, if the code is not AGPL, most online uses do not run afoul, because the code product (say executables) are not themselves being distributed. AGPL was formulated to close this loophole, but GPL code is free from this.
And this is a benefit to prevent lock-in. Amazon’s OLAP database, Redshift, is protocol compliant with Postgres. Even if you won’t get the performance benefits of Redshift if you move to a standard Postgres, at least you don’t have to change your code.
Now you can move to Azure without having to change your code.
Amazon Aurora doesn't have much to do with Postgres and is a custom storage subsystem used by many different database engines. Aurora Postgres is actually using Postgres code on top to handle queries, and eventually PG itself will get pluggable storage engines.
It's similar with Redshift although it's a much older codebase from the v8 branch with more customizations. The changes are very specific to their infrastructure and wouldn't help anyone else since it's not designed as an on-prem deployable product.
There's also no confirmation that DocumentDB runs on Postgres and its most likely a custom interface layer they wrote themselves. If you just want MongoDB on postgres then there are already open source projects that do it.
Redshift isn't even developed by Amazon — it's a commercial product called ParAccel, which they license (and modify, presumably).
Another commercial MPP database based on Postgres 8.x, GreenplumDB, was open-sourced a few years back. The changes are so extensive that there's little hope of catching up with the current Postgres codebase. Given the focus on OLAP and analytics over OLTP, there might not even be a strong motivation to catch up, either.
Redshift isn’t just a custom storage tier atop an older version of Postgres. It has an entirely different execution engine that compiles commands to an executable and farms them out to multiple data nodes.
Kudos to azure for opening so much of what they do. Lots of kubernetes work, including AKS-engine which runs their k8s implementation. Machine learning toolkit. Media services (faceid etc) as a container. The whole azure shabang runs on service fabric, which they've also open sourced.
It's a differentiator for some of their workloads: you don't have to hand your business over to a black box.
Aurora databases and DocumentDB share the same underlying reliable single-writer, many-reader block device for storage. That is all the magic. Not sure where you got the idea that DocumentDB has Postgres underneath it.
I get what you're saying, but BSD-licenses are specifically designed to facilitate things not being sent upstream. I don't understand why people moan about companies complying with the license agreement.
MongoDB was already under the AGPL; Amazon just replicated the API on top of their own storage engine (or an existing permissive-licensed storage engine? Who knows?).
If we're at the point where Amazon can just re-implement whatever project they want, more or less from scratch, I'm not sure there's any license that can save us. :(
Amazon has explained in their reinvent videos that Aurora is the storage layer of Postgres rewritten to be tightly coupled to their AWS infrastructure. So it is just regular Postgres (they upgrade to latest on a slightly slower cadence). And there’s no benefit to getting the Aurora layer upstream, no one else could use it anyway.
Citus is an extension, not a fork.
So neither of these projects are doing Postgres a dis-service. Both are actually pretty heavily aligned with the continued success and maintenance of mainline open source Postgres.
> Amazon has explained in their reinvent videos that Aurora is the storage layer of Postgres rewritten to be tightly coupled to their AWS infrastructure. So it is just regular Postgres (they upgrade to latest on a slightly slower cadence). And there’s no benefit to getting the Aurora layer upstream, no one else could use it anyway.
I don't think this is an accurate analysis. For one, they had to make a lot of independent improvements to not have performance regress horribly after their changes, and a lot of those could be upstreamed. Similarly, they could help with the effort to make table storage pluggable, but they've not, instead opting to just patch out things.
> Citus is an extension, not a fork.
Used to be a fork though.
> Both are actually pretty heavily aligned with the continued success and maintenance of mainline open source Postgres.
How is Amazon meaningfully involved in the maintenance of open source postgres?
This is the future and it's not just big companies doing it.
Virtually all of the companies that were built on open source products in the past few years stopped centering their focus as being the best place to run said open source program, but instead holding back performance and feature improvement as proprietary instead of pushing back upstream.
we now have closed source Amazon Aurora infrastructure that boasts performance gains that might never see it back upstream (who knows if it's just hardware or software or what behind the scenes here)
The performance benefits of Aurora over Postgres are mostly because Amazon rewrote the storage engine to run on top of their infrastructure.
> - we now have Amazon DocumentDB that is a closed source MongoDB-like scripting interface with Postgres under the hood
To clarify, Amazon DocumentDB uses the Aurora storage engine, which is the same proprietary storage engine that is used by Aurora MySQL and Aurora PostgreSQL, and gives you multi-facility durability by writing 6 copies of your data across 3 facilities, with a 4 of 6 quorum before writes are acknowledged back to the client.
So, it's a bit inaccurate to say that DocumentDB has anything to do with Postgres.
I would argue Microsoft's strategy actually makes them more wedded and committed to ensuring the vitality of open source PostgreSQL than anything AWS is doing.
The big news here: Citus Data donated 1% of their equity to non-profit PostgreSQL organizations[1] so this acquisition is a win for the community even in the darkest scenario of Citus Data disappearing into a canyon on the Microsoft campus.
Given Microsoft's change in operation over recent years there's also hope that they can continue their contributions into the future.
It's fascinating to see Microsoft leave behind the "embrace, extend, extinguish" narrative only to have Amazon adopt it, causing massive rifts and action within the database community[2][3]. I am genuinely concerned about the future of open source software in this continued scenario.
An article with what I considered an outrageous headline ("Is Amazon 'strip mining' open source?"[4]) has only rung more true over time. Amazon is one of the largest companies on earth, selling products that they receive for free but never improve[5], attacking the primary open source provider, and then shift toward their comparable proprietary closed offerings.
Hopefully new ways to "give back", such as equity contribution, can be one of the many paths forward needed to keep open source software healthy. Given how much innovation is unlocked by this, it'd be a crime to go back to the past era.
[5]: From [2], "Jay Kreps, a creator of Kafka and co-founder and CEO of Confluent ... said Amazon has not contributed a single line of code to the Apache Kafka open-source software and is not reselling Confluent’s cloud tool."
I still can't get over the fact that Microsoft is using Postgres internally, if you had told me that 5 years ago I wouldn't have believed it. Did they go into why over MSSQL?
The main question is: Did MS want an expert PgSQL team to work on Azure PostgreSQL (and may to create a proprietary competitor to Aurora)? Or Did they acquire Citus for its product, to improve and market it further?
It feels like it was the first. If so, it means bad news for Citus product as it will most likely be ignored for a while. That will be really sad, as I don't know any actively supported automated sharding solution for PgSQL other than Citus. There is PostgresXL[1], but there isn't much focus to make it community friendly.
I don't think anyone should expect acquihiring an expert Postgres team to work on a proprietary product to work well, because the programmers' skills are eminently transferrable.
Half the team would probably wander off to work for one of the other postgres-centered companies (and quite possibly continue to work on the open source Citus code).
When we first started building Citus, we began on the OLAP side. As time has gone on Citus has evolved to have full transactional support, first when targeting a single shard, and now fully distributed across your Citus database cluster.
Today, we find most who use the Citus database do so for either:
(OLTP) Fully transactional database powering their system of record or system of engagement (often multi-tenant)
(HTAP) For providing real-time insights directly to internal or external users across large amounts of data.
Great news for Citus, Microsoft, Postgres and for people using open source relational databases. This makes so much sense. (I know this comment might read naive to some but I’m genuinely excited right now)
I'm pretty excited as well... Especially if this means improvements to Azure's PostgreSQL options. DBaaS is one of the areas where cloud providers give a LOT of value, more so as long as the interfaces you use can be used internally/locally for development.
Similarly, I really appreciate MS-SQL for Linux on Docker as it is a lot easier to setup for CI/CD and local for dev and testing and is nearly transparent going to Azure SQL or MS SQL Enterprise for hosted deployments. I'd much rather use PostgreSQL with PLv8 than MS-SQL though.
I wonder how long it will be before they shutdown their own Citus Cloud hosted offering, which is hosted on AWS. Seems obvious that will become part of Azure soon.
I doubt they'd disrupt their AWS operations right away - this certainly won't be the first time that a MSFT team/subsidiary has used AWS.
What's more worrying to me is if they try to do both - build out a Citus offering on Azure, and simultaneously try to keep high-reliability of their AWS Citus Cloud, which may be the most reliable option for some time. It's tough for any organization, no matter how much capital has been injected, to keep a laser-sharp eye on two inevitably-competing initiatives, each of which have their own performance and automation characteristics. I don't want the one person in the company who knows, say, cloud hard drive recovery patterns like the back of their hand and had previously been the EBS guru, to suddenly be pulled into the new Azure optimization project... and that's not something that capital injections can necessarily fix.
That said, this could accelerate their development timelines overall, and it guarantees stability for the product for quite a while. Overall I think this is good news! Citus is one of those things that you want to have in your back pocket when building any type of app on Postgres, and we certainly see it in our company as a long-term "escape hatch" when we're forced to make database-heavy design decisions at currently relatively-small scale. This deal keeps it alive and prospering!
- we now have closed source Amazon Aurora infrastructure that boasts performance gains that might never see it back upstream (who knows if it's just hardware or software or what behind the scenes here)
- we now have Amazon DocumentDB that is a closed source MongoDB-like scripting interface with Postgres under the hood
- lastly, with this news, looks like Microsoft is now doubling down on the same strategy to build out infrastructure and _possibly_ closed source "forked" wins on top of the beautiful open source world that is Postgres
Please, please, please let's be sure to upstream! I love the cloud but when I go to "snapshot" and "restore" my PG DB I want a little transparency how y'all are doing this. Same with DocumentDB; I'd love an article of how they are using JSONB indices at this supposed scale! Not trying to throw shade; just raising my eyebrows a little.
[1]. https://www.citusdata.com/blog/2016/03/24/citus-unforks-goes...
[2]. https://www.citusdata.com/blog/2017/10/25/what-it-means-to-b...
As we continue to grow GitLab this Citus is the leading option to scale out database out. I'm glad that this option will still be there tomorrow.
Yep, I love the fact that y'all went the extension route much like https://www.timescale.com/ and others.
Moreover, using copyleft software doesn't mean using forces you to release code. There are specific interactions that trigger the sharing clause in, for example, the GPL, such as distribution, linking, and so on. There remain many, many uses that allow commercialization that do not run afoul of the copyleft nature of the GPL.
I am commenting because I have seen this sentiment repeated ad nauseum on here and, maybe that's not what you meant, but I felt the need to clarify. Moreover, if the code is not AGPL, most online uses do not run afoul, because the code product (say executables) are not themselves being distributed. AGPL was formulated to close this loophole, but GPL code is free from this.
Now you can move to Azure without having to change your code.
It's similar with Redshift although it's a much older codebase from the v8 branch with more customizations. The changes are very specific to their infrastructure and wouldn't help anyone else since it's not designed as an on-prem deployable product.
There's also no confirmation that DocumentDB runs on Postgres and its most likely a custom interface layer they wrote themselves. If you just want MongoDB on postgres then there are already open source projects that do it.
Another commercial MPP database based on Postgres 8.x, GreenplumDB, was open-sourced a few years back. The changes are so extensive that there's little hope of catching up with the current Postgres codebase. Given the focus on OLAP and analytics over OLTP, there might not even be a strong motivation to catch up, either.
https://www.citusdata.com/product/community
It's a differentiator for some of their workloads: you don't have to hand your business over to a black box.
The HN community did a little bit of reverse engineering.
If we're at the point where Amazon can just re-implement whatever project they want, more or less from scratch, I'm not sure there's any license that can save us. :(
Citus is an extension, not a fork.
So neither of these projects are doing Postgres a dis-service. Both are actually pretty heavily aligned with the continued success and maintenance of mainline open source Postgres.
I don't think this is an accurate analysis. For one, they had to make a lot of independent improvements to not have performance regress horribly after their changes, and a lot of those could be upstreamed. Similarly, they could help with the effort to make table storage pluggable, but they've not, instead opting to just patch out things.
> Citus is an extension, not a fork.
Used to be a fork though.
> Both are actually pretty heavily aligned with the continued success and maintenance of mainline open source Postgres.
How is Amazon meaningfully involved in the maintenance of open source postgres?
Virtually all of the companies that were built on open source products in the past few years stopped centering their focus as being the best place to run said open source program, but instead holding back performance and feature improvement as proprietary instead of pushing back upstream.
The performance benefits of Aurora over Postgres are mostly because Amazon rewrote the storage engine to run on top of their infrastructure.
To clarify, Amazon DocumentDB uses the Aurora storage engine, which is the same proprietary storage engine that is used by Aurora MySQL and Aurora PostgreSQL, and gives you multi-facility durability by writing 6 copies of your data across 3 facilities, with a 4 of 6 quorum before writes are acknowledged back to the client.
So, it's a bit inaccurate to say that DocumentDB has anything to do with Postgres.
Given Microsoft's change in operation over recent years there's also hope that they can continue their contributions into the future.
It's fascinating to see Microsoft leave behind the "embrace, extend, extinguish" narrative only to have Amazon adopt it, causing massive rifts and action within the database community[2][3]. I am genuinely concerned about the future of open source software in this continued scenario.
An article with what I considered an outrageous headline ("Is Amazon 'strip mining' open source?"[4]) has only rung more true over time. Amazon is one of the largest companies on earth, selling products that they receive for free but never improve[5], attacking the primary open source provider, and then shift toward their comparable proprietary closed offerings.
Hopefully new ways to "give back", such as equity contribution, can be one of the many paths forward needed to keep open source software healthy. Given how much innovation is unlocked by this, it'd be a crime to go back to the past era.
[1]: https://www.citusdata.com/newsroom/press/citus-data-donates-...
[2]: https://www.cnbc.com/2018/11/30/aws-is-competing-with-its-cu...
[3]: https://techcrunch.com/2019/01/09/aws-gives-open-source-the-...
[4]: https://www.cbronline.com/analysis/aws-managed-kafka
[5]: From [2], "Jay Kreps, a creator of Kafka and co-founder and CEO of Confluent ... said Amazon has not contributed a single line of code to the Apache Kafka open-source software and is not reselling Confluent’s cloud tool."
They made a decent cloud business model out of it (no idea how successful but everyone I asked was happy with it).
I just hope Microsoft allow the tech to evolve as open source!
Current Microsoft sure will. They're good with open source stuff.
Considering the competitive database landscape, this is a compelling offering to add to any cloud portfolio. Congrats to the Citus team.
It feels like it was the first. If so, it means bad news for Citus product as it will most likely be ignored for a while. That will be really sad, as I don't know any actively supported automated sharding solution for PgSQL other than Citus. There is PostgresXL[1], but there isn't much focus to make it community friendly.
[1]: https://www.postgres-xl.org/overview/
Half the team would probably wander off to work for one of the other postgres-centered companies (and quite possibly continue to work on the open source Citus code).
https://www.citusdata.com/blog/2018/06/07/what-is-citus-good...
---
When we first started building Citus, we began on the OLAP side. As time has gone on Citus has evolved to have full transactional support, first when targeting a single shard, and now fully distributed across your Citus database cluster.
Today, we find most who use the Citus database do so for either:
(OLTP) Fully transactional database powering their system of record or system of engagement (often multi-tenant)
(HTAP) For providing real-time insights directly to internal or external users across large amounts of data.
Similarly, I really appreciate MS-SQL for Linux on Docker as it is a lot easier to setup for CI/CD and local for dev and testing and is nearly transparent going to Azure SQL or MS SQL Enterprise for hosted deployments. I'd much rather use PostgreSQL with PLv8 than MS-SQL though.
What's more worrying to me is if they try to do both - build out a Citus offering on Azure, and simultaneously try to keep high-reliability of their AWS Citus Cloud, which may be the most reliable option for some time. It's tough for any organization, no matter how much capital has been injected, to keep a laser-sharp eye on two inevitably-competing initiatives, each of which have their own performance and automation characteristics. I don't want the one person in the company who knows, say, cloud hard drive recovery patterns like the back of their hand and had previously been the EBS guru, to suddenly be pulled into the new Azure optimization project... and that's not something that capital injections can necessarily fix.
That said, this could accelerate their development timelines overall, and it guarantees stability for the product for quite a while. Overall I think this is good news! Citus is one of those things that you want to have in your back pocket when building any type of app on Postgres, and we certainly see it in our company as a long-term "escape hatch" when we're forced to make database-heavy design decisions at currently relatively-small scale. This deal keeps it alive and prospering!