Interesting, the separate compute and storage tiers is another system going that direction which I think is becoming almost the standard at this point, especially for "cloud-native" things designed to run on k8s. From what I can tell (it isn't very explicit on this point) they are avoiding a distributed consensus at the storage layer and instead relying on a single writer/multiple reader model with the single writer being enforced by assignment of the tablets in the compute tier, with the tablet being responsible for writing to multiple storage nodes for durability? (But I might be wrong)
Assuming yes this approach, I think, is under utilized and is pretty similar to how Apache Pulsar works (my day job),but I am not sure how many distributed RDBMS have tried it out, will be cool to see how it evolves! It isn't clear how they ensure the assignment of a tablet to only a single compute node, but I think that is an easier problem relative to distributed consensus at the storage tier.
Each tablet gather a quorum of answers from members of so called BlobStorage group. BlobStorage group is a number of so called VDisks (virtual disk), all VDisks run on different nodes (even on different fail domain like racks, AZs). VDisk stores its data on physical device, i.e. PDisk.
From my past experience, Datomic uses the same approach, ie, multiple reader nodes and a single transact node. However, it's much more locked in with AWS, as it uses Dynamo and S3 for backing (maybe others as well?).
Assigning leaders is trivial with something like zookeeper. But in this case it appears that the leader metadata is stored in a table of the database itself, which raises questions of operability if those tablets are unavailable.
YDB doesn't use Zookeeper. The system is built of tablets, every tablet implements distributed consensus algorithm. There are different types of tablets in the system, say SchemeShard is tablet that stores metadata, table schema for instance. DataShard stores table partition data.
While you have a good point, I need to say that traditional databases and distributed databases differ in some points. For instance in YDB you can connect to any node of a database, it means that you need some kind of balancing (server side or client side). I'm not talking about SQL dialect right now, but rather how you connect to database and how you handle connection losses or node overloads. YDB SDKs have client side balancing feature to distribute load evenly across database nodes.
> The official docs basically solve your problems without much fuss, you don't have to rely on horrible vendor-specific forums.
tdodbc also comes with examples that are so far beyond the scope of a normal persons usecase, they honestly just feel like some greybeard doing the equivalent of a 10 minute long Tony Hawk combo. It's just such a pain in the butt to use, we had to fight teradata all the way till very recently.
Yes, you can think this way. But I need to add, that YDB is also a platform for developing distributed systems that store data. YDB provides a scalable and replicated storage with low latency, a conception of a tablet (that is also used in many systems) that implements distributed consensus. These building blocks are used for persistent queue implementation, block store, KV-tablets. These blocks are hard to develop and they are very good when you need to build something new or optimal for a specific problem. OLTP is an example of such a problem. But yes, we were building YDB to support OLTP workload initially.
I think they wanted to have a DB that is better tuned to distributed systems. Still don't know, why they do an SQL like query language called YQL (what would that mean in practical terms? Could a common ORM framework like JPA deal with the YQL query language ?)
I guess its depends, for many use cases, it can be managed at application level. I've parted ways with FK for a long time since it created more hassles than it solved esp when it comes to sharding and replications.
that's very common in distributed databases. even traditional databases, it's very common to not have FKs on large tables, and just handle it in software. indexing billions or more of rows is non-trivial.
The table is a bit outdated, we are going to fix it.
To be honest, YQL is a very popular language in our company, it is successfully used for more than 7 years, but I agree that outside people want to see more standard SQL dialect.
Assuming yes this approach, I think, is under utilized and is pretty similar to how Apache Pulsar works (my day job),but I am not sure how many distributed RDBMS have tried it out, will be cool to see how it evolves! It isn't clear how they ensure the assignment of a tablet to only a single compute node, but I think that is an easier problem relative to distributed consensus at the storage tier.
https://github.com/ydb-platform/ydb/tree/main/ydb/library/yq...
It's fascinating to see the PG's prevalence as the de-fact SQL standard.
Also the ecosystem... The official docs basically solve your problems without much fuss, you don't have to rely on horrible vendor-specific forums.
tdodbc also comes with examples that are so far beyond the scope of a normal persons usecase, they honestly just feel like some greybeard doing the equivalent of a 10 minute long Tony Hawk combo. It's just such a pain in the butt to use, we had to fight teradata all the way till very recently.
Is it build on top of PG ?
[0] https://github.com/ydb-platform/ydb/tree/main/ydb/library/yq...
https://db-engines.com/en/system/PostgreSQL%3BYandex+Databas...
I think they wanted to have a DB that is better tuned to distributed systems. Still don't know, why they do an SQL like query language called YQL (what would that mean in practical terms? Could a common ORM framework like JPA deal with the YQL query language ?)
Back to looking at Couchbase, Yugabase, or Citus for my distributed SQL.