YDB – An open-source Distributed SQL Database

Interesting, the separate compute and storage tiers is another system going that direction which I think is becoming almost the standard at this point, especially for "cloud-native" things designed to run on k8s. From what I can tell (it isn't very explicit on this point) they are avoiding a distributed consensus at the storage layer and instead relying on a single writer/multiple reader model with the single writer being enforced by assignment of the tablets in the compute tier, with the tablet being responsible for writing to multiple storage nodes for durability? (But I might be wrong)

Assuming yes this approach, I think, is under utilized and is pretty similar to how Apache Pulsar works (my day job),but I am not sure how many distributed RDBMS have tried it out, will be cool to see how it evolves! It isn't clear how they ensure the assignment of a tablet to only a single compute node, but I think that is an easier problem relative to distributed consensus at the storage tier.

fomichev3000 · 4 years ago

Each tablet gather a quorum of answers from members of so called BlobStorage group. BlobStorage group is a number of so called VDisks (virtual disk), all VDisks run on different nodes (even on different fail domain like racks, AZs). VDisk stores its data on physical device, i.e. PDisk.

anikuni · 4 years ago

From my past experience, Datomic uses the same approach, ie, multiple reader nodes and a single transact node. However, it's much more locked in with AWS, as it uses Dynamo and S3 for backing (maybe others as well?).

jeffbee · 4 years ago

Assigning leaders is trivial with something like zookeeper. But in this case it appears that the leader metadata is stored in a table of the database itself, which raises questions of operability if those tablets are unavailable.

fomichev3000 · 4 years ago

YDB doesn't use Zookeeper. The system is built of tablets, every tablet implements distributed consensus algorithm. There are different types of tablets in the system, say SchemeShard is tablet that stores metadata, table schema for instance. DataShard stores table partition data.

here is a table that compares YDB to postgress

https://db-engines.com/en/system/PostgreSQL%3BYandex+Databas...

I think they wanted to have a DB that is better tuned to distributed systems. Still don't know, why they do an SQL like query language called YQL (what would that mean in practical terms? Could a common ORM framework like JPA deal with the YQL query language ?)

SomeCallMeTim · 4 years ago

If that comparison table is right about "no foreign keys," that's a showstopper for me. :(

Back to looking at Couchbase, Yugabase, or Citus for my distributed SQL.

ranguna · 4 years ago

Try looking at cockroachDB, the only thing holding me back on that is the absence of triggers.

reactor · 4 years ago

I guess its depends, for many use cases, it can be managed at application level. I've parted ways with FK for a long time since it created more hassles than it solved esp when it comes to sharding and replications.

keredson · 4 years ago

that's very common in distributed databases. even traditional databases, it's very common to not have FKs on large tables, and just handle it in software. indexing billions or more of rows is non-trivial.

fomichev3000 · 4 years ago

Foreign keys come at a cost, especially in distributed database. It's just a fact. If you can avoid them, it's great.

MichaelMoser123 · 4 years ago

didn't notice that. Wow that's pretty basic...

fomichev3000 · 4 years ago

The table is a bit outdated, we are going to fix it. To be honest, YQL is a very popular language in our company, it is successfully used for more than 7 years, but I agree that outside people want to see more standard SQL dialect.

orthoxerox · 4 years ago

As far as I understand, YDB is different enough from regular RDBMS's that they don't provide an ODBC/JDBC/ADO.Net driver.

addisonj · 4 years ago

flakiness · 4 years ago

Although the doc talks about their own SQL dialect "YQL", it seems to be supporting PG SQL as a compatible layer.

https://github.com/ydb-platform/ydb/tree/main/ydb/library/yq...

It's fascinating to see the PG's prevalence as the de-fact SQL standard.

isoprophlex · 4 years ago

imo having had to deal with oracle sql, teradata sql, mssql and mysql, postgres as a dialect is such a sane and consistent experience.

Also the ecosystem... The official docs basically solve your problems without much fuss, you don't have to rely on horrible vendor-specific forums.

While you have a good point, I need to say that traditional databases and distributed databases differ in some points. For instance in YDB you can connect to any node of a database, it means that you need some kind of balancing (server side or client side). I'm not talking about SQL dialect right now, but rather how you connect to database and how you handle connection losses or node overloads. YDB SDKs have client side balancing feature to distribute load evenly across database nodes.

awild · 4 years ago

> The official docs basically solve your problems without much fuss, you don't have to rely on horrible vendor-specific forums.

tdodbc also comes with examples that are so far beyond the scope of a normal persons usecase, they honestly just feel like some greybeard doing the equivalent of a 10 minute long Tony Hawk combo. It's just such a pain in the butt to use, we had to fight teradata all the way till very recently.

Postgres support is in progress actually.

Meai · 4 years ago

it would be nice especially for using an IDE like Datagrip to manage and explore the YDB db

julienmarie · 4 years ago

It seems they are doing the same thing they did with ClickHouse in the OLAP space but this time in the OLTP space.

Yes, you can think this way. But I need to add, that YDB is also a platform for developing distributed systems that store data. YDB provides a scalable and replicated storage with low latency, a conception of a tablet (that is also used in many systems) that implements distributed consensus. These building blocks are used for persistent queue implementation, block store, KV-tablets. These blocks are hard to develop and they are very good when you need to build something new or optimal for a specific problem. OLTP is an example of such a problem. But yes, we were building YDB to support OLTP workload initially.

wnolens · 4 years ago

Very cool.

KronisLV · 4 years ago

Currently seems to support a number of languages: https://ydb.tech/en/docs/reference/ydb-sdk/install

  - Python
  - Go
  - .NET
  - Java
  - PHP

Has some example code here: https://ydb.tech/en/docs/getting_started/sdk

samber · 4 years ago

I found a lot of mentions of "Postgresql" in repository: https://github.com/ydb-platform/ydb/search?q=postgresql

Is it build on top of PG ?

YDB is developed from ground up. We think about postgres compatibility layer, that's why you see 'postgres' in source code.

gaploid · 4 years ago

it seems not. they have compatibility for querying external PG dbs (YQL query engine), I thing that's the reason for that.

jenny91 · 4 years ago

I'd almost guess so. Yandex is one of the largest production users of pg.

eatonphil · 4 years ago

It looks [0] like they use PostgreSQL's parser library under the hood which is cool.

[0] https://github.com/ydb-platform/ydb/tree/main/ydb/library/yq...

polskibus · 4 years ago

I would love to see it undergo a Jepsen treatment and a teardown by Andy Pavlo.

konart · 4 years ago

They do mention Andy Pavlo in this publication by the way: https://habr.com/ru/company/yandex/blog/660271/ saying that YDB was inspired by his and Michael Stonebraker's NewSQL ideas.