Readit News logoReadit News
addisonj · 4 years ago
Interesting, the separate compute and storage tiers is another system going that direction which I think is becoming almost the standard at this point, especially for "cloud-native" things designed to run on k8s. From what I can tell (it isn't very explicit on this point) they are avoiding a distributed consensus at the storage layer and instead relying on a single writer/multiple reader model with the single writer being enforced by assignment of the tablets in the compute tier, with the tablet being responsible for writing to multiple storage nodes for durability? (But I might be wrong)

Assuming yes this approach, I think, is under utilized and is pretty similar to how Apache Pulsar works (my day job),but I am not sure how many distributed RDBMS have tried it out, will be cool to see how it evolves! It isn't clear how they ensure the assignment of a tablet to only a single compute node, but I think that is an easier problem relative to distributed consensus at the storage tier.

fomichev3000 · 4 years ago
Each tablet gather a quorum of answers from members of so called BlobStorage group. BlobStorage group is a number of so called VDisks (virtual disk), all VDisks run on different nodes (even on different fail domain like racks, AZs). VDisk stores its data on physical device, i.e. PDisk.
anikuni · 4 years ago
From my past experience, Datomic uses the same approach, ie, multiple reader nodes and a single transact node. However, it's much more locked in with AWS, as it uses Dynamo and S3 for backing (maybe others as well?).
jeffbee · 4 years ago
Assigning leaders is trivial with something like zookeeper. But in this case it appears that the leader metadata is stored in a table of the database itself, which raises questions of operability if those tablets are unavailable.
fomichev3000 · 4 years ago
YDB doesn't use Zookeeper. The system is built of tablets, every tablet implements distributed consensus algorithm. There are different types of tablets in the system, say SchemeShard is tablet that stores metadata, table schema for instance. DataShard stores table partition data.
flakiness · 4 years ago
Although the doc talks about their own SQL dialect "YQL", it seems to be supporting PG SQL as a compatible layer.

https://github.com/ydb-platform/ydb/tree/main/ydb/library/yq...

It's fascinating to see the PG's prevalence as the de-fact SQL standard.

isoprophlex · 4 years ago
imo having had to deal with oracle sql, teradata sql, mssql and mysql, postgres as a dialect is such a sane and consistent experience.

Also the ecosystem... The official docs basically solve your problems without much fuss, you don't have to rely on horrible vendor-specific forums.

fomichev3000 · 4 years ago
While you have a good point, I need to say that traditional databases and distributed databases differ in some points. For instance in YDB you can connect to any node of a database, it means that you need some kind of balancing (server side or client side). I'm not talking about SQL dialect right now, but rather how you connect to database and how you handle connection losses or node overloads. YDB SDKs have client side balancing feature to distribute load evenly across database nodes.
awild · 4 years ago
> The official docs basically solve your problems without much fuss, you don't have to rely on horrible vendor-specific forums.

tdodbc also comes with examples that are so far beyond the scope of a normal persons usecase, they honestly just feel like some greybeard doing the equivalent of a 10 minute long Tony Hawk combo. It's just such a pain in the butt to use, we had to fight teradata all the way till very recently.

fomichev3000 · 4 years ago
Postgres support is in progress actually.
Meai · 4 years ago
it would be nice especially for using an IDE like Datagrip to manage and explore the YDB db
julienmarie · 4 years ago
It seems they are doing the same thing they did with ClickHouse in the OLAP space but this time in the OLTP space.
fomichev3000 · 4 years ago
Yes, you can think this way. But I need to add, that YDB is also a platform for developing distributed systems that store data. YDB provides a scalable and replicated storage with low latency, a conception of a tablet (that is also used in many systems) that implements distributed consensus. These building blocks are used for persistent queue implementation, block store, KV-tablets. These blocks are hard to develop and they are very good when you need to build something new or optimal for a specific problem. OLTP is an example of such a problem. But yes, we were building YDB to support OLTP workload initially.
wnolens · 4 years ago
Very cool.
KronisLV · 4 years ago
Currently seems to support a number of languages: https://ydb.tech/en/docs/reference/ydb-sdk/install

  - Python
  - Go
  - .NET
  - Java
  - PHP
Has some example code here: https://ydb.tech/en/docs/getting_started/sdk

samber · 4 years ago
I found a lot of mentions of "Postgresql" in repository: https://github.com/ydb-platform/ydb/search?q=postgresql

Is it build on top of PG ?

fomichev3000 · 4 years ago
YDB is developed from ground up. We think about postgres compatibility layer, that's why you see 'postgres' in source code.
gaploid · 4 years ago
it seems not. they have compatibility for querying external PG dbs (YQL query engine), I thing that's the reason for that.
jenny91 · 4 years ago
I'd almost guess so. Yandex is one of the largest production users of pg.
eatonphil · 4 years ago
It looks [0] like they use PostgreSQL's parser library under the hood which is cool.

[0] https://github.com/ydb-platform/ydb/tree/main/ydb/library/yq...

MichaelMoser123 · 4 years ago
here is a table that compares YDB to postgress

https://db-engines.com/en/system/PostgreSQL%3BYandex+Databas...

I think they wanted to have a DB that is better tuned to distributed systems. Still don't know, why they do an SQL like query language called YQL (what would that mean in practical terms? Could a common ORM framework like JPA deal with the YQL query language ?)

SomeCallMeTim · 4 years ago
If that comparison table is right about "no foreign keys," that's a showstopper for me. :(

Back to looking at Couchbase, Yugabase, or Citus for my distributed SQL.

ranguna · 4 years ago
Try looking at cockroachDB, the only thing holding me back on that is the absence of triggers.
reactor · 4 years ago
I guess its depends, for many use cases, it can be managed at application level. I've parted ways with FK for a long time since it created more hassles than it solved esp when it comes to sharding and replications.
keredson · 4 years ago
that's very common in distributed databases. even traditional databases, it's very common to not have FKs on large tables, and just handle it in software. indexing billions or more of rows is non-trivial.
fomichev3000 · 4 years ago
Foreign keys come at a cost, especially in distributed database. It's just a fact. If you can avoid them, it's great.
MichaelMoser123 · 4 years ago
didn't notice that. Wow that's pretty basic...
fomichev3000 · 4 years ago
The table is a bit outdated, we are going to fix it. To be honest, YQL is a very popular language in our company, it is successfully used for more than 7 years, but I agree that outside people want to see more standard SQL dialect.
orthoxerox · 4 years ago
As far as I understand, YDB is different enough from regular RDBMS's that they don't provide an ODBC/JDBC/ADO.Net driver.
polskibus · 4 years ago
I would love to see it undergo a Jepsen treatment and a teardown by Andy Pavlo.
konart · 4 years ago
They do mention Andy Pavlo in this publication by the way: https://habr.com/ru/company/yandex/blog/660271/ saying that YDB was inspired by his and Michael Stonebraker's NewSQL ideas.