Readit News logoReadit News
GeneralMayhem · 3 years ago
> I suppose the more fundamental question is: why are you not using a database that does sharding for you? Over the past few years the so-called “serverless” database has gotten a lot more traction. Starting with the infamous Spanner paper, many have been thinking about how running a distributed system should be native to the database itself, the foremost of which has been CockroachDB. You can even run cloud Spanner on GCP.

This is the most important paragraph in the article. In a world where things like Dynamo/Cassandra or Spanner/Cockroach exist, manually-sharded DB solutions are pretty much entirely obsolete. Spanner exists because Google got so sick of people building and maintaining bespoke solutions for replication and resharding, which would inevitably have their own set of quirks, bugs, consistency gaps, scaling limits, and manual operations required to reshard or rebalance from time to time. When it's part of the database itself, all those problems just... disappear. It's like a single database that just happens to spread itself across multiple machines. It's an actual distributed cloud solution.

My current employer uses sharded and replicated Postgres via RDS. Even basic things like deploying schema changes to every shard are an unbelievable pain in the ass, and changing the number of shards on a major database is a high-touch, multi-day operation. After having worked with Spanner in the past, it's like going back to the stone age, like we're only a step above babysitting individual machines in a closet. Nobody should do this.

grogers · 3 years ago
All the "auto-sharding" DBs have their own quirks, especially around hot key throughput. You often end up having to add "sharding bits" to the beginning of your keys to get enough throughput. The size of one partition is usually tiny compared to one partition of an RDBMS too. So it ends up being not nearly the panacea that it would seem to be.
bpicolo · 3 years ago
Software side sharding keys seem significantly simpler than managing sharding infrastructure. That's the beauty of memcached right?
dagss · 3 years ago
Would love to be able to use Spanner..

What do you suggest one does if one has to run on Azure though?

PS Azure Cosmos is not a real product, it is a beta toy that Microsoft just has made expensive enough that people think it cannot possibly be as bad as it is..

avinassh · 3 years ago
> PS Azure Cosmos is not a real product, it is a beta toy that Microsoft just has made expensive enough that people think it cannot possibly be as bad as it is..

what kind of problems you ran into with Cosmos?

vinay_ys · 3 years ago
You could consider one of these three –

1. yugabytedb (postgres compatible)

2. cockroachdb (postgres compatible)

3. tidb (mysql compatible)

You could self-host these on azure or subscribe to each of their managed service offering.

Of course, sticker price for managed will be more expensive than self-hosted enterprise product which itself will be more expensive than the self-hosted, community-supported, open-source option (which seems to be a real option with YugabyteDB).

yawgmoth · 3 years ago
Recently joined an Azure shop so new to Cosmos, but it's fairly abstract as a product. It seems to be a front for different database types like Mongo or Postgres or SQL Server with some additional features such as subscriptions.

Would be interested to hear your challenges.

BiteCode_dev · 3 years ago
If it's one more man-week of pain a year, but the spanner solution add a additional cost that is 10 times that, there is an incentive for the employer to keep the painful solution.

That and lock in, open source ecosystem and so on.

It's not a black and white situation.

KaiserPro · 3 years ago
> In a world where things like Dynamo/Cassandra or Spanner/Cockroach exist, manually-sharded DB solutions are pretty much entirely obsolete.

Not really. Cassandra is a write optimised, slow, network and IO heavy beast thats a pain in the arse to administer. We replaced a 6 node m4.4xlarge with a single db.m4.2xlarge. on postgres.

You need to pick your DB to match your data, not the otherway around.

billythemaniam · 3 years ago
Spanner and Cassandra really shouldn't be in the same sentence. They are optimized for very different use cases. The "obsolete" part of that quote does apply to Spanner, TiDB, and CockroachDB in my experience. I haven't used Yugabyte, but the other sharded databases, including Vitess (TiDB is what Vitess is trying to be), actually make life harder.
foobiekr · 3 years ago
Most orgs, including mine, also have the experience of a cassandra cluster simply melting down. That was more common ten years ago, but it is pretty much chaos when it happens.

If the non-SQL distributed databases, I found FoundationDB to be by far the most robust and zero-overhead.

However, at least through version 6, it had absolutely terrible, service-destroying behavior if any of the nodes became low on storage, dropping transaction levels through the floor and since transactions are used for recovery and cluster expansion... Since it does a very nice job auto-sharding, if you're not paying attention this tends to happen to multiple nodes around the same time.

I wish it would get more attention though. It is a really amazing tool.

bsaul · 3 years ago
How does cockroach compares in terms of performance to manual sharded databases ?

My intuition is that a properly sharded database will perform faster-or-same as a non-sharded one in all scenarios. Whereas automatically-sharded database will actually perform worst until you start reaching critical traffic that a single instance won't handle no matter what.

Am i wrong ?

arjunnarayan · 3 years ago
[disclosure, former cockroachdb engineer]

you can get expected "single shard" performance in CockroachDB by manually splitting the shards (called "ranges" in CockroachDB) along the lines of the expected single shard queries (what you call a "properly shared database"). This is easy to do with a single SQL command. (This is what we do today; we use CockroachDB for strongly consistent metadata).

The difference between CockroachDB and a manually sharded database is that when you _do_ have to perform some cross-shard transactions (which you inevitably have to do at some point), in CockroachDB you can execute them (with a reasonable performance penalty) with strong consistency and 2PC between the shards, whereas in your manually sharded database... good luck! Hope you implement 2PC correctly.

crazygringo · 3 years ago
> My intuition is that a properly sharded database will perform faster-or-same as a non-sharded one in all scenarios.

No, because as soon as you have to JOIN across shards performance plummets. I mean, you can't JOIN across shards so you have to simulate that client-side which is very slow.

The secret to performant manual sharding is to figure out a way to minimize JOINs across shards, and that the ones you have to do are very small. Or to start replicating data which quickly becomes its own nightmare of consistency.

jandrewrogers · 3 years ago
Where the line gets blurry is that there can be layers of sharding inside the database even if it is never exposed to the end user. At the limit, disk pages are essentially a shard and treated similarly. Some database kernels do true sharding and resharding within a single server transparently for performance reasons, it isn't just for having data on multiple servers or cores.

There is no intrinsic performance penalty for automatic sharding, it can be as fast or faster than any other scheme. The "faster" cases are when the sharding structure must be adaptive to have uniform distribution of data and load, such as when the key distribution is intrinsically unpredictable or when you are doing multi-attribute sharding. In these cases, effective sharding can only be created by software at runtime.

hilbertseries · 3 years ago
> My intuition is that a properly sharded database will perform faster-or-same as a non-sharded one in all scenarios.

This is workload dependent, if your queries only go to one shard, this is likely true. But if you have cross shard queries, then it’s no longer true.

varsketiz · 3 years ago
> In a world where things like Dynamo/Cassandra or Spanner/Cockroach exist, manually-sharded DB solutions are pretty much entirely obsolete.

Far from it. Plenty of companies that are now running mysql or postgress will shard manually when they will need to scale.

dalyons · 3 years ago
It can still be an obsolete practice even if some continue to choose to do it.
throwawaaarrgh · 3 years ago
distributed decentralized databases are their own special nightmare. they are all full of hidden traps, footguns, pains, limits. the more you push them the more these emerge.

in some ways the stone age was terrible, and in some ways the stone age was far superior to the current age. we've gotten soft, lazy, stupid, comfortable. we used to be feral animals, but now we're more like house plants. if we got more used to doing things like dumb animals, life would be simpler, and we could appreciate what we have more. but it's hard to resist the temptation of feeling like what we have is never enough. the grass is always greener, and how terrible that it isn't ours.

game_the0ry · 3 years ago
I just have to say - thanks for your response.

I am a full stack dev but I end up doing a lot of front end and rarely have to touch SQL. Posts like yours help me to realize there's still a lot I do not know while still being informative. So thanks <3

cornel_io · 3 years ago
Solutions like Spanner are fantastic for some things (and within Google it's a no-brainer, since you're not paying an outrageous markup), but besides being expensive, they don't just let you drop them in as a scalable replacement for an existing usage pattern, and usually sharding starts first coming up when you're at a huge scale and already have a ton of app code and database design but have grown past what a single db can support (that's a massive amount of traffic these days, well over 1M DAU for most apps). See for instance https://cloud.google.com/spanner/docs/schema-design, a lot of what you have to plan for is very different than if you were designing a normal (e.g.) Rails app with typical patterns.

It's a great option, as long as you're aware of the tradeoffs and don't expect it to act (and cost) exactly like a single Postgres server.

My order of ops generally goes:

- Can you get away with pure (or mostly) key/value for your hottest table(s) and mostly avoid queries? Use Dynamo or Google Cloud Datastore, but deeply understand their quirks (e.g. never index a timestamp field on Datastore or you're worse off than a weak-ass SQL server for write-heavy loads). These can scale basically forever with zero effort if you fit their constraints, and are cheap enough

- Can you tolerate the price of Spanner and deal with it not being normal SQL? Go for it, expect some non-trivial migration cost

- If you have to shard/cluster yourself, can you make Redis work? It'll be easier operationally than managing N SQL DBs of any kind. I know, if you could you'd probably have been fine using Dynamo...

- Can you soft-shard, e.g. use different machines for different tables and spread load enough to get each table onto a single server? Do it.

- Can you minimize re-sharding ops? (If you're in a massive growth phase, the answer is no) Ok, fine, do SQL sharding, and make sure you have well-tested code to do it and you mostly use key/value access patterns.

- Consider a beefy AF bare metal SQL server, see how far that gets you.

- If none of this applies, re-evaluate the cost ($ and time) of Spanner, you're paying a big price one way or another

- Only bad ideas from this point on, just do sharding but it'll suck...

Nowhere on my list: running your own cluster for Cassandra, Mongo, or anything similar, it's such an ops nightmare and there are hosted services that are better in every way but cost.

If you do end up writing your own sharding layer, best of luck, I have yet to see any reasonably production-ready ones that work well with pretty much any web framework or ORM. And writing these is so tough, so many edge cases to consider...not something I hope to ever do again.

GeneralMayhem · 3 years ago
> Consider a beefy AF bare metal SQL server, see how far that gets you.

Yep, this is of course the default - although you'll at least want replication for availability and disaster recovery, but that's not too bad with PG/RDS.

> they don't just let you drop them in as a scalable replacement for an existing usage pattern, and usually sharding starts first coming up when you're at a huge scale and already have a ton of app code and database design but have grown past what a single db can support

I think once you get beyond what fits on one primary box, you're almost certainly at the point of rethinking your data access models anyway. Manual sharding is disruptive too.

branko_d · 3 years ago
> Consider a beefy AF bare metal SQL server, see how far that gets you.

Quite far, possibly to millions of users.

Case in point: StackOverflow runs on SQL Server.

eternalban · 3 years ago
Because distributing a database is sensitive to cap semantics (AP or CP) and data dependencies (graph partitioning is hard *) and storage engine choices are driven by use-cases and the general technical solution is thus highly complex**. Spanner uses atomic clocks, for example. Running CockroachDB yourself is [very likely] not the same thing as using a saas varient, either. Sight unseen, it can not be 'trivial'. Same for Spanner. The general solution seems to require paying someone to provide the service. In sum, it is not a clear cut yes/no situation.

btw, [distributed] Postgres iirc was never as stellar the single node (the stuff we sing praises of) vs the distributed deployment. I'm sure it has improved significantly.

> "manual operations required to reshard or rebalance from time to time. When it's part of the database itself, all those problems just... disappear."

Not really correct.

* "Choosing the right keys can help Spanner evenly distribute data and processing to avoid hotspots"

https://cloud.google.com/spanner/docs/schema-design

https://cloud.google.com/blog/topics/developers-practitioner...

** https://static.googleusercontent.com/media/research.google.c...

[Spanner certainly did -not- start off as a distributed RDBMS. Because that project would have never been given a green light. Because it is understood just how complex that system would need to be. It started off as a distributed k/v. That's it.]

"[I]n many ways today’s Spanner is very different from what was described [in original Spanner whitepaper]"

...

"The initial focus of Spanner was on scalability and fault-tolerance, much as was the case for many systems at Google. In the last few years Spanner has increasingly become a fully-fledged database system. Production deployment with internal customers taught us a great deal about the requirements of web scale database applications, and shaped the techniques we presented in this paper. Aggressive focus on horizontal scalability enabled widespread deployment without investing too heavily in single machine performance. The latter is one of the areas where we are making continuous improvements, such as upgrading lower level storage to Ressi."

"The original API of Spanner provided NoSQL methods for point lookups and range scans of individual and interleaved tables. While NoSQL methods provided a simple path to launching Spanner, and continue to be useful in simple retrieval scenarios, SQL has provided significant additional value in expressing more complex data access patterns and pushing computation to the data."

Dead Comment

phamilton · 3 years ago
A favorite resource: https://learn.microsoft.com/en-us/azure/architecture/pattern...

Microsofts Azure Cloud Patterns is some of the best documentation out there. It's not Azure centric. It focuses on why you may want to do something and describes the techniques that are commonly used to solve it. It's pretty great.

beebmam · 3 years ago
I also couldn't recommend this higher. If you are designing a new application, read these docs. You will almost certainly learn something deeply useful that you'll carry me with you for the rest of your career.
sahn44 · 3 years ago
Great resource and very well written articles. I'm always curious how these come to be when done really well. I wish we could generate high quality, useful, correct, internal documentation like this at my company, but I've never figured out how to get that done in practice. The people that know the concepts are not natural/gifted writers, or at least wouldn't be motivated or prioritize doing this. Even if they did, the style/tone/depth would vary greatly. For these msft docs, they somehow achieve consistent tone, depth, and quality/correctness across so many domains and I just don't know how that's achieved.
revskill · 3 years ago
The main issue is, i can't stand C# or OOP syntax to illustrate patterns. Why class here ? For God sake, please use simple functions to prove the points.
cpurdy · 3 years ago
I think it only reasonable then for you to show us how to do it in COBOL, Lisp, or some other 1950s programming language.
yawgmoth · 3 years ago
They might supply Python examples if you ask. A lot of Microsoft documentation has Python as well as C# these days.
Douger · 3 years ago
Just want to say thanks for pointing out this resource. Will make for some great long weekend reading!
andirk · 3 years ago
Sometimes I wonder how many other industries allow a worker to stumble upon a document that they then read outside of work, despite the details often being kind of difficult to understand at first, frustrating even, but we do it literally as a pastime, for pleasure. I love that.
danjc · 3 years ago
Never come across this before. The collection of articles one level up from your link is an incredible resource.
shortrounddev · 3 years ago
MSDN is like the Wikipedia of coding problems. You documentation for things that have nothing to do with MS there
game_the0ry · 3 years ago
Super helpful, thanks!
dikei · 3 years ago
One thing I notice is the over-usage of sharding, especially hash-based, might turn your Relational Database into just another key-value store, with consistency constraints moving into application code, and you lose many advantages of a traditional RDBMS
dehrmann · 3 years ago
Online use cases always(?) scale out to key-value stores. Offline use cases almost always scale to distributed, date-partitioned columnar stores.
paulryanrogers · 3 years ago
What's the alternative as things get too big? Sharding by date? By client? How do you prevent hotspots?
preseinger · 3 years ago
the only real answer here is to shard by customer

optimistically, you can try to shard by read use case, but that's never gonna be stable over time

if you need a true multi-tenant system you can only shard by individual entity and move all of the composition logic to the next layer up, there's no way to cheat

dikei · 3 years ago
Pass a certain point, you ought to think about whether to keep using a RDBMS as a K-V store or switch to a real distributed K-V store like Cassandra, ScyllaDB, DynamoDB and the like

About hot spots, it has always been an issue with K-V stores, and the only real solution is a good key design, though there are some tricks:

* Use a uniformly distributed but deterministic key prefix. For example, instead of using raw user_id as key, attach a small hash before it: (hash(<user_id>), <user_id>) This can help with load distribution if your <user_id> is not uniformly distributed by itself such as a phone or id number.

* Add more data to your key to increase cardinality. For example, with time series data, instead of using object_id as partition key, use (user_id, time_bucket) so the data for a busy object will get split into different partition over time.

esafak · 3 years ago
Distributed databases like Yugabyte, Cockroach, TiDB, etc.
AdieuToLogic · 3 years ago
Pick a a stable, guaranteed-to-exist, shard key (composite or atomic properties) and use that. If composite, order properties used in most-to-least distinct value distribution.
charcircuit · 3 years ago
Yes, this is where caching layers come in. Now your cache acts as the RDBMS.
darwinwhy · 3 years ago
The first link in the article [1], to a blogpost about the etymology of the definition of "shard" that we now think of as the primary meaning of the word, is super interesting.

The release of Ultima Online doesn't really seem too close to any inflection point of the word on Google Ngram, but I'm not sure exactly how close we should expect it to be to 1980 or 2000.

1. https://www.raphkoster.com/2009/01/08/database-sharding-came...

Dwedit · 3 years ago
Shards are the secret ingredient in the webscale sauce. They just work.
samsquire · 3 years ago
Thanks for this article planetscale. I've been enamoured with sharding recently but more for multithreaded performance.

I want multimaster postgres.

I started trying to write a postgres synchronizer, by sorting every data by row and column and hashing the data of every column and row, then doing a rolling hash of the data.

In theory, two databases can synchronize by sending the final hash of their data and then doing a binary search backwards until the hashes match. This way you can work out which parts of the databases differ and need to be transmitted. If the databases are identical, very little data gets transferred.

One problem I've not worked out is how to decide which database is the winning database without having to change application queries.

If you synchronize two multimaster Postgres databases that have had independent writes to different sections, how do you identify which database is the source of truth for a column/row combination?

AdieuToLogic · 3 years ago
> One problem I've not worked out is how to decide which database is the winning database without having to change application queries.

I believe this is a case of the "Two Generals' Problem"[0], which implies that there is no provably correct solution to achieve this.

> If you synchronize two multimaster Postgres databases that have had independent writes to different sections, how do you identify which database is the source of truth for a column/row combination?

You can't without a quorum[1] and even that does not guarantee success.

0 - https://en.wikipedia.org/wiki/Two_Generals%27_Problem

1 - https://en.wikipedia.org/wiki/Split-brain_(computing)

setr · 3 years ago
> You can't without a quorum[1] and even that does not guarantee success.

Isn’t split-brain largely resolved by the rules:

1. Must have a strict majority of the total group agree to commit

2. Total should be an odd number

preseinger · 3 years ago
"a binary search backwards" is not well defined and doesn't guarantee any upper bound on consistency

sharding data based on individual rows within tables is tricky, you won't get reliable guarantees for queries in this way

"which database is the winning database" is a function of individual rows, a query that reads data from N different row-owners needs to query N different instances, or else accept that it will work against stale data

myzie · 3 years ago
Spock may be of interest to you: https://pgedge.github.io/spock/
brightball · 3 years ago
You need Citus
yuji · 3 years ago
Citus follows a single-master architecture for replication just like standard postgres. Citus would be a good solution to add additional writable nodes via sharding tables accross multiple databases.

Recently they've been acquired by Microsoft & have started to been integrated into their CosmosDB offering. Anybody have good experience with this DBaaS offering?

mamcx · 3 years ago
And the best sharding? Doing multi-tenant.

Sharding at table level is very complex and expensive. Fully give a single DB per tenant is very practical, and the reasons to do sharding (like reporting) is where you do the other fancy things (like ship events in to kafka, etc, etc).

Also, most issues of scalability are dominated by a few tenants that consume most resources, and distribute the loads is more easy per-tenant.

canadianhacker · 3 years ago
I think we need intuitive tooling to let developers continue just thinking in multi-tenant terms, with single-tenant behind the scenes.
bsaul · 3 years ago
I've been doing that as much as possible. however you're still left with availability issues, such as replicating / failover management etc. Which really are orthogonal issues to tenancy.

How do you manage that ?

mamcx · 3 years ago
You need at least 1 backup and/or replica.

Availability is kinda good: You could lose one/few tenants but that not take down all the rest (if they are isolated properly).

Is MUCH worse if all the db made the fancy way get down, or worse, you get a cascade of latency and/or crashed by the interlinked nature of the "scalable architecture that is fact share-everything global singleton".

And you can get a bit fancy (not done it myself, my customers tolerate a bit of downtime) of fork the tenant and upload it another node.

stn_za · 3 years ago
Issue is when your primary tenant(s) are much larger than the other...eventually vertical scaling is difficult even for only 1 tenant, depending on it's size.
cryptonym · 3 years ago
I have been doing that in a previous project. One DB or one Schema per tenant can also simplify backup/restore, rollback, export, and data boundaries.

Deleted Comment

elsadek · 3 years ago
I haven't heard again about DB sharding technique since prior 2010. This technique was used to separate, in a given table, most involved fields from those less used by using different server for each shard. With the rise of memory database like Redis, sharding was abondonned.