Without the intention to undermine anyone's work (and I truly appreciate the work the KeyDB guys did), I would not use this (in production) unless I see Jepsen test results [1] either by Kyle or someone supervised by Kyle. But, unfortunately, databases are complex enough and distributed systems even more, and I've lost many sleepless nights debugging obscure alternatives that claimed to be faster in edge cases. And the worst thing is that these projects get abandoned by the original authors after a few years, leaving me with (to quote a movie) "a big bag of odor".
I see databases as a programming languages - if they have a proven track record of frequent releases and responsive authors after 5-7 years, they are usable for wider adoption.
To counter what the other active business said, we tried using KeyDB for about 6 months and every fear you concern you stated came true. Numerous outages, obscure bugs, etc. Its not that the devs aren’t good, its just a complex problem and they went wide with a variety of enhancements. We changed client architecture to work better with tradition Redis. Combined with with recent Redis updates, its rock solid and back to being an after-thought rather than a pain point. Its only worth the high price if it solves problems without creating worse ones. I wish those guys luck but I wont try it again anytime soon.
* its been around 2 years since our last use as a paying customer. YMMV.
Actively using KeyDB at work. I would normally agree with you; and for a long time after I initially heard about the project, we hesitated to use it. But the complexity and headaches of managing a redis-cluster, uh, cluster (on a single machine — which is the idiomatic way to vertically scale Redis on a single multicore machine...) eventually pushed us over the edge into a proof-of-concept KeyDB deployment.
Conveniently, we don't have to worry about the consistency properties of KeyDB, because we're not using KeyDB distributed, and never plan to do so. We were only using redis-cluster (again, on a single machine) to take advantage of all cores on a multicore machine, and to avoid long-running commands head-of-line blocking all other requests. KeyDB replaces that complex setup with one that's just a single process — and one that doesn't require clients to understand sharding, or create any barriers to using multi-key commands.
When you think about it, a single process on a single machine is a pretty "reliable backplane" as far as CAP theory goes. Its MVCC may very well break under Jepsen — hasn't been tested — but it'd be pretty hard to trigger that break, given the atomicity of the process. Whereas even if redis-cluster has a perfect Jepsen score, in practice the fact that it operates as many nodes communicating over sockets — and the fact that Redis is canonically a memory store, rather than canonically durable — means that redis-cluster can get into data-losing situations in practice for all sorts of silly reasons, like one node on the box getting OOMed, restarting, and finding that it now doesn't have enough memory to reload its AOF file.
How is scalability an issue with 142 users? I am genuinely curious.
Scaling starts being a real issue with 10,000+ users. Pretty straight forward to write a server on rust with a single machine capable of handling around 5,000 users, assuming stateless requests.
It is a Snapchat project though so it is likely at least somewhat decent compared to a random unfunded project. Maybe snap could pay Jensen to review it?
I think I'll stay far away from this thing anyway. Numerous show-stopper bug reports open and there hasn't been a substantial commit on the main branch in at least a few weeks, and possibly months. I'll be surprised if Snap is actually paying anybody to work on this.
fwiw, the company I work for has used keydb (multiple clusters) in production for years, under non-trivial load (e.g. millions of req per second). It did serve as a replacement to actual redis clusters, there were major improvements seen by the switch. I can't remember the actual gains as it was so long ago. I do remember it was an actual drop in replacement, as simple as replacing the redis binary and restarting the service. So if you can afford to, maybe try replacing a redis instance or two and see how it goes.
This should be the default mode or thinking for anything that stores your data. Databases need to be bullet proof. And that armor for me is successful Jepsen tests.
If you only used the versions of databases tested by Jepsen you would have problems worse than data loss, such as security vulnerabilities, because some tests are years old.
This doesn’t seem to be what was suggested? Using an up-to-date database that has had its distributed mechanisms tested - even if the test was a few versions back - is a lot better than something that uses completely untested bespoke algos.
As for verifying Jepsen, I’m not entirely sure what you mean? It’s a non-deterministic test suite and reports the infractions it finds; the infractions found are obviously correct to anyone in the industry that works on this stuff.
Passing a Jepsen test doesn’t prove your system is safe, and nobody involved with Jepsen has claimed that anywhere I’ve seen.
> Then, has someone independently verified the Jepsen testing framework?
I don't think it matters. Jepsen finds problems. Lots of them. It's not intended to find all the problems. But it puts the databases it tests through a real beating by exercising cases that can happen, but are perhaps unlikely (or unlikely until you've been running the thing in production for quite a while). Having an independent review does nothing, practically, to make the results of the tests better.
In fact, almost nothing gets a perfectly clean Jepsen report. Moreover, many of the problems that are found get fixed before the report goes out. The whole point is that you can see how bad the problems are and judge for yourself whether the people maintaining the project are playing fast and loose or thinking rigorously about their software. There simply isn't a "yeah this project is good to go" rubber stamp. Jepsen isn't Consumer Reports.
I certainly understand this sentiment. I am building a general-purpose data manager that has many relational database features. It can do a number of things better and much faster than conventional databases. It is currently in beta and available for free download.
But I would be shocked (and worried) if someone tried to use it as their primary database in production. It just doesn't have enough testing yet and is still missing some features.
Instead, I am promoting it as a tool to do tasks like data analysis and data cleaning. That way it gets a good workout without causing major problems if there is a bug.
Good point. I remember Redis is designed as single thread as it makes the design simpler.
Now with Rust we can actually manage complexity from multiple threads (if that's even still needed when using an async/evented/eventloop/io_uring-based architecture).
Whenever I see someone arguing their code is better because it's multithreaded, I cringe.
Most developers cannot do multithreading correctly, and unless you're particularly good about it it's just going to introduce not only lots of bugs but also performance problems.
The only folks in that space that seem to do it well are ScyllaDB.
I have been writing multithreaded code for decades and it is hard. My current database engine can break a single query into tasks to be run in parallel for much faster performance; but finding and fixing bugs is a real challenge.
All it takes is one critical section to not be protected (i.e. locked) to cause a bug. A series of tests can run hundreds of times correctly without detecting the problem. It is only when a context switch happens at a certain microsecond that the error is exposed.
I am a true believer in multithreading as my own code can see tremendous performance gains using it on the latest multi-core CPUs; but tread very carefully when programming in this manner.
That's not true. Scylla does multi-threading. Scylla is a single process, single address space. It does pin the threads to individual hyper threads but there are additional other workers in the background as well.
A) key-value is MUCH simpler to implement than an RDBMS, code wise. It’s arguably more boring and less theoretical than indices on PG or foreign key constraints
B) there’s a lot of tricky stuff with indices on PG and you generally need a DB admin from day 1
C) your comment is probably more appropriate for either layered databases or new fangled stuff like time series or graph db’s etc
> B) there’s a lot of tricky stuff with indices on PG and you generally need a DB admin from day 1
Sorry this is absolute nonsense. Any software engineer worth their salary should be comfortable working with RDBMS index concepts and interrogating their relational model to determine best practice and direction for table indexing.
This comment made me think Dragonfly is a much better choice:
"We use keydb at work, and I absolutely do NOT recommend it due to its extreme instability, in fact we're currently in the process of switching to dragonfly precisely due to keydb's instability."
We evaluated DragonflyDB for Memcache. It was repeatably orders of magnitude slower under default configurations than original Memcache, using their own benchmark setup.
Either they didn't even test their own product, lied entirely about the performance, or got the marketing department to write the copy without any input from the development department.
I see databases as a programming languages - if they have a proven track record of frequent releases and responsive authors after 5-7 years, they are usable for wider adoption.
[1] https://jepsen.io/analyses
* its been around 2 years since our last use as a paying customer. YMMV.
Deleted Comment
Conveniently, we don't have to worry about the consistency properties of KeyDB, because we're not using KeyDB distributed, and never plan to do so. We were only using redis-cluster (again, on a single machine) to take advantage of all cores on a multicore machine, and to avoid long-running commands head-of-line blocking all other requests. KeyDB replaces that complex setup with one that's just a single process — and one that doesn't require clients to understand sharding, or create any barriers to using multi-key commands.
When you think about it, a single process on a single machine is a pretty "reliable backplane" as far as CAP theory goes. Its MVCC may very well break under Jepsen — hasn't been tested — but it'd be pretty hard to trigger that break, given the atomicity of the process. Whereas even if redis-cluster has a perfect Jepsen score, in practice the fact that it operates as many nodes communicating over sockets — and the fact that Redis is canonically a memory store, rather than canonically durable — means that redis-cluster can get into data-losing situations in practice for all sorts of silly reasons, like one node on the box getting OOMed, restarting, and finding that it now doesn't have enough memory to reload its AOF file.
Scaling starts being a real issue with 10,000+ users. Pretty straight forward to write a server on rust with a single machine capable of handling around 5,000 users, assuming stateless requests.
Maybe you were making a joke and I missed it.
Dead Comment
Then, has someone independently verified the Jepsen testing framework? https://github.com/jepsen-io/jepsen/
As for verifying Jepsen, I’m not entirely sure what you mean? It’s a non-deterministic test suite and reports the infractions it finds; the infractions found are obviously correct to anyone in the industry that works on this stuff.
Passing a Jepsen test doesn’t prove your system is safe, and nobody involved with Jepsen has claimed that anywhere I’ve seen.
I don't think it matters. Jepsen finds problems. Lots of them. It's not intended to find all the problems. But it puts the databases it tests through a real beating by exercising cases that can happen, but are perhaps unlikely (or unlikely until you've been running the thing in production for quite a while). Having an independent review does nothing, practically, to make the results of the tests better.
In fact, almost nothing gets a perfectly clean Jepsen report. Moreover, many of the problems that are found get fixed before the report goes out. The whole point is that you can see how bad the problems are and judge for yourself whether the people maintaining the project are playing fast and loose or thinking rigorously about their software. There simply isn't a "yeah this project is good to go" rubber stamp. Jepsen isn't Consumer Reports.
But I would be shocked (and worried) if someone tried to use it as their primary database in production. It just doesn't have enough testing yet and is still missing some features.
Instead, I am promoting it as a tool to do tasks like data analysis and data cleaning. That way it gets a good workout without causing major problems if there is a bug.
https://didgets.com/
https://github.com/Snapchat/KeyDB/issues/465
Now with Rust we can actually manage complexity from multiple threads (if that's even still needed when using an async/evented/eventloop/io_uring-based architecture).
Dead Comment
Most developers cannot do multithreading correctly, and unless you're particularly good about it it's just going to introduce not only lots of bugs but also performance problems.
The only folks in that space that seem to do it well are ScyllaDB.
All it takes is one critical section to not be protected (i.e. locked) to cause a bug. A series of tests can run hundreds of times correctly without detecting the problem. It is only when a context switch happens at a certain microsecond that the error is exposed.
I am a true believer in multithreading as my own code can see tremendous performance gains using it on the latest multi-core CPUs; but tread very carefully when programming in this manner.
And sorry, but that is multithreading, there are several cores.
but how does this kind of multithreading (one thread per core) is better than proper multithreading (many threads per core)?
B) there’s a lot of tricky stuff with indices on PG and you generally need a DB admin from day 1
C) your comment is probably more appropriate for either layered databases or new fangled stuff like time series or graph db’s etc
Sorry this is absolute nonsense. Any software engineer worth their salary should be comfortable working with RDBMS index concepts and interrogating their relational model to determine best practice and direction for table indexing.
https://github.com/skytable/skytable
https://www.dragonflydb.io/
"We use keydb at work, and I absolutely do NOT recommend it due to its extreme instability, in fact we're currently in the process of switching to dragonfly precisely due to keydb's instability."
https://news.ycombinator.com/item?id=35990897
Either they didn't even test their own product, lied entirely about the performance, or got the marketing department to write the copy without any input from the development department.
Do you think I also photoshopped this document? https://github.com/dragonflydb/dragonfly/blob/master/docs/me...
Deleted Comment
Deleted Comment