> Both paths will allow us to ensure that each replica converges on the same state, but that alone does not guarantee that this state is the “correct” state from the point of view of user intent.
In context of richtext editors, it's easy to converge a rich-text JSON into a single state for all collaborators. What's difficult is to ensure that the converged state is renderable as richtext. For example, is there a table cell that was inserted where a column was deleted?
While I'm optimistic it's not impossible to solve, it's surprising why most CRDT literatures don't go into these details on correctness. Appreciate the author for putting this out.
PS: We currently use OT for collaboration in Zoho Writer (https://writer.zoho.com/). I'm looking out for practical CRDT ideas that works well with richtext.
Exactly, the "C" in CRDT is a little misleading. They are "conflict free" in as much as they are able to merge and resolve in the basic sense. That does not mean that the resulting state is valid or meets your internal schema.
A good example of this is using Yjs with ProseMirror, it's possible for two concurrent edits to resolve to a structure that doesn't meet your ProseMirror schema. An example I have used before is a <figure> element that can contain a single optional <caption> element. It's possible for two users to add that caption to the figure concurrently, Yjs will happily merge its XMLFragment structures and place two captions in the figure. However when this is loaded into ProseMirror it will see that the document does not match its schema and dump the second caption. Fortunately the captions are ordered and so the same state will re resolved by all parties.
This is all ok if you are doing the merging on the front end, but if you try to merge the documents on the server, not involving ProseMirror, the structure you save, and potentially index or save as html, will be incorrect.
Point is, it's still essential with CRDTs to have a schema and a validation/resolution process. That or you use completely custom CRDTs that encodes into their resolution process the validation of the schema.
> Point is, it's still essential with CRDTs to have a schema and a validation/resolution process. That or you use completely custom CRDTs that encodes into their resolution process the validation of the schema.
I'm surprised that MS's concurrent revisions [1] haven't taken off because this is what they do by default: you specify a custom merge function for any versioned data type so you can resolve conflicts using any computable strategy.
Yea, this is kinda why i don't understand a lot of these off the shelf CRDT solutions. Like CRDTs for JSON.
I'm still trying to learn CRDTs; but early on in my learning process i had the thought that CRDTs don't have mere data types - as you say, they have Data Type + Conflict Resolution bundled into one. It's not enough to make a String type, because different types need to be resolved differently. Plus some types of resolution have a lot more metadata, so you want to choose the best fitting one.
I found that my goal to make SQL + CRDT meant i had to expose this combo of Type + Resolution to the end user. It seems essential to how the app works.
But maybe i don't know anything, /shrug. I'm still learning them.
I believe we shouldn't lock our richtext state to Prosemirror's specs. The state should better be modelled as a CRDT and ensuring correctness should be part of CRDT operations.
This way we unlock other possibilities like server-side merging and the editing library can be swapped (eg. lexical)
Right, but the problem is that most programmers don't want to reason about this stuff. The danger is that things will feel like they "just work" locally (when there's no concurrent edits) and then go potentially off the rails when actual users start editing together.
We can solve this by educating developers and giving them different merge strategies to use, but I worry that most devs won't bother most of the time. So long as the merge resolution is good enough, they'll just shrug and accept it.
That said, most collaborative editing happens online. People don't really go offline while editing a google doc with other people. Its fine if two people put their cursor in the same location - they won't edit at the same time, because that would be weird.
And with deep JSON structures, so long as the structure itself can stay intact and parsable, users will usually work on different things to avoid weird editing conflicts. Imagine a giant 3D world (like an MMO) full of lots of objects and scenery. And a team working on building out that world. Artists will probably work on different things, in different areas or different parts of the terrain. Or if they're working on the same thing, they'll do it live & online.
The one big area of CRDT research I'd love to see more of is offline conflict resolution - which should be different than online conflict resolution. Git has the right idea - I want merge conflicts when things are merged together. CRDTs have a superset of the information that git has. We have all the information we need in Yjs / automerge / etc to be able to notice merge conflicts and do whatever we want with them. But as far as I know, nobody has built something like Git on top of a CRDT yet which uses this information to create merge conflicts in a sensible way.
With the CRDT I'm building, for registers I've decided to use MVRegisters (so if multiple writes happen, we keep all conflicting values and can expose that conflict state through the API). But also have a "simple API" which picks an arbitrary, consistent winner so when developers don't care (or when they're making an MVP of their app) everything will still work fine. The (discarded) conflicting values will be in the history anyway if they want to recover it later.
> Exactly, the "C" in CRDT is a little misleading. They are "conflict free" in as much as they are able to merge and resolve in the basic sense. That does not mean that the resulting state is valid or meets your internal schema.
I disagree, the "conflict" in "conflict-free" is not misleading at all. As I see it, your issue with the "conflict" lies in a misunderstanding of what a conflict actually is in the context of CRDTs and what eliminating them entails. Concurrent updates can be free from conflicts but still end up with a semantically inconsistent document, and that is perfectly fine.
The whole point of CRDTs is to allow collaborative editing without resorting to mutex locks or forcing users to retry cancelled updates. That's why Yjs merges two captions added to a figure, and also why other CRDTs don't stop concurrent updates that make no sense semantically, such as having users concurrently writing incoherent sentences. Document-wise, there is no conflict or inconsistence.
I don't think I've seen any implementations of this, but the table example you listed is solvable. The insight is to stop thinking of the table as a bundle of markup and think about it as itself a primitive CRDT type that contains text in the table cells.
The columns are a CRDT set of column IDs. The rows are a CRDT map of row IDs to CRDT maps of column IDs to cell values, where the cell values are again the same type as your document text. Column headings are a specially formatted row. Rows are ordered using a sequence type.
Removing a row is a normal CRDT map deletion. Removing a column is a normal CRDT map deletion from the columns map. Rendering the table will ignore map entries that don't correspond to real columns.
So with your solution, the inserted table cell would "disappear" from what is rendered?
That's not necessarily (depends on the product) great or even acceptable user experience. As a user I would certainly want a way to know there is an "orphaned" "edited after delete" cell to salvage the data inputted or understand the confusion between editors.
We also had the OT vs. CRDT discussion with regards to our editor at Kitemaker (kitemaker.co), and also ended up with OT. For us there were a number of factors, but (in addition to the case you mentioned) it came down to the fact that (at that time) most of the CRDT libs like automerge generate huge diffs that we'd need to send over the wire and also that the open source editor we use (SlateJS) had operations that just slotted very nicely into OT.
> What's difficult is to ensure that the converged state is renderable as richtext. For example, is there a table cell that was inserted where a column was deleted?
Yes. This is one of the fundamental limitations of working at a textual level, which is sort of the local optimum that *nix ended up in. JSON in particular gets suuuuper fucked up if you don't merge/rebase carefully. There's no real syntax for it to grab onto and diff doesn't understand the concept of indentation or commas, so it just turns into an ocean of line-swapping and incorrect block-swapping. Diff also does an excruciatingly poor job in the very common case when everyone is appending to the same area (let's say, the end of the file).
This is pretty much just an inherent weakness of textual matching, what you need is to work on trees of lexical token nodes, or some type of object structure stream like powershell. Not that it's perfect either, I'm sure there are combinations of options that can leave a node in an inconsistent state, but it at least gives you a chance at correctness.
In some cases patience-diff can help, it tries to generate big blocks of changed ranges, hopefully some of the hunks being relatively syntactically well-formed, but there is still no guarantee. There is also JSON-diff which implements such a lexical-tree diff model for diff files, similar to the "jq" util.
I think that's also viable for other lexable languages too - it would be cool to sed or find/replace on an tokenized representation of the code, have jquery or jpath style selectors etc. A lot of the time we kinda end up doing this with find/replace anyway.
The intuitively obvious approach here is to have some kind of schema-guided or constraint-guided CRDT. That is, you get the guarantee that not only do you end up with valid JSON, you get the guarantee that f(that_json) == true. I imagine this is considerably easier said than done, but I'm curious if there's research (or results!) in the area.
Parts of the tree with “don’t merge children” might be ok, then you show the user a conflict dialog when this happens.
Or, inspired by your comment: run f for the tree, if false, search for the smallest subset where f is false and show the conflict dialog for that. The user
chooses “mine” or “theirs”.
For the table situation we solve this by have a generic concept of a "card" that is rendered as a table cell when it's the grandchild of a table (table > tr > td) but otherwise it's just a standalone item, much like a markdown callout.
> A general theme of successful multiplayer approaches we’ve seen is not overcomplicating things. We’ve heard a number of companies confess that their multiplayer approach feels naive — especially compared to the academic literature on the topic — and yet it works just fine in practice.
The KISS and YAGNI principles apply.
And on another note, it seems like one big industry is often overlooked in these types of discussions. I wonder what web development could learn from online game development when it comes to "multiplayer" apps - I mean it's right there in the name.
One very pragmatic approach that strikes me as reasonable is having fixed delay changes. The game Warcraft 3 (a real time strategy game) did something like this. What it accomplishes is that changes are basically forced to be in sync by delaying them.
Note that you will get used to the delay very quickly, you won't notice it too much after a short while. It is _much_ more noticeable if the delays are variable, but a consistent delay gets basically filtered out after a short while.
There is definitely a lot we can learn from multiplayer games, but the non-game use cases often have different constraints that lead to different solutions.
With a game, it's a given that everyone is playing in real-time, right now, with as little lag as they can get away with. The synchronization happens constantly and ideally the game's internal state is completely in sync across all players. Players want the game to be kept in sync as quickly as possible so that they can react to what other players are doing. That means the amount of synchronization to perform at any one point in time is generally fairly small.
Also, the structure of the game itself often implicitly partitions the game's mutable state among the players. Players can obviously do things that affect each other and there are mutable parts of the world that can be affected by multiple players. But each player often controls their own avatar in the game and has an implicit lock on that data.
This means that there is rarely any "merging" happening. It's usually just look at the events, determine an order to apply them, and do them one at a time. If the result is a little weird in terms of state (maybe a wall is built right when another player is crossing it and the game lets the teleport through), well, it's a game. The programmers can just declare by fiat that it's part of the game mechanics.
Outside of games, a common use case is "let me work on this document offline for the duration of a flight and then resync everything when I get to the airport". There may be hundreds of changes and other users may also have huge batches of changes. There can be a very complex reconciliation and megre process because the changes are many and the data is precious. You can't just say it's "part of the game" if your multi-user spreadsheet goofs someone's tax statement.
> I wonder what web development could learn from online game development when it comes to "multiplayer" apps - I mean it's right there in the name.
100%. The architecture we used for https://plane.dev takes a lot of inspiration from how game servers work, but using HTTP/WebSocket instead of UDP.
A lot of state sharing techniques that people are now using on the web (like cursor position smoothing and optimistic updates) have decades of precedent in multiplayer games.
The intersection of state replication and pixel streaming is indeed interesting. Cloud games for example are usually using game state sync techniques, and then add in the client<>gpu latency on top of that.
A true cloud native game (sadly part of Stadia's original ambitions) would hopefully have a more e2e state management approach inclusive of the pixel latency.
This is very different from "local multiplayer" pixel streaming, which may be what companies like Switchboard[1] are doing: multiple clients control the same I/O channel (e.g. you and I share a mouse on a single remote desktop)
This is an interesting idea but I do wonder if it can be applied to generic collaborative editing "as is". The approach for synchronized ticks (I think most RTS games do something similar to the Blizzard approach) does require each participant to "ack" the tick. You can see this when one participant is lagging in that the simulation freezes for all players until they acknowledge. The online gaming approach is to let other players vote to kick someone who is lagging the game but that probably wouldn't be considered suitable in something like enterprise content management.
Can some variation of the approach be taken without the strict requirement that:
- There is a known set of participants at any time
- The list of participants has a reasonable well-bounded size
- Peers can communicate with each other directly (WebRTC is available but there are sharp corners)
- It's acceptable to freeze all players based on one lagging player
- Need an unambiguous way to handle simultaneous player actions against shared state
Most likely the "fix" for the friction is that you just slow the ticks down a lot compared to what you would do in a multi-player game.
There are probably good techniques worth stealing from FPS or MMORPGs. Those also usually run on a fixed tick rate of about 50ms, but can't afford to have clients vote on it (FPS because lag is unacceptable, MMORPGs because of user count). There are various approaches in use, depending on whether you want to just degrade the experience of the lagging participant, or give preference based on action (e.g. FPS resolve conflicts in favor of the person who fired).
Honestly, I just restarting playing World of Warcraft, and even tho there have been amazing tech improvements to Blizzard's engine and servers, there is so much simple stuff lacking.
I would honestly say it's likely better for game server developers to be taking ideas from web development.
When it comes to CRDT's everyone keeps talking about text-editing, honestly one of the hardest problems for a CRDT to solve. Text is entirely unstructured, requires extremely high precision on the merge of any conflict.
With a video game instead, players already used to "rewind and replay" merges, they are already used to having a "miss" when it should have been a "hit", and they roll with it because while that is friction it's a game and precision isn't that important. I'll say differently, precision is already not what game servers provide.
At the end of the day, game servers are "just" low-latency chat applications with "bots" that process the messages. I'd love to see the game server written by the maintainers of WhatsApp / Messenger / Slack / etc. That would be a truly scalable game!
P.S. If you think "chat is so simple, can't compare it to a game server". Please just think through the design for FB's / Discord's display widget that shows "currently active friends".
I think the general term should be multi-actor. I understand why people use the term multiplayer, I think: familiarity. But players imply play which implies games, so it's not a great fit outside of them.
The type of thing you're talking about with Warcraft is a deterministic simulation. As long as you have deterministic simulation, and you synchronize your inputs by buffering sufficiently for the input latency of all the remote participants, that ensures everything stays 1:1 between each local simulation.
If I'm understanding correctly, it requires the mutations to be deterministic in order for the nodes to converge.
Replicache (replicache.dev - my thing) takes a similar approach except it does not requires the mutations to be deterministic, which is very useful because it enables the mutations to reach out to other services, be authenticated, etc.
We have used Automerge a bunch, but found that there is a threshold where beyond a given document size, performance gets exponentially worse, until even trivial updates take many seconds' worth of CPU. That is often how it works when the document end state is exclusively the sum of all the edits that have ever happened.
Our answer was to reimplement the Automerge API with different mechanics underneath that allows for a "snapshots + recent changes" paradigm, instead of "the doc is the sum of all changes". That way performance doesn't have to degrade over time as changes accumulate.
This is an implementation problem with automerge. I wrote a blog post last year about CRDT performance. I re-ran the benchmarks a couple months ago. Automerge has improved a lot since then, but a simple benchmark test (automerge-perf[1]) still takes 200MB of RAM using automerge-rs. Yjs and Diamond types can run the same benchmark in 4mb / 2mb of ram respectively.
I've had a chat with some of the automerge people about it. They're working on it, and I've shared the techniques I'm using in diamond types (and all the code). Its just an implementation bottleneck.
Remember, "conflict free" is the opposite of "transactional" (ACID). When you choose conflict-free, you're choosing to give up the ability to reject a change due to conflict and send it back to the user for resolution. This is the crux of the issue in my opinion.
* There is no such thing as "conflict free". When users edit data disconnected from each other, they can conflict, because the user intent of the edits can be different and not possible to reconcile. Nothing can fix that, not even crdts (As a thought experiment - two users are working on a report about ferrets. While offline, one decides the report should be about lemurs, while the other decides it should be about otters. The resulting set of changes will not be mergable without conflicts in any reasonable sense.)
* What CRDTs give you is actually convergence which is very important. But there are other ways to get convergence, and it can be done transactionallym as long as your are willing to weaken the D (durability). Replicache (my thing - replicache.dev) is one way, the way described here is another.
In general game networking is also transactional in this same sense. Operations are transactions, that can affect multiple data items and enforce arbitrary invariants. But the order they apply in drifts, and so durability is sacrificed.
Stepping back even further, CRDTs are also often transactional in the same way. Each operation in an op-based crdt can be thought of a transaction that applies completely or not at all. It's just that the set of available operations in a CRDT is typically fixed.
1. So by "weaken durability" you mean basically allow committed changes to be "lost" in the case of a common conflict, aka give it up entirely?
2. CRDTs are meant to converge even in the presence of network partitions (offline first), so you're saying your committed transactions can be uncommitted? Or you're saying writers have no idea if their transaction is committed because it might converge the other way after arbitrary delay?
"Durability" means "transactions that have committed will survive permanently" – wikipedia
Agreed - conflict resolution and authority handling are key to advanced game networking techniques and underpin the architecture of many big multiplayer games.
There's actually an approach that sits in between this and formal CRDTs.
I had the same insight that having a total ordered log of changes solves a lot of the issues with concurrent updates. We solved this by creating one sequence CRDT that ensures all events from clients eventually end up in the same order and then our data structures are just reductions over those events. It's a little bit like a distributed, synchronized Redux store. This let us avoid having to think in terms of CRDT types (LWW registers, Grow-only sets, sequences, etc) and just think in terms of the domain-specific business logic (e.g. "what should happen when someone marks a todo as done, after it's deleted?").
There's a little but more to so if this is interesting to anyone, I can write a more detailed explanation.
Have you considered backing your application's state-synchronization with a centralized CQRS/ES event store (e.g. https://www.eventstore.com/)? You get the same semantics, without paying the overhead of building them on top of a CRDT abstraction; and the event store itself also "knows" (in the sense that it supplies the framework and you supply the logic) how to reduce over states in order to build snapshot states ("aggregates") for clients to download for fast initial synchronization.
We basically have that as well! We use CRDTs on the client to allow for optimistic updates and fast replication of events directly to other clients but we also do exactly what you describe on the server so that a user loading the data for the first time just gets the "snapshot" and then plays events on top of it.
So each "event" has three properties that allow us to resolve to a sensible order across clients: session (integer), device_id (uuid), order (integer). The `session` number is set by the highest `order` that your device has seen from the server and the `order` gets incremented on each new event.
So an example you might have a sequence of events like this.
[session] [order]
0 1
0 2
0 3
---sync ---
3 4
3 5
3 6
We can then sort all events by (session, device_id, order) and we'll get any events that happen on a device to be sorted in a row even if some other device created a bunch of concurrent events at the same time.
For the most part is been surprisingly solid! I was very much expecting to need to completely rewrite it by now, but its met most of our present needs.
The only real shortcoming is not being able to support line specific commenting. While we _could_ support it, any minor update to the text would result in the comment being marked 'stale' with a high false positive rate. I've considered adopting CRDT/OT just for the text editing portion to solve this.
I really loved the Signals and Thread episode[0] on state machine replication. It basically said "you could do all of these fancy distributed systems techniques that assume a non total ordering of events, ooooorr you could have a main server that determines a total ordering for events and make your life infinitely easier"
It breaks down as soon as you have an offline scenario. You can only assume the server is authorative if there will not be network partitions or if writes are not accepted on the other side of the network partition (meaning on the client). As soon as the client accepts writes while offline those writes become authorative and will need to be merged with whatever changes the server saw.
Or even just have a simple central presence server hand out randomized-per-epoch node-IDs; have all nodes in the system use node-IDs as the prefix of the event temporal ordering key per "tick"; and trust your nodes to self-report their node-ID correctly in all events they send. Then your data protocol can still be p2p rather than all events having to flow through the server (if that benefits you in your problem domain) while getting all the same advantages of total ordering.
And this works great for anything enterprise-y, where all the nodes are operated by one party or a trusted consortium of parties, and no nodes are are intentionally trying to cheat by misreporting their assigned node-IDs. You only need central servers (or virtual central servers, a.k.a. blockchains) when you get to things like MMO games, or HFT trading bots talking to an exchange, where every client has a reason to want to pretend it's the one that "should" go earliest in the per-tick transaction ordering.
But if I have a conflict (ie one node changes a column datatype, another adds data to column) how do i know which one went first inside the same tick?
If I understand you, there will be a tick between 12.01 and 12.02, and node A gets token 1234, and node B gets 5678. Events are published p2p, and the ordering is 5678_+9seconds and 1234_+12seconds - node A (1234) admits it has done the action 3 seconds after B. But this still depends on synchronised clocks I think?
Also you still need to resolve each type of conflict (hence having CRDTs really helps) - we know there is a conflict between A and B changes, but how do we resolve them?
I thought James Long's 200 or three hundred lines of code to implement a real-world CRDT solution (Actual) was pretty cool. Seemed to solve a lot of issues revolving around heavy duty servers.
> Both paths will allow us to ensure that each replica converges on the same state, but that alone does not guarantee that this state is the “correct” state from the point of view of user intent.
In context of richtext editors, it's easy to converge a rich-text JSON into a single state for all collaborators. What's difficult is to ensure that the converged state is renderable as richtext. For example, is there a table cell that was inserted where a column was deleted?
I wrote a summary of this issue with CRDTs here - https://writer.zohopublic.com/writer/published/grcwy5c699d67... for those interested.
While I'm optimistic it's not impossible to solve, it's surprising why most CRDT literatures don't go into these details on correctness. Appreciate the author for putting this out.
PS: We currently use OT for collaboration in Zoho Writer (https://writer.zoho.com/). I'm looking out for practical CRDT ideas that works well with richtext.
A good example of this is using Yjs with ProseMirror, it's possible for two concurrent edits to resolve to a structure that doesn't meet your ProseMirror schema. An example I have used before is a <figure> element that can contain a single optional <caption> element. It's possible for two users to add that caption to the figure concurrently, Yjs will happily merge its XMLFragment structures and place two captions in the figure. However when this is loaded into ProseMirror it will see that the document does not match its schema and dump the second caption. Fortunately the captions are ordered and so the same state will re resolved by all parties.
This is all ok if you are doing the merging on the front end, but if you try to merge the documents on the server, not involving ProseMirror, the structure you save, and potentially index or save as html, will be incorrect.
Point is, it's still essential with CRDTs to have a schema and a validation/resolution process. That or you use completely custom CRDTs that encodes into their resolution process the validation of the schema.
I'm surprised that MS's concurrent revisions [1] haven't taken off because this is what they do by default: you specify a custom merge function for any versioned data type so you can resolve conflicts using any computable strategy.
[1] https://www.microsoft.com/en-us/research/project/concurrent-...
I'm still trying to learn CRDTs; but early on in my learning process i had the thought that CRDTs don't have mere data types - as you say, they have Data Type + Conflict Resolution bundled into one. It's not enough to make a String type, because different types need to be resolved differently. Plus some types of resolution have a lot more metadata, so you want to choose the best fitting one.
I found that my goal to make SQL + CRDT meant i had to expose this combo of Type + Resolution to the end user. It seems essential to how the app works.
But maybe i don't know anything, /shrug. I'm still learning them.
I believe we shouldn't lock our richtext state to Prosemirror's specs. The state should better be modelled as a CRDT and ensuring correctness should be part of CRDT operations. This way we unlock other possibilities like server-side merging and the editing library can be swapped (eg. lexical)
All this sounds right but it's lot of work.
We can solve this by educating developers and giving them different merge strategies to use, but I worry that most devs won't bother most of the time. So long as the merge resolution is good enough, they'll just shrug and accept it.
That said, most collaborative editing happens online. People don't really go offline while editing a google doc with other people. Its fine if two people put their cursor in the same location - they won't edit at the same time, because that would be weird.
And with deep JSON structures, so long as the structure itself can stay intact and parsable, users will usually work on different things to avoid weird editing conflicts. Imagine a giant 3D world (like an MMO) full of lots of objects and scenery. And a team working on building out that world. Artists will probably work on different things, in different areas or different parts of the terrain. Or if they're working on the same thing, they'll do it live & online.
The one big area of CRDT research I'd love to see more of is offline conflict resolution - which should be different than online conflict resolution. Git has the right idea - I want merge conflicts when things are merged together. CRDTs have a superset of the information that git has. We have all the information we need in Yjs / automerge / etc to be able to notice merge conflicts and do whatever we want with them. But as far as I know, nobody has built something like Git on top of a CRDT yet which uses this information to create merge conflicts in a sensible way.
With the CRDT I'm building, for registers I've decided to use MVRegisters (so if multiple writes happen, we keep all conflicting values and can expose that conflict state through the API). But also have a "simple API" which picks an arbitrary, consistent winner so when developers don't care (or when they're making an MVP of their app) everything will still work fine. The (discarded) conflicting values will be in the history anyway if they want to recover it later.
1. User 1 adds a caption
2. User 2 adds a caption.
3. Both transmit to the server.
4. Server receives the first edit, applies it to the doc, broadcasts the change.
5. Server receives the second edit, sees it is applied to the previous version of the document, rejects the edit.
6. Client that receives the rejection re-bases their edit, attempts to apply it, and it is rejected as it does not conform with the schema.
So the outcome is whichever client hits the server first wins, which is fine. Where's the problem?
I disagree, the "conflict" in "conflict-free" is not misleading at all. As I see it, your issue with the "conflict" lies in a misunderstanding of what a conflict actually is in the context of CRDTs and what eliminating them entails. Concurrent updates can be free from conflicts but still end up with a semantically inconsistent document, and that is perfectly fine.
The whole point of CRDTs is to allow collaborative editing without resorting to mutex locks or forcing users to retry cancelled updates. That's why Yjs merges two captions added to a figure, and also why other CRDTs don't stop concurrent updates that make no sense semantically, such as having users concurrently writing incoherent sentences. Document-wise, there is no conflict or inconsistence.
The columns are a CRDT set of column IDs. The rows are a CRDT map of row IDs to CRDT maps of column IDs to cell values, where the cell values are again the same type as your document text. Column headings are a specially formatted row. Rows are ordered using a sequence type.
Removing a row is a normal CRDT map deletion. Removing a column is a normal CRDT map deletion from the columns map. Rendering the table will ignore map entries that don't correspond to real columns.
That's not necessarily (depends on the product) great or even acceptable user experience. As a user I would certainly want a way to know there is an "orphaned" "edited after delete" cell to salvage the data inputted or understand the confusion between editors.
Your editor feels really nice BTW. Well done!
Yes. This is one of the fundamental limitations of working at a textual level, which is sort of the local optimum that *nix ended up in. JSON in particular gets suuuuper fucked up if you don't merge/rebase carefully. There's no real syntax for it to grab onto and diff doesn't understand the concept of indentation or commas, so it just turns into an ocean of line-swapping and incorrect block-swapping. Diff also does an excruciatingly poor job in the very common case when everyone is appending to the same area (let's say, the end of the file).
This is pretty much just an inherent weakness of textual matching, what you need is to work on trees of lexical token nodes, or some type of object structure stream like powershell. Not that it's perfect either, I'm sure there are combinations of options that can leave a node in an inconsistent state, but it at least gives you a chance at correctness.
In some cases patience-diff can help, it tries to generate big blocks of changed ranges, hopefully some of the hunks being relatively syntactically well-formed, but there is still no guarantee. There is also JSON-diff which implements such a lexical-tree diff model for diff files, similar to the "jq" util.
https://github.com/zgrossbart/jdd
I think that's also viable for other lexable languages too - it would be cool to sed or find/replace on an tokenized representation of the code, have jquery or jpath style selectors etc. A lot of the time we kinda end up doing this with find/replace anyway.
Or, inspired by your comment: run f for the tree, if false, search for the smallest subset where f is false and show the conflict dialog for that. The user chooses “mine” or “theirs”.
Have you seen Peritext from Ink & Switch? https://www.inkandswitch.com/peritext/ It's relatively new, but is a CRDT aimed at rich text!
Agree it’s exciting though and you can implement it fairly easily yourself by following the blog post as it’s so well written and explained.
Deleted Comment
> A general theme of successful multiplayer approaches we’ve seen is not overcomplicating things. We’ve heard a number of companies confess that their multiplayer approach feels naive — especially compared to the academic literature on the topic — and yet it works just fine in practice.
The KISS and YAGNI principles apply.
And on another note, it seems like one big industry is often overlooked in these types of discussions. I wonder what web development could learn from online game development when it comes to "multiplayer" apps - I mean it's right there in the name.
One very pragmatic approach that strikes me as reasonable is having fixed delay changes. The game Warcraft 3 (a real time strategy game) did something like this. What it accomplishes is that changes are basically forced to be in sync by delaying them.
Note that you will get used to the delay very quickly, you won't notice it too much after a short while. It is _much_ more noticeable if the delays are variable, but a consistent delay gets basically filtered out after a short while.
With a game, it's a given that everyone is playing in real-time, right now, with as little lag as they can get away with. The synchronization happens constantly and ideally the game's internal state is completely in sync across all players. Players want the game to be kept in sync as quickly as possible so that they can react to what other players are doing. That means the amount of synchronization to perform at any one point in time is generally fairly small.
Also, the structure of the game itself often implicitly partitions the game's mutable state among the players. Players can obviously do things that affect each other and there are mutable parts of the world that can be affected by multiple players. But each player often controls their own avatar in the game and has an implicit lock on that data.
This means that there is rarely any "merging" happening. It's usually just look at the events, determine an order to apply them, and do them one at a time. If the result is a little weird in terms of state (maybe a wall is built right when another player is crossing it and the game lets the teleport through), well, it's a game. The programmers can just declare by fiat that it's part of the game mechanics.
Outside of games, a common use case is "let me work on this document offline for the duration of a flight and then resync everything when I get to the airport". There may be hundreds of changes and other users may also have huge batches of changes. There can be a very complex reconciliation and megre process because the changes are many and the data is precious. You can't just say it's "part of the game" if your multi-user spreadsheet goofs someone's tax statement.
100%. The architecture we used for https://plane.dev takes a lot of inspiration from how game servers work, but using HTTP/WebSocket instead of UDP.
A lot of state sharing techniques that people are now using on the web (like cursor position smoothing and optimistic updates) have decades of precedent in multiplayer games.
We’ve also seen more apps start to use an approach to pixel streaming which more closely resembles stadia than traditional VNC. https://digest.browsertech.com/archive/browsertech-digest-wo...
A true cloud native game (sadly part of Stadia's original ambitions) would hopefully have a more e2e state management approach inclusive of the pixel latency.
This is very different from "local multiplayer" pixel streaming, which may be what companies like Switchboard[1] are doing: multiple clients control the same I/O channel (e.g. you and I share a mouse on a single remote desktop)
[1] https://www.switchboard.app/
I think websockets run over UDP when hosted on http/3.
Can some variation of the approach be taken without the strict requirement that: - There is a known set of participants at any time - The list of participants has a reasonable well-bounded size - Peers can communicate with each other directly (WebRTC is available but there are sharp corners) - It's acceptable to freeze all players based on one lagging player - Need an unambiguous way to handle simultaneous player actions against shared state
Most likely the "fix" for the friction is that you just slow the ticks down a lot compared to what you would do in a multi-player game.
I would honestly say it's likely better for game server developers to be taking ideas from web development.
When it comes to CRDT's everyone keeps talking about text-editing, honestly one of the hardest problems for a CRDT to solve. Text is entirely unstructured, requires extremely high precision on the merge of any conflict.
With a video game instead, players already used to "rewind and replay" merges, they are already used to having a "miss" when it should have been a "hit", and they roll with it because while that is friction it's a game and precision isn't that important. I'll say differently, precision is already not what game servers provide.
At the end of the day, game servers are "just" low-latency chat applications with "bots" that process the messages. I'd love to see the game server written by the maintainers of WhatsApp / Messenger / Slack / etc. That would be a truly scalable game!
P.S. If you think "chat is so simple, can't compare it to a game server". Please just think through the design for FB's / Discord's display widget that shows "currently active friends".
The type of thing you're talking about with Warcraft is a deterministic simulation. As long as you have deterministic simulation, and you synchronize your inputs by buffering sufficiently for the input latency of all the remote participants, that ensures everything stays 1:1 between each local simulation.
If I'm understanding correctly, it requires the mutations to be deterministic in order for the nodes to converge.
Replicache (replicache.dev - my thing) takes a similar approach except it does not requires the mutations to be deterministic, which is very useful because it enables the mutations to reach out to other services, be authenticated, etc.
Both the idea here and Replicache's approach are closely related to game networking. If you are interested in these ideas, a really excellent set of content is: https://www.gabrielgambetta.com/client-server-game-architect....
If you are interested in how Replicache's sync protocol works and achieves transactional changes with server authority, it is described here: https://doc.replicache.dev/concepts/how-it-works.
Our answer was to reimplement the Automerge API with different mechanics underneath that allows for a "snapshots + recent changes" paradigm, instead of "the doc is the sum of all changes". That way performance doesn't have to degrade over time as changes accumulate.
Project is here: https://github.com/frameable/pigeon, with some benchmarks: https://github.com/frameable/pigeon/wiki/Benchmarks in the wiki...
I've had a chat with some of the automerge people about it. They're working on it, and I've shared the techniques I'm using in diamond types (and all the code). Its just an implementation bottleneck.
[1] https://github.com/automerge/automerge-perf/
* There is no such thing as "conflict free". When users edit data disconnected from each other, they can conflict, because the user intent of the edits can be different and not possible to reconcile. Nothing can fix that, not even crdts (As a thought experiment - two users are working on a report about ferrets. While offline, one decides the report should be about lemurs, while the other decides it should be about otters. The resulting set of changes will not be mergable without conflicts in any reasonable sense.)
* What CRDTs give you is actually convergence which is very important. But there are other ways to get convergence, and it can be done transactionallym as long as your are willing to weaken the D (durability). Replicache (my thing - replicache.dev) is one way, the way described here is another.
In general game networking is also transactional in this same sense. Operations are transactions, that can affect multiple data items and enforce arbitrary invariants. But the order they apply in drifts, and so durability is sacrificed.
Stepping back even further, CRDTs are also often transactional in the same way. Each operation in an op-based crdt can be thought of a transaction that applies completely or not at all. It's just that the set of available operations in a CRDT is typically fixed.
1. So by "weaken durability" you mean basically allow committed changes to be "lost" in the case of a common conflict, aka give it up entirely?
2. CRDTs are meant to converge even in the presence of network partitions (offline first), so you're saying your committed transactions can be uncommitted? Or you're saying writers have no idea if their transaction is committed because it might converge the other way after arbitrary delay?
"Durability" means "transactions that have committed will survive permanently" – wikipedia
I had the same insight that having a total ordered log of changes solves a lot of the issues with concurrent updates. We solved this by creating one sequence CRDT that ensures all events from clients eventually end up in the same order and then our data structures are just reductions over those events. It's a little bit like a distributed, synchronized Redux store. This let us avoid having to think in terms of CRDT types (LWW registers, Grow-only sets, sequences, etc) and just think in terms of the domain-specific business logic (e.g. "what should happen when someone marks a todo as done, after it's deleted?").
There's a little but more to so if this is interesting to anyone, I can write a more detailed explanation.
Ties in nicely with event sourcing - the ordered sequence of events is the CRDT, and sending events between nodes are the delta states.
So each "event" has three properties that allow us to resolve to a sensible order across clients: session (integer), device_id (uuid), order (integer). The `session` number is set by the highest `order` that your device has seen from the server and the `order` gets incremented on each new event.
So an example you might have a sequence of events like this. [session] [order] 0 1 0 2 0 3 ---sync --- 3 4 3 5 3 6
We can then sort all events by (session, device_id, order) and we'll get any events that happen on a device to be sorted in a row even if some other device created a bunch of concurrent events at the same time.
Our team ended up having a similar take away, we wrote a blog post detailing our approach and experience here: https://hex.tech/blog/a-pragmatic-approach-to-live-collabora...
For the most part is been surprisingly solid! I was very much expecting to need to completely rewrite it by now, but its met most of our present needs.
The only real shortcoming is not being able to support line specific commenting. While we _could_ support it, any minor update to the text would result in the comment being marked 'stale' with a high false positive rate. I've considered adopting CRDT/OT just for the text editing portion to solve this.
[0]: https://signalsandthreads.com/state-machine-replication-and-...
And this works great for anything enterprise-y, where all the nodes are operated by one party or a trusted consortium of parties, and no nodes are are intentionally trying to cheat by misreporting their assigned node-IDs. You only need central servers (or virtual central servers, a.k.a. blockchains) when you get to things like MMO games, or HFT trading bots talking to an exchange, where every client has a reason to want to pretend it's the one that "should" go earliest in the per-tick transaction ordering.
If I understand you, there will be a tick between 12.01 and 12.02, and node A gets token 1234, and node B gets 5678. Events are published p2p, and the ordering is 5678_+9seconds and 1234_+12seconds - node A (1234) admits it has done the action 3 seconds after B. But this still depends on synchronised clocks I think?
Also you still need to resolve each type of conflict (hence having CRDTs really helps) - we know there is a conflict between A and B changes, but how do we resolve them?
Maybe I dont get it :-)
Deleted Comment