- Label each text character with a globally unique ID (e.g., a UUID), so that we can refer to it in a consistent way across time - instead of using an array index that changes constantly.
- Clients send the server “insert after” operations that reference an existing ID. The server looks up the target ID and inserts the new characters immediately after it.
- Deletion hides a character for display purposes, but it is still kept for "insert after" position purposes.
This might have potential outside text editing. Game world synchronization, maybe.
Is this really that novel? I mean using a central process for serializing a distributed system is like a no brainer -- didn't we start off from here originally? -- until you have to worry about network partitions, and CAP and all that jazz. You also now have a single point of failure. Also I skimmed the thing but was performance discussed?
> Is this really that novel? I mean using a central process for serializing a distributed system is like a no brainer -- didn't we start off from here originally?
Yes. This article reads like the adage "a month in a lab saves you an hour in a library".
then the following two documents are both valid final states:
abc
acb
That's fine as long as you have an authoritative server that observes all events in a single order and a way to unwind misordered local state, but it means that it's not a CRDT.
I think you could batch these things though. You just need a slightly more advanced protocol. If B is the first id, and L is the last, create range(B, L) as R and insert R after L (assuming Ctrl-v a second time).
It wasn’t really that we wanted to have a CRDT per se, but as it was implemented on top of an op-based append-only log CRDT, it turned out to hold those properties, which makes it a CRDT. We wanted to have the edits to be able to arrive in any order or after delays due to network partitions (this was for a p2p network).
Surprised to see no discussion of other data structures like dicts/maps, or arrays of arbitrary type. Hopefully they'd be a straightforward extension. IME, apps need collaborative data structures more often than they need pure collaborative text editing.
The motivating examples (update validation, partial loading, higher-level operations) are interesting, but I don't see a strong case that the reason Yjs etc. lack these features is the underlying CRDT implementation, as opposed to these features being intrinsically difficult to build.
> Surprised to see no discussion of other data structures like dicts/maps, or arrays of arbitrary type. Hopefully they'd be a straightforward extension. IME, apps need collaborative data structures more often than they need pure collaborative text editing.
Totally agree. I guess an array of "atomic" objects, where the properties of the objects can't be changed can be done just by replacing the string with your own type. Changes inside of the object is probably trickier, but maybe it's just about efficiently storing/traversing the tree?
I've also always thoguth it should be possible to create something where the consumer of the helper library (per OP terminology) can hook in their own lightweight "semantic model" logic, to prevent/manage invalid states. A todo item can't both have isDone: true and state: inProgress at the same time. Similar to rich text formatting semantics mentioned in the linked article.
CRDTs essentially work by deterministically picking one side when a conflict arises. The issue is that in general this does not guarantee the lack of data loss nor data being valid (you can resolve the conflict between two pieces of valid data and get invalid data as a result).
Imagine if every git merge conflict you got was resolved automatically by picking one side. Most of the time it would do the wrong thing, sometimes even leading to code that fails to compile. Imagine then you were not there ready to fix the issue, it would lead to even more chaotic results!
This is why CRDTs are not more widespread, because they only fix the problem you think you have, not the problem you actually have, which is to fix conflicts in a way that preserves data, its validity and meaning.
And arguably they make this issue even worse because they restrict the ways you can solve these conflicts to only those that can be replicated deterministically.
> This is why CRDTs are not more widespread, because they only fix the problem you think you have, not the problem you actually have, which is to fix conflicts in a way that preserves data, its validity and meaning.
I’ve been saying this for years, but there’s no reason you couldn’t make a crdt which emitted conflict ranges like git does. CRDTs have strictly more information than git when merging branches. It should be pretty easy to make a crdt which has a “merge and emit conflicts” mode for merging branches. It’s just nobody has implemented it yet.
(At this point I’ve been saying this for about 5 years. Maybe I need to finally code this up if only to demonstrate it)
Ok, so the main point that makes it different from CRDTs seems to be: if you have a central server, let the server do the synchronization (fixing an order among concurrent events), and not the data structure itself via an a-priori order.
Because all communication is between client and server, and never between clients, when the client connects to the server, the server can make sure that it first processes all of the client's local operations before sending it new remote updates.
As others in the comments argue, this is technically a CRDT (though a fully general one); also, undoing/replaying ops is itself non-trivial to implement. However, I hope this is still simpler than using a traditional CRDT/OT for each data type.
> Is the take-home message of the post that the full complexity of CRDTs/OT is necessary only in the absence of a central server?
That's the whole point of CRDTs: multiple replicas of the same data structure are managed throughout many nodes, each replica is updated independently, and they all eventually converge.
Some OTs do, some don't. OTs with the TP2 property do not require a central authority to order edits I believe.
In my experience if you are persisting your edits or document state, you have something that creates an ordering anyways. That thing is commonly an OLTP database. OLTPs are optimized for this kind of write-heavy workload and there's a lot of existing work on how to optimize them further.
I'm not an expert on this, but the main difference with a CRDT like Automerge seems to be the server reconciliation. See for example this article [1]. Automerge handles concurrent insertions by using a sequence number and relying on an agreed ordering of agent ids when insertions are concurrent, while this scheme relies on the server to handle them in the order they come in.
The article mentions this:
> This contrasts with text-editing CRDTs, in which IDs are ordered for you by a fancy algorithm. That ordering algorithm is what differs between the numerous text-editing CRDTs, and it’s the complicated part of any CRDT paper; we get to avoid it entirely.
I can buy the idea that many apps have a central server anyway, so you can avoid the "fancy algorithm", though the server reconciliation requires undoing and replaying of local edits, so it's not 100% clear to me if that's much simpler.
I believe that Automerge internally stores all operations in an eventually consistent total order, which you can use as a substitute for the server in server reconciliation (cf. https://mattweidner.com/2025/05/21/text-without-crdts.html#d...). However, Automerge does not actually do that - instead it processes text operations using a traditional CRDT, RGA, probably because (as you point out) undoing and replaying ops is not trivial to implement.
It's very appealing as a kind of irreducible complexity. It feels close to what is actually happening and is simple hahah if like you say not optimized.
Use of server reconciliation makes me think client-side reconciliation would be tricky… how do you preserve smooth editor UX while applying server updates as they arrive?
For example, if your client-sent request to insert a character fails, do you just retry the request? What if an update arrived in the intervening time? (Edit: they acknowledge this case in the “Client-Side” section, the proposal is to rewind and replay, and a simpler proposal to block until the pending queue is flushed)
From a frontend vantage I feel like there may be a long tail of underspecified UI/UX edge cases, such that CRDT would be simpler overall. And how does the editor feel to use while riding the NYC subway where coverage is spotty?
Both ProseMirror and the newer version of CodeMirror have a pretty elegant solution to this: each modification to the document is modeled as a step that keeps track of indices instead of node/text identities, and uses a data structure called a "position map" that the buffered steps can be mapped through to create steps with updated positions, which are then applied to your document.
In practice, it works quite well. Here's more info:
- Label each text character with a globally unique ID (e.g., a UUID), so that we can refer to it in a consistent way across time - instead of using an array index that changes constantly.
- Clients send the server “insert after” operations that reference an existing ID. The server looks up the target ID and inserts the new characters immediately after it.
- Deletion hides a character for display purposes, but it is still kept for "insert after" position purposes.
This might have potential outside text editing. Game world synchronization, maybe.
Yes. This article reads like the adage "a month in a lab saves you an hour in a library".
:)
ctrl+x
ctrl+v
Good luck
I think you could batch these things though. You just need a slightly more advanced protocol. If B is the first id, and L is the last, create range(B, L) as R and insert R after L (assuming Ctrl-v a second time).
I implemented this in a decentralized context and as a CRDT though, so that the properties hold (commutative, idempotent and associative).
The motivating examples (update validation, partial loading, higher-level operations) are interesting, but I don't see a strong case that the reason Yjs etc. lack these features is the underlying CRDT implementation, as opposed to these features being intrinsically difficult to build.
Totally agree. I guess an array of "atomic" objects, where the properties of the objects can't be changed can be done just by replacing the string with your own type. Changes inside of the object is probably trickier, but maybe it's just about efficiently storing/traversing the tree?
I've also always thoguth it should be possible to create something where the consumer of the helper library (per OP terminology) can hook in their own lightweight "semantic model" logic, to prevent/manage invalid states. A todo item can't both have isDone: true and state: inProgress at the same time. Similar to rich text formatting semantics mentioned in the linked article.
Imagine if every git merge conflict you got was resolved automatically by picking one side. Most of the time it would do the wrong thing, sometimes even leading to code that fails to compile. Imagine then you were not there ready to fix the issue, it would lead to even more chaotic results!
This is why CRDTs are not more widespread, because they only fix the problem you think you have, not the problem you actually have, which is to fix conflicts in a way that preserves data, its validity and meaning.
And arguably they make this issue even worse because they restrict the ways you can solve these conflicts to only those that can be replicated deterministically.
I’ve been saying this for years, but there’s no reason you couldn’t make a crdt which emitted conflict ranges like git does. CRDTs have strictly more information than git when merging branches. It should be pretty easy to make a crdt which has a “merge and emit conflicts” mode for merging branches. It’s just nobody has implemented it yet.
(At this point I’ve been saying this for about 5 years. Maybe I need to finally code this up if only to demonstrate it)
Because all communication is between client and server, and never between clients, when the client connects to the server, the server can make sure that it first processes all of the client's local operations before sending it new remote updates.
As others in the comments argue, this is technically a CRDT (though a fully general one); also, undoing/replaying ops is itself non-trivial to implement. However, I hope this is still simpler than using a traditional CRDT/OT for each data type.
That's the whole point of CRDTs: multiple replicas of the same data structure are managed throughout many nodes, each replica is updated independently, and they all eventually converge.
In my experience if you are persisting your edits or document state, you have something that creates an ordering anyways. That thing is commonly an OLTP database. OLTPs are optimized for this kind of write-heavy workload and there's a lot of existing work on how to optimize them further.
But now even S3 has PUT-IF, so you could use that to create an ordering. https://docs.aws.amazon.com/AmazonS3/latest/userguide/condit...
The article mentions this:
> This contrasts with text-editing CRDTs, in which IDs are ordered for you by a fancy algorithm. That ordering algorithm is what differs between the numerous text-editing CRDTs, and it’s the complicated part of any CRDT paper; we get to avoid it entirely.
I can buy the idea that many apps have a central server anyway, so you can avoid the "fancy algorithm", though the server reconciliation requires undoing and replaying of local edits, so it's not 100% clear to me if that's much simpler.
[1] https://josephg.com/blog/crdts-go-brrr/
For example, if your client-sent request to insert a character fails, do you just retry the request? What if an update arrived in the intervening time? (Edit: they acknowledge this case in the “Client-Side” section, the proposal is to rewind and replay, and a simpler proposal to block until the pending queue is flushed)
From a frontend vantage I feel like there may be a long tail of underspecified UI/UX edge cases, such that CRDT would be simpler overall. And how does the editor feel to use while riding the NYC subway where coverage is spotty?
In practice, it works quite well. Here's more info:
https://marijnhaverbeke.nl/blog/collaborative-editing.htmlhttps://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...