The big benefit of eg-walker is that you don't need to load any history from disk to be able to do collaborative editing. There's no need to keep around and load the whole history of a document to be able to merge changes and send edits to other peers. Its also much faster in most editing situations - though modern optimizations mean text based CRDTs are crazy fast now anyway.
The downside is that eg-walker is more complex to implement. Compare - this "from scratch" traditional CRDT implementation of FugueMax:
https://github.com/josephg/crdt-from-scratch/blob/master/crd...
With the same ordering algorithm implemented on top of egwalker:
https://github.com/josephg/egwalker-from-scratch/blob/master...
Eg-walker takes about twice as much code. In this case, ~600 lines instead of 300. Its more complex, but its not crazy. It also embeds a traditional CRDT inside the algorithm. If you want to understand eg-walker, you should start with fuguemax anyway.
- Using a b-tree instead of an array to store data
- Use internal run-length encoding. Humans usually type in runs of characters. So store runs of operations instead of individual operations. (Eg {insert "abc", pos 0} instead of [{insert "a", pos 0}, {insert "b" pos 1}, {insert "c" pos 2}]).
But these two ideas also affect one another. Its not enough to just use a b-tree. You need a b-tree which also stores runs. And you also need to be able to insert in the middle of a run. And so on. You need some custom collections.
If you do run-length encoding properly, all iteration throughout your code needs to make use of the compressed runs. If any part of the code works character-by-character, it'll become a bottleneck. Oh and did I mention that it works even better if you use columnar encoding, and break the data up into a bunch of small arrays? Yeahhhh.
So thats why diamond types - my optimized egwalker implementation - is tens of thousands of lines of code instead of a few hundred. (Though in my defence, it also includes custom binary serialization, testing, wasm bindings, and so on.)
Rust makes the implementation way easier to implement thanks to traits. I have simple traits for data that can be losslessly compressed into runs[1]. A whole bunch of code takes advantage of that, by providing tooling that can work with a wide variety of actual data. For example, I have a custom vec wrapper that automatically compresses items when you call push(). I have a "zip" iterator which glues together other iterators over run-length encoded data. And so on. Its great.
Though now that I think about it, maybe all that trait foo is what makes it headache inducing. I swear its worth it.
[1] Eg MergableSpan: https://github.com/josephg/diamond-types/blob/00f722d6ebdc9f...