Computation is a physical process and any model we use to build or describe this process is imposed by us. Whether this model should include the concept of data (and their counterparts "functions") is really the question here. While I don't think the data/function concept is essential to modeling computation, I also have a hard time diverging too far from these ideas because that is all I have seen for decades. I believe Kay is challenging us to explore the space of other concepts that can model computation.
* It can be used as schema-less
* allows attaching metadata tags to values (which can serve as type hints[1]), and
* encodes blobs efficiently
I have not used it, but in the space of flexible formats it appears to have other interesting properties. For instance it can encode a symbol table making symbols really compact in the rest of the message. Symbol tables can be shared out of band.
[1] https://amazon-ion.github.io/ion-docs/docs/spec.html#annot
* https://en.wikipedia.org/wiki/A_Symbolic_Analysis_of_Relay_a...
Do you handle the case where the actual objects don't overlap but result of an aggregate query is still affected? For instance a `count(*) where ..` query is affected by an insert.
I've been calling it the Lots of Little Databases model vs the Globe Spanning Gorilla.
Like the Spanner paper points out, even if your distributed database semantically appears like a single giant instance, in practice performance means developers avoid using distributed joins, etc, because these can lead to shuffling very large amounts of intermediate results across the network. So the illusion of being on a single giant machine ends up leaking through the reality, and people end up writing workarounds for distributed joins like async materialization.
If we give up the single machine illusion we get a lot of simplification, at the cost of features devs were unlikely to use anyhow. I see having consistent distributed commit but without cross shard joins as a really interesting alternative.
And besides scalability I like the extra security rope of fine grained partitioning from the start.
I'll write a blog post along these lines if I get anything worthwhile done.
It seems like they've (hopefully only temporarily) given up real transactional support with their horizontal postgres scheme?
The other side of something like Spanner is the quorum-based latency is often optimized by adding another cache on top, which instantly defeats the original consistency guarantees. The consistency of (spanner+my_cache) is not the same as the consistency of spanner. So if we're back to app level consistency guarantees anyway, turns out the "managed" solution is only partial.
Ideally the managed db systems would have flexible consistency, allowing me to configure not just which object sets need consistency but also letting me configure caches with lag tolerance. This would let me choose trade-offs without having to implement consistent caching and other optimization tricks on top of globally consistent/serializable databases.
You may find other interesting articles linked from here:https://en.wikipedia.org/wiki/Jamie_Zawinski, eg https://www.jwz.org/gruntle/nomo.html
What could be useful here is if postgres provided a way to determine the latest frozen uuid. This could be a few ms behind the last committed uuid but should guarantee that no new rows will land before the frozen uuid. Then we can use a single cursor track previously seen.