shalabhc (u/shalabhc)

shalabhc commented on Postgres UUIDv7 and per-back end monotonicity brandur.org/fragments/uui... · Posted by u/craigkerstiens

mmerickel · 8 months ago

Remember even if timestamps may be generated using a monotonically increasing value that does not mean they were committed in the same order to the database. It is an entirely separate problem if you are trying to actually determine what rows are "new" versus "previously seen" for things like cursor-based APIs and background job processing. This problem exists even with things like a serial/autoincrement primary key.

shalabhc · 8 months ago

+1

What could be useful here is if postgres provided a way to determine the latest frozen uuid. This could be a few ms behind the last committed uuid but should guarantee that no new rows will land before the frozen uuid. Then we can use a single cursor track previously seen.

shalabhc commented on Pex: A tool for generating .pex (Python EXecutable) files, lock files and venvs github.com/pex-tool/pex... · Posted by u/eamag

shalabhc · 10 months ago

We have been using pex to deploy code to running docker containers in ECS to avoid the cold start delay. Cuts down the iteration loop time for development significantly.

https://dagster.io/blog/fast-deploys-with-pex-and-docker

shalabhc commented on What If Data Is a Bad Idea? schmud.de/posts/2024-08-1... · Posted by u/surprisetalk

shalabhc · a year ago

The question is whether the concept of data is essential to how we structure computation.

Computation is a physical process and any model we use to build or describe this process is imposed by us. Whether this model should include the concept of data (and their counterparts "functions") is really the question here. While I don't think the data/function concept is essential to modeling computation, I also have a hard time diverging too far from these ideas because that is all I have seen for decades. I believe Kay is challenging us to explore the space of other concepts that can model computation.

shalabhc commented on Thoughts on Canonical S-Expressions (2019) write.emacsen.net/thought... · Posted by u/shoggouth

shalabhc · a year ago

I would suggest the author also look at Amazon Ion:

* It can be used as schema-less

* allows attaching metadata tags to values (which can serve as type hints[1]), and

* encodes blobs efficiently

I have not used it, but in the space of flexible formats it appears to have other interesting properties. For instance it can encode a symbol table making symbols really compact in the rest of the message. Symbol tables can be shared out of band.

[1] https://amazon-ion.github.io/ion-docs/docs/spec.html#annot

shalabhc commented on A Mathematical Theory of Communication [pdf] people.math.harvard.edu/~... · Posted by u/luu

shalabhc · a year ago

While well known for this paper and "information theory", Shannon's master's thesis* is worth checking out as well. It demonstrated some equivalence between electrical circuits and boolean algebra, and was one of the key ideas that enabled digital computers.

* https://en.wikipedia.org/wiki/A_Symbolic_Analysis_of_Relay_a...

shalabhc commented on How Convex Works stack.convex.dev/how-conv... · Posted by u/tim_sw

shalabhc · a year ago

> Instead, we can check whether any of the writes between the begin timestamp and commit timestamp overlap with our transaction’s read set.

Do you handle the case where the actual objects don't overlap but result of an aggregate query is still affected? For instance a `count(*) where ..` query is affected by an insert.

shalabhc commented on How Figma's databases team lived to tell the scale figma.com/blog/how-figmas... · Posted by u/pinser98

jasonwatkinspdx · a year ago

I think there's quite a few people chasing similar ideas, like Azure's Durable Entities.

I've been calling it the Lots of Little Databases model vs the Globe Spanning Gorilla.

Like the Spanner paper points out, even if your distributed database semantically appears like a single giant instance, in practice performance means developers avoid using distributed joins, etc, because these can lead to shuffling very large amounts of intermediate results across the network. So the illusion of being on a single giant machine ends up leaking through the reality, and people end up writing workarounds for distributed joins like async materialization.

If we give up the single machine illusion we get a lot of simplification, at the cost of features devs were unlikely to use anyhow. I see having consistent distributed commit but without cross shard joins as a really interesting alternative.

And besides scalability I like the extra security rope of fine grained partitioning from the start.

I'll write a blog post along these lines if I get anything worthwhile done.

shalabhc · a year ago

"Lots of Little Databases" reminded me of https://www.actordb.com/ which does lots of server-side sqlite instances, but the project now looks defunct.

shalabhc commented on How Figma's databases team lived to tell the scale figma.com/blog/how-figmas... · Posted by u/pinser98

nosefrog · a year ago

Coming from Google, where Spanner is this magical technology that supports infinite horizontal sharding with transactions and has become the standard storage engine for everything at Google (almost every project not using Spanner was moving to Spanner), I'm curious how Figma evaluated Cloud Spanner. Cloud Spanner does have a postgres translation layer, though I don't know how well it works.

It seems like they've (hopefully only temporarily) given up real transactional support with their horizontal postgres scheme?

shalabhc · a year ago

Global consistency is expensive, both latency-wise and cost-wise. In reality most apps don't need global serializability across all objects. For instance, you probably don't need serializability across different tenants, organizations, workspaces, etc. Spanner provides serializability across all objects IIUC - so you pay for it whether you need it or not.

The other side of something like Spanner is the quorum-based latency is often optimized by adding another cache on top, which instantly defeats the original consistency guarantees. The consistency of (spanner+my_cache) is not the same as the consistency of spanner. So if we're back to app level consistency guarantees anyway, turns out the "managed" solution is only partial.

Ideally the managed db systems would have flexible consistency, allowing me to configure not just which object sets need consistency but also letting me configure caches with lag tolerance. This would let me choose trade-offs without having to implement consistent caching and other optimization tricks on top of globally consistent/serializable databases.

shalabhc commented on Firefox 1.0 New York Times ad (2004) scribd.com/document/39351... · Posted by u/tones411

SOLAR_FIELDS · 2 years ago

Can anyone who was kinda close tell a story about Netscape and Firefox? I was just a kid back then and it would be nice to hear a story from the olden days about the before of Navigator and the after of Firefox

shalabhc · 2 years ago

Check out jwz's blog posts, eg https://www.jwz.org/blog/2023/01/mozilla-orgs-25th-anniversa...

You may find other interesting articles linked from here:https://en.wikipedia.org/wiki/Jamie_Zawinski, eg https://www.jwz.org/gruntle/nomo.html