Readit News logoReadit News
kingkilr · a year ago
I would strongly implore people not to follow the example this post suggests, and write code that relies on this monotonicity.

The reason for this is simple: the documentation doesn't promise this property. Moreover, even if it did, the RFC for UUIDv7 doesn't promise this property. If you decide to depend on it, you're setting yourself up for a bad time when PostgreSQL decides to change their implementation strategy, or you move to a different database.

Further, the stated motivations for this, to slightly simplify testing code, are massively under-motivating. Saving a single line of code can hardly be said to be worth it, but even if it were, this is a problem far better solved by simply writing a function that will both generate the objects and sort them.

As a profession, I strongly feel we need to do a better job orienting ourselves to the reality that our code has a tendency to live for a long time, and we need to optimize not for "how quickly can I type it", but "what will this code cost over its lifetime".

throw0101c · a year ago
> […] code that relies on this monotonicity. The reason for this is simple: the documentation doesn't promise this property. Moreover, even if it did, the RFC for UUIDv7 doesn't promise this property.

The "RFC for UUIDv7", RFC 9562, explicitly mentions monotonicity in §6.2 ("Monotonicity and Counters"):

    Monotonicity (each subsequent value being greater than the last) is 
    the backbone of time-based sortable UUIDs. Normally, time-based UUIDs 
    from this document will be monotonic due to an embedded timestamp; 
    however, implementations can guarantee additional monotonicity via 
    the concepts covered in this section.
* https://datatracker.ietf.org/doc/html/rfc9562#name-monotonic...

In the UUIDv7 definition (§5.7) it explicitly mentions the technique that Postgres employs for rand_a:

    rand_a:
        12 bits of pseudorandom data to provide uniqueness as per
        Section 6.9 and/or optional constructs to guarantee additional 
        monotonicity as per Section 6.2. Occupies bits 52 through 63 
        (octets 6-7).
* https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-vers...

Note: "optional constructs to guarantee additional monotonicity". Pg makes use of that option.

stonemetal12 · a year ago
>explicitly mentions monotonicity

>optional constructs

So it is explicitly mentioned in the RFC as optional, and Pg doesn't state that they guaranty that option. The point still stands, depending on optional behavior is a recipe for failure when the option is no longer taken.

peterldowns · a year ago
The test should do a set comparison, not an ordered list comparison, if it wants to check that the same 5 accounts were returned by the API. I think it's as simple as that.

The blogpost is interesting and I appreciated learning the details of how the UUIDv7 implementation works.

vips7L · a year ago
Don’t you think that depends on what you’re guaranteeing in your api? If you’re guaranteeing that your api returns the accounts ordered you need to test for that. But I do agree in general that using a set is the correct move.
sedatk · a year ago
As a counter-argument, it will inevitably turn into a spec if it becomes widely-used enough.

What was that saying, like: “every behavior of software eventually becomes API”

tomstuart · a year ago
the8472 · a year ago
Consider the incentives you're setting up there. An API contract goes both ways, the vendor promises some things and not others to preserve flexibility, and the user has to abide by it to not get broken in the future. If you unilaterally ignore the contract, even plan to do so in advance, then eventually kindness and capacity to accommodate such abuse will run might run out and they may switch to an adversarial stance. See QUIC for example which is a big middle finger to middle boxes.
drbojingle · a year ago
In enterprise land. In proof of concept land that's not quite true (but does become true if the concept works)
StackTopherFlow · a year ago
I agree, optimizing for readability and maintainability is almost always the right choice.
paulddraper · a year ago
> Moreover, even if it did, the RFC for UUIDv7 doesn't promise this property.

Huh?

If the docs were to guarantee it, they guarantee it. Why are you looking for everything to be part of RFC UUIDv7?

Failure of logic.

fwip · a year ago
Their next sentence explains. Other databases might not make that guarantee, including future versions of Postgres.
3eb7988a1663 · a year ago
I too am missing the win on this. It is breaking the spec, and does not seem like it offers a significant advantage. In the eventual event where you have a collection of UUID7 you are only ever going to be able to rely on the millisecond precision anyway.
sbuttgereit · a year ago
You say it's breaking the spec, but is it?

From https://www.rfc-editor.org/rfc/rfc9562.html#name-uuid-versio...:

"UUIDv7 values are created by allocating a Unix timestamp in milliseconds in the most significant 48 bits and filling the remaining 74 bits, excluding the required version and variant bits, with random bits for each new UUIDv7 generated to provide uniqueness as per Section 6.9. Alternatively, implementations MAY fill the 74 bits, jointly, with a combination of the following subfields, in this order from the most significant bits to the least, to guarantee additional monotonicity within a millisecond:

   1.  An OPTIONAL sub-millisecond timestamp fraction (12 bits at
       maximum) as per Section 6.2 (Method 3).

   2.  An OPTIONAL carefully seeded counter as per Section 6.2 (Method 1
       or 2).

   3.  Random data for each new UUIDv7 generated for any remaining
       space."
Which the referenced "method 3" is:

"Replace Leftmost Random Bits with Increased Clock Precision (Method 3):

For UUIDv7, which has millisecond timestamp precision, it is possible to use additional clock precision available on the system to substitute for up to 12 random bits immediately following the timestamp. This can provide values that are time ordered with sub-millisecond precision, using however many bits are appropriate in the implementation environment. With this method, the additional time precision bits MUST follow the timestamp as the next available bit in the rand_a field for UUIDv7."

throw0101c · a year ago
> It is breaking the spec […]

As per a sibling comment, it is not breaking the spec. The comment in the Pg code even cites the spec that says what to do (and is quoted in the post):

     * Generate UUID version 7 per RFC 9562, with the given timestamp.
     *
     * UUID version 7 consists of a Unix timestamp in milliseconds (48
     * bits) and 74 random bits, excluding the required version and
     * variant bits. To ensure monotonicity in scenarios of high-
     * frequency UUID generation, we employ the method "Replace
     * LeftmostRandom Bits with Increased Clock Precision (Method 3)",
     * described in the RFC. […]

braiamp · a year ago
I don't think most people will heed this warning. I warned people in a programming forum that Python ordering of objects by insertion time was a implementation detail, because it's not guaranteed by any PEP [0]. I could literally write a PEP compliant Python interpreter and could blow up in someone's code because they rely on the CPython interpreter behavior.

[0]: https://mail.python.org/pipermail/python-dev/2017-December/1...

dragonwriter · a year ago
> I warned people in a programming forum that Python ordering of objects by insertion time was a implementation detail, because it's not guaranteed by any PEP

PEPs do not provide a spec for Python, they neither cover the initial base language before the PEP process started, nor were all subsequent language changes made through PEPs. The closest thing Python has to a cross-implementation standard is the Python Language Reference for a particular version, treating as excluded anything explicitly noted as a CPython implementation detail. Dictionaries being insertion-ordered went from a CPython implementation detail in 3.6 to guaranteed language feature in 3.7+.

kstrauser · a year ago
That definitely was true, and I use to jitter my code a little to deliberately find and break tests that depended on any particular ordering.

It's now explicitly documented to be true, and you can officially rely on it. From https://docs.python.org/3/library/stdtypes.html#dict:

> Changed in version 3.7: Dictionary order is guaranteed to be insertion order.

That link documents the Python language's semantics, not the behavior of any particular interpreter.

deadbabe · a year ago
Most code does not live for a long time. Similar to how consumer products are built for planned obsolescence, code is also built with a specific lifespan in mind.

If you spend time making code bulletproof so it can run for like 100 years, you will have wasted a lot of effort for nothing when someone comes along and wipes it clean and replaces it with new code in 2 years. Requirements change, code changes, it’s the nature of business.

Remember any fool can build a bridge that stands, it takes an engineer to make a bridge that barely stands.

agilob · a year ago
>Most code does not live for a long time.

Sure, and here I am in a third company doing cloud migration and changing our default DB from MySQL to SQL server. The pain is real, 2 year long roadmap is now 5 years longer roadmap. All because some dude negotiated a discount on cloud services. And we still develop integrations that talk to systems written for DOS.

mardifoufs · a year ago
What? Okay, so assume that most code doesn't last. It doesn't mean that you should purposefully make it brittle for basically no additional benefit? If as you say, it's about making the most with as little as possible (which is what the bridge analogy usually refers to), then surely adding a single function (to actually enforce the ordering you want) to make your code more robust is one of the best examples of that?
Pxtl · a year ago
Uh, more people work on 20-year-old codebases than you'd think.
mmerickel · a year ago
Remember even if timestamps may be generated using a monotonically increasing value that does not mean they were committed in the same order to the database. It is an entirely separate problem if you are trying to actually determine what rows are "new" versus "previously seen" for things like cursor-based APIs and background job processing. This problem exists even with things like a serial/autoincrement primary key.
shalabhc · a year ago
+1

What could be useful here is if postgres provided a way to determine the latest frozen uuid. This could be a few ms behind the last committed uuid but should guarantee that no new rows will land before the frozen uuid. Then we can use a single cursor track previously seen.

fngjdflmdflg · a year ago
>The Postgres patch solves the problem by repurposing 12 bits of the UUID’s random component to increase the precision of the timestamp down to nanosecond granularity [...]

>It makes a repeated UUID between processes more likely, but there’s still 62 bits of randomness left to make use of, so collisions remain vastly unlikely.

Does it? Even though the number of random bits has decreased, the time interval to create such a duplicate has also decreased, namely to an interval of one nanosecond.

londons_explore · a year ago
I could imagine that certain nanoseconds might be vastly more likely than other nanoseconds.

For example, imagine you have a router that sends network packets out at the start of each microsecond, synced to wall time.

Or the OS scheduler always wakes processes up on a millisecond timer tick or some polling loop.

Now, when those packets are received by a postgres server and processed, the time to do that is probably fairly consistent - meaning that X nanoseconds past the microsecond you probably get most records being created.

UltraSane · a year ago
But only one nanosecond slower or faster and you get another set of 4.611 billion billion random IDs. I think random variations in buffer depths and CPU speeds will easily introduce hundreds of nanoseconds of timing variations. syncing any two things to less than 1 nanosecond is incredibly hard and doesn't happen by accident.
michaelt · a year ago
Imagine if you were generating 16 UUIDs per nanosecond, every nanosecond.

According to [1] due to the birthday paradox, the probability of a collision in any given nanosecond would be 3E−17 which of course sounds pretty low

But there are 3.154e+16 nanoseconds in a year - and if you get out your high-precision calculator, it'll tell you there's a 61.41% chance of a collision in a year.

Of course you might very well say "Who needs 16 UUIDs per nanosecond anyway?"

[1] https://www.bdayprob.com/

Horffupolde · a year ago
So what if there’s a collision? If the column is UNIQUE at most it’ll ROLLBACK on INSERT. 16 INSERTS per nanosecond is 16 billion TPS. At that scale you’ll have other problems.
paulddraper · a year ago
Depends if you think sub-millisecond locality is significant.
samatman · a year ago
I maintain that people are too eager to use UUIDv7 to begin with. It's a dessert topping and a floor wax.

Let's say you need an opaque unique handle, and a timestamp, and a monotonically increasing row ID. Common enough. Do they have to be the same thing? Should they be the same thing? Because to me that sounds like three things: an autoincrementing primary key, a UUIDv4, and a nanosecond timestamp.

Is it always ok that the 'opaque' unique ID isn't opaque at all, that it's carrying around a timestamp? Will that allow correlating things which maybe you didn't want hostiles to correlate? Are you 100% sure that you'll never want, or need, to re-timestamp data without changing its global ID?

Maybe you do need these things unnormalized and conflated. Do you though? At least ask the question.

peferron · a year ago
You can keep all three and still use UUIDv7 as a performance improvement in certain contexts due to data locality.
fastball · a year ago
Also if you have a nanosecond timestamp, do you actually need a monotonically increasing row ID? What for?
peferron · a year ago
Perhaps as a tie breaker if you insert multiple rows in a table within one transaction? In this situation, the timestamp returned by e.g. `now()` refers to the start of the transaction, which can cause it to be reused multiple times.
user3939382 · a year ago
Re-timestamp would be a new one for me. What’s a conceivable use case? An NTP fault?
Dylan16807 · a year ago
> The Postgres patch solves the problem by repurposing 12 bits of the UUID’s random component to increase the precision of the timestamp down to nanosecond granularity (filling rand_a above), which in practice is too precise to contain two UUIDv7s generated in the same process.

A millisecond divided by 4096 is not a nanosecond. It's about 250 nanoseconds.

scrollaway · a year ago
UUID7 is excellent.

I want to share a django library I wrote a little while back which allows for prefixed identity fields, in the same style as Stripe's ID fields (obj_XXXXXXXXX):

https://github.com/jleclanche/django-prefixed-identity-field...

This gives a PrefixedIdentityField(prefix="obj_"), which is backed by uuid7 and base58. In the database, the IDs are stored as UUIDs, which makes them an efficient field -- they are transformed into prefixed IDs when coming out of the database, which makes them perfect for APIs.

(I know, no documentation .. if someone wants to use this, feel free to file issues to ask questions, I'd love to help)

dotdi · a year ago
My org has been using ULID[0] extensively for a few years, and generally we've been quite happy with it. After initially dabbing with a few implementations, I reimplemented the spec in Kotlin, and this has been working out quite well for us. We will open-source our implementation in the following weeks.

ULID does specifically require generated IDs to be monotonically increasing as opposed to what the RFC for UUIDv7 states, which is a big deal IMHO.

[0]: https://github.com/ulid/spec

willvarfar · a year ago
Having used a lot of the ULID variants that the UUIDv7 spec cites as prior art, including the ULID spec you link to, I've gotta say that UUIDv7 has some real advantages.

The biggest advantage is that it is hex. Haven't yet met a database system that doesn't have functions for substr and from_hex etc, meaning you can extract the time part using vanilla sql.

ULID and others that use custom variants of base32 or base62 or whatever are just about impossible to wrangle with normal tooling.

Your future selfs will thank you for being able to manipulate it in whatever database you use in the future to analyse old logs or import whatever data you generate today.

mixmastamyk · a year ago
Aren't they stored as 16 bytes in binary? How to format it later as text is then your choice.
sedatk · a year ago
Additionally, v7 UUIDs can be generated simultaneously on the client-side by multiple threads without waiting for an oracle to release the next available ID. That's quite important for parallel processing. Otherwise, you might as well use an autoincrement BIGINT.
sedatk · a year ago
ULID guarantees monotonicity only per process, and it requires ID generation to be serialized. I find the promise quite misleading because of that. You might as well use a wide-enough integer with the current timestamp + random as baseline for the same purpose, but I wouldn't recommend that either.
lordofgibbons · a year ago
What benefit does this have over something like Twitter's Snowflake, which can be used to generate distributed monotonically increasing IDs without synchronization?

We've been using an implementation of it in Go for many years in production without issues.

WorldMaker · a year ago
UUIDv7 interoperates with all the other versions of UUID. The v7 support in Postgres doesn't add a new column type, it makes the existing column type more powerful/capable. Applications that had been using UUIDv4 everywhere can get cheap Snowflake-like benefits in existing code just from switching the generator function. Most languages have a GUID or UUID class/struct that is compatibly upgradable from v4 to v7, too.
akvadrako · a year ago
Snowflake is a 64 bit integer. It doesn't need a new column type and works everywhere.