> It is worth noting that the newest proposed standard for unique identifiers, UUID v7, aims to address the sortability and database performance issues of older UUID versions by adopting a similar time-ordered structure to ULID.
Ya, UUID v7 has been standard for a few years now. Perhaps the author is not familiar with the terminology of RFCs and so is misinterpreting the terminology used by IETF: "Request for Comments" and "Proposed Standard" can sound like they're not complete to folks not familiar with the IETF's process. Even then though, I would think they'd notice all the software that has UUID v7 support.
Whenever ULID comes up, I need to remind that it has a sequential ID generation mode in its spec which is prone to conflicts on multi-threads, processes or hosts which kills the purpose of a "universal" identifier. If you need a sequential ID, just use an integer, preferably one that's autoincremented by the database.
It's best to stick to UUIDv7 because of such quirks of ULID.
> I need to remind that it has a sequential ID generation mode in its spec which is prone to conflicts on multi-threads, processes or hosts which kills the purpose of a "universal" identifier.
Can you expand on how this can actually cause a problem? My understanding is different processes and hosts should never conflict because of the 80 bits of random data. The only way I can conceive of a conflict is multiple threads using the same non-thread-safe generator during the same millisecond.
You're right, not hosts or processes in that case. I forgot about random part as it's been a while since I looked at it. However, a single instance of a ULID generator must support this mode, which means that on multi-threaded architectures, it must lock the sequence as it still uses a single random value. That again, kills the purpose of a client-side, lock-free generation of universal identifiers as you said.
The monotonic behavior is not the default, but I would also be happier if it was removed from the spec or at least marked with all the appropriate warning signs on all the libraries implementing it.
But I don't think UUIDv7 solves the issue by "having less quirks". Just like you'd have to be careful to use the non-monotonic version of ULID, you'd have to be careful to use the right version of UUID. You also have to hope that all of your UUID consumers (which would almost invariably try to parse or validate the UUID, even if they do nothing with it) support UUIDv7 or don't throw on an unknown version.
Actually dived into this a bit just a couple days ago. It's very near impossibly for there to be a conflict since the timestamp resolves at the microsecond level, and if it's among threads, then there's a global state that, if somehow it should be hit 2+ times in the same microsecond, ensures detection and the random portion is incremented.
Under what circumstances is it prone to conflicts? On separate threads/hosts/processes, id's created within the same millisecond would be differentiated by the 80 bits of randomness (more than UUID v7).
No, ULID has a "monotonic" feature, where if it detects the same millisecond timestamp in back to back calls, it just increments the 80 bit "random" portion. This means it has convoying behavior. If two machines are generating ids independently and happen to choose initial random positions near each other, the probability of collision is much higher than the basic birthday bound.
I think this "sort of monotonic but not really" is the worst of both to be honest. It tempts you to treat it like an invariant when it isn't.
If you want monotonicity with independent generation, just use a composite key that's a lamport clock and a random nonce. Or if you want to be even more snazzy use Hybrid Logical Clocks or similar.
> If you need a sequential ID, just use an integer
Are monotonic/sequential ULIDs as easily enumerated as integers? It's the ease of enumerability that keeps a lot of folks away from using sequential integers as IDs
ULID's initial segment is timestamp generated, with a random suffix at the end. This kind of collision you're concerned about is not an issue at all, across multi-threads, processes or hosts.
UUID 7 is so much easier than the ULID in the article manipulate. Pretty much every language and database has the string manipulation and from_hex functions to extract the timestamps without any special support function. Whereas a format that is too clever is way more complicated to work with.
I know it may sound stupid but in my latest project I chose ULIDs because I can easily select them as one word, instead of various implementations of browsers, terminals, DB guis, etc each have their own opinion how to select and copy the whole UUID. So from that point of view ULIDs "look" better for me as they are more ergonomic when I actually have to deal with them manually.
Not sure what you mean by cryptographic strength - they are both Unique ID generators, not meant for anything related to cryptography.
UUIDv7 has 62 bits of random data, ULID uses 80 bits, so if anything ULID is "stronger" (meaning less chances of generating the same id within the same millisecond)
I have always been a bit hesitant to use UUIDs with timestamps as it can be a security issue if the IDs are public. For example getting the age of a user account just from the id. I will say, however, that I have not heard of any major incidents stemming from this.
I don’t think it’s generally a good idea to store important domain information in synthetic keys. UUIDs should always be treated as opaque keys, but they may have structure that will help them fulfilling their primary function. Timestamp in UUID v7 may be close to moment of record creation, but it shouldn’t be the contract in your system that it is the creation timestamp.
The classic solution to this is to have an internal ID (UUIDv7 if you want to use UUID, nice for indexing in newer databases) and an external ID (UUIDv4 or similar) which doesn't leak information to the outside world (but which otherwise doesn't offer any benefits at the storage level).
As others already pointed out, UUIDv7 is a solid choice and if you don't like the default representation, you can encode the underlying byte array with base62 for example, to get short, URL-friendly IDs.
Yeah, I would go with UUID v7 at this point given that it's part of the UUID RFC https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-vers...
It's best to stick to UUIDv7 because of such quirks of ULID.
Can you expand on how this can actually cause a problem? My understanding is different processes and hosts should never conflict because of the 80 bits of random data. The only way I can conceive of a conflict is multiple threads using the same non-thread-safe generator during the same millisecond.
But I don't think UUIDv7 solves the issue by "having less quirks". Just like you'd have to be careful to use the non-monotonic version of ULID, you'd have to be careful to use the right version of UUID. You also have to hope that all of your UUID consumers (which would almost invariably try to parse or validate the UUID, even if they do nothing with it) support UUIDv7 or don't throw on an unknown version.
I agree that picking UUID variant requires caution, but when someone has already picked ULID, UUIDv7 is easily a superior alternative.
I think this "sort of monotonic but not really" is the worst of both to be honest. It tempts you to treat it like an invariant when it isn't.
If you want monotonicity with independent generation, just use a composite key that's a lamport clock and a random nonce. Or if you want to be even more snazzy use Hybrid Logical Clocks or similar.
Are monotonic/sequential ULIDs as easily enumerated as integers? It's the ease of enumerability that keeps a lot of folks away from using sequential integers as IDs
> Why not use UUID7?
> "ULID is much older than UUID v7 though and looks nicer"
For those unfamiliar, UUIDv7 has pretty much the same properties – sortable, has timestamp, etc.
ULID: 01ARZ3NDEKTSV4RRFFQ69G5FAV
UUIDv7: 019b04ff-09e3-7abe-907f-d67ef9384f4f
UUIDv7 has 62 bits of random data, ULID uses 80 bits, so if anything ULID is "stronger" (meaning less chances of generating the same id within the same millisecond)
Deleted Comment
for those also learning