ULID: Universally Unique Lexicographically Sortable Identifier

rdtsc · 6 days ago

> It is worth noting that the newest proposed standard for unique identifiers, UUID v7, aims to address the sortability and database performance issues of older UUID versions by adopting a similar time-ordered structure to ULID.

Yeah, I would go with UUID v7 at this point given that it's part of the UUID RFC https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-vers...

codys · 6 days ago

Ya, UUID v7 has been standard for a few years now. Perhaps the author is not familiar with the terminology of RFCs and so is misinterpreting the terminology used by IETF: "Request for Comments" and "Proposed Standard" can sound like they're not complete to folks not familiar with the IETF's process. Even then though, I would think they'd notice all the software that has UUID v7 support.

sedatk · 6 days ago

Whenever ULID comes up, I need to remind that it has a sequential ID generation mode in its spec which is prone to conflicts on multi-threads, processes or hosts which kills the purpose of a "universal" identifier. If you need a sequential ID, just use an integer, preferably one that's autoincremented by the database.

It's best to stick to UUIDv7 because of such quirks of ULID.

cpburns2009 · 6 days ago

> I need to remind that it has a sequential ID generation mode in its spec which is prone to conflicts on multi-threads, processes or hosts which kills the purpose of a "universal" identifier.

Can you expand on how this can actually cause a problem? My understanding is different processes and hosts should never conflict because of the 80 bits of random data. The only way I can conceive of a conflict is multiple threads using the same non-thread-safe generator during the same millisecond.

sedatk · 6 days ago

You're right, not hosts or processes in that case. I forgot about random part as it's been a while since I looked at it. However, a single instance of a ULID generator must support this mode, which means that on multi-threaded architectures, it must lock the sequence as it still uses a single random value. That again, kills the purpose of a client-side, lock-free generation of universal identifiers as you said.

unscaled · 6 days ago

The monotonic behavior is not the default, but I would also be happier if it was removed from the spec or at least marked with all the appropriate warning signs on all the libraries implementing it.

But I don't think UUIDv7 solves the issue by "having less quirks". Just like you'd have to be careful to use the non-monotonic version of ULID, you'd have to be careful to use the right version of UUID. You also have to hope that all of your UUID consumers (which would almost invariably try to parse or validate the UUID, even if they do nothing with it) support UUIDv7 or don't throw on an unknown version.

sedatk · 6 days ago

UUIDv7 is the closest to ULID as both are timestamp based, and UUIDv7 has fewer quirks than ULID, no question about it.

I agree that picking UUID variant requires caution, but when someone has already picked ULID, UUIDv7 is easily a superior alternative.

skeledrew · 6 days ago

Actually dived into this a bit just a couple days ago. It's very near impossibly for there to be a conflict since the timestamp resolves at the microsecond level, and if it's among threads, then there's a global state that, if somehow it should be hit 2+ times in the same microsecond, ensures detection and the random portion is incremented.

listenallyall · 6 days ago

Under what circumstances is it prone to conflicts? On separate threads/hosts/processes, id's created within the same millisecond would be differentiated by the 80 bits of randomness (more than UUID v7).

jasonwatkinspdx · 6 days ago

No, ULID has a "monotonic" feature, where if it detects the same millisecond timestamp in back to back calls, it just increments the 80 bit "random" portion. This means it has convoying behavior. If two machines are generating ids independently and happen to choose initial random positions near each other, the probability of collision is much higher than the basic birthday bound.

I think this "sort of monotonic but not really" is the worst of both to be honest. It tempts you to treat it like an invariant when it isn't.

If you want monotonicity with independent generation, just use a composite key that's a lamport clock and a random nonce. Or if you want to be even more snazzy use Hybrid Logical Clocks or similar.

sedatk · 6 days ago

See my sibling comment.

marifjeren · 6 days ago

> If you need a sequential ID, just use an integer

Are monotonic/sequential ULIDs as easily enumerated as integers? It's the ease of enumerability that keeps a lot of folks away from using sequential integers as IDs

sedatk · 6 days ago

You mean someone who wants to attack your system might be discouraged by Base32 encoding?

N_Lens · 6 days ago

ULID's initial segment is timestamp generated, with a random suffix at the end. This kind of collision you're concerned about is not an issue at all, across multi-threads, processes or hosts.

sedatk · 6 days ago

Not if the same ULID generator instance is used across threads.

nighthawk454 · 6 days ago

Mentioned in the article's comments:

> Why not use UUID7?

> "ULID is much older than UUID v7 though and looks nicer"

For those unfamiliar, UUIDv7 has pretty much the same properties – sortable, has timestamp, etc.

ULID: 01ARZ3NDEKTSV4RRFFQ69G5FAV

UUIDv7: 019b04ff-09e3-7abe-907f-d67ef9384f4f

wood_spirit · 6 days ago

UUID 7 is so much easier than the ULID in the article manipulate. Pretty much every language and database has the string manipulation and from_hex functions to extract the timestamps without any special support function. Whereas a format that is too clever is way more complicated to work with.

nvader · 6 days ago

UUIDv7 looks better in the eye of this beholder.

ChymeraXYZ · 6 days ago

I know it may sound stupid but in my latest project I chose ULIDs because I can easily select them as one word, instead of various implementations of browsers, terminals, DB guis, etc each have their own opinion how to select and copy the whole UUID. So from that point of view ULIDs "look" better for me as they are more ergonomic when I actually have to deal with them manually.

andy_ppp · 6 days ago

It’s also quite common to base62 the UUID value so in this case “31prI2bsccbXJB7cvbtV9”

sblom · 6 days ago

I love the aesthetics. The cryptographic strength tradeoffs (against UUIDv7) seem rough for a lot of applications, though.

jalk · 6 days ago

Not sure what you mean by cryptographic strength - they are both Unique ID generators, not meant for anything related to cryptography.

UUIDv7 has 62 bits of random data, ULID uses 80 bits, so if anything ULID is "stronger" (meaning less chances of generating the same id within the same millisecond)

0x457 · 6 days ago

UUIDv7 has 74 bits of randomness, not 62, you forgot rand_a portion, so the difference is just 6 bits and only matters within the same millisecond.

elias1233 · 6 days ago

I have always been a bit hesitant to use UUIDs with timestamps as it can be a security issue if the IDs are public. For example getting the age of a user account just from the id. I will say, however, that I have not heard of any major incidents stemming from this.

ivan_gammel · 5 days ago

I don’t think it’s generally a good idea to store important domain information in synthetic keys. UUIDs should always be treated as opaque keys, but they may have structure that will help them fulfilling their primary function. Timestamp in UUID v7 may be close to moment of record creation, but it shouldn’t be the contract in your system that it is the creation timestamp.

verandaguy · 6 days ago

The classic solution to this is to have an internal ID (UUIDv7 if you want to use UUID, nice for indexing in newer databases) and an external ID (UUIDv4 or similar) which doesn't leak information to the outside world (but which otherwise doesn't offer any benefits at the storage level).

cryptos · 5 days ago

As others already pointed out, UUIDv7 is a solid choice and if you don't like the default representation, you can encode the underlying byte array with base62 for example, to get short, URL-friendly IDs.

swyx · 6 days ago

i keep a list of UUID info: https://github.com/swyxio/brain/blob/master/R%20-%20Dev%20No...

for those also learning