Why Choose ULIDs over Traditional UUIDs or IDs for Database Identification?

The main driver behind lexically sortable identifiers is that you generally insert into databases in time order so if your ids are sorted by time then you are appending to the end of the table. If, on the other hand, your ids are random (e.g. the prevalent UUIDv4) then your database is spending all it's time shuffling everything around to insert your new rows in the middle of everything.

Once you have time-based identifiers as primary keys you can do fancy things like using some id you know was generated at time x as a where clause on some other table to dramatically cut down the search space. This kind of dirty trick can really improve performance sometimes when it really matters.

However, the main contender for lexically sorted IDs is ... UUIDs! The new UUID specs have ULID versions of UUIDs.

The new UUID spec includes lexically sortable variants, of which UUIDv7 is gaining widespread support. https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122b...

That spec lists 16 prior lexically-sortable identifiers, and has good motivation for adding time sorting to the UUID standard.

Two things I particularly like about the UUIDv7 standard is that it is easy to swap into existing codebases that are using UUIDv4, and that it is easy to extract the timestamp part in systems that don't know about it and are treating it as a string e.g. in lots of databases you can do some sql like TIMESTAMP_MILLIS(CAST(CONCAT("0x", LEFT(REPLACE(id, '-', ''), 12)) AS INT64)). This is a massive advantage over systems that use non-hex encodings.

So these days, use UUIDv7 if you want to generate your ID outside of a database and the time the id was created is not a secret; else use UUIDv4. Your database will thank you.

Kwpolska · a year ago

> your database is spending all it's time shuffling everything around to insert your new rows in the middle of everything

Is it? The order of data on disk tends to be the insertion order. What might get shuffled around is the index for the primary key, but indexes can handle that quite well.

willvarfar · a year ago

And the answer is "depends on the database tech".

Here are some nice benchmarks: https://www.toomanyafterthoughts.com/uuids-are-bad-for-datab...

I learned the lexically-sortable-id trick from wizened DBAs in the 90s. It's an old trick. I've applied it myself to speed things up and have used both the ULID types described in the article and UUIDv7 etc.

I even advocate using UUID7 on BigQuery where there are no indexes. Having sortable keys is nice even if it doesn't add speed.

j-pb · a year ago

You don't need a total order for the locality properties. E.g. a 32bit ms timestamp that overflows every ~50 days + 96bit of entropy will give you the same locality with a much lower collision probability.

That being said, a lot of index types don't care about this kind of locality and might even benefit from higher entropy (randomized algorithms).

s4i · a year ago

Depending on your read and write patterns, overflowing even every few days might be appropriate, and also obfuscate the creation time [1] more.

[1] https://news.ycombinator.com/item?id=40272446

dagss · a year ago

But if you for instance use MS SQL and the uniqueidentifier type, sorted UUIDv7 is not sorted im the database because MS SQL swapsnthe bytes around. So ULID (stored as binary(16)) is an advantage in that case to avoid making that mistake.

I really like that ULID looks differentt from UUIDv4. UUIDv7 is something sorted that on the surface looks like it will be random and is mixed with random IDs etc..

jitl · a year ago

It depends per storage system. I think we evaluated this on Postgres b-tree and weren’t super jazzed about it. Some storage systems don’t like hotspots. They can write two far-apart keys in parallel, but adjacent keys need to be serialized. Sometimes this happens with sharding systems.

cbsmith · a year ago

What drives me nuts is people storing uuids in databases as hex strings. They're bytes, and if you do it right, even the type 1's are lexigraphically sortable.

Kwpolska · a year ago

Hex strings may require more storage, but they remove the need to convert back and forth and make it easier to manually query the database when debugging. The best type is your database's uuid type if it has one.

Deleted Comment

Related. Others?

ULIDs and Primary Keys (2022) - https://news.ycombinator.com/item?id=40016413 - April 2024 (33 comments)

Ulid: Universally Unique Lexicographically Sortable Identifier - https://news.ycombinator.com/item?id=39878319 - March 2024 (6 comments)

ULID – Sortable Unique Identifier - https://news.ycombinator.com/item?id=34281969 - Jan 2023 (23 comments)

Using ULIDs at Incident.io - https://news.ycombinator.com/item?id=34230652 - Jan 2023 (1 comment)

Understanding UUIDs, ULIDs and string representations - https://news.ycombinator.com/item?id=29794186 - Jan 2022 (100 comments)

Going Deep on UUIDs and ULIDs - https://news.ycombinator.com/item?id=28948815 - Oct 2021 (2 comments)

Universally Unique Lexicographically Sortable Identifier - https://news.ycombinator.com/item?id=23160641 - May 2020 (2 comments)

ULID: Universally Unique Lexicographically Sortable Identifier - https://news.ycombinator.com/item?id=18768909 - Dec 2018 (129 comments)

Universally Unique Lexicographically Sortable Identifier in Go - https://news.ycombinator.com/item?id=13116308 - Dec 2016 (52 comments)

ULID: Universally Unique Lexicographically Sortable Identifier - https://news.ycombinator.com/item?id=12205158 - Aug 2016 (3 comments)

s4i · a year ago

This topic resurfaces quite frequently, I've noticed. One thing I find quite frustrating is that most of the time this is discussed, people don't remember that exposing the creation time of a database row to the public is not acceptable in many systems. This makes v7 UUIDs, ULIDs, KSUIDs, etc. not appropriate for the task.

In any application where the user is able to first create a draft of something (a video upload, a news article, a blog post, ...) and then release it some day in the future, we probably don't want the public to be aware of when the author created the initial draft. For instance, news websites often start drafting articles about people that are speculated to pass away soon.

The problem also affects numeric/auto-incremented IDs. The "Finnish BBC" YLE uses numeric IDs in their news article URLs, making it possible to ballpark when the article was first started. Here's [1] an article about the death of former president Martti Ahtisaari from 2023/10/16 and here's [2] the next integer ID dated 2023/5/5 (5 months before).

[1] https://yle.fi/a/74-20030335

[2] https://yle.fi/a/74-20030336

wodenokoto · a year ago

Did the article provide any reason for ULID over UUID?

- it’s slightly more complex than UUID, but not enough to be a problem

- it’s sortable (in time?) which UUID can also be (but usually recommended not to be)

- it can produce slightly more ids per second, but not enough to make a difference.

So, it’s a tie, a tie and a tie. Why would you switch?

codetrotter · a year ago

There is a draft for a UUID variant called v7 which is basically the same like ulid

It is absolutely valid

In particular when you want ids to be in mostly ascending order but where you can’t use an auto incrementing int

It’s sortable is a key feature because it can help make searching tables more efficient in situations where the data is naturally tied to the time at which the entry was made and your queries relate to time also

For example, if you append rows with ulid or uuid v7, then you can binary search the records in the file without first sorting or making an extra index of the ids as they are already for the most part in ascending order

Cassandra seemed to have no problem using type1 UUIDs for this.

cuu508 · a year ago

That was left as an exercise for the reader :-/

> I am not going to cover all the reasons why Ulid is better, but in the end, it's up to your personal preference.

lazyasciiart · a year ago

It jumps straight to “I’m not going to list all the reasons”. Ok, maybe just the biggest one or two reasons? Is lexicographer sorting that helpful?

Dylan16807 · a year ago

> it’s sortable (in time?) which UUID can also be (but usually recommended not to be)

It gives you the sortable parts without the "recommended not to" parts.

erik_seaberg · a year ago

If your team hasn't engineered in monotonically increasing distributed clocks, the results of sorting are only partly correct. I'd rather not promote an option we can't rely on.

russell_h · a year ago

They’re only partially correct no matter what. The timestamp in a ULID has only has millisecond resolution, so two IDs generated in the same millisecond, even within the same process, will be randomly ordered.

In practice though there are a lot of advantages to having approximately-time-ordered IDs, and I’ve found the pitfalls easy enough to avoid.

skrebbel · a year ago

For many use cases, partially correct sorting is fine.

Eg messaging, tweets etc: if two messages are sent at pretty much the same time, it does not matter which order they show in.

vbezhenar · a year ago

At this point I think that ideal setup is to use numerical id for primary and foreign keys and maintaining a separate uuid field for everything else.

The reason being that index size matters a lot (for caches and other things) and index size depends on underlying field size, obviously.

Whether to use UUID or ULID is depends on tooling. While it's not hard to write ascending UUID generator and I did it myself few times, using standard APIs might be better approach despite the shortcomings.

itake · a year ago

Or just use ints and encrypt them when you want to return them to the user.

https://sqids.org/

carpenecopinum · a year ago

I like the general idea, but the page you linked explicitly states that it isn't meant to be used to hide (user) IDs, or actually encrypt data.

perfmode · a year ago

NB: may be more accurate to call this hashing, versus encryption

Or you can just store the UUIDs as bytes like the almighty intended.

UUID is 128 bits.

Even for huge tables 64 bits are enough. And for many tables 32 bits is more than enough.

would you still keep a uniqueness constraint on the uuid/ulid column?

9dev · a year ago

I wouldn’t, it’s redundant for all intents and purposes unless you’re Google, which you are not. UUIDs/ULIDs provide strong uniqueness guarantees on their own.

dxdm · a year ago

Yes, I'd keep a uniqueness constraint, because it prevents bugs from inserting the same value more than once. From concurrency issues to broken id generation, I've seen it happen before. So unless there are other considerations, having the database enforce uniqueness is a good thing in my book.

kreetx · a year ago

Depends on how many new U(U/L)IDs you insert, but when it's part of an index anyway then I personally would, as the performance penalty would be small (much smaller than anything else you do).

Of course. Index for this column is inevitable and with index, uniqueness basically free and protects from programmer errors.

KTibow · a year ago

The GH repo for ULIDs provides better and more relevant reasons for use. https://github.com/ulid/spec

dang · a year ago

w1zard0f0z · a year ago

How about TSIDs? Numerical, half the size, time sorted... https://vladmihalcea.com/uuid-database-primary-key/

TSIDs are basically type1 UUID's without the MAC address, no?

ponytech · a year ago

My preference: using traditional incremental numeric IDs and obfuscate them with Hashids (https://hashids.org) when exposed publicly

Sammi · a year ago

This looks amazing.

I'm wondering about the security of using a random alphabet with this instead of the default one. In my mind this amounts to a form of cryptography, but I have no idea how to analyse how much security it gives.

EDIT: Reading the faq I see that they insist that sqids cannot provide any encryption. This does not fit with my understanding of the word. Using unique random alphabet is probably the oldest form of encryption. Whether or not it is secure enough depends on your threat model. What I want and what I need is a way to calculate the security provided by a random alphabet.

mattbillenstein · a year ago

I've been doing something that includes a type character as the first character - followed by 11 random base58 digits. This allows > 2^64 possible ids and a type in ~12 bytes. I occasionally wish they were ordered, but most of my tables usually have a 'created_at' field anyway should ordering matter.

I feel like uuid / ulid are just overkill for most situations - and they're long and kinda ugly imo.

Isn't that basically a type 4 UUID that you truncated the last 4 bytes from?

The encoding is completely different? And given that encoding I don't go to any great means to store them as a special type in the db - just text. uuid being hexidecimal, databases often want to store them as binary since that's half the bytes...