Ask HN: Do you use foreign keys in relational databases?

Fear of RDBMSes is quite common. I used to suffer from it too. It’s just so annoying to have to switch your brain to a different programming paradigm every time you need to do something with the database that you start to make up all sorts of excuses as to why it’s really just better to “do it in the code”. Your coworkers argument about FKs making data migrations difficult is one of them.

Another classic is the “joins are slow” argument, which I believe goes back to a period in the late 1990s when in one, not highly regarded at the time, database, namely MySQL, they were indeed slow. But the reason “everyone” knew about this was precisely the oddness of this situation: in fact RDBMSes are highly optimized pieces of software that are especially good at combining sets of data. Much better than ORMs, anyway, or, god forbid, whatever you cobble together on your own.

There is, in my mind, only one valid reason to not use foreign keys in a database schema. If your database is mostly write only, the additional overhead of generating the indexes for the foreign keys may slow you down a little (for reading, these very same foreign keys in fact speed things up quite considerably). Even in such a case, however, I’d argue you’re doing it wrong and there should be a cache of some sort before things are written out in bulk to a properly setup RDBMS.

Semaphor · 3 years ago

> Much better than ORMs

I recently migrated to EntityFramework Core (from the non-core version) and I’m actually impressed. Most SQL is pretty much what I’d write by hand.

Now granted, if there are complex joins, subqueries and stuff, I don’t even try wrangling the ORM to somehow give me that output, but still. I feel more comfortable just using EF than I used to.

moonchrome · 3 years ago

My main problem with Entity Framework is the magic underneath.

Like simple operation

    x = Ef.Find(xid)
    x.Name = "something"
    y = Ef.Find(xid)

what is y.Name ? Even though you didn't save anything to the database yet ? And the second Find didn't actually refresh from the database ?

Oh and the random bugs where people improperly include related entities but it somehow ends up working because they are automatically added as you're firing off other related queries, until eventually it does not (usually in production only).

It's a really really complex system designed to look simple and pave over important details with "works most of the time" defaults.

tailspin2019 · 3 years ago

Another vote for EF Core here. It’s superb.

AdamN · 3 years ago

ORM is a very valuable tool and should be aggressively used. One can always step down to SQL as needed but otherwise, the ORM logic is easier to write and maintain.

rowanG077 · 3 years ago

Yep entity framework is truly amazing. If you have used that ORM you never go back. You still need to sometimes make your own query for perf or other needs. But it's quite rare in my experience.

Most of the time when I had performce issues it isn't EF. It's a missed index or higher level query issue.

sonthonax · 3 years ago

> Another classic is the “joins are slow” argument

The only person I knew who died on that hill would insist on doing two queries to the database, and then would insist on doing a client side cartesian join.

AdamN · 3 years ago

I remember getting beers with somebody in the aughts who claimed that he saw an entire website where the url was the key and the webpage was the value in an Oracle database. Any code was SQL operations inside the value field.

10x-dev · 3 years ago

Are joins in a 5NF database now as fast as querying a denormalized database?

lupire · 3 years ago

To be fair thata a reasonable approach if the database is at its monolithic scaling limit in CPU but not IO, while the clients can scale horizontally to more machines.

Unlikely in practice, though.

tluyben2 · 3 years ago

We used to do large setups at companies for what was then called intra and extranets begin 00s. These were very read/write intensive as the staff and partner staff would be on there basically all the time during office hours and data was not great for caching as data changed a lot especially in some companies like large hospitals and universities. We used mysql (I cannot remember why) and we did a lot of performance testing at that time; we removed all joins which made everything a lot faster. This is no longer the case now but indeed many people still believe it ; not (only) because they saw or tried it back then, but also because it’s less strain on the brain to just do single table selects and use not FKs or joins.

function_seven · 3 years ago

Back in the day I was forced to ditch FKs in my MySQL application, because I needed a FULLTEXT index on one of my columns, and MySQL only supported that type of index on MyISAM tables (this was on 5.x or something). MyISAM didn't do foreign keys.

It was a pretty central table, and the inability to use FKs there kinda spread outward.

edmundsauto · 3 years ago

Did you consider making a 1-1 relationship on a new table that only had the FULLTEXT column? Curious how you evaluated the trade offs

bartread · 3 years ago

> Another classic is the “joins are slow” argument

Along with the "indexes slow down INSERTs and UPDATEs" argument that you touch on. I mean, it is literally true that indexes make writes slightly slower, and an excessive quantity of indexes (which I have seen) can slow down writes enough to cause problems. But - in general - the slowdown is irrelevant compared with the overhead of querying a table that contains 2 billion rows using, oh, I don't know, a table scan because you don't have even a single index (I have also seen this).

vbezhenar · 3 years ago

One reason to avoid FK is when your database is partitioned to multiple servers, but that's obvious, I guess, and it's not really RDBMS anymore.

tailspin2019 · 3 years ago

> RDBMSes are highly optimized pieces of software

> Much better than ORMs

These two things are not mutually exclusive though right?

It’s entirely possible to have a lightweight and relatively transparent ORM which makes full use of the underlying RDBMS.

sverhagen · 3 years ago

Yeah, I was going to say something similar. But ORMs get blamed for obscuring what's going on, to the point that a developer may end up doing some sort of inefficient 1-to-n lookup that would've indeed been much better off as a SQL JOIN.

I use JPA/Hibernate professionally, as a decision maker, but I don't think I'm in either camp entirely. ORMs aren't a magic wand, but they do help you standardize the boilerplate that you'd end up with one way or the other, in most cases.

Gordonjcp · 3 years ago

> in fact RDBMSes are highly optimized pieces of software that are especially good at combining sets of data. Much better than ORMs, anyway,

ORMs are just a wrapper around RDBMSes. If your ORM is producing incredibly stupid SQL to query the DB with, you might want to check that you're not modelling your data in a stupid way.

I am by no means an expert, but in general I have found that if the ORM is doing something particularly crazy, it's because my underlying assumptions about the data model is wrong.

axylos · 3 years ago

> Your coworkers argument about FKs making data migrations difficult is one of them.

Got any arguments to back up this bald assertion?

In particular, I'd love to hear more about how to manage schema migrations on large tables with FK's without incurring lengthy locks or downtime.

Betting the answer is going to involve some variation on "well, don't do that" which is when I'll rest my case.

roguas · 3 years ago

There are tools for live migrations for most popular databases. Also a lot of Postgres DDL is very fast and/or capable of happening live.

radicality · 3 years ago

A lot of it depends on the use case. For example, Facebook - one of the largest (if not the largest) deployments of mysql does not allow any FK constrains. There’s multiple reasons, but one of those is better predictability of db operational perf - a row delete should delete just the row and not potentially trigger N cascading deletes.

pindab0ter · 3 years ago

I don't understand “a row delete should delete just the row and not potentially trigger N cascading deletes”. If you want that to not happen, then define that in the database definition. It sounds like you're saying that a core piece of functionality is somehow ‘wrong’, even though that same functionality can be used to make the desired bahviour for this exact use case explicit?

skyde · 3 years ago

facebook data model is a Graph where each row store one object “comment” or one association “comment is with post id” between objects .

They made an query and indexing system on top of it to make it fast called TAO.

Without it you need to send a distinct SQL query pet parent object to get list of associated child object which would be awfuly slow.

goto11 · 3 years ago

Cascading deletes is a separate from FK constraints. You can have FK constraints without cascading deletes.

pharmakom · 3 years ago

How about when the ID in a FK column has been generated outside the RDBMS but the target of the ID has not been written yet?

ashkulz · 3 years ago

You can use DEFERRABLE INITIALLY DEFERRED constraints so that the check happens when the transaction is committed.

GoblinSlayer · 3 years ago

I assume the target is externally generated too, thus can be legitimately absent.

Cthulhu_ · 3 years ago

I had this when importing test data; I found it acceptable (since it was just in development) to temporarily turn off FK checking.

I agree with your colleague, and I insist on pushing my car everywhere because I fear gas as it is flammable.

In other words, the world is full of idiots; and any time I start forgetting about it, I read something like your post and I get a wake-up call.

What does R stand for in RDBMS is you don't use foreign keys and joins?

Please, keep using your FKs, stay safe and don't mingle too much with idiots.

autarch · 3 years ago

The "R" stands for "relations", as in "relations", which is a mathematical concept. SQL calls a "relation" a "table". The "relational" is RDBMS has nothing to do with relationships.

But I still agree that OP's colleague is an idiot.

hu3 · 3 years ago

That might have been how it started (https://www.ibm.com/ibm/history/ibm100/us/en/icons/reldb/).

But it's definitely not what it means for the great majority of contemporary contexts.

Relations in modern RDBMS are usually aliases to foreign keys unless otherwise specified.

goto11 · 3 years ago

To be really pedantic, tables are relations but a join between two tables are also a relation. Base tables, queries and views are all relations and therefore interchangeable in relational algebra.

mkeedlinger · 3 years ago

I agree that using foreign key constraints is the right choice, but the tone of your comment comes off as very condescending and dismissive, and I don't like it.

Spivak · 3 years ago

Eh, there are a lot of people who don't like using FK constraints, calling them all idiots is just bad faith and ignores the reasons they did it. Just because you can enforce a constraint at a specific layer doesn't mean you have to. DB people love shoving all sorts of application logic into the DB and there are good arguments to do it as well as downsides. App people sometimes prefer to do everything in the app and just let the DB be a dumb data store and there are good arguments for that too. But it depends isn't a hot take.

FlyingSnake · 3 years ago

If you're using an RDBMS and not using FK or other relational constraints, how do you plan to maintain referential integrity?

pif · 3 years ago

> DB people love shoving all sorts of application logic

I agree that application logic goes into the application, but data integrity is NOT application logic.

jonnycomputer · 3 years ago

Mathematically a relation is a set of tuples; which is exactly what a table is.

jongjong · 3 years ago

I think the author is talking about 'foreign key constraints' - You could have foreign keys without enforcing a constraint.

Personally, I don't use foreign key constraints because:

1. It makes schema migrations and other data-management operations more difficult.

2. On insertion, the database needs to perform an additional check to verify that the record exists at the foreign key; this carries a performance cost; IMO, this is something which should be enforced at the application layer anyway.

3. It makes it more difficult to scale the database later because you can't separate tables onto 2 different hosts if one table references another using a foreign key.

BTW, about #3, the same argument can be made against using table joins. Once you start using foreign keys or table joins, you will be forced to run those two tables on the same host in the foreseeable future; it's very difficult, error-prone and time consuming to migrate away from such architecture if you have a lot of data in a live environment. Personally I prefer to design all my tables and front end applications to not rely on foreign keys or table joins. There is a good reason why databases which are focused on scalability (like MongoDB) do not support foreign keys or joins (or at least they try to avoid them).

I prefer to assemble data on the front end as much as possible because it allows my REST API calls to be granular; each one only refers to a single kind of resource; this helps to simplify caching and real-time updates; it also uses fewer resources on the server side and I find that it makes the front-end code more maintainable. Also, I like to design my front ends to mirror the natural separation of resources within the database. When the user wants to open up a related record, they need to click on a link (the foreign key ID/UUID is used to construct the link to the related resource); this loads up the other record as a separate step. This creates a very smooth (and fast) user experience - I also like it because this approach does not overload the user with information; collections of items don't show much details, on the other hand, individual resources may show a lot of detail.

The real reason people use joins is because they want to pack a lot of details onto the user's screen when they are looking at a list view... Sometimes the reason why they want to do that is because they didn't design their tables correctly; maybe the tables which they use to generate list views don't contain enough columns/detail to be useful on their own so they feel forced to do joins. I find that drawing ER diagrams helps a lot with that. It's very important to get the cardinality of relationships between the different tables exactly right. Also, I find it very helpful to represent any many-to-many relation between two tables as a distinct table.

P5fRxh5kUvp2th · 3 years ago

On point 3 it should be noted that it's almost always a mistake to optimize for scale at the start of a projects lifetime. There will be exceptions, but in general this is true.

You can always migrate that data to a more useful format if you find it starts hurting you at scale, if you start with the assumption you need the scale you're hurting yourself in the here and now for theoretical future benefit.

> The real reason people use joins is because they want to pack a lot of details onto the user's screen when they are looking at a list view

This is completely, emphatically wrong. I'm somewhat miffed at the air of authority you're using here. People use joins for the normalization of data.

pif · 3 years ago

> I prefer to assemble data on the front end as much as possible because it allows my REST API calls to be granular

It's clear you have never work with a lot of data.

> The real reason people use joins is because they want to pack a lot of details onto the user's screen

I hate this illusion that web programming is the whole of software development.

wahnfrieden · 3 years ago

postgres has a shitload of useful features that are unrelated to relations