Readit News logoReadit News
Posted by u/frogcoder 3 years ago
Ask HN: Do you use foreign keys in relational databases?
I use foreign keys quite often in my schemas because of data integrity, while my colleague has a no FK policy. His main argument is difficulties during data migrations which he frequently encounters. He rather have a smooth data migration process than having an unexpected error and abort the whole operation that the migration is only a small part of. I suspect the errors might be mainly caused by not considering data integrity at all at the first place, but I can feel his pain. To be fair, as far as I know, he never had major data problems.

He is not the only one I've met who insisted on not having FK. I've even seen large systems prohibit using JOIN statements.

Personally, I see the data integrity out weights the inconveniences, do you use FK for your systems, what are your experiences?

mtts · 3 years ago
Fear of RDBMSes is quite common. I used to suffer from it too. It’s just so annoying to have to switch your brain to a different programming paradigm every time you need to do something with the database that you start to make up all sorts of excuses as to why it’s really just better to “do it in the code”. Your coworkers argument about FKs making data migrations difficult is one of them.

Another classic is the “joins are slow” argument, which I believe goes back to a period in the late 1990s when in one, not highly regarded at the time, database, namely MySQL, they were indeed slow. But the reason “everyone” knew about this was precisely the oddness of this situation: in fact RDBMSes are highly optimized pieces of software that are especially good at combining sets of data. Much better than ORMs, anyway, or, god forbid, whatever you cobble together on your own.

There is, in my mind, only one valid reason to not use foreign keys in a database schema. If your database is mostly write only, the additional overhead of generating the indexes for the foreign keys may slow you down a little (for reading, these very same foreign keys in fact speed things up quite considerably). Even in such a case, however, I’d argue you’re doing it wrong and there should be a cache of some sort before things are written out in bulk to a properly setup RDBMS.

Semaphor · 3 years ago
> Much better than ORMs

I recently migrated to EntityFramework Core (from the non-core version) and I’m actually impressed. Most SQL is pretty much what I’d write by hand.

Now granted, if there are complex joins, subqueries and stuff, I don’t even try wrangling the ORM to somehow give me that output, but still. I feel more comfortable just using EF than I used to.

moonchrome · 3 years ago
My main problem with Entity Framework is the magic underneath.

Like simple operation

    x = Ef.Find(xid)
    x.Name = "something"
    y = Ef.Find(xid)

what is y.Name ? Even though you didn't save anything to the database yet ? And the second Find didn't actually refresh from the database ?

Oh and the random bugs where people improperly include related entities but it somehow ends up working because they are automatically added as you're firing off other related queries, until eventually it does not (usually in production only).

It's a really really complex system designed to look simple and pave over important details with "works most of the time" defaults.

tailspin2019 · 3 years ago
Another vote for EF Core here. It’s superb.
AdamN · 3 years ago
ORM is a very valuable tool and should be aggressively used. One can always step down to SQL as needed but otherwise, the ORM logic is easier to write and maintain.
rowanG077 · 3 years ago
Yep entity framework is truly amazing. If you have used that ORM you never go back. You still need to sometimes make your own query for perf or other needs. But it's quite rare in my experience.

Most of the time when I had performce issues it isn't EF. It's a missed index or higher level query issue.

sonthonax · 3 years ago
> Another classic is the “joins are slow” argument

The only person I knew who died on that hill would insist on doing two queries to the database, and then would insist on doing a client side cartesian join.

AdamN · 3 years ago
I remember getting beers with somebody in the aughts who claimed that he saw an entire website where the url was the key and the webpage was the value in an Oracle database. Any code was SQL operations inside the value field.
10x-dev · 3 years ago
Are joins in a 5NF database now as fast as querying a denormalized database?
lupire · 3 years ago
To be fair thata a reasonable approach if the database is at its monolithic scaling limit in CPU but not IO, while the clients can scale horizontally to more machines.

Unlikely in practice, though.

tluyben2 · 3 years ago
We used to do large setups at companies for what was then called intra and extranets begin 00s. These were very read/write intensive as the staff and partner staff would be on there basically all the time during office hours and data was not great for caching as data changed a lot especially in some companies like large hospitals and universities. We used mysql (I cannot remember why) and we did a lot of performance testing at that time; we removed all joins which made everything a lot faster. This is no longer the case now but indeed many people still believe it ; not (only) because they saw or tried it back then, but also because it’s less strain on the brain to just do single table selects and use not FKs or joins.
function_seven · 3 years ago
Back in the day I was forced to ditch FKs in my MySQL application, because I needed a FULLTEXT index on one of my columns, and MySQL only supported that type of index on MyISAM tables (this was on 5.x or something). MyISAM didn't do foreign keys.

It was a pretty central table, and the inability to use FKs there kinda spread outward.

edmundsauto · 3 years ago
Did you consider making a 1-1 relationship on a new table that only had the FULLTEXT column? Curious how you evaluated the trade offs
bartread · 3 years ago
> Another classic is the “joins are slow” argument

Along with the "indexes slow down INSERTs and UPDATEs" argument that you touch on. I mean, it is literally true that indexes make writes slightly slower, and an excessive quantity of indexes (which I have seen) can slow down writes enough to cause problems. But - in general - the slowdown is irrelevant compared with the overhead of querying a table that contains 2 billion rows using, oh, I don't know, a table scan because you don't have even a single index (I have also seen this).

vbezhenar · 3 years ago
One reason to avoid FK is when your database is partitioned to multiple servers, but that's obvious, I guess, and it's not really RDBMS anymore.
tailspin2019 · 3 years ago
> RDBMSes are highly optimized pieces of software

> Much better than ORMs

These two things are not mutually exclusive though right?

It’s entirely possible to have a lightweight and relatively transparent ORM which makes full use of the underlying RDBMS.

sverhagen · 3 years ago
Yeah, I was going to say something similar. But ORMs get blamed for obscuring what's going on, to the point that a developer may end up doing some sort of inefficient 1-to-n lookup that would've indeed been much better off as a SQL JOIN.

I use JPA/Hibernate professionally, as a decision maker, but I don't think I'm in either camp entirely. ORMs aren't a magic wand, but they do help you standardize the boilerplate that you'd end up with one way or the other, in most cases.

Gordonjcp · 3 years ago
> in fact RDBMSes are highly optimized pieces of software that are especially good at combining sets of data. Much better than ORMs, anyway,

ORMs are just a wrapper around RDBMSes. If your ORM is producing incredibly stupid SQL to query the DB with, you might want to check that you're not modelling your data in a stupid way.

I am by no means an expert, but in general I have found that if the ORM is doing something particularly crazy, it's because my underlying assumptions about the data model is wrong.

axylos · 3 years ago
> Your coworkers argument about FKs making data migrations difficult is one of them.

Got any arguments to back up this bald assertion?

In particular, I'd love to hear more about how to manage schema migrations on large tables with FK's without incurring lengthy locks or downtime.

Betting the answer is going to involve some variation on "well, don't do that" which is when I'll rest my case.

roguas · 3 years ago
There are tools for live migrations for most popular databases. Also a lot of Postgres DDL is very fast and/or capable of happening live.
radicality · 3 years ago
A lot of it depends on the use case. For example, Facebook - one of the largest (if not the largest) deployments of mysql does not allow any FK constrains. There’s multiple reasons, but one of those is better predictability of db operational perf - a row delete should delete just the row and not potentially trigger N cascading deletes.
pindab0ter · 3 years ago
I don't understand “a row delete should delete just the row and not potentially trigger N cascading deletes”. If you want that to not happen, then define that in the database definition. It sounds like you're saying that a core piece of functionality is somehow ‘wrong’, even though that same functionality can be used to make the desired bahviour for this exact use case explicit?
skyde · 3 years ago
facebook data model is a Graph where each row store one object “comment” or one association “comment is with post id” between objects .

They made an query and indexing system on top of it to make it fast called TAO.

Without it you need to send a distinct SQL query pet parent object to get list of associated child object which would be awfuly slow.

goto11 · 3 years ago
Cascading deletes is a separate from FK constraints. You can have FK constraints without cascading deletes.
pharmakom · 3 years ago
How about when the ID in a FK column has been generated outside the RDBMS but the target of the ID has not been written yet?
ashkulz · 3 years ago
You can use DEFERRABLE INITIALLY DEFERRED constraints so that the check happens when the transaction is committed.
GoblinSlayer · 3 years ago
I assume the target is externally generated too, thus can be legitimately absent.
Cthulhu_ · 3 years ago
I had this when importing test data; I found it acceptable (since it was just in development) to temporarily turn off FK checking.
thatjoeoverthr · 3 years ago
Your database is is the state of your system. Guard it!

I just ran into severe data corruption at a large client because a programmer four years ago wrote an empty catch block. The system would open a transaction, hit a fault, roll back, then continue writing to the database as if it’s still in the context of the transaction.

I spent some time trying to pin down exactly what it did, and found that many writes went through because of a missing foreign key constraint.

In short: if a particular table of minor importance had a foreign key constraint, there would have been no damage whatsoever, because it would have faulted immediately after the rollback.

You can’t rig up a constraint against every dumb write. But you can rig them up against some of the dumb writes. And sometimes that’s enough.

UglyToad · 3 years ago
You're exactly right. This is the crux of it. In many domains (not every domain, but every one I've ever worked in) you can delete/rewrite/change the app code without much fuss but the data is vital.

For that reason having constraints enforced by the system that stores the data, external to the app code which developers will inevitably mess up, is so useful and important.

So many issues in my experience have been similar to the one you describe. If the right constraints had been present so much work and so many headaches could have been avoided.

I feel like people who claim they're not needed or not important or "the app code will do it" need to be wrangled into maintenance work of old systems for a year or two until they repent. Rather than boshing out an ill-considered prototype and then moving on before the bugs are discovered.

yuvalr1 · 3 years ago
Key statement: integrity of the state is much more important than some inconveniences here and there
chousuke · 3 years ago
I'd go so far as to say that integrity of state is a requirement to build robust software. If your state is ill-defined, it's pretty much impossible to write software that behaves correctly.

Sure, you can guard against some forms of bad data and fix it as it comes in (or abort with errors if your system can handle that), but in order to do anything interesting, you need to make assumptions about the data. the only operation you can perform without assumptions is the identity function, and most software requires much more than that.

spiffytech · 3 years ago
Data that's malformed at captured time is often indistinguishable from outright data loss.

Since it just gets worse the longer you accept corrupted data, this is a good justification for "crash early" programming, like DB enforcement of data integrity.

P5fRxh5kUvp2th · 3 years ago
a few weeks ago I had a discussion with a developer with 20+ years experience about why you shouldn't squelch DB errors and try to continue running.

Nothing is a hard and fast rule, but in this case the only way for an error to occur is if the query schema differed from the code select statement (column names changed). At that point wtf are you doing trying to keep running without errors, something is fundamentally mismatched between your application and the data.

pif · 3 years ago
I agree with your colleague, and I insist on pushing my car everywhere because I fear gas as it is flammable.

In other words, the world is full of idiots; and any time I start forgetting about it, I read something like your post and I get a wake-up call.

What does R stand for in RDBMS is you don't use foreign keys and joins?

Please, keep using your FKs, stay safe and don't mingle too much with idiots.

autarch · 3 years ago
The "R" stands for "relations", as in "relations", which is a mathematical concept. SQL calls a "relation" a "table". The "relational" is RDBMS has nothing to do with relationships.

But I still agree that OP's colleague is an idiot.

hu3 · 3 years ago
That might have been how it started (https://www.ibm.com/ibm/history/ibm100/us/en/icons/reldb/).

But it's definitely not what it means for the great majority of contemporary contexts.

Relations in modern RDBMS are usually aliases to foreign keys unless otherwise specified.

goto11 · 3 years ago
To be really pedantic, tables are relations but a join between two tables are also a relation. Base tables, queries and views are all relations and therefore interchangeable in relational algebra.
mkeedlinger · 3 years ago
I agree that using foreign key constraints is the right choice, but the tone of your comment comes off as very condescending and dismissive, and I don't like it.
Spivak · 3 years ago
Eh, there are a lot of people who don't like using FK constraints, calling them all idiots is just bad faith and ignores the reasons they did it. Just because you can enforce a constraint at a specific layer doesn't mean you have to. DB people love shoving all sorts of application logic into the DB and there are good arguments to do it as well as downsides. App people sometimes prefer to do everything in the app and just let the DB be a dumb data store and there are good arguments for that too. But it depends isn't a hot take.
FlyingSnake · 3 years ago
If you're using an RDBMS and not using FK or other relational constraints, how do you plan to maintain referential integrity?
pif · 3 years ago
> DB people love shoving all sorts of application logic

I agree that application logic goes into the application, but data integrity is NOT application logic.

jonnycomputer · 3 years ago
Mathematically a relation is a set of tuples; which is exactly what a table is.
jongjong · 3 years ago
I think the author is talking about 'foreign key constraints' - You could have foreign keys without enforcing a constraint.

Personally, I don't use foreign key constraints because:

1. It makes schema migrations and other data-management operations more difficult.

2. On insertion, the database needs to perform an additional check to verify that the record exists at the foreign key; this carries a performance cost; IMO, this is something which should be enforced at the application layer anyway.

3. It makes it more difficult to scale the database later because you can't separate tables onto 2 different hosts if one table references another using a foreign key.

BTW, about #3, the same argument can be made against using table joins. Once you start using foreign keys or table joins, you will be forced to run those two tables on the same host in the foreseeable future; it's very difficult, error-prone and time consuming to migrate away from such architecture if you have a lot of data in a live environment. Personally I prefer to design all my tables and front end applications to not rely on foreign keys or table joins. There is a good reason why databases which are focused on scalability (like MongoDB) do not support foreign keys or joins (or at least they try to avoid them).

I prefer to assemble data on the front end as much as possible because it allows my REST API calls to be granular; each one only refers to a single kind of resource; this helps to simplify caching and real-time updates; it also uses fewer resources on the server side and I find that it makes the front-end code more maintainable. Also, I like to design my front ends to mirror the natural separation of resources within the database. When the user wants to open up a related record, they need to click on a link (the foreign key ID/UUID is used to construct the link to the related resource); this loads up the other record as a separate step. This creates a very smooth (and fast) user experience - I also like it because this approach does not overload the user with information; collections of items don't show much details, on the other hand, individual resources may show a lot of detail.

The real reason people use joins is because they want to pack a lot of details onto the user's screen when they are looking at a list view... Sometimes the reason why they want to do that is because they didn't design their tables correctly; maybe the tables which they use to generate list views don't contain enough columns/detail to be useful on their own so they feel forced to do joins. I find that drawing ER diagrams helps a lot with that. It's very important to get the cardinality of relationships between the different tables exactly right. Also, I find it very helpful to represent any many-to-many relation between two tables as a distinct table.

P5fRxh5kUvp2th · 3 years ago
On point 3 it should be noted that it's almost always a mistake to optimize for scale at the start of a projects lifetime. There will be exceptions, but in general this is true.

You can always migrate that data to a more useful format if you find it starts hurting you at scale, if you start with the assumption you need the scale you're hurting yourself in the here and now for theoretical future benefit.

> The real reason people use joins is because they want to pack a lot of details onto the user's screen when they are looking at a list view

This is completely, emphatically wrong. I'm somewhat miffed at the air of authority you're using here. People use joins for the normalization of data.

pif · 3 years ago
> I prefer to assemble data on the front end as much as possible because it allows my REST API calls to be granular

It's clear you have never work with a lot of data.

> The real reason people use joins is because they want to pack a lot of details onto the user's screen

I hate this illusion that web programming is the whole of software development.

wahnfrieden · 3 years ago
postgres has a shitload of useful features that are unrelated to relations
buro9 · 3 years ago
I use FKs for most things in an RDBMS... but not for all things.

For example audit logs get no FKs, when a delete happens the audit logs about the deletion shouldn't be deleted.

I always FK a large table (millions or more rows) to a small table (tens to hundreds of rows).

But I will pause and ask hard questions about FK a large table to a large table... will this impact migrations? Do I need this FK? Is data integrity at risk without this FK even assuming a buggy app? Does the app utilise this FK for queries, or is there zero performance benefit from having the FK? If I don't have the FK are both tables always queryable by a PK? Should I have an index, potentially a UNIQUE index, in place of a FK?

Like most things... it depends. A dogmatic insistence on using them everywhere isn't always healthy, and the inverse is true that an avoidance everywhere isn't healthy.

The DB is there to store data and make it available, whilst enforcing data integrity... if it makes sense to use a FK to achieve those things do it, otherwise don't.

buro9 · 3 years ago
I should've added...

FK based on the possible size of a table rather than the current size of the table.

FKs are incredibly performance on a near-empty local dev database ;)

vbezhenar · 3 years ago
> For example audit logs get no FKs, when a delete happens the audit logs about the deletion shouldn't be deleted.

    on delete set null

?

Diggsey · 3 years ago
That still modifies the audit log, which is presumably supposed to be immutable. It also means you lose information, eg. if this is a user ID, then before you would be able to determine if two audit actions were done by the same (deleted) user. With "set null", you can't determine that anymore.

The parent's point is that the choice should depdend on the situation. Sometimes a foreign key is appropriate, sometimes it is not.

masklinn · 3 years ago
> For example audit logs get no FKs, when a delete happens the audit logs about the deletion shouldn't be deleted.

Or updated. Or prevent the deletion.

But that’s configurable anyway.

scottcodie · 3 years ago
Foreign keys also let the query optimizer make better query plans. This is actually a bigger deal than most people think, the query optimizer can rewrite to semi joins or even eliminate joins completely if the optimizer has better guarantees about referential integrity.
seunosewa · 3 years ago
Could you give us a simple example?
CraigJPerry · 3 years ago
There’s a pithy quote by someone famous in DB circles who said something like “normalise until it hurts, de-normalise until it’s fast enough” - I’m vague on the exact words used but that’s the gist of it.

I’ve never come across a scenario yet where this wasn’t sound advice.

I tend to lean heavily on my DB as well. E.g. I tend to push all state down to the DB and out of the application. I work in environments where it’s common for developers to want to disable FK constraints, and i temporarily do sometimes during specific bulk operations during releases for example. The usual reasons others will suggest relaxing FK constraints permanently will be due to the need for audit logs or soft deletes but i have patterns for these too.

Foreign key constraints are pretty awesome, all databases I’ve worked in so far have escape hatches for when they hurt too much.

evdubs · 3 years ago
I've heard it as, "Normalize until it hurts. Denormalize until it works."

Still, in my experience, database normalization has seemed like less of a performance impediment than queries written without considering an execution plan and indexes.

Deleted Comment

josephcsible · 3 years ago
Giving up foreign key constraints because they cause errors is basically the same mistake that the monk in http://thecodelesscode.com/case/115 made.
metadat · 3 years ago
Yes, this 100%. Though, as other comments have printed out: you need to take application and scale into account.

Fundamentally it's a question of how one should go about handling multi-tenant situations.

Rexxar · 3 years ago
I didn't remember this little stories where sometimes so cruel.
quickthrower2 · 3 years ago
Foreign keys can be removed during migrations and added back after. You would also disable triggers and check constraints too.

I like foreign keys, check constraints and tight data types. Might as well constrain it and limit the scope for errors.

Application programmers can write some buggy code. The DB should provide a line of defence.

If you don’t have foreign keys that should be a design choice with a legitimate reason the entity can become orphaned.

You need a definition of what that means in real life. E.g PERSONID id 1012 and there is no associated record. This kinda means you need to look at another entity to know what the first entity “means”.

This might be useful for data that needs to sit in distributed databases.