I use foreign keys quite often in my schemas because of data integrity, while my colleague has a no FK policy. His main argument is difficulties during data migrations which he frequently encounters. He rather have a smooth data migration process than having an unexpected error and abort the whole operation that the migration is only a small part of. I suspect the errors might be mainly caused by not considering data integrity at all at the first place, but I can feel his pain. To be fair, as far as I know, he never had major data problems.
He is not the only one I've met who insisted on not having FK. I've even seen large systems prohibit using JOIN statements.
Personally, I see the data integrity out weights the inconveniences, do you use FK for your systems, what are your experiences?
Another classic is the “joins are slow” argument, which I believe goes back to a period in the late 1990s when in one, not highly regarded at the time, database, namely MySQL, they were indeed slow. But the reason “everyone” knew about this was precisely the oddness of this situation: in fact RDBMSes are highly optimized pieces of software that are especially good at combining sets of data. Much better than ORMs, anyway, or, god forbid, whatever you cobble together on your own.
There is, in my mind, only one valid reason to not use foreign keys in a database schema. If your database is mostly write only, the additional overhead of generating the indexes for the foreign keys may slow you down a little (for reading, these very same foreign keys in fact speed things up quite considerably). Even in such a case, however, I’d argue you’re doing it wrong and there should be a cache of some sort before things are written out in bulk to a properly setup RDBMS.
I recently migrated to EntityFramework Core (from the non-core version) and I’m actually impressed. Most SQL is pretty much what I’d write by hand.
Now granted, if there are complex joins, subqueries and stuff, I don’t even try wrangling the ORM to somehow give me that output, but still. I feel more comfortable just using EF than I used to.
Like simple operation
what is y.Name ? Even though you didn't save anything to the database yet ? And the second Find didn't actually refresh from the database ?Oh and the random bugs where people improperly include related entities but it somehow ends up working because they are automatically added as you're firing off other related queries, until eventually it does not (usually in production only).
It's a really really complex system designed to look simple and pave over important details with "works most of the time" defaults.
Most of the time when I had performce issues it isn't EF. It's a missed index or higher level query issue.
The only person I knew who died on that hill would insist on doing two queries to the database, and then would insist on doing a client side cartesian join.
Unlikely in practice, though.
It was a pretty central table, and the inability to use FKs there kinda spread outward.
Along with the "indexes slow down INSERTs and UPDATEs" argument that you touch on. I mean, it is literally true that indexes make writes slightly slower, and an excessive quantity of indexes (which I have seen) can slow down writes enough to cause problems. But - in general - the slowdown is irrelevant compared with the overhead of querying a table that contains 2 billion rows using, oh, I don't know, a table scan because you don't have even a single index (I have also seen this).
> Much better than ORMs
These two things are not mutually exclusive though right?
It’s entirely possible to have a lightweight and relatively transparent ORM which makes full use of the underlying RDBMS.
I use JPA/Hibernate professionally, as a decision maker, but I don't think I'm in either camp entirely. ORMs aren't a magic wand, but they do help you standardize the boilerplate that you'd end up with one way or the other, in most cases.
ORMs are just a wrapper around RDBMSes. If your ORM is producing incredibly stupid SQL to query the DB with, you might want to check that you're not modelling your data in a stupid way.
I am by no means an expert, but in general I have found that if the ORM is doing something particularly crazy, it's because my underlying assumptions about the data model is wrong.
Got any arguments to back up this bald assertion?
In particular, I'd love to hear more about how to manage schema migrations on large tables with FK's without incurring lengthy locks or downtime.
Betting the answer is going to involve some variation on "well, don't do that" which is when I'll rest my case.
They made an query and indexing system on top of it to make it fast called TAO.
Without it you need to send a distinct SQL query pet parent object to get list of associated child object which would be awfuly slow.
I just ran into severe data corruption at a large client because a programmer four years ago wrote an empty catch block. The system would open a transaction, hit a fault, roll back, then continue writing to the database as if it’s still in the context of the transaction.
I spent some time trying to pin down exactly what it did, and found that many writes went through because of a missing foreign key constraint.
In short: if a particular table of minor importance had a foreign key constraint, there would have been no damage whatsoever, because it would have faulted immediately after the rollback.
You can’t rig up a constraint against every dumb write. But you can rig them up against some of the dumb writes. And sometimes that’s enough.
For that reason having constraints enforced by the system that stores the data, external to the app code which developers will inevitably mess up, is so useful and important.
So many issues in my experience have been similar to the one you describe. If the right constraints had been present so much work and so many headaches could have been avoided.
I feel like people who claim they're not needed or not important or "the app code will do it" need to be wrangled into maintenance work of old systems for a year or two until they repent. Rather than boshing out an ill-considered prototype and then moving on before the bugs are discovered.
Sure, you can guard against some forms of bad data and fix it as it comes in (or abort with errors if your system can handle that), but in order to do anything interesting, you need to make assumptions about the data. the only operation you can perform without assumptions is the identity function, and most software requires much more than that.
Since it just gets worse the longer you accept corrupted data, this is a good justification for "crash early" programming, like DB enforcement of data integrity.
Nothing is a hard and fast rule, but in this case the only way for an error to occur is if the query schema differed from the code select statement (column names changed). At that point wtf are you doing trying to keep running without errors, something is fundamentally mismatched between your application and the data.
In other words, the world is full of idiots; and any time I start forgetting about it, I read something like your post and I get a wake-up call.
What does R stand for in RDBMS is you don't use foreign keys and joins?
Please, keep using your FKs, stay safe and don't mingle too much with idiots.
But I still agree that OP's colleague is an idiot.
But it's definitely not what it means for the great majority of contemporary contexts.
Relations in modern RDBMS are usually aliases to foreign keys unless otherwise specified.
I agree that application logic goes into the application, but data integrity is NOT application logic.
Personally, I don't use foreign key constraints because:
1. It makes schema migrations and other data-management operations more difficult.
2. On insertion, the database needs to perform an additional check to verify that the record exists at the foreign key; this carries a performance cost; IMO, this is something which should be enforced at the application layer anyway.
3. It makes it more difficult to scale the database later because you can't separate tables onto 2 different hosts if one table references another using a foreign key.
BTW, about #3, the same argument can be made against using table joins. Once you start using foreign keys or table joins, you will be forced to run those two tables on the same host in the foreseeable future; it's very difficult, error-prone and time consuming to migrate away from such architecture if you have a lot of data in a live environment. Personally I prefer to design all my tables and front end applications to not rely on foreign keys or table joins. There is a good reason why databases which are focused on scalability (like MongoDB) do not support foreign keys or joins (or at least they try to avoid them).
I prefer to assemble data on the front end as much as possible because it allows my REST API calls to be granular; each one only refers to a single kind of resource; this helps to simplify caching and real-time updates; it also uses fewer resources on the server side and I find that it makes the front-end code more maintainable. Also, I like to design my front ends to mirror the natural separation of resources within the database. When the user wants to open up a related record, they need to click on a link (the foreign key ID/UUID is used to construct the link to the related resource); this loads up the other record as a separate step. This creates a very smooth (and fast) user experience - I also like it because this approach does not overload the user with information; collections of items don't show much details, on the other hand, individual resources may show a lot of detail.
The real reason people use joins is because they want to pack a lot of details onto the user's screen when they are looking at a list view... Sometimes the reason why they want to do that is because they didn't design their tables correctly; maybe the tables which they use to generate list views don't contain enough columns/detail to be useful on their own so they feel forced to do joins. I find that drawing ER diagrams helps a lot with that. It's very important to get the cardinality of relationships between the different tables exactly right. Also, I find it very helpful to represent any many-to-many relation between two tables as a distinct table.
You can always migrate that data to a more useful format if you find it starts hurting you at scale, if you start with the assumption you need the scale you're hurting yourself in the here and now for theoretical future benefit.
> The real reason people use joins is because they want to pack a lot of details onto the user's screen when they are looking at a list view
This is completely, emphatically wrong. I'm somewhat miffed at the air of authority you're using here. People use joins for the normalization of data.
It's clear you have never work with a lot of data.
> The real reason people use joins is because they want to pack a lot of details onto the user's screen
I hate this illusion that web programming is the whole of software development.
For example audit logs get no FKs, when a delete happens the audit logs about the deletion shouldn't be deleted.
I always FK a large table (millions or more rows) to a small table (tens to hundreds of rows).
But I will pause and ask hard questions about FK a large table to a large table... will this impact migrations? Do I need this FK? Is data integrity at risk without this FK even assuming a buggy app? Does the app utilise this FK for queries, or is there zero performance benefit from having the FK? If I don't have the FK are both tables always queryable by a PK? Should I have an index, potentially a UNIQUE index, in place of a FK?
Like most things... it depends. A dogmatic insistence on using them everywhere isn't always healthy, and the inverse is true that an avoidance everywhere isn't healthy.
The DB is there to store data and make it available, whilst enforcing data integrity... if it makes sense to use a FK to achieve those things do it, otherwise don't.
FK based on the possible size of a table rather than the current size of the table.
FKs are incredibly performance on a near-empty local dev database ;)
The parent's point is that the choice should depdend on the situation. Sometimes a foreign key is appropriate, sometimes it is not.
Or updated. Or prevent the deletion.
But that’s configurable anyway.
I’ve never come across a scenario yet where this wasn’t sound advice.
I tend to lean heavily on my DB as well. E.g. I tend to push all state down to the DB and out of the application. I work in environments where it’s common for developers to want to disable FK constraints, and i temporarily do sometimes during specific bulk operations during releases for example. The usual reasons others will suggest relaxing FK constraints permanently will be due to the need for audit logs or soft deletes but i have patterns for these too.
Foreign key constraints are pretty awesome, all databases I’ve worked in so far have escape hatches for when they hurt too much.
Still, in my experience, database normalization has seemed like less of a performance impediment than queries written without considering an execution plan and indexes.
Deleted Comment
Fundamentally it's a question of how one should go about handling multi-tenant situations.
I like foreign keys, check constraints and tight data types. Might as well constrain it and limit the scope for errors.
Application programmers can write some buggy code. The DB should provide a line of defence.
If you don’t have foreign keys that should be a design choice with a legitimate reason the entity can become orphaned.
You need a definition of what that means in real life. E.g PERSONID id 1012 and there is no associated record. This kinda means you need to look at another entity to know what the first entity “means”.
This might be useful for data that needs to sit in distributed databases.