butlerm (u/butlerm) - Readit News

butlerm commented on Do you really need foreign keys? shayon.dev/post/2023/355/... · Posted by u/shayonj

iLoveOncall · 2 years ago

> FKs are a performance impact. The fact they require indexes is likely fine, since those indexes are needed anyhow. But the lookup made for each insert/delete is an overhead.

This is not a valid argument at all and I'm concerned anyone would think it is.

If you have a foreign key, it means you have a dependency that needs to be updated or deleted. If that's the case, you will have an overhead anyway, the only question being whether it's at the DB level or at the application level.

I don't think there are many cases where there's any advantage to self-manage them at the application level.

> FKs don't work well with online schema migrations

This seems to be related only to the specific project that the issue is about if you read about the detailed explanation below.

butlerm · 2 years ago

> If that's the case, you will have an overhead anyway, the only question being whether it's at the DB level or at the application level.

Inserts and updates do not require referential integrity checking if you know that the reference in question is valid in advance. Common cases are references to rows you create in the same transaction or rows you know will not be deleted.

If you actually want to delete something that may be referred to elsewhere then checking is appropriate of course, and in many applications such checking is necessary in advance so you have some idea whether something can be deleted (and if not why not). That type of check may not be race free of course, hence "some idea".

butlerm commented on Do you really need foreign keys? shayon.dev/post/2023/355/... · Posted by u/shayonj

mavelikara · 2 years ago

But you pay the cost in checking it in the application, as GP said. If so, it simply is moving the cost from db to application layer. Is there a reason the checks can be implemented more efficiently in the application than the DB can?

butlerm · 2 years ago

Referential integrity problems usually happen due to missing deletes, improper deletes, or references that should be cleared.

The overhead of checking for the existence of referred to records in ordinary inserts and updates in application code is unnecessary in most cases, and that is where the problem is. Either you have to check to have any idea what is going on, because your key values are being supplied from an outside source or you should be able write your application so that it does not insert random references into your database.

If you actually need to delete a row that might be referred to, the best thing to do is not to do that, because you will need application level checks to make the reason why you cannot delete something visible in any case. 'Delete failed because the record is referred to somewhere' is usually an inadequate explanation. The application should probably check so that delete isn't even presented as an option in cases like that.

butlerm commented on Do you really need foreign keys? shayon.dev/post/2023/355/... · Posted by u/shayonj

setr · 2 years ago

You’re still going to pay the cost of maintaining referential integrity — you’re just doing it on the app side. You can do it faster by being not-correct — eg you don’t need a lock if you ignore race conditions — but it’s not like the database is arbitrarily slow at doing one of its basic fundamental jobs.

Of course, you can just skip the validation altogether and cross your fingers and hope you’re correct, but it’s the same reasoning as removing array bounds checking from your app code; you’ve eked out some more performance and it’s great until it’s catastrophically not so great.

Your reasoning should really be inverted. Be correct first, and maintain excessive validation as you can, and rip it out where performance matters. With OLTP workloads, your data’s correctness is generally much more valuable than the additional hardware you might have to throw at it.

I’m also not sure why dropping/creating foreign keys is a big deal for migrations, other than time spent

butlerm · 2 years ago

It is quite common for modern databases to have multiversion concurrency so that writers do not block readers. If you do not your transactions should either be awfully short, you should be prepared to wait, or you should implement dirty reads (which are quite common in any case).

butlerm commented on Do you really need foreign keys? shayon.dev/post/2023/355/... · Posted by u/shayonj

jrumbut · 2 years ago

You either end up reinventing foreign keys, your support volume will scale faster than your data, or user experience will suffer.

There may be situations where foreign keys become too much overhead, but it's worth fighting to keep them as long as possible. Data integrity only becomes more important at scale. Every orphaned record is a support ticket, lost sale, etc.

butlerm · 2 years ago

Orphaned detail records are usually inconsequential, like uncollected garbage. References to anything with an optional relationship should use outer joins as a matter of course. If you delete something that really needs to be there you have a problem, which is one of the reasons not to delete rows like that, ever, but rather to mark them as inactive or deleted instead.

butlerm commented on Do you really need foreign keys? shayon.dev/post/2023/355/... · Posted by u/shayonj

dasil003 · 2 years ago

"just write test cases to check for [referential integrity]" is doing some awful heavy lifting in this comment.

Assuming a standard n-tier application architecture, how do you guarantee the test prevents race conditions?

butlerm · 2 years ago

Typically you look for orphan rows - the sort of thing ON DELETE CASCADE was invented to prevent. Another thing to check for are records that need to exist but should have references cleared when something else is deleted, e.g. ON DELETE SET NULL. And the third thing is ON DELETE RESTRICT.

You can check for the first two of those things after the fact, and they are relatively benign. In many cases it will make no difference to application queries, especially with the judicious use of outer joins, which should be used for all optional relationships anyway.

If you need ON DELETE RESTRICT application code should probably check anyway, because otherwise you have unexpected delete failures with no application level visibility as to what went wrong. That can be tested for, and pretty much has to be before code that deletes rows subject to delete restrictions is released into production.

As far as race conditions go, they should be eliminated through the use of database transactions. Another alternative is never to delete rows that are referred to elsewhere and just set a deleted flag or something. That is mildly annoying to check for however. Clearing an active flag is simpler because you usually want rows like that to stay around anyway, just not be used in new transactions.

butlerm commented on Do you really need foreign keys? shayon.dev/post/2023/355/... · Posted by u/shayonj

goostavos · 2 years ago

I would argue that if you have completely different contexts / business lines / concerns (customer service, accounting, biz dev) all directly accessing the same database you have far, far larger architectural concerns that FK will ever hope to address.

butlerm · 2 years ago

Large ERP systems do that sort of thing as a matter of course and have for decades now. It does require careful planning and design. I mean AR / AP / scheduling / manufacturing / inventory and so on.

The main downside of splitting everything into isolated databases is that it makes it approximately impossible to generate reports that require joining across databases. Not without writing new and relatively complex application code to do what used to require a simple SQL query to accomplish anyway.

Of course if you have the sort of business with scalability problems that require abandoning or restructuring your database on a regular basis, then placing that kind of data in a shared database is probably not such a great idea.

It should also be said that common web APIs as a programming technique are much harder to use and implement reliably due to the data marshalling and extra error handling code required than just about any system of queries or stored procedures against a conventional database. The need to page is perverse, for example.

That does not mean that sort of tight coupling is appropriate in many cases, but it is (typically) much easier to implement. Web APIs could use standard support for two phase commit and internally paged queries that preserve some semblance of consistency. The problem is that stateless architecture makes that sort of thing virtually impossible. Who knows which rows will disappear when you query for page two because the positions of all of your records have just shifted? Or which parts of a distributed transaction will still be there if anything goes wrong?

butlerm commented on Do you really need foreign keys? shayon.dev/post/2023/355/... · Posted by u/shayonj

goostavos · 2 years ago

Follow this advice with caution. Dropping foreign keys is effectively giving up part of the C in ACID. It should be done with very, very open eyes to the downsides. I'm not sure the author is selling the "when" side of this very well. Migrations are "hard" because the database is forcing you to handle correctness criteria that are easy to ignore. "Lock contention" is the database covering your sloppy ill-thought out application code from doing the Wrong Thing. Yes, of course, you take a minor performance hit in exchange for consistency, but how many of us actually work on applications where that trade off is the wrong one? Developers are frustratingly superstitious about scale and their need to do it.

The data will very likely outlive the application that created it. It's almost guaranteed to outlive your tenure on the team. Viciously guarding its correctness solves all the problems that not guarding it causes.

butlerm · 2 years ago

If you do not particularly care about performance or have a great deal of headroom then database enforcement of referential integrity is great. Alternatively you could just write test cases to check for it and not pay the severe performance penalty.

The other major downside of database enforcement of referential integrity is the common need to drop and re-create foreign keys during database schema upgrades and data conversions.

butlerm commented on I benchmarked six Go SQLite drivers github.com/cvilsmeier/go-... · Posted by u/cvilsmeier

lmz · 2 years ago

Also because there's a mismatch between goroutines and C threads as described here https://www.cockroachlabs.com/blog/the-cost-and-complexity-o... while Java threads can map 1:1 to C threads.

butlerm · 2 years ago

Perhaps someone should define a new C compatible threading API to allow C libraries (including glibc or a wrapper around glibc) to work with something other than native pthreads. Such as goroutines or Java threads and so on.

butlerm commented on The Case for Memory Safe Roadmaps nsa.gov/Press-Room/Press-... · Posted by u/AndrewDucker

munk-a · 2 years ago

As someone who has worked in C/C++ on teams for a while. My personal opinion is that safe C++ programmers never use pointers - put everything on the stack and use (usually const) references.

But I've really got to ask - at that point do you really want to program in C++ anymore or is it just better to use a safer language that removes pointers entirely?

butlerm · 2 years ago

It is basically impossible to write general purpose software like compilers, word processors, and layout engines without doing heap allocations. That means either pointers or references, which are difficult to distinguish if you do not engage in pointer arithmetic.

Any C++ program that does not do heap allocations either uses arrays as a substitute for the same thing or isn't a general purpose application.

butlerm commented on The Case for Memory Safe Roadmaps nsa.gov/Press-Room/Press-... · Posted by u/AndrewDucker

1970-01-01 · 2 years ago

Fully agree. The best time to stop giving the next generation of coders the gift of unsafe memory is today. C and C++ should be known as 'legacy' languages.

butlerm · 2 years ago

It is generally speaking difficult to make an efficient implementation of the compiler and/or the virtual machine for many memory safe languages without writing it in a more efficient, statically compiled language like C, C++, or Rust. And that is to say nothing of software like operating system kernels and browser engines. So perhaps Rust will gradually take over the world there.