Earlier today, I sparked a rather unexpected debate with my professional peers on the topic of foreign keys in our company’s databases. In a nutshell, a software engineer who is senior to me stated that he didn’t like a commit I had made to one of our repositories because I implemented a foreign-key relationship between two tables. His argument amounted to him not liking foreign keys because they restrict the data that can be inserted into the database & he believes that makes the database harder to work with. (Note: his argument was against foreign keys in general, not just the anecdotal foreign-key constraint that I proposed in our app. He believes that foreign-key constraints are a design flaw in modern relational database systems.)
At this point, I was caught completely flat-footed, and our boss chose to side with the senior developer for the time being, asking me to remove the foreign keys from my implementation of the bug-fix I was working on. The bug that I was fixing in fact stemmed from relational inconsistencies that were confusing our ORM (which itself leaves a bit to be desired, but that’s a different story), and while I tried to explain this to my team, everyone sided with the senior dev & I was told to go back to the drawing board and re-implement my database fix WITHOUT relying upon foreign key constraints. In other words, I’ve been asked to do relational management without the built-in relational-management tools of our RDBMS (posgres, if it matters).
I reiterated that this was not the proper way to go about fixing the bug, at which point my boss told me that the discussion was taking up too much time and that if I wanted to debate the merits of formal FK-constraints, I would need to do it at a later date… which I intend to do, but I also feel like I’ve been tasked with arguing why water is wet.
When I go into this debate, I want to come armed with as many facts in support of foreign-key usage as possible. I know that implementing sensible FK-constraints:
- serves as an automatic guardrail against relational inconsistencies (ie, prevents “garbage data” (which is a demonstrable problem with the system I’m working on currently)),
- clarifies the high-level relational context between two related tables to someone who is inspecting the database schema, and
- can increase the performance of database query operations.
… But I’m wondering if there are any other bullet points that I may be missing on this topic with which I should meet this debate. What other benefits should I make sure to bring up with my boss and colleagues? (Or has there genuinely been a paradigm shift away from FK-constraints & I’ve been living with my head under a rock when it happened?)
I don’t have enough points to comment, so an answer.
OMG. I feel your pain 😉
Yeah, people find foreign key constraints a real pain. But without them, you have to then audit the data for widows and orphans, otherwise you’re entirely likely to have bad summary results. At the very least, you don’t really know the validity of your results. Are they off by 0.001%? By 2%? It’s hard to say…you have go go in and figure out the level of garbage data.
On a human level, okay, I get it, constraints can be a pain. It’s an acute pain as it blocks you right at the point of insertion. (Kind of the whole point.) With sloppy rules, then you get the pain, but it’s deferred. And speaking of which, since I see a Postgres tag on here, a couple of things to know about. First up deferrable constraint. This is an enormously valuable feature for this very situation. It really can be a pain to insert a child only to get it kicked back because the parent doesn’t exist yet when you’re going to add that parent shortly, within the same transaction. It’s kind of fussy and tedious to sort that out. (Or it can feel that way.) With a deferred constraint, the FOREIGN KEY is still enforced, but not until the end of the transaction. So, you can add your records in less strict order, if that’s a help. Here’s a piece on the subject that looks pretty thorough, you can find many others if you look:
And, secondly, and for what it’s worth, you’ve also got ON UPDATE CASCADE support, although needing that is often a sign of a problem in the design.
Oh, and for bullet points, a formally defined constant goes into the meta data about your system. So, any automatic tool that knows how to interpret that data can do much, much more for you. For example:
- Class and code generators.
- Structure diagrams.
- Test cases.
- Query constructors.
Adding an edit here with a suggestion that could help. Opinions are all well and good, but evidence sometimes is better. To that end, what about running some exploratory queries to find out the state of play in the data today? You may find that there are no problems in the data, even without server-side constraints. Some people are just freakishly good at not screwing up. Then again, there might be a total mess. Here’s a query off the top of my head which, I’m sure, can be improved:
select count(*) from child tablesample bernoulli(5) -- If you're only after an estimate, TABLESAMPLE is your friend. left join parent on (child.parent_id = parent.id) where parent.id is null
P.S. I lost this very same argument once, largely because I was unaware of the deferred option at the time.