Why aren’t primary key / foreign key matches used for joins?

Posted on

Question :

As far as I could find out many DBMSs (e.g. mysql, postgres, mssql) use fk and pk combinations only to constrain changes to data, but they are rarely natively used to automatically select columns to join (like natural join does with names). Why is that? If you’ve already defined a relationship between 2 tables with a pk/fk, why can’t the database figure out that if I join those tables I want to join them on the pk/fk columns?

EDIT: to clarify this a bit:

suppose I have a table1 and a table2. table1 one has a foreign key on column a, which references to the primary key on table2, the column b. Now if I join these tables, I’ll have to do something like this:

SELECT * FROM table1
JOIN table2 ON table1.a = table2.b

However, I already defined using my keys that table1.a references to table2.b, so it seems to me that it shouldn’t be to hard to make a DBMS system automatically use table1.a and table2.b as the join columns, such that one can simply use:

SELECT * FROM table1
AUTO JOIN table2

However, many DBMS don’t seem to implement something like this.

Answer :

In many cases, there are more than one way to join two tables; See the other answers for lots of examples. Of course, one could say that it would be an error to use the ‘automatic join’ in those cases. Then only a handfull of simple cases where it can be used would be left.

However, there is a severe drawback! Queries that are correct today, might become an error tomorrow just by adding a second FK to the same table!

Let me say that again: by adding columns, queries that do not use those columns could turn from ‘correct’ into ‘error’!

That is such a maintenance nightmare, that any sane style guide would prohibit to use this feature. Most already prohibit select * for the same reason!

All this would be acceptable, if performance would be enhanced. However, that’s not the case.

Summarizing, this feature could be used in only a limited set of simple cases, does not increase performance, and most style guides would prohibit its usage anyway.

Therefor it is not supprising that most database vendors choose to spend their time on more important things.

A foreign key is meant to constrain the data. ie enforce referential integrity. That’s it. Nothing else.

  1. You can have multiple foreign keys to the same table. Consider the following where a shipment has a starting point, and an ending point.

    table: USA_States
    StateID
    StateName
    
    table: Shipment
    ShipmentID
    PickupStateID Foreign key
    DeliveryStateID Foreign key
    

    You may want to join based on the pickup state. Maybe you want to join on the delivery state. Maybe you want to perform 2 joins for both! The sql engine has no way of knowing what you want.

  2. You’ll often cross join scalar values. Although scalars are usually the result of intermediate calculations, sometimes you’ll have a special purpose table with exactly 1 record. If the engine tried to detect a foriegn key for the join…. it wouldn’t make sense because cross joins never match up a column.

  3. In some special cases you’ll join on columns where neither is unique. Therefore the presence of a PK/FK on those columns is impossible.

  4. You may think points 2 and 3 above are not relevant since your questions is about when there IS a single PK/FK relationship between tables. However the presence of single PK/FK between the tables does not mean you can’t have other fields to join on in addition to the PK/FK. The sql engine would not know which fields you want to join on.

  5. Lets say you have a table “USA_States”, and 5 other tables with a FK to the states. The “five” tables also have a few foreign keys to each other. Should the sql engine automatically join the “five” tables with “USA_States”? Or should it join the “five” to each other? Both? You could set up the relationships so that the sql engine enters an infinite loop trying to join stuff together. In this situation it’s impossible fore the sql engine to guess what you want.

In summary: PK/FK has nothing to do with table joins. They are separate unrelated things. It’s just an accident of nature that you often join on the PK/FK columns.

Would you want the sql engine to guess if it’s a full, left, right, or inner join? I don’t think so. Although that would arguably be a lesser sin than guessing the columns to join on.

the concept of “joinability.” Relations r1 and r2 are joinable if
and only if attributes with the same name are of the same type… this
concept applies not only to join as such but to various other
operations [such as union] as well.

SQL and Relational Theory: How to Write Accurate SQL Code By C. J. Date

Standard SQL already has such a feature, known as NATURAL JOIN, and has been implemented in mySQL.

Although your suggestion is not quite as worthy, it seems a reasonable one. With SQL Server (which lacks support for NATURAL JOIN), I use SQL Prompt in Management Studio: when writing an INNER JOIN its InteliSense suggests ON clauses based on both common attribute names and foreign keys and I find it very useful. I’ve no great desire to see a new (standard) SQL join type for this, though.

SQL came first!

Foreign Keys and Foreign Key constraints came later and are essentially an optimization for “transaction” style applications.

Relational databases were originally conceived as a method of applying complex queries on sets of data in a way that was mathematically provable using relational algebra. I.E. for a given set of data and a given query there is always a single correct answer.

Relational databases have come a long way since then, and there primary use as the persistence layer for transactional systems was not what CODD et. all envisaged.

However the ANSI standards body for all its conflicting goals and vendor politics has always striven to preserve “mathematically provable” properties of SQL.

If you allowed the database to infer the join properties from “hidden” foreign key data you would lose this property (consider the ambiguity if there was more than one set of foreign keys defined).

Also a programmer reading the SQL would not necessarily know what foreign keys were currently defined for the two tables, and, would need to examine the database schema to work out what the query was doing.

While you have defined a Foreign Key relationship that does not mean that is how you want to join the tables in all queries. It is the most probable method for joining the tables, but there are cases where is it not correct.

  • You may want to use a Cartesian product of the two tables or part thereof for some purpose.
  • There may be other fields on which you can join for another purpose.
  • If you are joining three or more tables, one of the tables may be related to two or more of the tables. In this case usually only one of the possible FK relationships may be appropriate in the query.

You may be operating on a false assumption. You say ‘as far as you can find out’ but don’t give any empirical or evidentiary proof. If the pk or fk are the best index for a query it will be used. I don’t know why you are seeing this, but my guess is poorly formed queries.


Edit now that the question has been totally rewritten : The case you are describing would only be for a very small set of queries. What if there are 12 tables Joined? What if there are no FKs…. Even if there was a default join on I would still always specify the join just for readability. (I don’t want to have to look at the data and then try to figure out what is being joined on)

Some Query tools actually do an auto join for you then allow you remove or edit the join. I think MS Access’s Query Builder does this.

Lastly the ANSII standard states that the join must be specified. That is reason enough to not allow it.

Leave a Reply

Your email address will not be published. Required fields are marked *