Is it a bad idea to make redundant rows when 2 columns reference to the same table?

Posted on

Question :

Suppose we have this database design

Users table:
- id
- name

Friends table:
- id
- user_id
- friend_user_id

Let’s say USER 1 befriended USER 2, this is what my application logic will execute:

INSERT INTO friends (user_id, friend_user_id) VALUES (1, 2);
INSERT INTO friends (user_id, friend_user_id) VALUES (2, 1);

For the retrieval of the USER 1 friends, we simply query this:

SELECT * FROM friends WHERE user_id = 1

Is it a bad idea to make redundant rows when friending users?

What I am trying to avoid here is not to store the friendship in a single row and keep guessing in every query which column the user is in:

SELECT * FROM friends WHERE user_id = 1 OR friend_user_id = 1

The latter made my application logic very complicated in some cases.

Answer :

There are two ways of doing this:

  1. Only allowing the friendship to be established or removed via a procedure – no direct inserts or deletes allowed1
  2. Use a single row with an additional index

The table

For purposes of this answer, let’s define our table as such:

CREATE TABLE Friendship
(
  First_User_Id   INT NOT NULL
 ,Second_User_Id  INT NOT NULL
 ,CONSTRAINT FK_First_User_Is_User FOREIGN KEY (First_User_Id) REFERENCES User (User_Id)
 ,CONSTRAINT FK_Second_User_Is_User FOREIN KEY (Second_User_Id) REFERENCES User (Used_Id)
 ,CONSTRAINT PK_Friend PRIMARY KEY (First_User_Id, Second_User_Id)
 ,CONSTRAINT CK_First_User_NE_Second_User CHECK (First_User_Id <> Second_User_Id)
)

This differs from your question in the sense there is no id column – it is unnecessary (it is not a key) and wasteful (a second index would need to be created to maintain uniqueness – which is essentially creating a copy of the entire table).

Via Procedure

For this two work, we need two rows for each friendship (A,B) and (B,A). However, to make sure both rows are created (or removed) at the same time, we need to use procedural logic.

How to achieve this will vary based on the DBMS used. The basic outline would be something like this:

/* Adding a friendship */
CREATE PROCEDURE Friendship_Create
  @First_User_Id   INT
 ,@Second_User_Id  INT
AS
BEGIN
  INSERT INTO Friendship
  VALUES
    (@First_User_Id, @Second_User_Id)
   ,(@Second_User_Id, @First_User_Id)
END


/* Removing a friendship */
CREATE PROCEDURE Friendship_Delete
  @First_User_Id   INT
 ,@Second_User_Id  INT
AS
BEGIN
  DELETE FROM
    FriendShip
  WHERE
    (First_User_Id = @First_User_Id AND Second_User_Id = @Second_User_Id)
      OR (First_User_Id = @Second_User_Id AND Second_User_Id = @First_User_Id)
END

Depending on the DBMS, you may need to do additional checks before committing the transaction or set the isolation level of each transaction appropriately.

You also need to configure permissions in a way that no user (except root/sa) would have direct Insert/Update/Delete permissions.

Getting the friendships for User 1 is very direct:

SELECT
  Second_Friend_Id
FROM
  Friendship
WHERE
  First_User_Id = 1

Using a single row

If the relationship is bi-directional (mutual), then you only need to store one row. (A,B) contains the same amount of information as (B,A).

However, this still leaves two issues unresolved:

  1. An efficient manner to query using either member of the friendship
  2. A way to prevent duplicate relationships from being established

For the first, we can simply create a second unique constraint2 on the table that inverts the order of the B-Tree. Physically this isn’t much different from inserting both a (A,B) and (B,A) row in the table, except we are guaranteed both pairs will exist without relying on transactional logic.

For the second, we can impose the restriction that User A Id < User B Id. So of the pairs (A,B) and (B,A), only one satisfies that condition.

Putting everything together:

CREATE TABLE Friendship
(
  First_User_Id   INT NOT NULL
 ,Second_User_Id  INT NOT NULL
 ,CONSTRAINT FK_First_User_Is_User FOREIGN KEY (First_User_Id) REFERENCES User (User_Id)
 ,CONSTRAINT FK_Second_User_Is_User FOREIN KEY (Second_User_Id) REFERENCES User (Used_Id)
 ,CONSTRAINT PK_Friend PRIMARY KEY (First_User_Id, Second_User_Id)
 ,CONSTRAINT AK_Friend UNIQUE (Second_User_Id, First_User_Id)
 ,CONSTRAINT CK_First_User_Id_LT_Second_User_Id CHECK (First_User_Id < Second_User_Id)
)

Getting the friendships of User 1 is slightly different but will return the same results:

SELECT
  CASE
    WHEN First_User_Id = 1 THEN Second_User_Id
    ELSE First_User_Id
  END AS Friend_User_Id
FROM
  Friendship
WHERE
  First_User_Id = 1
    OR Second_User_Id = 1

Additional Thoughts

You do not need a graph database to support graph structures, the relational model has done that since its inception (and databases have supported since recursion was implemented). What graph support can do is provide syntax that may simplify your queries somewhat – although the physical implementation may not be as lightweight as a standard implementation.


1 IMO procedures should be the default method of altering the data.

2 Technically you could just do a regular index, a UNIQUE constraint does basically the same thing.

Leave a Reply

Your email address will not be published. Required fields are marked *