How do you join three or more records from the same table?

Posted on

Question :

I have table with a lot of data in it, and we would like a way to essentially flag certain documents to indicate that they are related to (x) other documents:

TABLE A              
-----------
id
name
created_at
modified_at

Records:   id        name              created_at              modified_at
            1       arlo g    2020-04-29 12:30:00      2020-04-29 12:30:00
            2     kenny ws    2020-04-29 12:32:00      2020-04-29 12:32:00
            3      roger w    2020-04-29 12:36:00      2020-04-29 12:36:00
            4     frank lw    2020-04-29 12:41:00      2020-04-29 12:41:00
            5    depeche m    2020-04-29 12:43:00      2020-04-29 12:43:00
            6       manson    2020-04-29 12:46:00      2020-04-29 12:46:00

Now – assume a user is viewing document id 1. They want to tie it to documents 3 and 4.

If a user then opens document 1, the system can easily pull documents 3 and 4.

  • On the same note – if a user opens document 3, they can easily pull documents 1 and 4.

Is this possible? I am not sure how to structure a query clause to pull the document ids for a given document, such that the list is the same each time no matter which one of those documents is queried on.

Answer :

If a user then opens document 1, the system can easily pull documents 3 and 4.

On the same note – if a user opens document 3, they can easily pull documents 1 and 4.

This looks like the relation have NO parent-child type. So the solution may be

CREATE TABLE r ( group_id INT UNSIGNED,
                 table_a_id INT UNSIGNED,
                 PRIMARY KEY (table_a_id)
              /* PRIMARY KEY (group_id, table_a_id) */
               );

So if you need in “if a user opens document 3, they can easily pull documents 1 and 4”, then all 3 table_a values (1,3,4) must belong the same group.

The primary key selection is dependent by the fact does a definite document may belong to one group only or to more than one group.

When you need to obtain the whole group members by one member of a group you will use

SELECT t2.*
FROM table_a t1
JOIN r       r1   ON t1.id = r1.table_a_id
JOIN r       r2   ON r1.group_id = r2.group_id 
JOIN table_a t2   ON r2.table_a_id = t2.id
WHERE t1.name = 'initial document name'

As this is a many-to-many relationship.

CREATE TABLE r (from INT UNSIGNED, to INT UNSIGNED, PRIMARY KEY(from,to));

Relations are fetch by aliasing the source tbl multiple times like

   SELECT t1.id, t1.name, t2.id, t2.created_at, t2.name
   FROM tbl t1
   LEFT JOIN r ON t1.id = r.from
   JOIN tbl t2 ON t2.id = r.to
   WHERE ...

So t1 and t2, are table aliases of tbl with the same fields, however their rows (not documents) are based on the join criteria.

This idea is to store related documents as a chain of links:

create table link
( x int not null primary key
, y int not null unique
);

insert into link (x,y) values (1,3),(3,4);

If we were to add a link we attach it either first or last. I.e

insert into chain (x,y) values (5,1);

or:

insert into link (x,y) values (4,5);

If we remove one document (say 3), we delete one link and update the other.

The transitive closure of link is how all documents are related to each other:

with recursive backward as (
  select 1 as x, 0 as iter
  union all
  select l.x, iter + 1 from link l join backward b on l.y = b.x
),   forward as (
  select 1 as x, 0 as iter
  union all
  select l.y, iter + 1 from link l join forward b on l.x = b.x
)
select x from forward where iter > 0
union 
select x from backward where iter > 0

The document in question is used as a seed for the two recursive CTE:s:

select 1 as x, 0 as iter

If the closure of a document is supposed to be reflexive, i.e. contain itself, just remove the predicate where iter > 0

Fiddle

If you are on a version less than 8, or if you have large chains where recursive queries are to slow, you may consider some kind of Incremental Evaluation System (IES). The idea is quite simple, we create a table for the transitive closure of link:

create table tc 
( x int not null
, y int not null
, primary key (x,y)
);  

Now, whenever a chain is modified, the TC must change accordingly. A trigger that detects insert of links may be implemented as:

delimiter //
create trigger ira_link
after insert on link
for each row
begin
    insert into tc (x,y) 
    select NEW.x, NEW.y
    union all 
    select tc.x, NEW.y from tc where NEW.x = tc.y
    union all
    select NEW.x, tc.y from tc where NEW.y = tc.x
;
end //
delimiter ;

insert into link (x,y) values (1,3);
insert into link (x,y) values (3,4);
insert into link (x,y) values (0,1);
insert into link (x,y) values (4,5);
insert into link (x,y) values (6,7);

We may ask for what documents that is related to 3 with:

select x from tc where y = 3
union
select y from tc where x = 3;

0
1
4
5

or with the slightly shorter:

select case when x = 3 then y else x end 
from tc where 3 in (x,y); 

It is a pity that functions that return tables are not implemented, something like:

select * from get_tc(3)

would be a nice abstraction

I’ll leave the triggers for DELETE, UPDATE as an exercise for the reader. I could not use the previous link due to restrictions for triggers, but you can use https://www.db-fiddle.com/

Leave a Reply

Your email address will not be published. Required fields are marked *