Question :
I have table with a lot of data in it, and we would like a way to essentially flag certain documents to indicate that they are related to (x) other documents:
TABLE A
-----------
id
name
created_at
modified_at
Records: id name created_at modified_at
1 arlo g 2020-04-29 12:30:00 2020-04-29 12:30:00
2 kenny ws 2020-04-29 12:32:00 2020-04-29 12:32:00
3 roger w 2020-04-29 12:36:00 2020-04-29 12:36:00
4 frank lw 2020-04-29 12:41:00 2020-04-29 12:41:00
5 depeche m 2020-04-29 12:43:00 2020-04-29 12:43:00
6 manson 2020-04-29 12:46:00 2020-04-29 12:46:00
Now – assume a user is viewing document id 1
. They want to tie it to documents 3
and 4
.
If a user then opens document 1
, the system can easily pull documents 3
and 4
.
- On the same note – if a user opens document
3
, they can easily pull documents1
and4
.
Is this possible? I am not sure how to structure a query clause to pull the document ids for a given document, such that the list is the same each time no matter which one of those documents is queried on.
Answer :
If a user then opens document 1, the system can easily pull documents 3 and 4.
On the same note – if a user opens document 3, they can easily pull documents 1 and 4.
This looks like the relation have NO parent-child type. So the solution may be
CREATE TABLE r ( group_id INT UNSIGNED,
table_a_id INT UNSIGNED,
PRIMARY KEY (table_a_id)
/* PRIMARY KEY (group_id, table_a_id) */
);
So if you need in “if a user opens document 3, they can easily pull documents 1 and 4”, then all 3 table_a
values (1,3,4) must belong the same group.
The primary key selection is dependent by the fact does a definite document may belong to one group only or to more than one group.
When you need to obtain the whole group members by one member of a group you will use
SELECT t2.*
FROM table_a t1
JOIN r r1 ON t1.id = r1.table_a_id
JOIN r r2 ON r1.group_id = r2.group_id
JOIN table_a t2 ON r2.table_a_id = t2.id
WHERE t1.name = 'initial document name'
As this is a many-to-many relationship.
CREATE TABLE r (from INT UNSIGNED, to INT UNSIGNED, PRIMARY KEY(from,to));
Relations are fetch by aliasing the source tbl
multiple times like
SELECT t1.id, t1.name, t2.id, t2.created_at, t2.name
FROM tbl t1
LEFT JOIN r ON t1.id = r.from
JOIN tbl t2 ON t2.id = r.to
WHERE ...
So t1 and t2, are table aliases of tbl
with the same fields, however their rows (not documents) are based on the join criteria.
This idea is to store related documents as a chain of links:
create table link
( x int not null primary key
, y int not null unique
);
insert into link (x,y) values (1,3),(3,4);
If we were to add a link we attach it either first or last. I.e
insert into chain (x,y) values (5,1);
or:
insert into link (x,y) values (4,5);
If we remove one document (say 3), we delete one link and update the other.
The transitive closure of link is how all documents are related to each other:
with recursive backward as (
select 1 as x, 0 as iter
union all
select l.x, iter + 1 from link l join backward b on l.y = b.x
), forward as (
select 1 as x, 0 as iter
union all
select l.y, iter + 1 from link l join forward b on l.x = b.x
)
select x from forward where iter > 0
union
select x from backward where iter > 0
The document in question is used as a seed for the two recursive CTE:s:
select 1 as x, 0 as iter
If the closure of a document is supposed to be reflexive, i.e. contain itself, just remove the predicate where iter > 0
If you are on a version less than 8, or if you have large chains where recursive queries are to slow, you may consider some kind of Incremental Evaluation System (IES). The idea is quite simple, we create a table for the transitive closure of link:
create table tc
( x int not null
, y int not null
, primary key (x,y)
);
Now, whenever a chain is modified, the TC must change accordingly. A trigger that detects insert of links may be implemented as:
delimiter //
create trigger ira_link
after insert on link
for each row
begin
insert into tc (x,y)
select NEW.x, NEW.y
union all
select tc.x, NEW.y from tc where NEW.x = tc.y
union all
select NEW.x, tc.y from tc where NEW.y = tc.x
;
end //
delimiter ;
insert into link (x,y) values (1,3);
insert into link (x,y) values (3,4);
insert into link (x,y) values (0,1);
insert into link (x,y) values (4,5);
insert into link (x,y) values (6,7);
We may ask for what documents that is related to 3 with:
select x from tc where y = 3
union
select y from tc where x = 3;
0
1
4
5
or with the slightly shorter:
select case when x = 3 then y else x end
from tc where 3 in (x,y);
It is a pity that functions that return tables are not implemented, something like:
select * from get_tc(3)
would be a nice abstraction
I’ll leave the triggers for DELETE, UPDATE as an exercise for the reader. I could not use the previous link due to restrictions for triggers, but you can use https://www.db-fiddle.com/