I have table with a lot of data in it, and we would like a way to essentially flag certain documents to indicate that they are related to (x) other documents:
TABLE A ----------- id name created_at modified_at Records: id name created_at modified_at 1 arlo g 2020-04-29 12:30:00 2020-04-29 12:30:00 2 kenny ws 2020-04-29 12:32:00 2020-04-29 12:32:00 3 roger w 2020-04-29 12:36:00 2020-04-29 12:36:00 4 frank lw 2020-04-29 12:41:00 2020-04-29 12:41:00 5 depeche m 2020-04-29 12:43:00 2020-04-29 12:43:00 6 manson 2020-04-29 12:46:00 2020-04-29 12:46:00
Now – assume a user is viewing document id
1. They want to tie it to documents
If a user then opens document
1, the system can easily pull documents
- On the same note – if a user opens document
3, they can easily pull documents
Is this possible? I am not sure how to structure a query clause to pull the document ids for a given document, such that the list is the same each time no matter which one of those documents is queried on.
If a user then opens document 1, the system can easily pull documents 3 and 4.
On the same note – if a user opens document 3, they can easily pull documents 1 and 4.
This looks like the relation have NO parent-child type. So the solution may be
CREATE TABLE r ( group_id INT UNSIGNED, table_a_id INT UNSIGNED, PRIMARY KEY (table_a_id) /* PRIMARY KEY (group_id, table_a_id) */ );
So if you need in “if a user opens document 3, they can easily pull documents 1 and 4”, then all 3
table_a values (1,3,4) must belong the same group.
The primary key selection is dependent by the fact does a definite document may belong to one group only or to more than one group.
When you need to obtain the whole group members by one member of a group you will use
SELECT t2.* FROM table_a t1 JOIN r r1 ON t1.id = r1.table_a_id JOIN r r2 ON r1.group_id = r2.group_id JOIN table_a t2 ON r2.table_a_id = t2.id WHERE t1.name = 'initial document name'
As this is a many-to-many relationship.
CREATE TABLE r (from INT UNSIGNED, to INT UNSIGNED, PRIMARY KEY(from,to));
Relations are fetch by aliasing the source
tbl multiple times like
SELECT t1.id, t1.name, t2.id, t2.created_at, t2.name FROM tbl t1 LEFT JOIN r ON t1.id = r.from JOIN tbl t2 ON t2.id = r.to WHERE ...
So t1 and t2, are table aliases of
tbl with the same fields, however their rows (not documents) are based on the join criteria.
This idea is to store related documents as a chain of links:
create table link ( x int not null primary key , y int not null unique ); insert into link (x,y) values (1,3),(3,4);
If we were to add a link we attach it either first or last. I.e
insert into chain (x,y) values (5,1);
insert into link (x,y) values (4,5);
If we remove one document (say 3), we delete one link and update the other.
The transitive closure of link is how all documents are related to each other:
with recursive backward as ( select 1 as x, 0 as iter union all select l.x, iter + 1 from link l join backward b on l.y = b.x ), forward as ( select 1 as x, 0 as iter union all select l.y, iter + 1 from link l join forward b on l.x = b.x ) select x from forward where iter > 0 union select x from backward where iter > 0
The document in question is used as a seed for the two recursive CTE:s:
select 1 as x, 0 as iter
If the closure of a document is supposed to be reflexive, i.e. contain itself, just remove the predicate
where iter > 0
If you are on a version less than 8, or if you have large chains where recursive queries are to slow, you may consider some kind of Incremental Evaluation System (IES). The idea is quite simple, we create a table for the transitive closure of link:
create table tc ( x int not null , y int not null , primary key (x,y) );
Now, whenever a chain is modified, the TC must change accordingly. A trigger that detects insert of links may be implemented as:
delimiter // create trigger ira_link after insert on link for each row begin insert into tc (x,y) select NEW.x, NEW.y union all select tc.x, NEW.y from tc where NEW.x = tc.y union all select NEW.x, tc.y from tc where NEW.y = tc.x ; end // delimiter ; insert into link (x,y) values (1,3); insert into link (x,y) values (3,4); insert into link (x,y) values (0,1); insert into link (x,y) values (4,5); insert into link (x,y) values (6,7);
We may ask for what documents that is related to 3 with:
select x from tc where y = 3 union select y from tc where x = 3; 0 1 4 5
or with the slightly shorter:
select case when x = 3 then y else x end from tc where 3 in (x,y);
It is a pity that functions that return tables are not implemented, something like:
select * from get_tc(3)
would be a nice abstraction
I’ll leave the triggers for DELETE, UPDATE as an exercise for the reader. I could not use the previous link due to restrictions for triggers, but you can use https://www.db-fiddle.com/