This question is about SQL in general. Answering specifically for MySQL would be helpful but not necessary.
Ok, I’m having trouble putting this into words… so bear with me.
Say I have a tree of things (I’ll call them nodes), with a table that looks something like this (the structure can be changed; this is just a simple version):
+---------+-----------+ | node_id | parent_id | +---------+-----------+ | 1 | NULL | | 2 | NULL | | 3 | 1 | | 4 | 1 | | 5 | 5 | | 7 | 2 | | 8 | 5 | +---------+-----------+
Then I have a table of entities. Each entity is associated with a certain node. For example (again, the structure can be changed):
+-----------+---------+ | entity_id | node_id | +-----------+---------+ | 1 | 1 | | 2 | 1 | | 3 | 4 | | 4 | 7 | | ...many more rows | +-----------+---------+
This structure could represent a lot of things, e.g. each entity is a movie, while each node is a genre (but there can be unlimited levels of sub-genres).
So retrieving all the entities for a given node is simple; just running a query on the entities table for a specified
Here’s my question: how would I query the entities table for all entities associated with a given node, and all it’s children nodes (every level, recursively). In the movies example, I want to find all movies for a particular genre and all its sub-genres, and its sub-genres, etc.
I mentioned the word recursive, but that doesn’t mean that the query should be recursive, but conceptually it is. The query should be as fast as possible. The structure of the tables may need to be changed as well.
Thanks for your help!
I assume the row (5,5) is a typo, so I used:
create table tree (node_id int not null primary key, parent_id int); insert into tree (node_id, parent_id) values (1,null),(2,(null),(3,1),(4,1),(5,4),(7,2),(8,5);
as a side note, which representation would you have preferred if you were about to start answering a question, the ascii art in your question, or create and insert statements like above?
To recursively traverse the tree you can use a recursive common table expression (CTE):
with rec (node_id, ancestor_id) as ( select t.node_id, t.parent_id from tree t union all select rec.node_id, t.parent_id from rec, tree t where rec.ancestor_id = t.node_id ) select * from rec
I much prefer explicit joins, but the DBMS I’m using does not support them in CTE:
with rec (node_id, ancestor_id) as ( select t.node_id, t.parent_id from tree t union all select rec.node_id, t.parent_id from rec join tree t on rec.ancestor_id = t.node_id ) select * from rec
Now it’s just a matter of joining this result with your entities table:
with rec (node_id, ancestor_id) as ( select t.node_id, t.parent_id from tree t union all select rec.node_id, t.parent_id from rec join tree t on rec.ancestor_id = t.node_id ) select rec.ancestor_id, e,entity_id from rec join entities e on e.node_id = rec.node_id
This assumes that your DBMS version supports recursive CTE:s. The latest versions of MySQL / MariaDB does.
Another side note, you may want to normalize the tree relation, and keep the structure in a separate relation:
create table treenode (node_id int not null primary key, ...) create table tree (node_id int not null primary key ,parent_id int not null);
root node(s) are the nodes that are absent in tree
If you are on an older version, or have huge amounts of data and your performance needs to be improved you can add additional information to your model. The most common ones are:
- Nested Sets, all nodes have an interval of what nodes they sub-sume
- Materialized Path, all nodes have an aggregated string which represents ancestor nodes
- A transitive closure table, I have some notes and a few examples at tree: