How do I sort the results of a recursive query in an expanded tree-like fashion?

Posted on

Question :

Let’s assume you have a nodes tables like this:

CREATE TABLE nodes (
    node serial PRIMARY KEY,
    parent integer NULL REFERENCES nodes(node),
    ts timestamp NOT NULL DEFAULT now()
);

It represents a standard node-like tree structure with root nodes at the top and several child nodes dangling from root nodes or other child nodes.

Let us insert a couple of example values:

INSERT INTO nodes (parent) VALUES
  (NULL), (NULL), (NULL), (NULL)
, (1), (1), (1), (1), (6), (1), (6)
, (9), (6), (6), (3), (3), (3), (15);

Now I want to retrieve the first 10 root nodes and all their children up to a depth of 4:

WITH RECURSIVE node_rec AS (
    (SELECT 1 AS depth, * FROM nodes WHERE parent IS NULL LIMIT 10)

    UNION ALL

    SELECT depth + 1, n.*
    FROM nodes AS n JOIN node_rec ON (n.parent = node_rec.node)
    WHERE depth < 4
)
SELECT * FROM node_rec;

This works great and gives me the following result:

 depth | node | parent 
-------+------+--------
     1 |  1   |
     1 |  2   |
     1 |  3   |
     1 |  4   |
     2 |  5   |  1
     2 |  6   |  1
     2 |  7   |  1
     2 |  8   |  1
     2 | 10   |  1
     2 | 15   |  3
     2 | 16   |  3
     2 | 17   |  3
     3 |  9   |  6
     3 | 11   |  6
     3 | 13   |  6
     3 | 14   |  6
     3 | 18   | 15
     4 | 12   |  9

As you might have noticed, there’s no ORDER BY clause, so the order is not defined. The order you see here is from root nodes to deeper nodes.

How would I order the results as they would appear in an expanded tree view, as you can see from the example picture below?

Expanded tree view of nodes

I basically want the child nodes to be placed right after their corresponding parent node. If two or more child nodes have the same parent node, I want them to be sorted by their timestamp. Based on the example above, here’s the desired output order that I’m trying to achieve:

 depth | node | parent | ts
-------+------+--------+---------
     1 |  1   |        | 2014-01-01 00:00:00
     2 |  5   |     1  | 2014-01-01 00:10:00
     2 |  6   |     1  | 2014-01-01 00:20:00
     3 |  9   |     6  | 2014-01-01 00:25:00
     4 |  12  |     9  | 2014-01-01 00:27:00
     3 |  11  |     6  | 2014-01-01 00:26:00
     3 |  13  |     6  | 2014-01-01 00:30:00
     3 |  14  |     6  | 2014-01-01 00:36:00
     2 |  7   |     1  | 2014-01-01 00:21:00
     2 |  8   |     1  | 2014-01-01 00:22:00
     2 |  10  |     1  | 2014-01-01 00:23:00
     1 |  2   |        | 2014-01-01 00:08:00
     1 |  3   |        | 2014-01-01 00:09:00
     2 |  15  |     3  | 2014-01-01 10:00:00
     3 |  18  |     15 | 2014-01-01 11:05:00
     2 |  16  |     3  | 2014-01-01 11:00:00
     2 |  17  |     3  | 2014-01-01 12:00:00
     1 |  4   |        | 2014-01-01 00:10:00

Answer :

You can order by the array representing the path from the root up to the leaf:

To keep it simple I left out LIMIT (so we also don’t need extra parentheses) and WHERE of your original query. Those can be added freely.

Basic query

WITH RECURSIVE node_rec AS (
   SELECT 1 AS depth, *
        , ARRAY[node] AS path
   FROM   nodes
   WHERE  parent IS NULL

   UNION ALL
   SELECT r.depth + 1, n.*
        , r.path || n.node
   FROM   node_rec r 
   JOIN   nodes    n ON n.parent = r.node
   )
SELECT *
FROM   node_rec
ORDER  BY path;

The same can be simplified with the dedicated SEARCH clause in Postgres 14 or later:

WITH RECURSIVE node_rec AS (
   SELECT 1 AS depth, *
   FROM   nodes
   WHERE  parent IS NULL

   UNION ALL
   SELECT r.depth + 1, n.*
   FROM   node_rec r 
   JOIN   nodes    n ON n.parent = r.node
   ) SEARCH DEPTH FIRST BY node SET path
SELECT *
FROM   node_rec
ORDER  BY path, ts;

Order by additional column

If two or more child nodes have the same parent node,
I want them to be sorted by their timestamp.

We really need to sort by the timestamp column ts first on each level, and node is just a tiebreaker (the timestamp may not be unique). The simplest solution is to take both as record type:

WITH RECURSIVE node_rec AS (
   SELECT 1 AS depth, *
         , ARRAY[(ts, node)] AS path
   FROM   nodes
   WHERE  parent IS NULL

   UNION ALL
   SELECT r.depth + 1, n.*
        , r.path || (n.ts, n.node)
   FROM   node_rec r 
   JOIN   nodes    n ON n.parent = r.node
   )
SELECT *
FROM   node_rec
ORDER  BY path;

Equivalent, simpler query with the SEARCH clause in Postgres 14 or later:

WITH RECURSIVE node_rec AS (
   SELECT 1 AS depth, *
   FROM   nodes
   WHERE  parent IS NULL

   UNION ALL
   SELECT r.depth + 1, n.*
   FROM   node_rec r 
   JOIN   nodes    n ON n.parent = r.node
   ) SEARCH DEPTH FIRST BY ts, node SET path
SELECT *
FROM   node_rec
ORDER  BY path, ts;

AS you can see, the new syntax allows to specify multiple columns for the sort.

db<>fiddle here

Leave a Reply

Your email address will not be published. Required fields are marked *