How is this interpreted by PostgreSQL?

Posted on

Question :

Just to give a quick DDL to make my question easier to understand:

------------------ creating tables ------------------
create table if not exists users(
    user_id bigserial primary key,
    first_name text,
    last_name text,
    user_name text
);
create table if not exists managers(
    manager_id bigserial primary key,
    user_id bigint references users(user_id) not null,
    permissions text[]
);
create table if not exists transactions(
    transaction_id bigserial primary key,
    user_id bigint references users(user_id) not null,
    amount numeric,
    is_approved boolean,
    manager_id bigint references managers(manager_id) null
);
------------------ inserting data ------------------
insert into users(first_name, last_name, user_name)
values ('first_name','last_name','ohyea'),
  ('manager','manager','manager');

insert into managers(user_id)
values (2);
insert into transactions(user_id, amount, is_approved, manager_id)
values (1,10,true,null),
  (1,1000,true,1);

Now I am basically going to get manager’s name using this way:

select 
  t.transaction_id,
  t.user_id,
  u.user_name,
  t.amount,
  t.is_approved,
  mu.user_name as manager_name
from transactions t
    join users u on u.user_id = t.user_id
    left join managers m
        join users mu on mu.user_id = m.user_id
    on m.manager_id = t.manager_id;

The part that I am interested in is this:

    left join managers m
        join users mu on mu.user_id = m.user_id
    on m.manager_id = t.manager_id

What is this to PostgreSQL? A suquery? Like how does it interpret this and what’s the difference between doing something like this, and this:

left join (
    select m.manager_id, u.user_name
    from managers m
        join users u on u.user_id = m.user_id
) mu on mu.manager_id = t.manager_id

EDIT:

After a lot of testing, turns out both approaches are absolutely, 100%, 1:1 identical. Same execution plan, same execution times (with small millisecond variations), the only benefit you gain by using the first approach is just cleaner code and easier to follow in bigger functions with a lot of subquery joins.

Answer :

The documentation tells you:

A joined table is a table derived from two other (real or derived) tables according to the rules of the particular join type. Inner, outer, and cross-joins are available. The general syntax of a joined table is

T1 join_type T2 [ join_condition ]

Joins of all types can be chained together, or nested: either or both T1 and T2 can be joined tables. Parentheses can be used around JOIN clauses to control the join order. In the absence of parentheses, JOIN clauses nest left-to-right.

The only way to parse the expression

a JOIN b JOIN c ON cond2 ON cond1

that is in line with the above syntax is

a JOIN (b JOIN c ON cond2) ON cond1

so if your code doesn’t throw a syntax error, that’s how it must be interpreted.

Leave a Reply

Your email address will not be published. Required fields are marked *