Question :
This just came up in a question on Reddit, and I’m wondering
- What is a
PARTITIONED OUTER JOIN
in Oracle? (definition) - What does a simple example look like? (use)
- How would you write it in terms of PostgreSQL or Standard SQL that otherwise lacks
PARTITIONED OUTER JOIN
? (equivalence)
Answer :
{1} Partitioned Outer Join: Definition
… “Such a join extends the conventional outer join syntax by applying the outer join to each logical partition defined in a query. Oracle logically partitions the rows in your query based on the expression you specify in the PARTITION BY clause. The result of a partitioned outer join is a UNION of the outer joins of each of the partitions in the logically partitioned table with the table on the other side of the join.” (documentation)
{2} Simple Example
“Data is normally stored in sparse form. That is, if no value exists for a given combination of dimension values, no row exists in the fact table. However, you may want to view the data in dense form, with rows for all combination of dimension values displayed even when no fact data exist for them.”
… “For example, if a product did not sell during a particular time period, you may still want to see the product for that time period with zero sales value next to it.” (quotes from the documentation)
Test tables and data (INSERTs)
-- Oracle 12c
create table sales (
date_ date
, location_ varchar2( 16 )
, qty_ number
);
create table locations (
name varchar2( 16 )
);
-- dates for locations are "gappy":
-- none of the locations has entries for all 3 dates
-- ( date range: 2019-01-15 - 2019-01-17 )
insert into sales ( date_, location_, qty_ )
values ( date '2019-01-17', 'London', 11 ) ;
insert into sales ( date_, location_, qty_ )
values ( date '2019-01-15', 'London', 10 ) ;
insert into sales ( date_, location_, qty_ )
values ( date '2019-01-16', 'Paris', 20 ) ;
insert into sales ( date_, location_, qty_ )
values ( date '2019-01-17', 'Boston', 31 ) ;
insert into sales ( date_, location_, qty_ )
values ( date '2019-01-16', 'Boston', 30 ) ;
-- locations
insert into locations ( name ) values ( 'London' );
insert into locations ( name ) values ( 'Paris' );
insert into locations ( name ) values ( 'Boston' );
Required output
date_ location_ qty_
2019-01-15 London 10
2019-01-15 Paris 0 -- not INSERTed!
2019-01-15 Boston 0 -- not INSERTed!
2019-01-16 London 0 -- not INSERTed!
2019-01-16 Paris 20
2019-01-16 Boston 30
2019-01-17 London 11
2019-01-17 Paris 0 -- not INSERTed!
2019-01-17 Boston 31
Query (partitioned outer join)
select S.date_, S.qty_, L.name
from sales S partition by ( date_ )
right join locations L on S.location_ = L.name
;
-- result
DATE_ QTY_ NAME
15-JAN-19 NULL Boston
15-JAN-19 10 London
15-JAN-19 NULL Paris
16-JAN-19 30 Boston
16-JAN-19 NULL London
16-JAN-19 20 Paris
17-JAN-19 31 Boston
17-JAN-19 11 London
17-JAN-19 NULL Paris
Query (version 2, same join)
-- same as above, using NVL(), column aliases, and ORDER BY ...
select S.date_, nvl( S.qty_, 0 ) as sold, L.name as location
from sales S partition by ( date_ )
right join locations L on S.location_ = L.name
order by S.date_, L.name
;
DATE_ SOLD LOCATION
--------- ---------- ----------------
15-JAN-19 0 Boston
15-JAN-19 10 London
15-JAN-19 0 Paris
16-JAN-19 30 Boston
16-JAN-19 0 London
16-JAN-19 20 Paris
17-JAN-19 31 Boston
17-JAN-19 11 London
17-JAN-19 0 Paris
{3} Equivalence
The following query does roughly the same job as the PARTITION BY ( date_ ) outer join. We are using a combination of a CROSS JOIN (inner SELECT) and LEFT OUTER JOIN.
(Conversion of NULLs to 0s omitted)
Oracle
select SL.*, S.qty_
from
(
select *
from (
select unique date_ from sales
) , (
select unique name from locations
)
) SL left join (
select date_, location_, qty_ from sales
) S on SL.name = S.location_ and SL.date_ = S.date_
order by SL.date_, SL.name
;
DATE_ NAME QTY_
15-JAN-19 Boston NULL
15-JAN-19 London 10
15-JAN-19 Paris NULL
16-JAN-19 Boston 30
16-JAN-19 London NULL
16-JAN-19 Paris 20
17-JAN-19 Boston 31
17-JAN-19 London 11
17-JAN-19 Paris NULL
PostgreSQL 10 ( dbfiddle )
-- DDL and INSERTs
create table sales (
date_ date
, location_ varchar( 16 )
, qty_ number
);
create table locations (
name varchar( 16 )
);
insert into sales ( date_, location_, qty_ ) values
( '2019-01-17', 'London', 11 )
, ( '2019-01-15', 'London', 10 )
, ( '2019-01-16', 'Paris', 20 )
, ( '2019-01-17', 'Boston', 31 )
, ( '2019-01-16', 'Boston', 30 )
insert into locations ( name ) values
( 'London' ), ( 'Paris' ), ( 'Boston' );
Query (Postgres)
-- SL: all date_ <-> location combinations
-- S: all location_ and qty_ values of table sales
select SL.*, S.qty_
from
(
select *
from (
select distinct date_ from sales
) S_ cross join (
select distinct name from locations
) L_
) SL left join (
select date_, location_, qty_ from sales
) S on SL.name = S.location_ and SL.date_ = S.date_
order by SL.date_, SL.name
;
date_ name qty_
2019-01-15 Boston
2019-01-15 London 10
2019-01-15 Paris
2019-01-16 Boston 30
2019-01-16 London
2019-01-16 Paris 20
2019-01-17 Boston 31
2019-01-17 London 11
2019-01-17 Paris