Question :
I have 2 tables: tbl1, tbl2.
CREATE TABLE tbl1(time_1)
AS VALUES
( '2017-09-06 15:26:03'::timestamp ),
( '2017-09-06 15:26:02' ),
( '2017-09-06 15:28:01' ),
( '2017-09-06 15:40:00' );
CREATE TABLE tbl2(time_2)
AS VALUES
( '2017-09-06 15:29:01'::timestamp ),
( '2017-09-06 15:40:00' ),
( '2017-09-06 15:23:59' ),
( '2017-09-06 15:45:58' );
I want to join the table, thus for every row in tbl1 match closest time in tbl2. The output is :
time_1 time_2
--------------------- --------------------
2017-09-06 15:26:03 2017-09-06 15:23:59
2017-09-06 15:26:02 2017-09-06 15:23:59
2017-09-06 15:28:01 2017-09-06 15:29:01
2017-09-06 15:40:00 2017-09-06 15:45:58
I know how to find single value for closest time:
SELECT * from tbl1 where time_1=INPUT_TIME ORDER BY case when time_1 > INPUT_TIME then time_1 - INPUT_TIME else INPUT_TIME - time_1 end limit 1;
Answer :
Using btree_gist
and <->
You really want <->
which will actually work on GiST indexes through the internal tstz_dist
. This works because you don’t really care about the interval. First you need to add the extension,
CREATE EXTENSION btree_gist;
then
SELECT DISTINCT ON (time_1) time_1, time_2
FROM tbl1
CROSS JOIN tbl2
ORDER BY time_1, time_1 <-> time_2;
If there are too many rows to do a cross-join, you may have better luck with the lateral solution.
SELECT time_1, time_2
FROM tbl1
CROSS JOIN LATERAL (
SELECT time_2
FROM tbl2
ORDER BY time_1<->time_2
LIMIT 1
) AS tbl2;
Would be interested in seeing a report back on the benchmarks. =) Especially if you have a GIST index on time_1 and time_2.
Here’s one attempt using a lateral join:
select time_1, time_2
from tbl1
cross join lateral (
select time_2
from tbl2
order by case when time_1 > time_2 then age(time_1,time_2)
else age(time_2,time_1)
end
fetch first 1 rows only
) as t;
Another possibility is to use a sub-query:
select time_1, (
select time_2
from tbl2
order by case when time_1 > time_2 then age(time_1,time_2)
else age(time_2,time_1)
end
fetch first 1 rows only
)
from tbl1;
There is probably a more elegant way to determine the absolute value of the timestamp difference. Worth noting is that the result will be different if the order of the tables is changed.
select dt1.time_1,
dt1.time_2
from
(select row_number() over(partition by time_1 order by abs(extract('epoch' from time_1) - extract('epoch' from time_2)) asc) as rn,
time_1,
time_2
from tbl1
cross
join tbl2
) dt1
where dt1.rn = 1
order by dt1.time_1
- within the derived table
dt1
… - perform a
cross join
(aka cartesian product) across the 2 tables … - using the absolute value of the differences in
epoch
times to order row numbering for a group of rows with the sametime_1
value; then … - in the outer query limit our output to the first row (
dt1.rn = 1
) for eachtime_1
value
Results:
time_1 | time_2
------------------- | -------------------
2017-09-06 15:26:02 | 2017-09-06 15:23:59
2017-09-06 15:26:03 | 2017-09-06 15:23:59
2017-09-06 15:28:01 | 2017-09-06 15:29:01
2017-09-06 15:40:00 | 2017-09-06 15:40:00
NOTE: For the last time_1 = 2017-09-06 15:40:00
we get a different time_2
match than suggested in the question, which I’m assuming is incorrect:
-- suggested in the question:
time_1 time_2
--------------------- --------------------
2017-09-06 15:40:00 2017-09-06 15:45:58
Here’s a dbfiddle