Question :
I have a table with the following structure:
timstamp-start, timestamp-stop
1,5
6,10
25,30
31,35
...
I am only interested in continuous timespans e.g. the break between a timestamp-end and the following timestamp-start is less than 3.
How could I get the aggregated covered timespans as a result:
timestamp-start,timestamp-stop
1,10
25,35
The reason I am considering this is because a user may request a time span that would need to return several thousand rows. However, most records are continuous and using above method could potentially reduce many thousand of rows down to just a dozen. Or is the added computation not worth the savings in bandwidth and latency?
For example: if the values are 1,5 and 7,10, I don’t want 2 rows because 5 and 7 are less than three apart so I want them to be combined into one row (1,10)
Answer :
All credit for this answer goes to Dudu Markovitz. I think this is a bit different, but I’m using his method and you should read his explanation.
Test data:
CREATE TABLE foo(a,b) AS
SELECT *
FROM ( VALUES (1,5), (6,10), (25,30), (31,35) ) AS x;
Here is the query I created,
SELECT min(a), max(b) FROM (
SELECT a,
b,
count(reset) OVER (ORDER BY a,b) AS grp
FROM (
SELECT
a,
b,
CASE WHEN a-(lag(b) OVER ()) > 3 THEN 1 END AS reset
FROM foo
ORDER BY a,b
) AS f1
) as f2
GROUP BY grp
ORDER BY grp;
I’ll break it down the same way… Here we establish resets
to mark new ranges.
SELECT
a,
b,
CASE WHEN a-(lag(b) OVER ()) > 3 THEN 1 END AS reset
FROM foo
ORDER BY a,b;
gives us this
a | b | reset
----+----+-------
1 | 5 |
6 | 10 |
25 | 30 | 1
31 | 35 |
Then we count(reset) OVER (ORDER BY a,b)
to get
a | b | grp
----+----+-----
1 | 5 | 0
6 | 10 | 0
25 | 30 | 1
31 | 35 | 1
Then we wrap that whole thing in a sub-select GROUP BY grp
and pick min(a)
and max(b)
which does what you want.