Question :
I have a PostgreSQL table constructed as
device_id | point | dt_edit
----------+-----------+----------
100 | geometry | timestamp
101 | geometry | timestamp
100 | geometry | timestamp
102 | geometry | timestamp
102 | geometry | timestamp
101 | geometry | timestamp
I need to select the last 2 records ordered by dt_edit
from unique device_id
. This query works very slow, on 1 billion records – 500 seconds:
SELECT dt.device_id,
dt.point,
dt.dt_edit
FROM ( SELECT gps_data.device_id,
gps_data.point,
gps_data.dt_edit,
rank() OVER (PARTITION BY gps_data.device_id
ORDER BY gps_data.dt_edit DESC) AS rank
FROM gps_data) dt
WHERE dt.rank <= 2
Any ideas?
Answer :
If we can assume a device
table holding all devices of interest. Example:
CREATE TABLE device (device_id int, device text);
INSERT INTO device (device_id, device) VALUES
(100, 'a')
, (101, 'b')
, (102, 'c');
The query can be very simple:
SELECT d.device_id, g.point, g.dt_edit
FROM device d
, LATERAL (
SELECT point, dt_edit
FROM gps_data
WHERE device_id = d.device_id
ORDER BY dt_edit DESC NULLS LAST -- more items?
LIMIT 2
) g;
Assuming a much smaller cardinality in table device
than in table gps_data
, i.e. many rows per device in the latter.
If dt_edit
is not UNIQUE
, add more items to ORDER BY
to disambiguate the sort order.
All you need for this to be fast is a multicolumn index;
CREATE INDEX gps_data_foo_idx
ON gps_data (device_id, dt_edit DESC NULLS LAST); -- more items?
The NULLS LAST
modifier is useful if the column can be NULL, but does not hurt in any case. Either way, the query must match the index.
If you don’t have a device table, I would suggest to create one (and keep it current).
But there are ways to make this fast even without device
table …
Details: