I have a table with about 300 columns and about 107 rows and I have to retrieve changed rows only, ordered by time (not distinct, but changed). The result set may be limited to 100 rows as well.
I’ve tried the following query with
lag window function:
-- Suppose, we want to retrieve data from column1, column2 and column3 fields -- There may be other fields, though SELECT w."stamp", w."column1", w."column2", w."column3" FROM ( SELECT o.stamp, o.obj_id, "o"."column1", "o"."column2", "o"."column3", lag(o."column1") OVER (ORDER BY stamp) AS "_prev_column1", lag(o."column2") OVER (ORDER BY stamp) AS "_prev_column2", lag(o."column3") OVER (ORDER BY stamp) AS "_prev_column3" FROM "table_name" o WHERE o.stamp BETWEEN '01.12.2015 00:00' AND '23.01.2016 00:00' ORDER BY o.stamp DESC ) AS w WHERE w.obj_id = 42 AND w.stamp BETWEEN '01.12.2015 00:00' AND '23.01.2016 00:00' AND ("w"."_prev_column_1", "w"."_prev_column_2", "w"."_prev_column_3") IS DISTINCT FROM ("w"."column_1", "w"."column_2", "w"."column_3") ORDER BY w.stamp DESC LIMIT 100;
However, it takes too much time to complete. Is it possible to optimize this query, or the problem should be solved in another way (e.g., custom function)?
CREATE TABLE table_name ( id BIGINT PRIMARY KEY NOT NULL DEFAULT nextval('ds_dyn_sequence'::regclass), obj_id BIGINT NOT NULL, stamp TIMESTAMP NOT NULL DEFAULT now(), column1 BIGINT, column2 BIGINT, column3 BIGINT -- Other fields, all BIGINT NULL ); CREATE UNIQUE INDEX obj_id_stamp_key ON table_name USING BTREE (obj_id, stamp);
The table contains about 104 rows per a hour. The table will be limited to three months, so, the total number of rows will approximately be 2*107.
PostgreSQL version: 9.3
If the version was 9.3 or newer, I would try this rewrite of the query:
SELECT w.stamp, w.column1, w.column2, w.column3 FROM "table_name" AS w JOIN LATERAL ( SELECT p.column1, p.column2, p.column3 FROM "table_name" AS p WHERE p.stamp < w.stamp AND p.stamp >= '2015-12-01'::timestamp ORDER BY p.stamp DESC LIMIT 1 ) AS p ON (w.column1, w.column2, w.column3) IS DISTINCT FROM (p.column1, p.column2, p.column3) WHERE w.obj_id = 42 AND w.stamp >= '2015-12-01'::timestamp AND w.stamp <= '2016-01-23'::timestamp ORDER BY w.stamp DESC LIMIT 100 ;
with the index you have and with an index on
(obj_id, stamp DESC)
Unrelated to efficiency: are you sure that you want to use
BETWEEN with timestamps? It’s more common to work with inclusive-exclusive ranges. In other words, shouldn’t this condition:
AND w.stamp <= '2016-01-23'::timestamp
AND w.stamp < '2016-01-23'::timestamp
<=), you are searching though a range that is 53 whole days long + 1 microsecond (or whatever the accuracy of
There’s probably no better way to get the expected result.
You don’t have to repeat the
w.stamp BETWEEN '01.12.2015 00:00' AND '23.01.2016 00:00' in the outer query.
Are you sure that a missing
w.obj_id = 42 is correct? If this was part of the Derived Table the existing index would be useful, otherwise you need one on
For this query the index should include the three columns used in
LAG to get best performance, but [if you need additional columns] this might be too costly.
Is there no
PARTITION BY for