Question :
Do you have any suggestion on how to improve a table structure or its indexes in order to have a single query run faster?
The table has this structure:
docid | character varying(17) | not null
seqnr | numeric | not null default 1
creatinguser | character varying(50) | not null
creatingdate | date |
lastchangeuser | character varying(50) | not null
lastchangedate | date |
description | character varying(50) |
Indexes:
"systemdatabydocument_pkey" PRIMARY KEY, btree (docid, seqnr)
"sdbdchanged" btree (lastchangedate)
"sdbdcreated" btree (creatingdate)
My query (please note that the value in WHERE
changes) is:
SELECT MAX(creatingDate), MAX(lastChangeDate)
FROM agenzia.SystemDataByDocument
WHERE docId = 'Sedi';
and explain analyze on a old/slow system with about 250k records on postgresql 8.3.17 is:
Result (cost=99.05..99.06 rows=1 width=0) (actual time=573.767..573.768 rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..39.81 rows=1 width=8) (actual time=280.417..280.418 rows=1 loops=1)
-> Index Scan Backward using sdbdcreated on systemdatabydocument (cost=0.00..22887.95 rows=575 width=8) (actual time=280.414..280.414 rows=1 loops=1)
Filter: ((creatingdate IS NOT NULL) AND ((docid)::text = 'Sedi'::text))
-> Limit (cost=0.00..59.24 rows=1 width=8) (actual time=293.332..293.333 rows=1 loops=1)
-> Index Scan Backward using sdbdchanged on systemdatabydocument (cost=0.00..34064.51 rows=575 width=8) (actual time=293.328..293.328 rows=1 loops=1)
Filter: ((lastchangedate IS NOT NULL) AND ((docid)::text = 'Sedi'::text))
Total runtime: 573.837 ms
while the same plan on postgresql 9.1.11 (on a new/faster machine with about 500k records) is almost the same:
Result (cost=117.40..117.41 rows=1 width=0) (actual time=537.812..537.813 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..59.73 rows=1 width=4) (actual time=279.532..279.532 rows=0 loops=1)
-> Index Scan Backward using sdbdcreated on systemdatabydocument (cost=0.00..30342.46 rows=508 width=4) (actual time=279.525..279.525 rows=0 loops=1)
Index Cond: (creatingdate IS NOT NULL)
Filter: ((docid)::text = 'Sedi'::text)
InitPlan 2 (returns $1)
-> Limit (cost=0.00..57.67 rows=1 width=4) (actual time=258.270..258.270 rows=0 loops=1)
-> Index Scan Backward using sdbdchanged on systemdatabydocument (cost=0.00..29296.76 rows=508 width=4) (actual time=258.267..258.267 rows=0 loops=1)
Index Cond: (lastchangedate IS NOT NULL)
Filter: ((docid)::text = 'Sedi'::text)
Total runtime: 537.924 ms
Thank you very much.
Addendum 1: adding an index with all fields does not help:
=>create index testtesttest3 on agenzia.systemdatabydocument (docid,creatingDate,lastChangeDate);
CREATE INDEX
Tempo: 2180,780 ms
=> explain analyze SELECT MAX(creatingDate), MAX(lastChangeDate) FROM agenzia.SystemDataByDocument WHERE docId = 'Sedi';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=62.92..62.93 rows=1 width=0) (actual time=281.283..281.284 rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..3.68 rows=1 width=8) (actual time=0.160..0.161 rows=1 loops=1)
-> Index Scan Backward using testtesttest3 on systemdatabydocument (cost=0.00..2118.04 rows=575 width=8) (actual time=0.157..0.157 rows=1 loops=1)
Index Cond: ((docid)::text = 'Sedi'::text)
Filter: (creatingdate IS NOT NULL)
-> Limit (cost=0.00..59.24 rows=1 width=8) (actual time=281.109..281.110 rows=1 loops=1)
-> Index Scan Backward using sdbdchanged on systemdatabydocument (cost=0.00..34063.25 rows=575 width=8) (actual time=281.106..281.106 rows=1 loops=1)
Filter: ((lastchangedate IS NOT NULL) AND ((docid)::text = 'Sedi'::text))
Total runtime: 281.365 ms
(10 rows)
Answer :
As suggested by @Mihai, adding an index that include docid
as first column help. The best solution is to create two of them, one for (docid,creatingDate)
and the second one for (docid,lastChangeDate)
. Please note for simplyfing the index scan, the second fild is ordered DESCENDING
, infact the first two indexes are ignored. The new plan is now:
=# create index testtesttest1 on agenzia.systemdatabydocument (docid,creatingDate);
CREATE INDEX
=# create index testtesttest2 on agenzia.systemdatabydocument (docid,lastChangeDate);
CREATE INDEX
=# create index testtesttest3 on agenzia.systemdatabydocument (docid,creatingDate DESC);
CREATE INDEX
=# create index testtesttest4 on agenzia.systemdatabydocument (docid,lastChangeDate DESC);
CREATE INDEX
=# explain analyze SELECT MAX(creatingDate), MAX(lastChangeDate) FROM agenzia.SystemDataByDocument WHERE docId = 'Sedi';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=5.40..5.41 rows=1 width=0) (actual time=0.151..0.151 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..2.70 rows=1 width=4) (actual time=0.091..0.091 rows=0 loops=1)
-> Index Scan using testtesttest3 on systemdatabydocument (cost=0.00..1372.58 rows=508 width=4) (actual time=0.085..0.085 rows=0 loops=1)
Index Cond: (((docid)::text = 'Sedi'::text) AND (creatingdate IS NOT NULL))
InitPlan 2 (returns $1)
-> Limit (cost=0.00..2.70 rows=1 width=4) (actual time=0.054..0.054 rows=0 loops=1)
-> Index Scan using testtesttest4 on systemdatabydocument (cost=0.00..1372.58 rows=508 width=4) (actual time=0.053..0.053 rows=0 loops=1)
Index Cond: (((docid)::text = 'Sedi'::text) AND (lastchangedate IS NOT NULL))
Total runtime: 0.256 ms
-
In general, for many DBMSes to solve the function
MAX(x)
using an index, a descending order index onx
is needed: this index has the property that the MAX value of x is the first value in the index.PostgreSQL supports a “backwards index scan” which, as its name suggests, scans the index in reverse order, so that the MAX value is the first value it finds in an index in normal (ascending) order.
-
Then to find the MAX(x) for a key value, eg
select MAX(x) from mytable where id = 27
will require a composite index:- the first column needs to be
id
so that the data for the givenid
can be
found quickly - the second column should be
x descending
so that the first value of the next level of the index is the MAX value of x (or in PostgreSQL, in “normal”/ascending order).
- the first column needs to be
-
To find multiple max values in one query, one needs to create multiple indexes:
- compound indexes are useful when one is specifying more conditions in the where clause, eg
col1 = 27, col2 = 'black'
, while - multiple indexes are useful when retrieving multiple columns or expressions for the same where clause. In this case you enable the DBMS to perform two index scans and combine the results.
- compound indexes are useful when one is specifying more conditions in the where clause, eg
Putting all the above together, you should define two indexes:
create index on agenzia.SystemDataByDocument(docId, creatingDate desc);
and
create index on agenzia.SystemDataByDocument(docId, lastChangeDate desc);
In PostgreSQL the second columns of these index could be stored in “normal”/ascending order.