Question :
I have problem with two queries on PostgreSQL v8.2. It seem like the optimizer chooses a wrong index:
db=# explain
select count(*), messagetype, alias
from event
where
templateinfoid = 10 and
templateid = 12458 and
'2013-03-01 00:00:00' < gw_out_time and
'2013-03-31 23:59:59' >= gw_out_time
group by messagetype, alias
order by alias asc;
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1128.74..1128.74 rows=1 width=14)
Sort Key: alias
-> HashAggregate (cost=1128.72..1128.73 rows=1 width=14)
-> Index Scan using idx_event_tid_tinfoid_outtime on event (cost=0.00..1125.93 rows=372 width=14)
Index Cond: ((templateinfoid = 10) AND (templateid = 12458) AND ('2013-03-01 00:00:00'::timestamp without time zone < gw_out_time) AND ('2013-03-31 23:59:59'::timestamp without time zone >= gw_out_time))
(5 rows)
db=# explain
select count(*), messagetype, alias
from event
where
templateinfoid = 10 and
templateid = 12458 and
'2013-03-02 00:00:00' < gw_out_time and
'2013-03-31 23:59:59' >= gw_out_time
group by messagetype, alias
order by alias asc;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=45.48..45.51 rows=1 width=14)
-> Sort (cost=45.48..45.49 rows=1 width=14)
Sort Key: alias, messagetype
-> Index Scan using idx_event_gw_out_time on event (cost=0.00..45.47 rows=1 width=14)
Index Cond: (('2013-03-02 00:00:00'::timestamp without time zone < gw_out_time) AND ('2013-03-31 23:59:59'::timestamp without time zone >= gw_out_time))
Filter: ((templateinfoid = 10) AND (templateid = 12458))
(6 rows)
I have read 54.1. Row Estimation Examples (http://www.postgresql.org/docs/8.2/static/row-estimation-examples.html) article and I counted the selectivity of rows by templateid and templateinfoid columns.
My stats:
db # select * from pg_stats where tablename = 'event' and attname IN ('gw_out_time','templateid','templateinfoid');
-[ RECORD 1 ]-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | event
attname | gw_out_time
null_frac | 0.00166667
avg_width | 8
n_distinct | -1
most_common_vals |
most_common_freqs |
histogram_bounds | {"2010-01-03 18:13:27.334","2010-11-21 08:44:41.704","2011-05-06 22:08:09.59","2011-09-15 22:21:14.904","2011-12-14 15:03:18.674","2012-03-20 14:18:21.59","2012-06-22 10:22:33.607","2012-09-13 11:19:39.55","2012-11-22 12:34:26.521","2012-12-30 17:50:00.672","2013-03-01 21:04:18.16"}
correlation | 0.989909
-[ RECORD 2 ]-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | event
attname | templateid
null_frac | 0.02
avg_width | 8
n_distinct | 439
most_common_vals | {3182,5125,699,8415,9616,8975,6249,934,5320,2877}
most_common_freqs | {0.078,0.0663333,0.043,0.0403333,0.0403333,0.0363333,0.0303333,0.0276667,0.0206667,0.0186667}
histogram_bounds | {42,2271,4944,5597,6904,8824,9674,10135,10759,12356,12946}
correlation | 0.702483
-[ RECORD 3 ]-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | event
attname | templateinfoid
null_frac | 0.02
avg_width | 8
n_distinct | 8
most_common_vals | {10,3,6,2,5,12,11,9}
most_common_freqs | {0.697667,0.184333,0.0623333,0.0176667,0.011,0.005,0.00133333,0.000666667}
histogram_bounds |
correlation | 0.741422
and
db # SELECT reltuples, relpages FROM pg_class WHERE relname = 'event';
reltuples | relpages
-------------+----------
2.74586e+08 | 18752867
(1 row)
so I did:
templateinfoid selectivity:
0.697667 * 18752867 = 13.083.256,461289
templateid selectivity:
(1 + (12946 - bckt[2].min)/(bckt[2].max - bckt[2].min))/num_bckts * 18752867 =
(1 + (12458 - 12356)/(12946 - 12356))/10 * 18752867 =
0,117288136 * 18752867 =
2.199.488,807457627
How to count rows selectivity for the gw_out_time column which is timestamp?
Answer :
Consider
INDEX(templateinfoid, templateid, -- first, but in either order
gw_out_time,
messagetype, alias) -- last, for "covering" (either order)
Also, consider either of these (to avoid an extra sort):
group by messagetype, alias
order by messagetype, alias
or
group by alias, messagetype
order by alias, messagetype