Optimizing PostgreSQL with Index

Posted on

Question :

I’m having a hard time trying to get PostgreSQL to use an index that I created. I understand that PostgreSQL is smart enough to determine when an index should be used because in most cases a seq scan is fine. However, I want to make my table to run faster. Can someone tell me what am I doing wrong with my index? My Query Plan time and cost does not change.

I know when I use the index in Oracle, it actually improves the performance. I’m not sure how I can get PostgreSQL to do the same.

This is my query:

select sum(lo_revenue), d_year, p_brand1 
from lineorder, dwdate, part, supplier 
where lo_orderdate = d_datekey 
    and lo_partkey = p_partkey 
    and lo_suppkey = s_suppkey 
    and p_brand1 between 'MFGR#2221' and 'MFGR#2228' 
    and s_region = 'ASIA' 
    group by d_year, p_brand1 
    order by d_year, p_brand1;

This is my index:

create index index1 on lineorder(lo_revenue,lo_orderdate,lo_partkey,lo_suppkey);

Here is what I get when I run an EXPLAIN:

QUERY PLAN                                

-------------------------------------------------------------------------------------------------------
------------------------------
 Sort  (cost=166357.23..166357.26 rows=13 width=17)
   Sort Key: dwdate.d_year, part.p_brand1
   ->  HashAggregate  (cost=166356.86..166356.99 rows=13 width=17)
         ->  Hash Join  (cost=6023.57..166338.87 rows=2398 width=17)
               Hash Cond: (lineorder.lo_orderdate = dwdate.d_datekey)
               ->  Hash Join  (cost=5928.06..166207.39 rows=2398 width=17)
                     Hash Cond: (lineorder.lo_suppkey = supplier.s_suppkey)
                     ->  Hash Join  (cost=5869.45..166084.74 rows=10682 width=21)
                           Hash Cond: (lineorder.lo_partkey = part.p_partkey)
                           ->  Seq Scan on lineorder  (cost=0.00..137603.98 rows=6001198 width=16)
                           ->  Hash  (cost=5865.00..5865.00 rows=356 width=13)
                                 ->  Seq Scan on part  (cost=0.00..5865.00 rows=356 width=13)
                                       Filter: (((p_brand1)::text >= 'MFGR#2221'::text) AND ((p_brand1)
::text <= 'MFGR#2228'::text))
                     ->  Hash  (cost=53.00..53.00 rows=449 width=4)
                           ->  Seq Scan on supplier  (cost=0.00..53.00 rows=449 width=4)
                                 Filter: ((s_region)::text = 'ASIA'::text)
               ->  Hash  (cost=63.56..63.56 rows=2556 width=8)
                     ->  Seq Scan on dwdate  (cost=0.00..63.56 rows=2556 width=8)
(18 rows)

Time: 0.821 ms

Answer :

I found a better index. I think I was reading the query plan results, which caused me to determine the wrong index order.. Now I’m reading it from bottom to top, and it gives me this index:

create index index1 on lineorder(lo_partkey,lo_suppkey,lo_orderdate);

However, is there anyway of making it even better??

For this particular query, the following index would work better:

create index index1 on lineorder(lo_suppkey, lo_partkey, lo_orderdate);

Or possibly better two separate indexes for supplier+partkey and for orderdate:

create index index1 on lineorder(lo_suppkey, lo_partkey);
create index index2 on lineorder(lo_orderdate);

That’s because your query is filtered by a single supplier, then by a range of parts, and finally the results are grouped together by year.

In my personal experience, using explicit joins often helps with identifying these sorts of things, and also in reducing the chances of missing a joining condition and causing a cross table join as a result.

Leave a Reply

Your email address will not be published. Required fields are marked *