Postgresql index stops being used after time, gets picked back up after re-creating

Posted on

Question :

I’m confused as to what exactly is going on here. Hoping someone can provide some insight. Why did the index stop being used after time? Why does re-creating it work? Would there be a better index?

At the start I want to already say: thanks to anyone who takes a look or leaves a suggestion.

Here follows all data I could think to gather:

  • Postgresql 11.2
  • Table with 30 million rows, 20GB in size

Conf setting:

shared_buffers = 1024MB
effective_cache_size = 2048MB
checkpoint_completion_target = 0.5
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 32MB # should probably not raise

# Cannot be raised
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4

Using this query to get some statistics:

    pt.tablename AS TableName
    ,t.indexname AS IndexName
    ,to_char(pc.reltuples, '999,999,999,999') AS TotalRows
    ,pg_size_pretty(pg_relation_size(quote_ident(pt.tablename)::text)) AS TableSize
    ,pg_size_pretty(pg_relation_size(quote_ident(t.indexrelname)::text)) AS IndexSize
    ,to_char(t.idx_scan, '999,999,999,999') AS TotalNumberOfScan
    ,to_char(t.idx_tup_read, '999,999,999,999') AS TotalTupleRead
    ,to_char(t.idx_tup_fetch, '999,999,999,999') AS TotalTupleFetched
FROM pg_tables AS pt
LEFT OUTER JOIN pg_class AS pc 
    ON pt.tablename=pc.relname
        pc.relname AS TableName
        ,pc2.relname AS IndexName
    FROM pg_index AS pi
    JOIN pg_class AS pc 
        ON pc.oid = pi.indrelid
    JOIN pg_class AS pc2 
        ON pc2.oid = pi.indexrelid
    JOIN pg_stat_all_indexes AS psai 
        ON pi.indexrelid = psai.indexrelid 
    ON pt.tablename = T.TableName
WHERE pt.schemaname='public'

The result:

                  tablename                   |                         indexname                          |    totalrows     | tablesize  | indexsize  | totalnumberofscan |  totaltupleread  | totaltuplefetched 
 mytable                                      | myindex                                                    |       30,000,000 | 20 GB      | 48 MB      |           111,166 |    3,294,259,230 |       13,524,499

The index:

"myindex" btree (zulu_india, mike_romeo) WHERE a_and_not_b(four_lima, bravo_romeo)

The function is this:

create or replace function a_and_not_b(four_lima bool, bravo_romeo bool)
returns BOOLEAN as $$
   select $1 AND NOT $2;
language sql immutable;

The query:

    distinct quebec,
    bravo_romeo = false
    and four_lima = true
    and zulu_india <= '2021-03-12 00:00:00.005'
order by
    mike_romeo asc,
    quebec asc
limit 100 offset 1300;

quebec is an id(bigint)

zulu_india is a timestamp without timezone

mike_romeo is a text

bravo_romeo is a boolean

four_lima is a boolean

And the index does not get used. A sequential scan gets used.

After recreating, the stats look like this:

                  tablename                   |                         indexname                          |    totalrows     | tablesize  | indexsize  | totalnumberofscan |  totaltupleread  | totaltuplefetched 
 mytable                                      | myindex                                                    |       30,000,000 | 20 GB      | 688 kB     |                 0 |                0 |           0

Smaller, but the initial one was small too.

Statistics for the query without the index:

Limit  (cost=17091631.17..17091646.17 rows=100 width=3254) (actual time=202217.922..202218.138 rows=100 loops=1)
    Buffers: shared hit=1148 read=2763740
  ->  Unique  (cost=17091436.17..18273890.07 rows=7883026 width=3254) (actual time=202215.055..202217.967 rows=1400 loops=1)
          Buffers: shared hit=1148 read=2763740
        ->  Sort  (cost=17091436.17..17111143.73 rows=7883026 width=3254) (actual time=202215.049..202215.370 rows=1400 loops=1)
                Sort Key: mike_romeo, quebec, zulu_india, four_lima, bravo_romeo
                Sort Method: quicksort  Memory: 3798kB
                Buffers: shared hit=1148 read=2763740
              ->  Seq Scan on oscar_tango mike_three  (cost=0.00..3247634.00 rows=7883026 width=3254) (actual time=2076.567..202130.550 rows=4958 loops=1)
                      Filter: ((NOT bravo_romeo) AND four_lima AND (zulu_india <= 'four_november'::timestamp without time zone))
                      Rows Removed by Filter: 38603368
                      Buffers: shared hit=1148 read=2763740
Planning time: 6.364 ms
Execution time: 202218.535 ms

Statistics for the query with the index:

Limit  (cost=17180505.79..17180520.79 rows=100 width=3253) (actual time=143.841..143.964 rows=100 loops=1)
    Buffers: shared read=3290
  ->  Unique  (cost=17180310.79..18373240.69 rows=7952866 width=3253) (actual time=141.242..143.804 rows=1400 loops=1)
          Buffers: shared read=3290
        ->  Sort  (cost=17180310.79..17200192.96 rows=7952866 width=3253) (actual time=141.238..141.530 rows=1400 loops=1)
                Sort Key: mike_romeo, quebec, zulu_india, four_lima, bravo_romeo
                Sort Method: quicksort  Memory: 3798kB
                Buffers: shared read=3290
              ->  Bitmap Heap Scan on oscar_tango mike_three  (cost=2092.88..3213354.03 rows=7952866 width=3253) (actual time=8.761..73.799 rows=4958 loops=1)
                      Recheck Cond: ((zulu_india <= 'four_november'::timestamp without time zone) AND four_lima AND (NOT bravo_romeo))
                      Heap Blocks: exact=3207
                      Buffers: shared read=3290
                    ->  Bitmap Index Scan on alpha  (cost=0.00..104.66 rows=7952866 width=0) (actual time=8.005..8.005 rows=4958 loops=1)
                            Index Cond: (zulu_india <= 'four_november'::timestamp without time zone)
                            Buffers: shared read=83
Planning time: 96.897 ms
Execution time: 147.054 ms

Answer :

The cost estimates of the two plans are almost identical, so small differences in the size of the index could be enough to change which one is believed to be faster.

If your row estimates were remotely accurate, the two plans wouldn’t be so close to tied, and so wouldn’t be subject to changing position based on small changes in index packing density.

There are some unavoidable reasons that estimates can be way off, but in your case even the very simple “Bitmap Index Scan on alpha” with a single hard-coded comparator values is way off, and that is hard to explain by anything other than your table statistics are extremely out of date. You should run VACUUM ANALYZE on the table (or on the whole database), as well as investigate how they got so out of date to begin with.

Now I see what is going on. I was ignoring the partial index definition you showed us, because the name of the defined index didn’t match the name of the index in the plan, and because the WHERE clause in the index didn’t match the WHERE clause of the query. But that was dumb of me. The names don’t match because the plan was anonymized, and PostgreSQL is smart enough to match up the WHERE clauses because it in-lines the function definition somewhere along the line. So this is a case where the poor estimates are largely unavoidable without taking special steps. Almost all the selectivity of the index comes from its where clause, not from the timestamp cutoff. But this WHERE clause selectivity is poorly estimated. The two booleans must have an interdependence which makes the combination used here much more selective than you would get by multiplying their two selectivities (which is what PostgreSQL does). PostgreSQL “knows” that the partial index is very small, but doesn’t use that knowledge in estimating the joint selectivity.

If this analysis is correct, you can probably fix the problem by creating a multicolumn statistics:

create statistics br_fl_stats on bravo_romeo, four_lima from mytable;
analyze mytable;

However, I suspect that you would need to upgrade to v12 for this to work, as I think you need multicolumn MCV statistics, which weren’t added until then.

Leave a Reply

Your email address will not be published. Required fields are marked *