Postgres slow query with order by, index and limit

Posted on

Question :

I am trying to improve the performance of a postgres(9.6) query. Here is my schema and table contains about 60 million rows.

          Column          |            Type             | Modifiers
--------------------------+-----------------------------+-----------
 transaction_id           | text                        | not null
 network_merchant_name    | text                        |
 network_merchant_id      | text                        |
 network_merchant_mcc     | integer                     |
 network_merchant_country | text                        |
 issuer_country           | text                        |
 merchant_id              | text                        |
 remapped_merchant_id     | text                        |
 created_at               | timestamp without time zone |
 updated_at               | timestamp without time zone |
 remapped_at              | timestamp without time zone |
Indexes:
    "mapped_transactions_pkey" PRIMARY KEY, btree (transaction_id)
    "ix_mapped_transactions_remapped_at" btree (remapped_at NULLS FIRST)

Here is the query I am trying to execute.

SELECT *
FROM mapped_transactions
ORDER BY remapped_at ASC NULLS FIRST
LIMIT 10000;

Here is the query plan:

    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..1511.67 rows=10000 width=146) (actual time=327049.374..327345.341 rows=10000 loops=1)
   Buffers: shared hit=574937 read=210425 dirtied=356 written=4457
   I/O Timings: read=146625.381 write=59.637
   ->  Index Scan using ix_mapped_transactions_remapped_at on mapped_transactions  (cost=0.57..16190862.91 rows=107145960 width=146) (actual time=327049.364..327339.402 rows=10000 loops=1)
         Buffers: shared hit=574937 read=210425 dirtied=356 written=4457
         I/O Timings: read=146625.381 write=59.637
 Planning time: 0.125 ms
 Execution time: 327348.322 ms
(8 rows)  

I don’t understand why it takes so much time when there is an index on the remapped_atcolumn.

On the other hand if I order in reverse it is fast.

SELECT *
FROM mapped_transactions
ORDER BY remapped_at DESC NULLS LAST
LIMIT 10000;

and the plan is:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..1511.67 rows=10000 width=146) (actual time=0.020..9.268 rows=10000 loops=1)
   Buffers: shared hit=1802
   ->  Index Scan Backward using ix_mapped_transactions_remapped_at on mapped_transactions  (cost=0.57..16190848.04 rows=107145866 width=146) (actual time=0.018..4.759 rows=10000 loops=1)
         Buffers: shared hit=1802
 Planning time: 0.080 ms
 Execution time: 11.561 ms
(6 rows)

Can someone please help me how I can improve the performance of the first query?

Updates

  1. I solved this by rebuilding the table and reindexing the data. VACUUM FULL ANALYZE was not an option because the table was being used and I did not want to lock it.

  2. The performance of the index is deteriorating very fast. I rebuilt the index 7 hours ago and the performance was good. Now the queries answer in about 10s. Please note that that this table is write heavy. How can I make the index fast? Do I have to re-index the table frequently? There are no deletions in this table but a lot of updates.

Answer :

I can’t confirm with PostgreSQL 9.5. So this may be a regression. Sample data,

CREATE TABLE mapped_transactions(remapped_at)
AS
  SELECT CASE WHEN x::int % 42 = 0 THEN null ELSE x::int END
  FROM generate_series( 1, 40e6 )
    AS gs(x);

CREATE INDEX ON mapped_transactions (remapped_at NULLS FIRST);
VACUUM ANALYZE mapped_transactions;

Here we do NULLS FIRST and it works,

EXPLAIN ANALYZE
SELECT *
FROM mapped_transactions
ORDER BY remapped_at ASC NULLS FIRST LIMIT 10000;
                                                                                     QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..260.24 rows=10000 width=4) (actual time=0.069..4.308 rows=10000 loops=1)
   ->  Index Only Scan using mapped_transactions_remapped_at_idx on mapped_transactions  (cost=0.56..1038716.81 rows=40000016 width=4) (actual time=0.067..2.740 rows=10000 loops=1)
         Heap Fetches: 0
 Planning time: 0.232 ms
 Execution time: 5.017 ms
(5 rows)

Trying DESC NULLS LAST

EXPLAIN ANALYZE
SELECT * FROM mapped_transactions
ORDER BY remapped_at DESC NULLS LAST LIMIT 10000;
                                                                                          QUERY PLAN                                                                                          
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..260.24 rows=10000 width=4) (actual time=0.073..4.429 rows=10000 loops=1)
   ->  Index Only Scan Backward using mapped_transactions_remapped_at_idx on mapped_transactions  (cost=0.56..1038716.81 rows=40000016 width=4) (actual time=0.071..2.865 rows=10000 loops=1)
         Heap Fetches: 0
 Planning time: 0.114 ms
 Execution time: 5.137 ms
(5 rows)

You may have to work up a test case. It’s not evident what you’re experiencing.

I ran some more tests and it seems like this solution is not going to work for me. The index is constantly being degraded over time and if I don’t re-index (or drop and create again) the queries I want to run will get slower and slower. There are ways to create the index without blocking other reads/writes but at this time I am not going to go with them because they are not easily scalable. I cannot solve this problem with the knowledge I have so I am going to go with another approach. This approach uses an index with a column that is constant and it never gets updated. Thanks everyone for helping me out.

Leave a Reply

Your email address will not be published. Required fields are marked *