Strange performance of a top-N query with Postgres

Posted on

Question :

I have a table transfer_event from which I need to find the latest n entries for a set of tokens; both n and the tokens are supplied by the user. I am using Postgres 9.6.

The question is that if I use one query to get the top n entries for a list of tokens, the query is a lot slower than all the queries for getting the top n entries for each token individually (55s vs 10s), and I don’t understand what work is done for the extra 45s.

For the combined query, I settled on using a lateral join (the approach using a window function is even slower):

-- There's 50 tokens to look for
with tokens(t) as (values('0x..',..))
select *
  from tokens t
       cross join lateral (
         select *
           from transfer_event c
          where c.token = t.t
            and c.block_range @> 2147483647
          order by c.timestamp desc
          limit 5) x

Here’s the query plan:

 Nested Loop  (cost=1.19..855.09 rows=250 width=250) (actual time=0.120..56313.953 rows=230 loops=1)
   Buffers: shared hit=6567497 read=3119272
   CTE tokens
     ->  Values Scan on "*VALUES*"  (cost=0.00..0.62 rows=50 width=32) (actual time=0.003..0.072 rows=50 loops=1)
   ->  CTE Scan on tokens t  (cost=0.00..1.00 rows=50 width=32) (actual time=0.006..0.168 rows=50 loops=1)
   ->  Limit  (cost=0.56..16.97 rows=5 width=250) (actual time=1119.814..1126.273 rows=5 loops=50)
         Buffers: shared hit=6567497 read=3119272
         ->  Index Scan Backward using attr_6_7_transfer_event_timestamp on transfer_event c  (cost=0.56..1432759.34 rows=436637 width=250) (actual time=1119.814..1126.270 rows=5 loops=50)
               Filter: ((block_range @> 2147483646) AND (token = t.t))
               Rows Removed by Filter: 1875532
               Buffers: shared hit=6567497 read=3119272
 Planning time: 3.309 ms
 Execution time: 56314.050 ms
(13 rows)

This query takes 55s; if I run separate queries, one for each token, going through all of them only takes 10s; my understanding is that the query above is essentially what I am doing manually: loop over each token and execute

select * from transfer_event c
 where c.token = '$tk'
   and c.block_range @> 2147483646
 order by timestamp desc limit 5

Query plan for one example token:

 Limit  (cost=0.56..315.27 rows=5 width=250) (actual time=8.664..10.816 rows=5 loops=1)
   ->  Index Scan Backward using attr_6_7_transfer_event_timestamp on transfer_event c  (cost=0.56..1432759.34 rows=22763 width=250) (actual time=8.663..10.813 rows=5 loops=1)
         Filter: ((block_range @> 2147483646) AND (token = '0xec67005c4e498ec7f55e092bd1d35cbc47c91892'::text))
         Rows Removed by Filter: 18562
 Planning time: 1.437 ms
 Execution time: 10.871 ms
(6 rows)

Additional information based on replies

Thanks to jjanes and Laurenz Albe for commenting on the question; based on the comments additional information:

The timings were taken with warm caches (by running the queries multiple times before the plans that I posted)

For the token-by-token lookup, Postgres does indeed vary the query plan. The one above is used most commonly, but there are two other plans used for a few tokens.

Used for 3 of the tokens:

 Limit  (cost=5284.08..5284.09 rows=5 width=250) (actual time=5.123..5.125 rows=5 loops=1)
   Buffers: shared hit=1766
   ->  Sort  (cost=5284.08..5287.43 rows=1339 width=250) (actual time=5.122..5.122 rows=5 loops=1)
         Sort Key: "timestamp" DESC
         Sort Method: top-N heapsort  Memory: 27kB
         Buffers: shared hit=1766
         ->  Bitmap Heap Scan on transfer_event c  (cost=62.94..5261.84 rows=1339 width=250) (actual time=0.671..3.990 rows=2184 loops=1)
               Recheck Cond: (token = '0x09617f6fd6cf8a71278ec86e23bbab29c04353a7'::text)
               Filter: (block_range @> 2147483646)
               Heap Blocks: exact=1740
               Buffers: shared hit=1763
               ->  Bitmap Index Scan on attr_6_1_transfer_event_token  (cost=0.00..62.61 rows=1339 width=0) (actual time=0.405..0.405 rows=2184 loops=1)
                     Index Cond: (token = '0x09617f6fd6cf8a71278ec86e23bbab29c04353a7'::text)
                     Buffers: shared hit=23
 Planning time: 1.265 ms
 Execution time: 5.222 ms
(16 rows)

Used for 4 of the tokens:

 Limit  (cost=8.59..8.60 rows=1 width=250) (actual time=0.058..0.058 rows=0 loops=1)
   Buffers: shared hit=7
   ->  Sort  (cost=8.59..8.60 rows=1 width=250) (actual time=0.057..0.057 rows=0 loops=1)
         Sort Key: "timestamp" DESC
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=7
         ->  Index Scan using attr_6_1_transfer_event_token on transfer_event c  (cost=0.56..8.58 rows=1 width=250) (actual time=0.028..0.028 rows=0 loops=1)
               Index Cond: (token = '0x0cbe2df57ca9191b64a7af3baa3f946fa7df2f25'::text)
               Filter: (block_range @> 2147483646)
               Buffers: shared hit=4
 Planning time: 1.404 ms
 Execution time: 0.107 ms
(12 rows)

Answer :

You can see that your second query has Rows Removed by Filter: 18562, while at an average, the first query has Rows Removed by Filter: 1875532.

If you run 50 individual queries, the optimizer will optimize each of them individually, and in the cases where only few rows satisfy the filter condition, it will probably choose a different and better execution plan. The query with the lateral join does not have that option: it has to pick a plan that it uses for all 50 loops, and this plan turns out to be bad for some of the values.

One thing you can try is indexing the conditions on transfer_event.block_range and transfer_event.token and making sure that the index on transfer_event.timestamp cannot be used, for example with ORDER BY c.timestamp + INTERVAL '0 days' DESC. I cannot say if that will improve the performance, but it is worth a try.

Leave a Reply

Your email address will not be published. Required fields are marked *