Question :
I am facing some curious cases in my postgres 9.6, mainly for queries that follow the format:
SELECT a.*
FROM a
INNER JOIN b ON a.id = b.a_id
WHERE
a.field1 = '' AND
b.field2 > ..
ORDER BY a.timestamp DESC
LIMIT 10
The explain analyze on this query returns as:
Limit (cost=0.86..13968.24 rows=10 width=24) (actual time=14933.751..14933.792 rows=10 loops=1)
-> Nested Loop (cost=0.86..6309063.89 rows=4517 width=24) (actual time=14933.751..14933.791 rows=10 loops=1)
-> Index Scan Backward using idx_timestamp on stac (cost=0.43..5229344.35 rows=988799 width=24) (actual time=0.007..2885.512 rows=3535779 loops=1)
Filter: ((field2 > '2020-01-01 18:59:59.999999-05'::timestamp with time zone) AND (field1 = 4))
Rows Removed by Filter: 168
-> Index Scan using id_idx on scene_data (cost=0.43..1.08 rows=1 width=16) (actual time=0.003..0.003 rows=0 loops=3535779)
Index Cond: (a.id = b.a_id)
Filter: ((another_field >= '14000'::double precision) AND (another_field = 'Some string'::text))
Rows Removed by Filter: 1
Planning time: 0.322 ms
Execution time: 14933.836 ms
3 ways this query became significantly faster:
- Fencing (edited)
SELECT * FROM (SELECT a.*
FROM a
INNER JOIN b ON a.id = b.a_id
WHERE
b.field_a = '...'
AND b.field_b >= 14000
AND a.field_a = 4
AND a.field_b > '2020-01-01T23:59:59.999999Z'
ORDER BY a.timestamp DESC) a
ORDER BY a.timestamp DESC LIMIT 10
(Edited)
Explain Analyze
Limit (cost=300441.32..300442.57 rows=10 width=353) (actual time=323.171..325.616 rows=10 loops=1)
-> Gather Merge (cost=300441.32..300731.00 rows=2519 width=353) (actual time=323.169..325.610 rows=10 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Sort (cost=299441.31..299447.60 rows=2519 width=353) (actual time=314.241..314.244 rows=10 loops=2)
Sort Key: table_a.timestamp DESC
Sort Method: top-N heapsort Memory: 27kB
Worker 0: Sort Method: top-N heapsort Memory: 35kB
-> Nested Loop (cost=0.99..299299.00 rows=2519 width=353) (actual time=0.117..297.406 rows=41558 loops=2)
-> Parallel Index Scan using idx_table_b_field_a on table_b (cost=0.43..28730.77 rows=37301 width=16) (actual time=0.067..24.940 rows=41558 loops=2)
Index Cond: (b.field_a = '....'::text)
Filter: (b.field_b >= '14000'::double precision)
Rows Removed by Filter: 174
-> Index Scan using table_a_pkey on table_a (cost=0.56..7.25 rows=1 width=353) (actual time=0.006..0.006 rows=1 loops=83116)
Index Cond: (table_a.id = table_b.a_id)
Filter: ((a.field_b > '2020-01-01 18:59:59.999999-05'::timestamp with time zone) AND (a.field_a = 4))
Planning Time: 0.728 ms
Execution Time: 325.703 ms
- Adding a second field
SELECT a.*
FROM a
INNER JOIN b ON a.id = b.a_id
WHERE
a.field1 = '' AND
b.field2 > ..
ORDER BY a.timestamp, a.id DESC LIMIT 10
-
Increasing limit to 500, which produces a different sort method/key such as
Limit (cost=511622.95..511624.20 rows=500 width=24) (actual time=385.317..385.383 rows=500 loops=1) -> Sort (cost=511622.95..511634.24 rows=4517 width=24) (actual time=385.315..385.348 rows=500 loops=1) Sort Key: a.timestamp DESC Sort Method: top-N heapsort Memory: 64kB -> Nested Loop (cost=1.12..511397.87 rows=4517 width=24) (actual time=0.028..374.960 rows=83116 loops=1) -> Index Scan using index_1 on b (cost=0.56..37701.50 rows=68004 width=16) (actual time=0.019..35.351 rows=83116 loops=1) Index Cond: (field_1 = 'some string'::text) Filter: (field_3 >= '14000'::double precision) Rows Removed by Filter: 349 -> Index Scan using a_pkey on stac (cost=0.56..6.96 rows=1 width=24) (actual time=0.004..0.004 rows=1 loops=83116) Index Cond: (a.id = b.a_id) Filter: (( > '2020-01-01 18:59:59.999999-05'::timestamp with time zone) AND (some_fild = 4)) Planning time: 0.382 ms Execution time: 385.440 ms
But why? How, to some extent, forcing “more complexity” into my query, makes the query planer decide “oh let’s use heap sort”, and they become quicker? I would love to understand why these generate faster queries before implementing them. My team and I also intend to migrate to newer versions but we’re uncertain where to focus our efforts first to have concrete and understandable results.
Edit:
select
*
FROM a
INNER JOIN b
ON b.a_id = a.id --> This is a 1:1 Relationship
WHERE b.field_a = '...' --> ~83465 rows matching in b
AND b.field_b >= 14000 --> ~2938319 rows matching in b
AND a.field_a = 4 --> ~3868810 rows matching in a
AND a.field_b > '2020-01-01T23:59:59.999999Z' --> ~3818223 rows matching in a
ORDER BY observed desc LIMIT 10 OFFSET 0
I noticed a significant loss of performance when running simple counts for field_a and field_b when joining the two tables
select count(*) from a where a.field_a = 4
[0.5s] and
select count(*) from a inner join b on a.id = b.a_id where a.field_a = 4
[15s]
Also notice that table_a.id and table_b.a_id (FK) are UUID
Answer :
Simply put, the optimizer believes that for your current filters, if it just reads the table in the right order (a.timestamp DESC
) and join from there, it won’t take long to find 10 matching rows. In reality, it actually needed to loop through the top 3,535,779 rows before it makes 10 matches.
You haven’t provided stats for (1) but an educated guess would be that changing the query to order in the opposite direction means that it finds matching rows a lot sooner. This would be the case if your other filters are more likely to be true for earlier timestamp
values.
(2) same as (1) you’re now ordering by timestamp (asc)
(3) increasing the number of rows you want it to find will linearly increase the amount of rows it would compute it needs to loop through in order to find enough matching rows. This increases the cost of this method above a certain threshold – the cost of using the filters to drive the query and sorting the results.
My advise would be to give the optimizer a better idea of how selective your other filters are. If they are related to the timestamp
(as suggested by (2) and (3)) then this information is going to be very tricky for it to pick up.