Question :
I was wondering about the relative performance of the solutions provided in the answers to a question on stackoverflow, I decided to run some tests.
The OP wanted to get the first matching row given a set of conditions with decreasing precedence. Both solutions involved a pseudocolumn, but one (mine) involved multiple SELECT
statements UNION ALL
ed together, while the other one constructed a CASE
expression.
I share my results in the hope that somebody will find this useful.
Answer :
Your test design is flawed. You are testing incorrect results.
I added an answer to the question on SO you are referring to.
In your CASE version you can’t add ORDER BY col1, col2
. Would have to be ORDER BY precedence
. But it would still be incorrect. You would have to ORDER BY
the sum of scores for individual conditions to get rows fulfilling the most conditions first.
Similarly, your UNION ALL version yields incorrect results.
However, none of this seems necessary, there is a simpler and faster solution with UNION ALL
. Refer to my answer to the SO question or use the sqlfiddle to play with
For a table consisting only a few rows, I could not see any difference in the execution time (while the query plans differed, results not shown).
Then I reused a test table from an earlier experiment. This table consists 436421 rows while having the following structure:
Table "public.avg_test"
Column | Type | Modifiers
--------+-----------------------------+-----------
col1 | timestamp without time zone |
col2 | integer |
Now the test runs were the following (the conditions chosen return less than 2 % of all rows without LIMIT
):
CASE version
EXPLAIN ANALYZE SELECT
col1
, col2
, CASE
WHEN col1 = '2012-01-01 12:00:00' THEN 1
WHEN col2 > 98 THEN 2 ELSE 100000
END AS precedence
FROM avg_test
ORDER BY col1, col2
LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=11090.78..11090.78 rows=1 width=12) (actual time=144.559..144.559 rows=1 loops=1)
-> Sort (cost=11090.78..12182.13 rows=436539 width=12) (actual time=144.557..144.557 rows=1 loops=1)
Sort Key: col1, col2
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on avg_test (cost=0.00..8908.09 rows=436539 width=12) (actual time=0.057..88.477 rows=436421 loops=1)
Total runtime: 144.595 ms
(6 rows)
UNION ALL version
EXPLAIN ANALYZE
SELECT
col1
, col2
, 1 AS precedence
FROM avg_test
WHERE col1 = '2012-01-01 12:00:00'
UNION ALL
SELECT
col1
, col2
, 2 AS precedence
FROM avg_test
WHERE col2 > 98
ORDER BY col1, col2
LIMIT 1
;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=15766.57..15766.57 rows=1 width=16) (actual time=85.771..85.771 rows=1 loops=1)
-> Sort (cost=15766.57..15788.75 rows=8873 width=16) (actual time=85.769..85.769 rows=1 loops=1)
Sort Key: public.avg_test.col1, public.avg_test.col2
Sort Method: top-N heapsort Memory: 25kB
-> Result (cost=0.00..15722.20 rows=8873 width=16) (actual time=0.056..84.276 rows=8802 loops=1)
-> Append (cost=0.00..15722.20 rows=8873 width=16) (actual time=0.056..83.153 rows=8802 loops=1)
-> Seq Scan on avg_test (cost=0.00..7816.74 rows=99 width=12) (actual time=0.056..39.860 rows=101 loops=1)
Filter: (col1 = '2012-01-01 12:00:00'::timestamp without time zone)
-> Seq Scan on avg_test (cost=0.00..7816.74 rows=8774 width=12) (actual time=0.046..42.363 rows=8701 loops=1)
Filter: (col2 > 98)
Total runtime: 85.815 ms
(11 rows)
(Both were run several times to exclude caching effects.)
In this setup the UNION ALL
was faster because it hit much less rows – this is reflected in both the sequential scans and the sort.
Now I created two indexes:
CREATE INDEX idx_avg_1 ON avg_test (col1, col2);
CREATE INDEX idx_avg_2 ON avg_test (col2);
ANALYZE avg_test;
The subsequent runs gave very different results:
CASE version with indexes
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.05 rows=1 width=12) (actual time=0.024..0.025 rows=1 loops=1)
-> Index Scan using idx_avg_1 on avg_test (cost=0.00..21105.89 rows=436421 width=12) (actual time=0.023..0.023 rows=1 loops=1)
Total runtime: 0.056 ms
(3 rows)
UNION ALL version with indexes
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3095.29..3095.30 rows=1 width=16) (actual time=12.037..12.037 rows=1 loops=1)
-> Sort (cost=3095.29..3124.87 rows=11832 width=16) (actual time=12.036..12.036 rows=1 loops=1)
Sort Key: public.avg_test.col1, public.avg_test.col2
Sort Method: top-N heapsort Memory: 25kB
-> Result (cost=0.00..3036.13 rows=11832 width=16) (actual time=0.026..10.181 rows=8802 loops=1)
-> Append (cost=0.00..3036.13 rows=11832 width=16) (actual time=0.025..8.374 rows=8802 loops=1)
-> Index Scan using idx_avg_1 on avg_test (cost=0.00..187.92 rows=99 width=12) (actual time=0.025..0.055 rows=101 loops=1)
Index Cond: (col1 = '2012-01-01 12:00:00'::timestamp without time zone)
-> Bitmap Heap Scan on avg_test (cost=223.23..2729.89 rows=11733 width=12) (actual time=1.997..7.093 rows=8701 loops=1)
Recheck Cond: (col2 > 98)
-> Bitmap Index Scan on idx_avg_2 (cost=0.00..220.30 rows=11733 width=0) (actual time=1.433..1.433 rows=8701 loops=1)
Index Cond: (col2 > 98)
Total runtime: 12.105 ms
(13 rows)
This is a huge difference, and this time the CASE
was the winner. An execution plan couldn’t be any simpler than this…