As a follow-up to this question, I have one of my own.
The original question concerns the use of a
CASE statement with > than 100 options and the statement has to be used in 4 places – so obviously the SQL is pretty hairy. The OP’s question concerned SQL Server 2012, my question, however, is about PostgreSQL.
In my answer, I proposed the use of a
VIEW as a “one-stop-shop” solution – i.e. declare the
VIEW once, use it anywhere – and this applies for any query in the future also and any variant thereof.
So, the OP now has 5 options:
I excluded the option of changing the underlying data since, frequently in this forum, consultants/DBA’s/programmers are not allowed change underlying data – makes the answers more interesting also!
CASE expression with > 100 options (x4) is horribly cumbersome and complex – but when is it a good idea to use
CASE and at what point does it become a minus rather than a plus?
In my opinion (and not just because it’s my answer!), a
VIEW is the optimal solution – it’s simple, will work for all RDBMS’s and is permanent and will work for all queries now and into the future should the OP wish to modify the query.
JOIN LATERAL construct will work also as a kind of derived table, which is pretty much what a
CTE is also. They can both be used down the line in the same query.
Which of the 5 approaches is better/best and at what point does the technical (ease of use, speed, query plan optimisation) tilt in the particular solution’s favour?
I would use a translation table in a
LATERAL subquery. Demo (Postgres 10+):
CREATE TABLE ac_translate ( ord_nr int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY , like_pattern text NOT NULL , target text NOT NULL ); INSERT INTO ac_translate(like_pattern, target) VALUES ('AIR NEW Z%' , 'AIR NEW ZEALAND') -- rows in order of precedence! , ('AIR BP%' , 'AIR BP') , ('ADDICTION ADVICE%', 'ADDICTION ADVICE') , ('AIA%' , 'AIA') ;
ord_nr to adjust priorities.
SELECT COALESCE(act.target, ac.accountName) AS accountname , SUM(ac.charge_amount) AS gstexcl FROM account_code ac LEFT JOIN LATERAL ( SELECT a1.target FROM ac_translate a1 WHERE ac.accountname LIKE a1.like_pattern ORDER BY a1.ord_nr LIMIT 1 ) act ON true GROUP BY 1;
Or with a correlated subquery:
SELECT COALESCE( (SELECT a1.target FROM ac_translate a1 WHERE ac.accountname LIKE a1.like_pattern ORDER BY a1.ord_nr LIMIT 1), ac.accountName) AS accountname , SUM(ac.charge_amount) AS sum_amount FROM account_code ac GROUP BY 1;
This is easy to handle, keeps the long list of options out of the code and puts it into a table where it can be maintained properly. And is moderately fast.
We can’t easily use a plain
LEFT JOIN ac_translate since
CASE traverses patterns in order to return the single, first match. We can’t just join to a set, that might return multiple matches if one pattern is the prefix of the other, like ‘AIR%’ and ‘AIR N%’. So we use an ordering number in the translation table to prioritize matches in the subquery.
ELSE clause in the referenced question resolves to the original value. That’s implemented with
COALESCE here. Basically, this combines the virtues of the top two answers over there.
To top it off, I slipped in
GROUP BY 1 as another way to avoid repeating lengthy expressions (which is not actually needed here any more). See:
Performance deteriorates with the number of rows in the translation table since Postgres is forced to walk through all of them sequentially and evaluate the
LIKE expression. If that’s not fast enough any more, we need index support, but the expression is not “sargable” – the expression we need to index is to the right of the operator and there is no
There is a workaround, though. My example requires that patterns have at least 3 leading characters (3 is my arbitrary pick). Add a
CHECK constraint in the translation table to enforce this rule, and an expression index on the leading trigram:
CREATE INDEX ac_translate_left_idx ON ac_translate (left(like_pattern, 3));
Adapt the query:
SELECT COALESCE(act.target, ac.accountName) AS accountname , SUM(ac.charge_amount) AS gstexcl FROM account_code ac LEFT JOIN LATERAL ( SELECT a1.target FROM ac_translate a1 WHERE left(ac.accountname, 3) = left(a1.like_pattern, 3) AND ac.accountname LIKE a1.like_pattern ORDER BY a1.ord_nr LIMIT 1 ) act ON true GROUP BY 1;
With enough rows in the translation table (and favorable estimates and cost settings) Postgres will use a very fast index scan to narrow it down to the few candidates (if any) and only filter the rest with the
LIKE expression. Should scale just fine. I added the
EXPLAIN output to the fiddle as proof of concept: