Question :
As a follow-up to this question, I have one of my own.
The original question concerns the use of a CASE
statement with > than 100 options and the statement has to be used in 4 places – so obviously the SQL is pretty hairy. The OP’s question concerned SQL Server 2012, my question, however, is about PostgreSQL.
In my answer, I proposed the use of a VIEW
as a “one-stop-shop” solution – i.e. declare the VIEW
once, use it anywhere – and this applies for any query in the future also and any variant thereof.
Another poster (@AndriyM) proposed the use of a CROSS APPLY
to resolve the issue which is another solution. The PostgreSQL syntax is JOIN LATERAL
I then added a CTE (Common Table Expression) to my original answer as yet another possible solution.
So, the OP now has 5 options:
CASE
VIEW
JOIN LATERAL
(CROSS APPLY
for SQL Server)CTE
Separate table
I excluded the option of changing the underlying data since, frequently in this forum, consultants/DBA’s/programmers are not allowed change underlying data – makes the answers more interesting also!
Obviously, a CASE
expression with > 100 options (x4) is horribly cumbersome and complex – but when is it a good idea to use CASE
and at what point does it become a minus rather than a plus?
In my opinion (and not just because it’s my answer!), a VIEW
is the optimal solution – it’s simple, will work for all RDBMS’s and is permanent and will work for all queries now and into the future should the OP wish to modify the query.
The JOIN LATERAL
construct will work also as a kind of derived table, which is pretty much what a CTE
is also. They can both be used down the line in the same query.
Which of the 5 approaches is better/best and at what point does the technical (ease of use, speed, query plan optimisation) tilt in the particular solution’s favour?
Answer :
I would use a translation table in a LATERAL
subquery. Demo (Postgres 10+):
CREATE TABLE ac_translate (
ord_nr int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY
, like_pattern text NOT NULL
, target text NOT NULL
);
INSERT INTO ac_translate(like_pattern, target) VALUES
('AIR NEW Z%' , 'AIR NEW ZEALAND') -- rows in order of precedence!
, ('AIR BP%' , 'AIR BP')
, ('ADDICTION ADVICE%', 'ADDICTION ADVICE')
, ('AIA%' , 'AIA')
;
See:
Manipulate ord_nr
to adjust priorities.
Query:
SELECT COALESCE(act.target, ac.accountName) AS accountname
, SUM(ac.charge_amount) AS gstexcl
FROM account_code ac
LEFT JOIN LATERAL (
SELECT a1.target
FROM ac_translate a1
WHERE ac.accountname LIKE a1.like_pattern
ORDER BY a1.ord_nr
LIMIT 1
) act ON true
GROUP BY 1;
Or with a correlated subquery:
SELECT COALESCE(
(SELECT a1.target
FROM ac_translate a1
WHERE ac.accountname LIKE a1.like_pattern
ORDER BY a1.ord_nr
LIMIT 1), ac.accountName) AS accountname
, SUM(ac.charge_amount) AS sum_amount
FROM account_code ac
GROUP BY 1;
This is easy to handle, keeps the long list of options out of the code and puts it into a table where it can be maintained properly. And is moderately fast.
We can’t easily use a plain LEFT JOIN ac_translate
since CASE
traverses patterns in order to return the single, first match. We can’t just join to a set, that might return multiple matches if one pattern is the prefix of the other, like ‘AIR%’ and ‘AIR N%’. So we use an ordering number in the translation table to prioritize matches in the subquery.
The ELSE
clause in the referenced question resolves to the original value. That’s implemented with COALESCE
here. Basically, this combines the virtues of the top two answers over there.
To top it off, I slipped in GROUP BY 1
as another way to avoid repeating lengthy expressions (which is not actually needed here any more). See:
Speed
Performance deteriorates with the number of rows in the translation table since Postgres is forced to walk through all of them sequentially and evaluate the LIKE
expression. If that’s not fast enough any more, we need index support, but the expression is not “sargable” – the expression we need to index is to the right of the operator and there is no COMMUTATOR
for LIKE
. Details:
There is a workaround, though. My example requires that patterns have at least 3 leading characters (3 is my arbitrary pick). Add a CHECK
constraint in the translation table to enforce this rule, and an expression index on the leading trigram:
CREATE INDEX ac_translate_left_idx ON ac_translate (left(like_pattern, 3));
Adapt the query:
SELECT COALESCE(act.target, ac.accountName) AS accountname
, SUM(ac.charge_amount) AS gstexcl
FROM account_code ac
LEFT JOIN LATERAL (
SELECT a1.target
FROM ac_translate a1
WHERE left(ac.accountname, 3) = left(a1.like_pattern, 3)
AND ac.accountname LIKE a1.like_pattern
ORDER BY a1.ord_nr
LIMIT 1
) act ON true
GROUP BY 1;
With enough rows in the translation table (and favorable estimates and cost settings) Postgres will use a very fast index scan to narrow it down to the few candidates (if any) and only filter the rest with the LIKE
expression. Should scale just fine. I added the EXPLAIN
output to the fiddle as proof of concept:
db<>fiddle here