Can the subquery be simplified/optimized?

Posted on

Question :

Here’s a simplified version of a query I run. I’ve removed a bunch of key/value pairs for simplicity. The constraint is that I can’t change anything outside of the ARRAY function; but the ARRAY function and everything inside is fair game. I want to remove the subqueries if at all possible; and if not speed up the query as much as possible.

SELECT json_build_object('last', max(day), 'missing_days', ARRAY(
             SELECT distinct to_char(dates, 'YYYY-MM-DD')
             FROM generate_series('2009-12-20', CURRENT_DATE, interval '1 day') AS dates
             WHERE dates NOT IN (SELECT day FROM records))
       ) AS jsobj
FROM records;

I thought I remembered from my database design classes that you should be able to use columns from the records table in the sub-query but I could be mistaken, or that might be database dependent, or I could have been trying the wrong syntax; anyway it didn’t work when I tried it.

Answer :

Yes, you can refer to rows of higher levels in a subquery. But that would not solve your problem since you need to consider the whole table, not just the current row.

It’s not completely clear what the query is trying to achieve and data types (table definition!) are also missing. Be aware that your invocation of generate_series() returns timestamptz, which is cast to date (I assume) and this cast depends on your current timezone setting. Results for the same query may be different with a different setting for the calling session.

It’s a bit cleaner and cheaper to work with timestamp to begin with. See:

And NOT IN (<subquery>) is almost always a bad choice. There are faster options which are less error-prone. NOT IN carries traps for NULL values.

Each of these queries should be faster. The winner depends on data distribution in your table. Test both:

Query 1 (returns text like your original):

SELECT to_char(day, 'YYYY-MM-DD')
FROM   generate_series(timestamp '2015-12-20', LOCALTIMESTAMP, interval '1 day') d(day)
WHERE  NOT EXISTS (SELECT 1 FROM records WHERE day = d.day);

Query 2 (returns date):

SELECT generate_series(timestamp '2015-12-20', LOCALTIMESTAMP, '1 day')::date AS day
EXCEPT ALL
SELECT day FROM records;   -- data types must match

Leave a Reply

Your email address will not be published. Required fields are marked *