Question :
I have a query that uses a CTE which contains millions of rows. I plan to call this query many times, and the CTE returned each time will have most of the same rows. Is it possible for me to cache the CTE somehow so that only new rows need to be calculated?
The (slightly simplified) query is:
WITH vals AS (
SELECT '2013-08-01 0:00'::timestamp AS frame_start,
'2013-09-01 0:00'::timestamp AS frame_end,
'1 day'::interval AS interval_length
), intervals AS (
SELECT tsrange(start_time,
lead(start_time, 1, frame_end) OVER (ORDER BY start_time NULLS FIRST)) AS time_range
FROM (
SELECT generate_series(frame_start, frame_end, interval_length) AS start_time,
frame_end
FROM vals
) _
WHERE start_time < frame_end
), market_trades_ts AS (
SELECT time_range, td.id
FROM intervals i
LEFT JOIN market_trades td
ON td.timestamp >= COALESCE(lower(i.time_range), '-infinity') AND td.timestamp < COALESCE(upper(i.time_range), 'infinity')
)
SELECT time_range, count(*) AS agg
FROM market_trades_ts td
GROUP BY time_range
ORDER BY time_range;
It would be great if market_trades_ts
could be cached so that, for any intervals which it’s seen before before, it can pull the previous result set, and then take its union with the new rows (for intervals which it hasn’t seen before).
Is this possible? It seems like it will speed up my query dramatically.
Answer :
PostgreSQL’s CTE’s materialize results – they run the CTE term once and cache the output for the duration of the query. So it’s already doing what you want.
CTE results cannot be cached between queries. If you want that, you should instead CREATE TEMPORARY TABLE AS SELECT ...
.