Question :
I am running the following query:
SELECT m.entityid,
m.alias::text AS alias,
m.measurement_ts,
m.volume_m3::double precision AS volume_m3,
m.pressure_mca,
m.battery_volt,
(m.volume_m3 - lag(m.volume_m3) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts))::double precision / NULLIF(date_part('epoch'::text, m.measurement_ts - lag(m.measurement_ts) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts)), 0::double precision) * 3600::double precision AS flow,
(m.volume_m3 - lag(m.volume_m3) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts))::double precision AS volumedelta,
date_part('epoch'::text, m.measurement_ts - lag(m.measurement_ts) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts)) AS seconds_since_last
FROM openiot_json.all_pde_data m
where measurement_ts BETWEEN '2021-01-18T20:00:31.69Z' AND '2021-04-18T19:00:31.69Z' and alias = '1002149751'
order by "measurement_ts" desc
And that query runs fine, with the following execution plan:
Sort (cost=936.08..936.67 rows=236 width=112) (actual time=64.371..64.749 rows=4334 loops=1)
Sort Key: addt.measurement_ts DESC
Sort Method: quicksort Memory: 802kB
-> WindowAgg (cost=912.03..926.78 rows=236 width=112) (actual time=51.539..62.264 rows=4334 loops=1)
-> Sort (cost=912.03..912.62 rows=236 width=68) (actual time=51.514..51.867 rows=4334 loops=1)
Sort Key: ((pts.sid)::text), addt.measurement_ts
Sort Method: quicksort Memory: 802kB
-> Nested Loop (cost=13.03..902.73 rows=236 width=68) (actual time=1.992..45.831 rows=4334 loops=1)
-> Index Scan using ppe_to_sigfox_alias_sid_start_date on ppe_to_sigfox pts (cost=0.27..8.29 rows=1 width=56) (actual time=0.031..0.036 rows=1 loops=1)
Index Cond: ((alias)::text = '1002149751'::text)
-> Bitmap Heap Scan on all_devices_data_table addt (cost=12.76..892.07 rows=237 width=48) (actual time=1.955..43.490 rows=4334 loops=1)
Recheck Cond: ((entityid = (pts.sid)::text) AND (measurement_ts >= pts."startDate") AND (measurement_ts <= pts."endDate") AND (measurement_ts >= '2021-01-18 21:00:31.69+01'::timestamp with time zone) AND (measurement_ts <= '2021-04-18 21:00:31.69+02'::timestamp with time zone))
Heap Blocks: exact=3803
-> Bitmap Index Scan on all_devices_data_table_etdesc (cost=0.00..12.70 rows=237 width=0) (actual time=1.520..1.520 rows=4334 loops=1)
Index Cond: ((entityid = (pts.sid)::text) AND (measurement_ts >= pts."startDate") AND (measurement_ts <= pts."endDate") AND (measurement_ts >= '2021-01-18 21:00:31.69+01'::timestamp with time zone) AND (measurement_ts <= '2021-04-18 21:00:31.69+02'::timestamp with time zone))
Planning Time: 0.636 ms
Execution Time: 65.103 ms
So far, so good. Now, if you look at the original query, you can see some computed metrics in the SELECT
list which I don’t want to keep copying and pasting around. So I create a VIEW
that does this for me:
CREATE OR REPLACE VIEW openiot_json.measurement
AS SELECT m.entityid,
m.alias::text AS alias,
m.measurement_ts,
m.volume_m3::double precision AS volume_m3,
m.pressure_mca,
m.battery_volt,
(m.volume_m3 - lag(m.volume_m3) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts))::double precision / NULLIF(date_part('epoch'::text, m.measurement_ts - lag(m.measurement_ts) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts)), 0::double precision) * 3600::double precision AS flow,
(m.volume_m3 - lag(m.volume_m3) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts))::double precision AS volumedelta,
date_part('epoch'::text, m.measurement_ts - lag(m.measurement_ts) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts)) AS seconds_since_last
FROM openiot_json.all_pde_data m;
Now, if I try the same query again, but against the view:
SELECT
*
FROM
openiot_json.measurement
WHERE
measurement_ts BETWEEN '2021-01-18T20:00:31.69Z' AND '2021-04-18T19:00:31.69Z' and alias = '1002149751'
ORDER by measurement_ts desc
I get a much worse performance, because the query plan changes:
Sort (cost=123776.16..123777.61 rows=581 width=112) (actual time=18810.436..18810.910 rows=4334 loops=1)
Sort Key: measurement.measurement_ts DESC
Sort Method: quicksort Memory: 802kB
-> Subquery Scan on measurement (cost=104880.52..123749.48 rows=581 width=112) (actual time=18385.000..18808.447 rows=4334 loops=1)
Filter: ((measurement.measurement_ts >= '2021-01-18 21:00:31.69+01'::timestamp with time zone) AND (measurement.measurement_ts <= '2021-04-18 21:00:31.69+02'::timestamp with time zone) AND (measurement.alias = '1002149751'::text))
Rows Removed by Filter: 2152725
-> WindowAgg (cost=104880.52..119621.90 rows=235862 width=112) (actual time=12030.287..18600.273 rows=2157059 loops=1)
-> Sort (cost=104880.52..105470.18 rows=235862 width=68) (actual time=12030.232..13001.832 rows=2157059 loops=1)
Sort Key: ((pts.sid)::text), addt.measurement_ts
Sort Method: external merge Disk: 160488kB
-> Gather (cost=1017.07..74155.19 rows=235862 width=68) (actual time=29.601..817.669 rows=2157059 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=17.07..49568.99 rows=98276 width=68) (actual time=21.239..517.195 rows=719020 loops=3)
Hash Cond: (addt.entityid = (pts.sid)::text)
Join Filter: ((addt.measurement_ts >= pts."startDate") AND (addt.measurement_ts <= pts."endDate"))
Rows Removed by Join Filter: 3001
-> Parallel Seq Scan on all_devices_data_table addt (cost=0.00..32967.84 rows=884484 width=48) (actual time=0.041..147.070 rows=707600 loops=3)
-> Hash (cost=10.92..10.92 rows=492 width=56) (actual time=21.029..21.031 rows=496 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 53kB
-> Seq Scan on ppe_to_sigfox pts (cost=0.00..10.92 rows=492 width=56) (actual time=20.783..20.898 rows=496 loops=3)
Planning Time: 1.008 ms
JIT:
Functions: 55
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 10.280 ms, Inlining 0.000 ms, Optimization 3.027 ms, Emission 58.193 ms, Total 71.500 ms
Execution Time: 18882.123 ms
Looking at the query plans, the first query is first filtering rows, then sorting. However, the second query is first sorting (which takes a lot of time cause it sorts on disk) and then filtering.
Why is this and how can I fix it, so that I get the nice query plan and performance using the view?
Just as a note, the openiot_json.all_pde_data
used in the original query is also a VIEW
, which uses tables:
CREATE OR REPLACE VIEW openiot_json.all_pde_data
AS SELECT pts.sid::text AS entityid,
pts.alias,
addt.recvtime,
addt.measurement_ts,
addt.volume_m3 + pts."offset" AS volume_m3,
addt.pressure_mca,
addt.battery_volt,
pts."offset"
FROM openiot_json.all_devices_data_table addt
JOIN openiot_json.ppe_to_sigfox pts ON addt.entityid = pts.sid::text AND addt.measurement_ts >= pts."startDate" AND addt.measurement_ts <= pts."endDate";
I am using the following version:
PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
Answer :
The two queries are not equivalent.
The first query without view runs window functions after applying the WHERE
clause.
The second query on the view runs window functions before applying the WHERE
clause.
This can lead to different results. And (obviously) to different query plans.
Your second view peniot_json.all_pde_data
does not use window functions (or any other features) that would have a similar effect.
Consider the sequence of events in a SELECT
query as laid out here: