Different (much worse) query plan when using one view

Posted on

Question :

I am running the following query:

SELECT m.entityid,
    m.alias::text AS alias,
    m.measurement_ts,
    m.volume_m3::double precision AS volume_m3,
    m.pressure_mca,
    m.battery_volt,
    (m.volume_m3 - lag(m.volume_m3) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts))::double precision / NULLIF(date_part('epoch'::text, m.measurement_ts - lag(m.measurement_ts) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts)), 0::double precision) * 3600::double precision AS flow,
    (m.volume_m3 - lag(m.volume_m3) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts))::double precision AS volumedelta,
    date_part('epoch'::text, m.measurement_ts - lag(m.measurement_ts) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts)) AS seconds_since_last
   FROM openiot_json.all_pde_data m
where measurement_ts BETWEEN '2021-01-18T20:00:31.69Z' AND '2021-04-18T19:00:31.69Z' and alias = '1002149751'
order by "measurement_ts" desc

And that query runs fine, with the following execution plan:

Sort  (cost=936.08..936.67 rows=236 width=112) (actual time=64.371..64.749 rows=4334 loops=1)
  Sort Key: addt.measurement_ts DESC
  Sort Method: quicksort  Memory: 802kB
  ->  WindowAgg  (cost=912.03..926.78 rows=236 width=112) (actual time=51.539..62.264 rows=4334 loops=1)
        ->  Sort  (cost=912.03..912.62 rows=236 width=68) (actual time=51.514..51.867 rows=4334 loops=1)
              Sort Key: ((pts.sid)::text), addt.measurement_ts
              Sort Method: quicksort  Memory: 802kB
              ->  Nested Loop  (cost=13.03..902.73 rows=236 width=68) (actual time=1.992..45.831 rows=4334 loops=1)
                    ->  Index Scan using ppe_to_sigfox_alias_sid_start_date on ppe_to_sigfox pts  (cost=0.27..8.29 rows=1 width=56) (actual time=0.031..0.036 rows=1 loops=1)
                          Index Cond: ((alias)::text = '1002149751'::text)
                    ->  Bitmap Heap Scan on all_devices_data_table addt  (cost=12.76..892.07 rows=237 width=48) (actual time=1.955..43.490 rows=4334 loops=1)
                          Recheck Cond: ((entityid = (pts.sid)::text) AND (measurement_ts >= pts."startDate") AND (measurement_ts <= pts."endDate") AND (measurement_ts >= '2021-01-18 21:00:31.69+01'::timestamp with time zone) AND (measurement_ts <= '2021-04-18 21:00:31.69+02'::timestamp with time zone))
                          Heap Blocks: exact=3803
                          ->  Bitmap Index Scan on all_devices_data_table_etdesc  (cost=0.00..12.70 rows=237 width=0) (actual time=1.520..1.520 rows=4334 loops=1)
                                Index Cond: ((entityid = (pts.sid)::text) AND (measurement_ts >= pts."startDate") AND (measurement_ts <= pts."endDate") AND (measurement_ts >= '2021-01-18 21:00:31.69+01'::timestamp with time zone) AND (measurement_ts <= '2021-04-18 21:00:31.69+02'::timestamp with time zone))
Planning Time: 0.636 ms
Execution Time: 65.103 ms

So far, so good. Now, if you look at the original query, you can see some computed metrics in the SELECT list which I don’t want to keep copying and pasting around. So I create a VIEW that does this for me:

CREATE OR REPLACE VIEW openiot_json.measurement
AS SELECT m.entityid,
    m.alias::text AS alias,
    m.measurement_ts,
    m.volume_m3::double precision AS volume_m3,
    m.pressure_mca,
    m.battery_volt,
    (m.volume_m3 - lag(m.volume_m3) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts))::double precision / NULLIF(date_part('epoch'::text, m.measurement_ts - lag(m.measurement_ts) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts)), 0::double precision) * 3600::double precision AS flow,
    (m.volume_m3 - lag(m.volume_m3) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts))::double precision AS volumedelta,
    date_part('epoch'::text, m.measurement_ts - lag(m.measurement_ts) OVER (PARTITION BY m.entityid ORDER BY m.measurement_ts)) AS seconds_since_last
   FROM openiot_json.all_pde_data m;

Now, if I try the same query again, but against the view:

SELECT
    *
FROM
    openiot_json.measurement
WHERE
  measurement_ts BETWEEN '2021-01-18T20:00:31.69Z' AND '2021-04-18T19:00:31.69Z' and alias = '1002149751'
ORDER by measurement_ts desc

I get a much worse performance, because the query plan changes:

Sort  (cost=123776.16..123777.61 rows=581 width=112) (actual time=18810.436..18810.910 rows=4334 loops=1)
  Sort Key: measurement.measurement_ts DESC
  Sort Method: quicksort  Memory: 802kB
  ->  Subquery Scan on measurement  (cost=104880.52..123749.48 rows=581 width=112) (actual time=18385.000..18808.447 rows=4334 loops=1)
        Filter: ((measurement.measurement_ts >= '2021-01-18 21:00:31.69+01'::timestamp with time zone) AND (measurement.measurement_ts <= '2021-04-18 21:00:31.69+02'::timestamp with time zone) AND (measurement.alias = '1002149751'::text))
        Rows Removed by Filter: 2152725
        ->  WindowAgg  (cost=104880.52..119621.90 rows=235862 width=112) (actual time=12030.287..18600.273 rows=2157059 loops=1)
              ->  Sort  (cost=104880.52..105470.18 rows=235862 width=68) (actual time=12030.232..13001.832 rows=2157059 loops=1)
                    Sort Key: ((pts.sid)::text), addt.measurement_ts
                    Sort Method: external merge  Disk: 160488kB
                    ->  Gather  (cost=1017.07..74155.19 rows=235862 width=68) (actual time=29.601..817.669 rows=2157059 loops=1)
                          Workers Planned: 2
                          Workers Launched: 2
                          ->  Hash Join  (cost=17.07..49568.99 rows=98276 width=68) (actual time=21.239..517.195 rows=719020 loops=3)
                                Hash Cond: (addt.entityid = (pts.sid)::text)
                                Join Filter: ((addt.measurement_ts >= pts."startDate") AND (addt.measurement_ts <= pts."endDate"))
                                Rows Removed by Join Filter: 3001
                                ->  Parallel Seq Scan on all_devices_data_table addt  (cost=0.00..32967.84 rows=884484 width=48) (actual time=0.041..147.070 rows=707600 loops=3)
                                ->  Hash  (cost=10.92..10.92 rows=492 width=56) (actual time=21.029..21.031 rows=496 loops=3)
                                      Buckets: 1024  Batches: 1  Memory Usage: 53kB
                                      ->  Seq Scan on ppe_to_sigfox pts  (cost=0.00..10.92 rows=492 width=56) (actual time=20.783..20.898 rows=496 loops=3)
Planning Time: 1.008 ms
JIT:
  Functions: 55
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 10.280 ms, Inlining 0.000 ms, Optimization 3.027 ms, Emission 58.193 ms, Total 71.500 ms
Execution Time: 18882.123 ms

Looking at the query plans, the first query is first filtering rows, then sorting. However, the second query is first sorting (which takes a lot of time cause it sorts on disk) and then filtering.

Why is this and how can I fix it, so that I get the nice query plan and performance using the view?

Just as a note, the openiot_json.all_pde_data used in the original query is also a VIEW, which uses tables:

CREATE OR REPLACE VIEW openiot_json.all_pde_data
AS SELECT pts.sid::text AS entityid,
    pts.alias,
    addt.recvtime,
    addt.measurement_ts,
    addt.volume_m3 + pts."offset" AS volume_m3,
    addt.pressure_mca,
    addt.battery_volt,
    pts."offset"
   FROM openiot_json.all_devices_data_table addt
     JOIN openiot_json.ppe_to_sigfox pts ON addt.entityid = pts.sid::text AND addt.measurement_ts >= pts."startDate" AND addt.measurement_ts <= pts."endDate";

I am using the following version:

PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit

Answer :

The two queries are not equivalent.

The first query without view runs window functions after applying the WHERE clause.
The second query on the view runs window functions before applying the WHERE clause.

This can lead to different results. And (obviously) to different query plans.

Your second view peniot_json.all_pde_data does not use window functions (or any other features) that would have a similar effect.

Consider the sequence of events in a SELECT query as laid out here:

Leave a Reply

Your email address will not be published. Required fields are marked *