Question :
my question is closely related to the following question asked in this thread generate_series for multiple record types in postgresql. The only difference is, that instead of only inserting missing values, I want to fill the gaps with the last previous value. To illustrate it with the same example. I have two tables.
CREATE TABLE pests(id,name)
AS VALUES
(1,'Thrip'),
(2,'Fungus Gnosts');
CREATE TABLE pest_counts(id,pest_id,date,count)
AS VALUES
(1,1,'2015-01-01'::date,14),
(2,2,'2015-01-02'::date,5);
The desired output should be nearly identical. However, I want to fill missing values with previous values, if possible. So the second entry for Thrip should be 14, since a previous value exists and the first value of Fungus Gnats should be 0, since no previous value exists. Is it possible to do this using Postgres?
expected results
name | date | count
-------------+------------+-------
Thrip | 2015-01-01 | 14
Thrip | 2015-01-02 | **14** <- fill with existing previous value.
....
Fungus Gnats | 2015-01-01 | 0
Fungus Gnats | 2015-01-02 | 5
...
Answer :
Example data with two rows added:
insert into pest_counts(pest_id, date, count) values
(1, '2015-01-03', 10),
(2, '2015-01-04', 7);
select * from pest_counts;
id | pest_id | date | count
----+---------+------------+-------
1 | 1 | 2015-01-01 | 14
2 | 2 | 2015-01-02 | 5
3 | 1 | 2015-01-03 | 10
4 | 2 | 2015-01-04 | 7
(4 rows)
Use count()
as a window function to generate groups. Each not null value starts a new group:
select name, day, count, count(count) over (partition by name order by day) as grp
from (
select id, name, day::date
from pests
cross join generate_series('2015-01-01'::timestamp, '2015-01-04', '1 day') day
) d
left join pest_counts on d.id = pest_id and day = date;
name | day | count | grp
--------------+------------+-------+-----
fungus gnats | 2015-01-01 | | 0
fungus gnats | 2015-01-02 | 5 | 1
fungus gnats | 2015-01-03 | | 1
fungus gnats | 2015-01-04 | 7 | 2
thrip | 2015-01-01 | 14 | 1
thrip | 2015-01-02 | | 1
thrip | 2015-01-03 | 10 | 2
thrip | 2015-01-04 | | 2
(8 rows)
Now you can use sum()
as a window function over partitions by names and generated groups:
select name, day, coalesce(count, sum(count) over (partition by name, grp order by day), 0) as count
from (
select name, day, count, count(count) over (partition by name order by day) as grp
from (
select id, name, day::date
from pests
cross join generate_series('2015-01-01'::timestamp, '2015-01-04', '1 day') day
) d
left join pest_counts on d.id = pest_id and day = date
) s;
name | day | count
--------------+------------+-------
fungus gnats | 2015-01-01 | 0
fungus gnats | 2015-01-02 | 5
fungus gnats | 2015-01-03 | 5
fungus gnats | 2015-01-04 | 7
thrip | 2015-01-01 | 14
thrip | 2015-01-02 | 14
thrip | 2015-01-03 | 10
thrip | 2015-01-04 | 10
(8 rows)