I have a set of transactions of stock purchases by users and I want to keep track of a running balance of each stock as the year progresses. I am using a windowing function to track the running balance but for some reason I cannot get the
GROUP BY portion of this query to work.
It continues to have duplicate days in the result set even when I attempt to group by the date (
created_at). Sample below:
select t.customer_id, t.created_at::date, sum(case when t.stock_ticker = 'tsla' then t.amount end) over (order by t.created_at::date rows unbounded preceding) as tsla_running_amount, sum(case when t.stock_ticker = 'goog' then t.amount end) over (order by t.created_at::date rows unbounded preceding) as goog_running_amount, from transactions t group by t.created_at, t.customer_id, t.stock_ticker, t.amount order by t.created_at desc;
CREATE TABLE transactions ( transaction_id varchar(255) NOT NULL, amount float8 NOT NULL, stock_ticker varchar(255) NOT NULL, transaction_type varchar(255) NOT NULL, customer_id varchar NOT NULL, inserted_at timestamp NOT NULL, created_at timestamp NOT NULL, CONSTRAINT transactions_pkey PRIMARY KEY (transaction_id) ); INSERT INTO transactions(transaction_id, amount, stock_ticker, transaction_type, customer_id, inserted_at, created_at) VALUES ('123123abmk12', 10, 'tsla', 'purchase', 'a1b2c3', '2020-04-01 01:00:00', '2020-04-01 01:00:00') , ('123123abmk13', 20, 'tsla', 'purchase', 'a1b2c3', '2020-04-03 01:00:00', '2020-04-03 01:00:00') , ('123123abmk14', 5, 'goog', 'purchase', 'a1b2c3', '2020-04-01 01:00:00', '2020-04-01 01:00:00') , ('123123abmk15', 8, 'goog', 'purchase', 'a1b2c3', '2020-04-03 01:00:00', '2020-04-03 01:00:00'); CREATE INDEX ix_transactions_customer_id ON transactions USING btree (customer_id);
The result here always comes back with multiple rows per day, when I want them to be grouped all into one day.
After doing some research I attempted to cast
date in the
GROUP BY clause as well, but I get this error:
Column t.created_at must appear in the GROUP BY clause or be used in an aggregate function
In addition, the results are only going to show days in which a transaction has happened for a user. I need to be able to show a row for each day in a time series (1 year) even if the user did not make a transaction that day. (Using the most recent running balance on the row instead.)
I think that
generate_series() is the way to go, but I am having trouble understanding how to fit it in.
I see a couple of issues. This should do it:
SELECT * -- ⑥ FROM ( -- ① SELECT the_day::date FROM generate_series(timestamp '2020-01-01', date_trunc('day', localtimestamp), interval '1 day') the_day ) d LEFT JOIN ( -- ② SELECT customer_id , created_at::date AS the_day -- ⑥ , sum(sum(t.amount) FILTER (WHERE stock_ticker = 'tsla')) OVER w AS tsla_running_amount -- ③ , sum(sum(t.amount) FILTER (WHERE stock_ticker = 'goog')) OVER w AS goog_running_amount FROM transactions t WHERE created_at >= timestamp '2020-01-01' -- ④ GROUP BY customer_id, created_at::date -- ⑤ WINDOW w AS (PARTITION BY customer_id ORDER BY created_at::date) -- ③ ) t USING (the_day) -- ⑥ ORDER BY customer_id, the_day; -- ⑦
① Generating all days of the current year in optimized fashion. See:
LEFT JOIN you had in mind. You may really want one row per combination of
(customer_id, day). See:
③ While you also
GROUP BY customer_id the Window frame needs to lead with
PARTITION BY customer_id.
rows unbounded preceding. The default window frame should be just fine. The manual:
The default framing option is
RANGE UNBOUNDED PRECEDING, which is the
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
Use the aggregate
FILTER clause instead of the
CASE expression. Shorter, faster, cleaner. See:
Finally wrap the result of the aggregation in another
sum() for the window function. See:
WINDOW clause avoids spelling out the same repeatedly. No effect on performance.
WHERE is logically redundant, but assuming there are older rows, too, it excludes irrelevant rows early, improving performance.
Asuming no or almost no future timestamps, so no upper bound.
⑤ You cannot have
t.stock_ticker, t.amount in
GROUP BY as you want one row per
⑥ Use the same column alias (
the_day in my example) to allow the simple
USING (the_day) in the
JOIN clause, and
SELECT * at the top level.
⑦ Not sure about the final sort order. This seems more useful to me. Adjust to your preference.