Question :
I have a table like the following:
spread_id serial NOT NULL,
game_id integer NOT NULL,
sportsbook_id integer NOT NULL,
spread_type integer NOT NULL,
spread_duration integer NOT NULL,
home_line double precision,
home_odds integer,
away_line double precision,
away_odds integer,
update_time timestamp without time zone NOT NULL,
game_update_count integer NOT NULL
I’m trying to get the last row inserted (max update_time or game_update_count), for each group of sportsbook_id, spread_type, spread_duration, and game_id.
The following query gets me close, but I am not able to select the lines/odds without Postgres complaining.
SELECT
spreads.game_id, sportsbook_id, spread_type, spread_duration,
MAX(game_update_count) AS game_update_count
FROM spreads
LEFT JOIN schedule ON
schedule.game_id = spreads.game_id
WHERE date >= '2012-01-01' AND date <= '2012-01-02'
GROUP BY
spreads.game_id, sportsbook_id, spread_type, spread_duration
ORDER BY
spread_duration, spread_type, sportsbook_id, spreads.game_id,
game_update_count DESC;
Anyone have any thoughts on a better approach to take?
Answer :
The simplest way in Postgres is with DISTINCT ON
:
SELECT DISTINCT ON (1,2,3,4)
sp.game_id, sportsbook_id, spread_type, spread_duration, game_update_count
FROM spreads sp
LEFT JOIN schedule sch USING (game_id)
WHERE date >= '2012-01-01'
AND date <= '2012-01-02'
ORDER BY 4,3,2,1, game_update_count DESC;
Details:
The numbers are just syntax shorthand referring to the ordinal position of SELECT
items.
If game_update_count
can be NULL, you’ll want game_update_count DESC NULLS LAST
.
Instead of using a group by, have you tried using a Window Function?
Try this:
SELECT *
FROM (
SELECT
spreads.game_id, sportsbook_id, spread_type, spread_duration,
row_number() over (partition by spreads.game_id, sportsbook_id, spread_type, spread_duration order by update_time desc) priority
FROM spreads
LEFT JOIN schedule ON
schedule.game_id = spreads.game_id
WHERE date >= '2012-01-01' AND date <= '2012-01-02'
)
WHERE priority = 1
The row_number()
which I’ve aliased as priority
is the window function and what it is doing is conceptually similar to a GROUP BY
clause. The difference is that it is just allowing you to still see the data with row-level fidelity through the window. In this case, you’ll notice that the columns you grouped by have been used to partition
the data. The outer SELECT
statement eliminates the data that you don’t want (all of the out of date sport-bet lines).
I wish you the best as you implement Window Functions (and with your sport betting)!
(word of warning… I typically work with SQL Server guy, so my code may not by syntactically correct, but it should get you off to a good start)