Question :
In the example below, I need to fetch the row that has the latest data based on a combination of dates. I cannot simply do MAX(insert_date), MAX(update_date)
as it does not return the correct data. The way that it works right now is to get the MAX(insert_date)
then do a self join to obtain the MAX(update_date)
then self join to return the row values.
Is there a better and more efficient way of doing this? The example below only contains 4 rows but in production I will be processing about 1 million rows every few minutes.
Example:
create table #temp (
iud char(1) not null,
id int not null,
date date not null,
value decimal(9,2) not null,
insert_date datetimeoffset not null,
update_date datetime2 not null
);
insert #temp
values
('i', 1001, '2001-01-01', 2, '2001-01-01 00:00', '2001-01-01 00:00'),
('i', 1001, '2001-01-01', 9, '2001-01-01 00:00', '2001-01-01 01:00'),
('i', 1001, '2001-01-01', 7, '2001-01-02 00:00', '2001-01-01 00:30'),
('i', 1001, '2001-01-01', 4, '2001-01-02 00:00', '2001-01-01 00:00');
-- this is wrong as it returns no results
select t.*
from #temp as t
join (select iud, id, date, max(insert_date) as insert_date, max(update_date) as update_date
from #temp
group by iud, id, date) as x
on t.iud = x.iud
and t.id = x.id
and t.date = x.date
and t.insert_date = x.insert_date
and t.update_date = x.update_date;
-- this works, but can it be simplified?
select n.*
from #temp as n
join (
select n.iud, n.id, n.date, n.insert_date, max(update_date) as update_date
from #temp as n
join (select iud, id, date, max(insert_date) as insert_date
from #temp
group by iud, id, date) as i
on i.iud = n.iud
and i.id = n.id
and i.insert_date = n.insert_date
group by n.iud, n.id, n.date, n.insert_date) as x
on x.date = n.date
and x.insert_date = n.insert_date
and x.iud = n.iud
and x.id = n.id
and x.update_date = n.update_date
order by n.iud, n.id, n.date;
drop table #temp;
Answer :
If I’ve understood what you are trying to do correctly then
WITH T AS
(
SELECT *,
RN = RANK() OVER (PARTITION BY iud, id, date ORDER BY insert_date DESC, update_date DESC)
FROM #temp
)
SELECT *
FROM T
WHERE RN=1;
May well perform better – especially if there is a covering index on (iud, id, date, insert_date DESC, update_date DESC)