Get correct row version based on combination of dates – query simplification

Posted on

Question :

In the example below, I need to fetch the row that has the latest data based on a combination of dates. I cannot simply do MAX(insert_date), MAX(update_date) as it does not return the correct data. The way that it works right now is to get the MAX(insert_date) then do a self join to obtain the MAX(update_date) then self join to return the row values.

Is there a better and more efficient way of doing this? The example below only contains 4 rows but in production I will be processing about 1 million rows every few minutes.

Example:

create table #temp (
iud char(1) not null,
id int not null,
date date not null,
value decimal(9,2) not null,
insert_date datetimeoffset not null,
update_date datetime2 not null
);

insert #temp
values
('i', 1001, '2001-01-01', 2, '2001-01-01 00:00', '2001-01-01 00:00'),
('i', 1001, '2001-01-01', 9, '2001-01-01 00:00', '2001-01-01 01:00'),
('i', 1001, '2001-01-01', 7, '2001-01-02 00:00', '2001-01-01 00:30'),
('i', 1001, '2001-01-01', 4, '2001-01-02 00:00', '2001-01-01 00:00');

-- this is wrong as it returns no results
select t.*
from #temp as t
join (select iud, id, date, max(insert_date) as insert_date, max(update_date) as update_date
      from #temp
      group by iud, id, date) as x
on t.iud = x.iud
and t.id = x.id
and t.date = x.date
and t.insert_date = x.insert_date
and t.update_date = x.update_date;

-- this works, but can it be simplified?
select n.*
from #temp as n
join (
    select n.iud, n.id, n.date, n.insert_date, max(update_date) as update_date
      from #temp as n
      join (select iud, id, date, max(insert_date) as insert_date
              from #temp
          group by iud, id, date) as i
    on i.iud = n.iud
  and i.id = n.id
  and i.insert_date = n.insert_date
group by n.iud, n.id, n.date, n.insert_date) as x
on x.date = n.date
and x.insert_date = n.insert_date
and x.iud = n.iud
and x.id = n.id
and x.update_date = n.update_date
order by n.iud, n.id, n.date;

drop table #temp;

Answer :

If I’ve understood what you are trying to do correctly then

WITH T AS
(
SELECT *, 
       RN = RANK() OVER (PARTITION BY  iud, id, date ORDER BY insert_date DESC, update_date DESC)
FROM #temp
)
SELECT *
FROM T 
WHERE RN=1;

May well perform better – especially if there is a covering index on (iud, id, date, insert_date DESC, update_date DESC)

Leave a Reply

Your email address will not be published. Required fields are marked *