Question :
I have data in an Oracle table like the following:
PROJ_NBR STATUS START_DT
AL20 AC 1/14/2010 4:31
AL20 AC 1/14/2010 4:32
AL20 AC 1/14/2010 4:32
AL20 DE 1/14/2010 4:34
AL20 DE 1/14/2010 4:46
AL20 DE 1/14/2010 4:46
AL20 DE 1/14/2010 4:46
AL20 DE 1/14/2010 4:46
AL20 DE 1/14/2010 4:46
AL20 DE 1/14/2010 4:46
AL20 DE 1/14/2010 4:46
AL20 DE 1/14/2010 4:46
AL20 DE 1/14/2010 4:46
AL20 DE 1/14/2010 4:46
AL20 AC 3/4/2010 4:31
AL20 AC 3/4/2010 4:39
AL20 AC 3/21/2010 13:24
AL20 AC 2/4/2011 13:54 --year changes here
AL20 AC 2/4/2011 14:14
AL20 AC 2/4/2011 14:27
AL20 DE 2/11/2011 16:24
Our table has 2,091,348 rows.
I want data like following…
PROJ_NBR STATUS BEGIN END
AL20 AC 1/14/2010 4:31 1/14/2010 4:32
AL20 DE 1/14/2010 4:34 1/14/2010 4:46
AL20 AC 3/4/2010 4:31 2/4/2011 14:27 --Second time range for AL20, AC
AL20 DE 2/11/2011 16:24 2/11/2011 16:24
Answer :
Below a complete solution for sql-server.
I was trying analytical functions, but didn’t succeed. I’m tempted to repost the question to SO requiring a solution using analytical functions. I think there must be one.
And here is my working solution. I’m proud not to have used cursors.
set language us_english
Set Nocount ON
create table #t( PROJ_NBR char(4), STATUS char(2), START_DT datetime);
insert into #t values ('AL20', 'AC', '1/14/2010 4:31');
insert into #t values ('AL20', 'AC', '1/14/2010 4:32');
insert into #t values ('AL20', 'AC', '1/14/2010 4:32');
insert into #t values ('AL20', 'DE', '1/14/2010 4:34');
insert into #t values ('AL20', 'DE', '1/14/2010 4:46');
insert into #t values ('AL20', 'DE', '1/14/2010 4:46');
insert into #t values ('AL20', 'DE', '1/14/2010 4:46');
insert into #t values ('AL20', 'DE', '1/14/2010 4:46');
insert into #t values ('AL20', 'DE', '1/14/2010 4:46');
insert into #t values ('AL20', 'DE', '1/14/2010 4:46');
insert into #t values ('AL20', 'DE', '1/14/2010 4:46');
insert into #t values ('AL20', 'DE', '1/14/2010 4:46');
insert into #t values ('AL20', 'DE', '1/14/2010 4:46');
insert into #t values ('AL20', 'DE', '1/14/2010 4:46');
insert into #t values ('AL20', 'AC', '3/4/2010 4:31');
insert into #t values ('AL20', 'AC', '3/4/2010 4:39');
insert into #t values ('AL20', 'AC', '3/21/2010 13:24');
insert into #t values ('AL20', 'AC', '2/4/2011 13:54');
insert into #t values ('AL20', 'AC', '2/4/2011 14:14');
insert into #t values ('AL20', 'AC', '2/4/2011 14:27');
insert into #t values ('AL20', 'DE', '2/11/2011 16:24');
select
distinct PROJ_NBR, STATUS, [start], [end]
from(
select
*,
(select MIN(START_DT)
from #t t1
where t.STATUS = t1.STATUS
and t.START_DT >= t1.START_DT
and not exists ( select * from #t t2
where t.STATUS <> t2.STATUS
and t.START_DT >= t2.START_DT
and t1.START_DT <= t2.START_DT)
) [start],
(select Max(START_DT)
from #t t1
where t.STATUS = t1.STATUS
and t.START_DT <= t1.START_DT
and not exists ( select * from #t t2
where t.STATUS <> t2.STATUS
and t.START_DT <= t2.START_DT
and t1.START_DT >= t2.START_DT)
) [end]
from #t t
) t1
order by [start]
I think besides column and table names this can easily adapted to Oracle too.
Oracle Analytics Solution:
SELECT Proj_Nbr, Status, MIN(Start_Dt) AS "BEGIN", MAX(Start_Dt) AS "END"
FROM (
SELECT Proj_Nbr, Status, Start_Dt,
MAX(SectionID) OVER (ORDER BY Start_Dt ROWS UNBOUNDED PRECEDING) SectionID
FROM
(
SELECT Proj_Nbr, Status, Start_Dt,
CASE WHEN Proj_Nbr = Lag(Proj_Nbr) OVER (ORDER BY start_dt)
AND Status = Lag(Status) OVER (ORDER BY start_dt)
THEN null
ELSE rowid
END SectionID
FROM t1
)
)
GROUP BY Proj_Nbr, Status, SectionID;
Data Generation:
CREATE TABLE t1 (PROJ_NBR VARCHAR2(4), STATUS VARCHAR2(2), START_DT DATE);
INSERT INTO t1 VALUES ('AL20', 'AC', to_date('01/14/2010 04:31', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'AC', to_date('01/14/2010 04:32', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'AC', to_date('01/14/2010 04:32', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'DE', to_date('01/14/2010 04:34', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'DE', to_date('01/14/2010 04:46', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'DE', to_date('01/14/2010 04:46', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'DE', to_date('01/14/2010 04:46', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'DE', to_date('01/14/2010 04:46', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'DE', to_date('01/14/2010 04:46', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'DE', to_date('01/14/2010 04:46', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'DE', to_date('01/14/2010 04:46', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'DE', to_date('01/14/2010 04:46', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'DE', to_date('01/14/2010 04:46', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'DE', to_date('01/14/2010 04:46', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'AC', to_date('03/04/2010 04:31', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'AC', to_date('03/04/2010 04:39', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'AC', to_date('03/21/2010 13:24', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'AC', to_date('02/04/2011 13:54', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'AC', to_date('02/04/2011 14:14', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'AC', to_date('02/04/2011 14:27', 'MM/DD/YYYY HH24:MI'));
INSERT INTO t1 VALUES ('AL20', 'DE', to_date('02/11/2011 16:24', 'MM/DD/YYYY HH24:MI'));
Here’s my take: (Thanks to Leigh for providing the data generation script)
select proj_nbr, status, min(start_dt) start_dt, end_dt
from (
select a.*, (
select max(p.start_dt)
from t1 p
where P.PROJ_NBR = a.proj_nbr and p.status = a.status
and p.start_dt >= a.start_dt
and not exists (
select null from t1 q where q.PROJ_NBR = a.proj_nbr
and q.status <> p.status
and q.start_dt < p.start_dt and q.start_dt > a.start_dt)
) end_dt
from t1 a
)
group by proj_nbr, status, end_dt
I think adding the following index will improve performance:
create index on t1 (proj_nbr, status, start_date)
SELECT PROJ_NBR, STATUS, MIN(START_DT) AS _BEGIN, MAX(START_DT) AS _END
FROM MYTABLE GROUP BY PROJ_NBR, STATUS;
Why the _? Because BEGIN
and END
are reserved words.