Question :
I have a queue table with the following sample data:
id company location
1 acme new york
2 acme philadelphia
3 genco st.louis
4 genco san diego
5 genco san francisco
6 acme miami
I want to query for the queue position of each company group to show where in line each company is (imagine acme has 1,000 rows before 500 Genco starts and then acme has 5,000 more after the Genco records process). Result I would want would look like the following:
queuePositionId company
1 acme
3 genco
6 acme
I played around with ranking and grouping but the fact that a company group can be multiple times anywhere in the queue kept messing with my aggregates. I also tried a dense_rank
but couldn’t figure out the ordering. Any ideas?
Answer :
This is a Gaps and Islands question. See here for more details on problems like this.
This should do what you need:
-- Generate test data
DECLARE @Companies TABLE
(
ID INT
,Company NVARCHAR(100)
,Location NVARCHAR(100)
);
INSERT @Companies
SELECT *
FROM (VALUES (1, 'acme', 'new york')
,(2, 'acme', 'philadelphia')
,(3, 'genco', 'st.louis')
,(4, 'genco', 'san diego')
,(5, 'genco', 'san francisco')
,(6, 'acme', 'miami')
) AS CompanyLocations(ID, Company, Location);
-- Find company positions
;WITH cte_Companies
AS
(
SELECT ID
,Company
,CASE
WHEN LAG(Company) OVER(ORDER BY ID) = Company
THEN 1
ELSE 0
END AS CompanyPosition
FROM @Companies
)
SELECT ID, Company
FROM cte_Companies
WHERE CompanyPosition = 0
UPDATE
Andriy noted that my solution was a SQL Servre 2012+ solution. The following code should work for versions down to 2005.
-- Generate test data
DECLARE @Companies TABLE
(
ID INT
,Company NVARCHAR(100)
,Location NVARCHAR(100)
);
INSERT @Companies
SELECT *
FROM (VALUES (1, 'acme', 'new york')
,(2, 'acme', 'philadelphia')
,(3, 'genco', 'st.louis')
,(4, 'genco', 'san diego')
,(5, 'genco', 'san francisco')
,(6, 'acme', 'miami')
-- Further test data
,(7, 'genco', 'London')
,(8, 'genco', 'Portsmouth')
) AS CompanyLocations(ID, Company, Location);
-- Find company positions
SELECT ID, Company
FROM @Companies c1
WHERE NOT EXISTS (
SELECT *
FROM @Companies c2
WHERE c1.Company = c2.Company
AND c1.ID - 1 = c2.ID
)
There is more than one option for you to choose from.
Apart from the LAG replacement method suggested in James Anderson’s answer, you could for instance, employ the classic double ranking method of determining islands:
WITH
partitioned AS
(
SELECT
id,
company,
grp = ROW_NUMBER() OVER ( ORDER BY id ASC)
- ROW_NUMBER() OVER (PARTITION BY company ORDER BY id ASC)
FROM
dbo.atable
)
SELECT
id = MIN(id),
company
FROM
partitioned
GROUP BY
company,
grp
;
The grp
column helps you to distinguish between different islands of rows that have the same company
value (like, for instance, between the acme
rows before genco
and those after
genco` in your data sample). It is used as an additional grouping criterion in the main query.
If the id
values are already row numbers essentially (i.e. they have no gaps or duplicates), you can use id
instead of the first ROW_NUMBER call in the above query:
WITH
partitioned AS
(
SELECT
id,
company,
grp = id - ROW_NUMBER() OVER (PARTITION BY company ORDER BY id ASC)
FROM
dbo.atable
)
SELECT
id = MIN(id),
company
FROM
partitioned
GROUP BY
company,
grp
;
You could also have an alternative substitution for the missing LAG functionality, which uses OUTER APPLY:
SELECT
id,
company
FROM
dbo.atable AS t
OUTER APPLY
(
SELECT TOP 1
company
FROM
dbo.atable AS t2
WHERE
t2.id < t.id
ORDER BY
t2.id DESC
) AS o
WHERE
o.company <> t.company
OR o.company IS NULL
;
For each row the OUTER APPLY subquery returns company
from the preceding row (or NULL if it is the first row). The WHERE clause simply filters out the rows where the match has a different name or is a NULL.