I’m trying to find a way to get a count of unique foreign keys that are associated with at least two different Col2 values across multiple rows. Example:
I want to find the count of unique FK associated with BOTH Page1 and Page2 for a large amount of data. Any advice on the most efficient way to do this?
Edit: thanks for the responses and follow up. I will try to clarify: the desired result of the query I need for this table would be 1 (the number of unique values of FK that are associated with both page1 and page2). The question I’m looking to answer is essentially how many users hit page1 and also hit page2 in the same session. Each hit is a unique key under PK and whether they belong to the same session is determined by the key under FK.
You can use a window function for this, no self-joins are needed.
SELECT *, COUNT(*) OVER (PARTITION BY FK) FROM YourTable;
For efficiency you will want an index with
FK as the leading key column.
Try something like this…
SELECT COUNT(DISTINCT FK) FROM Table WHERE FK IN ( SELECT FK FROM Table GROUP BY FK HAVING COUNT(DISTINCT Col2) > 1 )
As for performance on a large table, I can’t advise on that without knowing the table definition and indexes that exist on it.
If I understand you correctly, you only want the unique count of
FK values, and only for when the same value exists for both
page2. I believe the following query accomplishes that:
SELECT COUNT(DISTINCT T1.FK) -- unique count of shared FK values FROM YourTable AS T1 INNER JOIN YourTable AS T2 ON T1.FK = T2.FK -- Join where they share the same FK... AND T1.Col2 = 'page1' -- ...and T1 is only page1... AND T2.Col2 = 'page2'; -- ...and T2 is only page2
For maximizing performance, I’d recommend the following index for the above query:
CREATE NONCLUSTERED INDEX IX_YourTable_FK_Col2 ON YourTable (FK, Col2);
If you really wanted to get crazy, you even try making it a filtered index:
CREATE NONCLUSTERED INDEX IX_YourTable_FK_Col2_Filtered ON YourTable (FK, Col2) WHERE Col2 IN ('page1', 'page2');
Though this may be overkill and at best have negligible performance gains compared to the previously mentioned index. It also limits the usability of the index if you ever wanted to ask the same query for other values of
Col2. But if you always are only going to run this query for
page1 and / or
page2 then you may find this index better, otherwise the first index I mentioned would be the way to go.