Question :
I have an operation that is going rather slowly, probably because my CROSS JOIN
first creates a ton of rows for SQL Server to work on, which are later GROUP
ed and PIVOT
ed into far fewer rows and a bunch of columns. The aim is to get the “times each record was downloaded” per “unique organization name”, where the unique organization names are the unique OrganizationName
entries in the AdamUser
table and the times downloaded is the number of associated entries in the MaintenanceJobHistory
table. Lastly, Record
is joined onto each row to output more info about that record. Here’s the query:
SELECT *
FROM (
SELECT
rec.[Id] AS RecordId,
CONCAT('Downloads_', usr.[OrganizationName]) AS OrganizationName,
COUNT(hist.[Id]) AS TimesDownloaded -- To be aggregated by PIVOT
FROM (
SELECT
innerRec.[Id]
FROM
[dbo].[Record] innerRec
INNER JOIN
[dbo].[RecordClassificationLink] innerLnk ON innerLnk.[RecordId] = innerRec.[Id]
-- WHERE (classification ID is foo or bar), for optional classification filtering
GROUP BY
innerRec.[Id]
-- HAVING COUNT(innerLnk.ClassificationId) = (number of specified classifications), for optional classification filtering
) rec
CROSS JOIN [dbo].[AdamUser] usr
LEFT JOIN (
SELECT * FROM [dbo].[MaintenanceJobHistory] WHERE [CreatedOn] > '2016-01-01 12:00:00' AND [CreatedOn] < '2016-12-01 12:00:00'
) hist ON hist.[AccessingUser] = usr.[Name] AND hist.[RecordId] = rec.[Id]
GROUP BY
rec.[Id], usr.[OrganizationName]
) srcTable
PIVOT -- Pivot around columns outside aggregation fn, eg. heading column [OrganizationName] & all other columns: [RecordId]
(
MIN(srcTable.[TimesDownloaded]) FOR [OrganizationName] IN (...~200 columns...)
) pivotTable
INNER JOIN [dbo].[Record] outerRec ON outerRec.[Id] = pivotTable.[RecordId]
And here’s the execution plan:
https://www.brentozar.com/pastetheplan/?id=ry8tXM3mg
It does give me the correct output, but I noticed that the Sort operation before the Stream Aggregate is taking a long time, and spills into tempdb
. This may be why the query takes ~5 minutes with the ~200 columns I have for the normal dataset, whereas my test dataset with 6 columns in the PIVOT
only takes maybe half a minute. In terms of how many rows it’s dealing with, the two tables being CROSS JOIN
ed (Record
and AdamUser
) have 38397 and 1017 rows respectively.
Is there a way I can speed this up, or is it necessary to take this long for the number of rows and columns I have?
Answer :
I didn’t find any way to speed up the PIVOT
operation – it’s just a big operation that takes time with a lot of data. The one thing I did that seemed to speed things up a small amount (perhaps the addition concatenation operator implemented more efficiently in SQL Server) was to change CONCAT('Downloads_', usr.[OrganizationName])
to 'Downloads_' + usr.[OrganizationName]
. The rest seems to be pretty much optimal.