I have an operation that is going rather slowly, probably because my
CROSS JOIN first creates a ton of rows for SQL Server to work on, which are later
PIVOTed into far fewer rows and a bunch of columns. The aim is to get the “times each record was downloaded” per “unique organization name”, where the unique organization names are the unique
OrganizationName entries in the
AdamUser table and the times downloaded is the number of associated entries in the
MaintenanceJobHistory table. Lastly,
Record is joined onto each row to output more info about that record. Here’s the query:
SELECT * FROM ( SELECT rec.[Id] AS RecordId, CONCAT('Downloads_', usr.[OrganizationName]) AS OrganizationName, COUNT(hist.[Id]) AS TimesDownloaded -- To be aggregated by PIVOT FROM ( SELECT innerRec.[Id] FROM [dbo].[Record] innerRec INNER JOIN [dbo].[RecordClassificationLink] innerLnk ON innerLnk.[RecordId] = innerRec.[Id] -- WHERE (classification ID is foo or bar), for optional classification filtering GROUP BY innerRec.[Id] -- HAVING COUNT(innerLnk.ClassificationId) = (number of specified classifications), for optional classification filtering ) rec CROSS JOIN [dbo].[AdamUser] usr LEFT JOIN ( SELECT * FROM [dbo].[MaintenanceJobHistory] WHERE [CreatedOn] > '2016-01-01 12:00:00' AND [CreatedOn] < '2016-12-01 12:00:00' ) hist ON hist.[AccessingUser] = usr.[Name] AND hist.[RecordId] = rec.[Id] GROUP BY rec.[Id], usr.[OrganizationName] ) srcTable PIVOT -- Pivot around columns outside aggregation fn, eg. heading column [OrganizationName] & all other columns: [RecordId] ( MIN(srcTable.[TimesDownloaded]) FOR [OrganizationName] IN (...~200 columns...) ) pivotTable INNER JOIN [dbo].[Record] outerRec ON outerRec.[Id] = pivotTable.[RecordId]
And here’s the execution plan:
It does give me the correct output, but I noticed that the Sort operation before the Stream Aggregate is taking a long time, and spills into
tempdb. This may be why the query takes ~5 minutes with the ~200 columns I have for the normal dataset, whereas my test dataset with 6 columns in the
PIVOT only takes maybe half a minute. In terms of how many rows it’s dealing with, the two tables being
CROSS JOINed (
AdamUser) have 38397 and 1017 rows respectively.
Is there a way I can speed this up, or is it necessary to take this long for the number of rows and columns I have?
I didn’t find any way to speed up the
PIVOT operation – it’s just a big operation that takes time with a lot of data. The one thing I did that seemed to speed things up a small amount (perhaps the addition concatenation operator implemented more efficiently in SQL Server) was to change
CONCAT('Downloads_', usr.[OrganizationName]) to
'Downloads_' + usr.[OrganizationName]. The rest seems to be pretty much optimal.