CROSS JOIN with later PIVOT going rather slowly

Posted on

Question :

I have an operation that is going rather slowly, probably because my CROSS JOIN first creates a ton of rows for SQL Server to work on, which are later GROUPed and PIVOTed into far fewer rows and a bunch of columns. The aim is to get the “times each record was downloaded” per “unique organization name”, where the unique organization names are the unique OrganizationName entries in the AdamUser table and the times downloaded is the number of associated entries in the MaintenanceJobHistory table. Lastly, Record is joined onto each row to output more info about that record. Here’s the query:

SELECT *
FROM (
    SELECT
        rec.[Id]                                      AS RecordId,
        CONCAT('Downloads_', usr.[OrganizationName])  AS OrganizationName,
        COUNT(hist.[Id])                              AS TimesDownloaded  -- To be aggregated by PIVOT
    FROM (
        SELECT
            innerRec.[Id]
        FROM
            [dbo].[Record] innerRec
        INNER JOIN
            [dbo].[RecordClassificationLink] innerLnk ON innerLnk.[RecordId] = innerRec.[Id]
        -- WHERE (classification ID is foo or bar), for optional classification filtering
        GROUP BY
            innerRec.[Id]
        -- HAVING COUNT(innerLnk.ClassificationId) = (number of specified classifications), for optional classification filtering
    ) rec
    CROSS JOIN [dbo].[AdamUser] usr
    LEFT JOIN (
        SELECT * FROM [dbo].[MaintenanceJobHistory] WHERE [CreatedOn] > '2016-01-01 12:00:00' AND [CreatedOn] < '2016-12-01 12:00:00'
    ) hist ON hist.[AccessingUser] = usr.[Name] AND hist.[RecordId] = rec.[Id]
    GROUP BY
        rec.[Id], usr.[OrganizationName]
) srcTable
PIVOT  -- Pivot around columns outside aggregation fn, eg. heading column [OrganizationName] & all other columns: [RecordId]
(
    MIN(srcTable.[TimesDownloaded]) FOR [OrganizationName] IN (...~200 columns...)
) pivotTable
INNER JOIN [dbo].[Record] outerRec ON outerRec.[Id] = pivotTable.[RecordId]

And here’s the execution plan:

https://www.brentozar.com/pastetheplan/?id=ry8tXM3mg

It does give me the correct output, but I noticed that the Sort operation before the Stream Aggregate is taking a long time, and spills into tempdb. This may be why the query takes ~5 minutes with the ~200 columns I have for the normal dataset, whereas my test dataset with 6 columns in the PIVOT only takes maybe half a minute. In terms of how many rows it’s dealing with, the two tables being CROSS JOINed (Record and AdamUser) have 38397 and 1017 rows respectively.

Is there a way I can speed this up, or is it necessary to take this long for the number of rows and columns I have?

Answer :

I didn’t find any way to speed up the PIVOT operation – it’s just a big operation that takes time with a lot of data. The one thing I did that seemed to speed things up a small amount (perhaps the addition concatenation operator implemented more efficiently in SQL Server) was to change CONCAT('Downloads_', usr.[OrganizationName]) to 'Downloads_' + usr.[OrganizationName]. The rest seems to be pretty much optimal.

Leave a Reply

Your email address will not be published. Required fields are marked *