Reducing impact of SUM(DATALENGTH()) over linked server

Posted on

Question :

I have a query that reports back to me how much time elapsed for records to be added. I also in this query track the size of an image that was added to the database. From what I can see it looks like it is returning the whole image from the linked server and then calculating the size. Is there an easy way I can get the calculation of size to occur on the other side of the link so that the return can be faster?

SELECT
    [Date] = CAST(DATEADD(DAY,-@DaysBack,GETDATE()) AS DATE)
    ,NOTES = N.RPMID
    ,TimeTaken = DATEDIFF("SECOND",SN.PickedUpDate,SN.ProcessedDate)
    ,PictureSize = SUM(CAST(DATALENGTH(SI.[Image])AS BIGINT))
    ,RecordsAdded = CAST(COUNT(*) AS BIGINT)
FROM
    DBServer1.DB1.dbo.Notes AS N WITH (NOLOCK)
    INNER JOIN DBServer1.DB1.dbo.NoteDistribution AS ND WITH (NOLOCK) ON N.Notes=ND.Notes
    INNER JOIN dbo.Submission_Notes AS SN WITH (NOLOCK) ON SN.RetailPlanGuid=N.RPMID
    LEFT JOIN DBServer1.DB1.dbo.SavedImage SI WITH (NOLOCK) ON SI.Notes=N.Notes
WHERE
    1=1
    AND N.RecordStampWhenAdd > CAST(GETDATE()-@DaysBack AS DATE)
    AND N.RecordStampWhenAdd < CAST(GETDATE()-@DaysBack+1 AS DATE)
    AND DATEDIFF("SECOND",SN.PickedUpDate,SN.ProcessedDate) > 0
GROUP BY
    N.RPMID
    ,DATEDIFF("SECOND",SN.PickedUpDate,SN.ProcessedDate)

The part in question is the picturesize that is returned. I want the calculations in there to be done on the server where the data is stored and just have the number return to me. Any help on this would be great. Thanks.

Answer :

I am not sure exactly how you have determined that all of the data is going across the wire (or why that should be the case) but you could always have a computed column:

ALTER TABLE dbo.SavedImage ADD PictureSize 
  AS CAST(DATALENGTH([Image]) AS BIGINT);

Or just add a column and calculate that yourself whenever you insert/update a row, either by forcing inserts/updates through a stored procedure, or using a trigger.

And persist and/or index that as necessary, and then just use SUM(PictureSize) in the query.

Of course the big impact is still going to be on the fact that you have to scan the entire table or index in order to generate the aggregate.

This method could also be used to maintain the total size of all columns in some other table, but I’m largely against materializing redundant data unless there is a demonstrated advantage to doing so.

In any case, I really don’t think this query is slow because it is moving the whole varbinary data over the wire and then calculating the length at the wrong end; if you have evidence to suggest that is the case, please update the question and we’ll try to address it.

Can you just make a ImageSize persisted computed column in the source database? That would calculate the size for you on insert and allow you to just request that data.

Leave a Reply

Your email address will not be published.