Speed up Key Lookup (clustered)

Posted on

Question :

I have a slow query in Sql 2017. Key lookup (Clustered) is taking 62%. There is also an Index Seek (nonclustered) taking 30%.

Is there anyway I can speed this query up?

Execution plan: https://www.brentozar.com/pastetheplan/?id=r1oIcWofd

exec sp_executesql 
N'
SELECT TOP(@__p_4) 
    [v].[County], 
    [v].[Community], 
    [v].[DataProvider], 
    [v].[Site], 
    [v].[Latitude], 
    [v].[Longitude], 
    [v].[Elevation], 
    [v].[Instrument], 
    [v].[MonitorId], 
    [v].[ParameterId], 
    [v].[Parameter], 
    [v].[MeasurementStartTime], 
    [v].[MeasuredValue], 
    [v].[AdjustedValue], 
    [v].[Units], 
    [v].[AdjustmentDesc], 
    [v].[LowerLimitCheck], 
    [v].[UpperLimitCheck], 
    [v].[RepeatingValueCheck], 
    [v].[SpikeCheck], 
    [v].[QualityPreliminary]
FROM [vw_DownloadDataSubhourly] AS [v]
WHERE 
(
    (([v].[CountyId] = @__Parse_0) AND ([v].[MeasurementStartTime] >= @__start_1)) 
    AND ([v].[MeasurementStartTime] <= @__end_2)
) 
AND [v].[ParameterType] IN (''PM2.5'')
ORDER BY [v].[Site], [v].[MonitorId], [v].[ParameterId], [v].[MeasurementStartTime]
',
N'@__p_4 int,@__Parse_0 int,@__start_1 datetime,@__end_2 datetime',
@__p_4=750000,@__Parse_0=34,@__start_1='2019-10-01 00:00:00',@__end_2='2020-12-31 23:59:59'

vw_DownloadDataSubhourly
Execution plan: https://www.brentozar.com/PasteThePlan/?id=BkiX1Vsz_


CREATE view [dbo].[vw_DownloadDataSubhourly] as

SELECT      
                DataProvider.DataProviderId, 
                County.CountyId, 
                Community.CommunityId, 
                County.CountyName as "County",
                Community.CommunityName as "Community",
                DataProvider.DataProviderShortName as "DataProvider", 
                Site.SiteLongName as "Site",
                SiteConfig.LatitudeTrunc as "Latitude",
                SiteConfig.LongitudeTrunc as "Longitude",
                SiteConfig.Elevation,
                Monitor.ExternalMonitorId as "MonitorId",
                Parameter.ParameterId as "ParameterId",
                Parameter.ParameterNamePublic "Parameter",
                Parameter.ParameterType "ParameterType",
                SubHourlyMeasurementData.StartDateTime "MeasurementStartTime",
                SubHourlyMeasurementData.AObs as "MeasuredValue",
                SubHourlyMeasurementData.AObsAdj as "AdjustedValue",
                UOM.UOMAbbreviation as "Units",
                AdjCode.AdjCodeDescription as "AdjustmentDesc",
                concat(MonitorConfig.MonitorMfr, ' ', MonitorConfig.MonitorModel) as "Instrument",
                case 
                    when SubHourlyMeasurementData.QF06 = '-3' then 'QC not computed'
                    when SubHourlyMeasurementData.QF06 = '-2' then 'Not enough data'
                    when SubHourlyMeasurementData.QF06 = '-1' then 'Not enough site/monitor info'
                    when SubHourlyMeasurementData.QF06 = '0' then 'Pass'
                    when SubHourlyMeasurementData.QF06 = '1' then 'Fail'
                    else NULL
                    end as "LowerLimitCheck",
                case 
                    when SubHourlyMeasurementData.QF05 = '-3' then 'QC not computed'
                    when SubHourlyMeasurementData.QF05 = '-2' then 'Not enough data'
                    when SubHourlyMeasurementData.QF05 = '-1' then 'Not enough site/monitor info'
                    when SubHourlyMeasurementData.QF05 = '0' then 'Pass'
                    when SubHourlyMeasurementData.QF05 = '1' then 'Fail'
                    else NULL
                    end as "UpperLimitCheck",
                case 
                    when SubHourlyMeasurementData.QF04 = '-3' then 'QC not computed'
                    when SubHourlyMeasurementData.QF04 = '-2' then 'Not enough data'
                    when SubHourlyMeasurementData.QF04 = '-1' then 'Not enough site/monitor info'
                    when SubHourlyMeasurementData.QF04 = '0' then 'Pass'
                    when SubHourlyMeasurementData.QF04 = '1' then 'Fail'
                    else NULL
                    end as "RepeatingValueCheck",
                case 
                    when SubHourlyMeasurementData.QF07 = '-3' then 'QC not computed'
                    when SubHourlyMeasurementData.QF07 = '-2' then 'Not enough data'
                    when SubHourlyMeasurementData.QF07 = '-1' then 'Not enough site/monitor info'
                    when SubHourlyMeasurementData.QF07 = '0' then 'Pass'
                    when SubHourlyMeasurementData.QF07 = '1' then 'Fail'
                    else NULL
                    end as "SpikeCheck",
                case 
                    when SubHourlyMeasurementData.QOverall = '-3' then 'QC not computed'
                    when SubHourlyMeasurementData.QOverall = '-2' then 'Not enough data'
                    when SubHourlyMeasurementData.QOverall = '-1' then 'Not enough site/monitor info'
                    when SubHourlyMeasurementData.QOverall = '0' then 'Passed all checks'
                    when SubHourlyMeasurementData.QOverall = '1' then 'Suspect'
                    else NULL
                    end as "QualityPreliminary"

FROM           SubHourlyMeasurementData 
                         INNER JOIN
                         DataProvider ON SubHourlyMeasurementData.DataProviderId = DataProvider.DataProviderId 
                         INNER JOIN
                         Site ON SubHourlyMeasurementData.SiteId = Site.SiteId AND Site.DataProviderId = DataProvider.DataProviderId
                         LEFT JOIN Community ON Site.CommunityId = Community.CommunityId
                         INNER JOIN 
                         SiteConfig ON SiteConfig.SiteId = SubHourlyMeasurementData.SiteId
                         INNER JOIN
                         County ON County.CountyId = SiteConfig.CountyId
                         INNER JOIN
                         Monitor ON Monitor.MonitorId = SubHourlyMeasurementData.MonitorId
                         INNER JOIN
                         MonitorConfig ON MonitorConfig.MonitorConfigId = SubHourlyMeasurementData.MonitorConfigId
                         INNER JOIN
                         Parameter ON Parameter.ParameterId = SubHourlyMeasurementData.ParameterId
                         INNER JOIN
                         UOM ON SubHourlyMeasurementData.UOMId = UOM.UOMId
                         LEFT JOIN
                         AdjCode ON SubHourlyMeasurementData.AdjCodeId = AdjCode.AdjCodeId
                         INNER JOIN
                         StreamSegment ON SubHourlyMeasurementData.StreamSegmentId = StreamSegment.StreamSegmentId
                         INNER JOIN 
                         MeasurementTechnique ON MeasurementTechnique.MeasurementTechniqueId = StreamSegment.MeasurementTechniqueId
                         AND MeasurementTechnique.ParameterId = SubHourlyMeasurementData.ParameterId
                         INNER JOIN
                         SamplingDuration ON SamplingDuration.SamplingDurationId = StreamSegment.SamplingDurationId
                         INNER JOIN
                         SamplingFrequencyUnits ON SamplingFrequencyUnits.SamplingFrequencyUnitsId = StreamSegment.SamplingFrequencyUnitsId

WHERE SubHourlyMeasurementData.StartDateTime >= MonitorConfig.StartDateTime and StreamSegment.IsDownloadable = 1 and SubHourlyMeasurementData.iscalculated = 1 

GO

Answer :

J.D. has covered the stats bit – essentially, you might get better estimates if you have a more representative stats sample for the indexes on the SubHourlyMeasurementData table. Right now it’s only sampling around 1% of the rows in the table:

Screenshot of stats details

Here we can see that the stats have been updated (ModificationCount is 0), but the SamplingPercent was only 1.41871. SQL Server picks this percentage automatically based on the number of rows in the table.

You can try updating stats with FULLSCAN, which might take a while:

UPDATE STATISTICS dbo.SubHourlyMeasurementData
WITH FULLSCAN;

If that helps you can work down to a more manageable percentage that still solves the problem:

UPDATE STATISTICS dbo.SubHourlyMeasurementData
SAMPLE 50 PERCENT;

Regarding the key lookup, that is required in order check the residual predicate IsCalculated=1:

Screenshot of key lookup doing a residual predicate

You can avoid the lookup by adding IsCalculated to the INCLUDE list for the <IDX_DownloadDataSubHourly> index:

CREATE NONCLUSTERED INDEX [<IDX_DownloadDataSubHourly>]
ON dbo.SubHourlyMeasurementData (...)
INCLUDE (..., IsCalculated);

If that predicate is used a lot, and it’s fairly selective, you might consider creating a filtered index for it (or adding WHERE IsCalculated=1 to the current nonclustered index, but that might break other queries trying to use it).

So you definitely have some wild cardinality estimate issues going on based on your execution plan (along with some sub-performant index scans and warnings such as residual I/O issues probably resultant of the cardinality estimate issues).

Please see the following couple of cases where SQL Server thinks your query is only going to return a small number of rows (Estimated Rows) but the Actual Rows is much larger, in the tens of millions:

Cardinality Estimate Issue #1

Cardinality Estimate Issue #2

Cardinality Estimate Issue #3

Unfortunately it looks like this is all coming from the query under the hood of the view you’re querying, so without knowing what that query is actually doing, it’s tough to debug the issue. If you could please provide the view definition of vw_DownloadDataSubhourly, we should be able to help you trace the source of these cardinality estimate issues.

Cardinality estimate issues occur when the SQL Engine believes that the amount of rows an operation of your execution plan is going to return severely differs from the actual number of rows that are returned (usually I find when an order of magnitude of difference between those two statistics are exhibited, a cardinality estimate issue exists). The further off those two are, generally the worse the issue is. Especially in the case when the Estimated Rows are less than the Actual Rows because that means the SQL Engine provisioned less resources (e.g. Memory) than needed to serve the data for your query, hence the slow runtime.

Barring having the view definition of vw_DownloadDataSubhourly, the only thing I can recommend at this time is trying to update statistics on the entire table for some of these one’s exhibiting cardinality estimate issues. Here’s an additional resource regarding statistics and how to manually and automatically update them. For now I would recommend trying to manually update them for the following tables roughly in this order:

  1. dbo.SubHourlyMeasurementData
  2. dbo.AdjCode
  3. dbo.Community
  4. dbo.SamplingFrequencyUnits
  5. dbo.SiteConfig
  6. dbo.Site
  7. dbo.SamplingDuration
  8. dbo.MonitorConfig
  9. dbo.DataProvider
  10. dbo.UOM
  11. dbo.Parameter

Unfortunately a majority of the tables referenced by your view vw_DownloadDataSubhourly are facing cardinality estimate issues.

You’ll want to ensure you update the statistics with the FULLSCAN option to ensure you get accurate calculations on your larger tables. Also please note for larger tables, updating statistics could take a bit of time to run and potentially be resource intensive on the server during that time period.

Leave a Reply

Your email address will not be published. Required fields are marked *