There’s lots of superb information out there about the certain approaches that can be used to mitigate bad parameter sniffing, but there’s not a lot of information about how to go about identifying a bad parameter sniffing problem.
Assuming you’re not using SQL Server 2016 so can’t leverage Query Store and you don’t have users complaining about a problematic stored procedure that intermittently performs poorly, what methods can be used to proactively identify a bad parameter sniffing problem?
Utilising Extended Events, RML Utilities and simply querying/polling the DMVs all seem like viable approaches, does anyone have a methodology using either of these or know of any good resources that describe how to go about doing this?
Identifying parameter sniffing is hard work! It may not be apparent using monitoring tools or queries, because it typically requires some analysis of other executions of the query plan.
This is where the DMVs, and a plan cache analysis tool like sp_BlitzCache can help. You can download it here. Full disclosure: I work for Brent Ozar, and contribute to the project.
It ONLY uses DMVs that would be available by your guidelines to perform analysis of the plan cache, etc. It’s also free. If you can’t install the stored proc, the relevant bit of code to attempt to identify parameter sniffing is here:
parameter_sniffing = CASE WHEN AverageReads > @parameter_sniffing_io_threshold AND min_worker_time < ((1.0 - (@parameter_sniffing_warning_pct / 100.0)) * AverageCPU) THEN 1 WHEN AverageReads > @parameter_sniffing_io_threshold AND max_worker_time > ((1.0 + (@parameter_sniffing_warning_pct / 100.0)) * AverageCPU) THEN 1 WHEN AverageReads > @parameter_sniffing_io_threshold AND MinReturnedRows < ((1.0 - (@parameter_sniffing_warning_pct / 100.0)) * AverageReturnedRows) THEN 1 WHEN AverageReads > @parameter_sniffing_io_threshold AND MaxReturnedRows > ((1.0 + (@parameter_sniffing_warning_pct / 100.0)) * AverageReturnedRows) THEN 1 END
The variables are configurable if you use the stored procedure. If you want to roll your own, how you assign the values is up to you. What we look at is if the average of certain plan attributes are wildly exceeded or, uh… underceeded by the min/max values of those attributes.
These are usually good indicators of parameter sniffing. Thinking about a ‘small’ value plan that performs a key lookup 5 times (and works well for most values over a number of executions). If a ‘large’ value comes along next and uses the cached plan, forcing the key lookup to execute hundreds of thousands of times, it would use more cpu/do more reads/run longer than average.
Hope this helps!