We are running SQL Server 2016 and have a very simple stored procedure that does a select using a clustered index seek of a single row in a table with 140 million rows. In that table, we keep around the last 6 months of data, with roughly the same number of rows per day. On a daily basis, we delete the oldest day and insert the most recent day. The select stored procedure has isolation level set to read uncommitted (which I understand has its own issues, but we felt the risk of blocking was greater than the risk/likelihood of reading incomplete data).
Recently, the select query has been timing out at 30 seconds (a timeout value we set – not sure how long the query would have actually taken). The issue ultimately resolves itself anywhere between 30-120 seconds later.
I explored various possible causes, but my current theory is that the query optimizer detects out of date statistics, triggering a stats update on the table, which it waits to complete, and then creates an execution plan once the stats have been updated. This was confirmed by looking at the date/time the stats were last updated, which coincides exactly with the query slowdown.
Side note: the first time this issue occurred, it coincided with our daily insert of data. The second time it occurred, it didn’t seem to coincide with the insert or delete. But both times, the slowdown coincided with a stats update. I’m not sure why the stats became outdated the second time.
I see a couple options here:
- set auto stats update async so that the query optimizer doesn’t wait for the stats update to complete
- disable auto stats update and manually update stats nightly (would also probably specify a query hint in the stored procedure to use the clustered index, but not sure if that’s necessary).
So, does my theory for the cause sound reasonable?
Which of my options is best?
Are there other options?
So its not that you get a bad plan, your issue is that you have somebody becoming a victim of the auto-stats, right?
Since you probably know what table(s) we’re talking about, I suggest you disable auto-update at the table (or even stats) level. You can do this using
sp_autostats. Now you don’t affect the rest of the database. And of course handle the stats update on this stats yourself.
This assumes that you get OK plans even when stats are a bit out of whack. If not, then async can be an option, but you won’t have control over when the load of this stats update will take place.
I wouldn’t go the hint route unless you identify that you indeed get bad plans with old stats. So consider the plan quality a different aspect. They are related of course, but out-of-date stats doesn’t necessarily mean bad plans.
If it is helpful, you can capture when auto-update stats happens with an XE trace. Either just count how many you have over a day, say. That wuuld be using the Event Counter XE target. Or capture the actual event (to a file target) if you need more details than “I had this many in total over the time of the trace running”.
For instance, I have my “Looking for strange” trace for things like this (just counting how many – then modify the trace as you want to limit the events and/or add a file target).