Update statistics

Posted on

Question :

For the data retention purpose I am deleting records from some 60 tables. Some of the tables have less number of records while some have more. I don’t want to update the stats for all the tables. Is there any cut off number of records like after this I need to update stats.

Answer :

It depends entirely on the number of rows in your of your tables, the number of rows you’re deleting in each table and the distribution of data in the columns you’re generally using as predicates in queries.

Imagine you have Table1 with ColumnA that has an uneven distribution of single characters (many with A & a few with Z). You regularly query this table looking for rows with one character or the other. The query optimiser uses statistics about the distribution of data in this column to determine that index seeks are the most efficient way to return all the rows with a ‘Z’ and a scan is best for ‘A’ because of the estimated number of rows with that value.

Now imagine you delete 3/4 of the rows in this table which incidentally eliminates most of the ‘A’ records and skews your data distribution so that ‘Z’ is the most common prefix. If your statistics aren’t updated, the query optimiser will still consider a scan for ‘A’ and a seek for ‘Z’ the most efficient operators, which may no longer be true because of the actual distribution of data in that column.

This is a contrived example but illustrates the importance of up to date statistics. There is also a good article on the importance of statistics here.

Generally, if you have AUTO_UPDATE_STATS enabled on your databases, the query optimiser will update stale statistics as/when required based on the number of modified rows.

In SQL Server 2005, AUTO_UPDATE_STATS doesn’t kick in until 500 rows + 20% of the rows in your table are modified. For smaller tables, say 10000 rows, this means 2000 rows need to be modified before AUTO_UPDATE_STATS will trigger statistics updates. For larger tables, say 100 million rows, you need to modify 20 million rows before it kicks in.

This means for larger tables, you generally want to manually update your statistics more frequently as the statistics can become stale while still not hitting this threshold.

What this all means is that, if you have AUTO_UPDATE_STATS turned on, your smaller tables will update their stats automatically when they’re next queried, which is probably desirable, and your larger tables likely won’t. While avoiding the performance impact of the stats updates on these large tables might be desirable, eventually, you might hit performance issues due to these stale stats, so you should consider manually updating them during a maintenance period.

Leave a Reply

Your email address will not be published. Required fields are marked *