CPU impact, Scan count 9, logical reads 1686, physical reads 0 vs Scan count 1, logical reads 10253, physical reads 20

Posted on

Question :

I have some 3rd party software that is doing so often the same update query, on 10 million rows table with 1 GB data

(@P1 bigint,@P2 bigint,@P3 bigint,@P4 bigint,@P5 bigint) 
UPDATE top(1000) hsi.hspendingitems  
   SET status = @P1  
WHERE handlertype = @P2  
AND status in (@P3,@P4,@P5)

Initial Results (without modifications)

Query 1 plan:

https://www.brentozar.com/pastetheplan/?id=Syv7OxRHW

Query IO statistics

Table 'hspendingitems'. Scan count 9, logical reads 1686, physical reads 0, 
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  

Additional Index

I then added an index:

CREATE NONCLUSTERED INDEX [ix_test_dba] ON [hsi].[hspendingitems]
(
[handlertype] ASC,
[status] ASC
)
WITH 
(
PAD_INDEX = OFF, 
STATISTICS_NORECOMPUTE = OFF, 
SORT_IN_TEMPDB = OFF, 
DROP_EXISTING = OFF, 
ONLINE = OFF, 
ALLOW_ROW_LOCKS = ON, 
ALLOW_PAGE_LOCKS = ON
) 
ON [DBSpace8]

Results with New Index

Query 2 plan:
https://www.brentozar.com/pastetheplan/?id=S1yjulAS-

Query statistics:

Table 'hspendingitems'. Scan count 1, logical reads 10253, physical reads 20,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Is it better for SQL Server to do less scans but more logical reads?

I noticed that the cost plan is also different:

Initial Cost 102.937 vs with Index 0.0232853 ?

Vendor is aware to add that index, what do you think I should push vendor for adding that index?

Answer :

The context is important when trying to compare the IO statistics of queries. Let’s just start by describing what the queries do based on their plans.

The first query scans the heap in parallel until it finds 1000 rows that need to be updated. The update is then performed on the table. In the worst case scenario for performance, the entire table will need to be scanned.

The second query does an index seek until it finds 1000 rows that need to be updated. Both the table data and the index needs to be updated, because you’re changing a column in the index. The query’s cost is so low that it isn’t eligible for parallelism.

Now let’s go back to the IO statistics to see if they make more sense. For the first query:

Table ‘hspendingitems’. Scan count 9, logical reads 1686, physical
reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads
0, lob read-ahead reads 0.

The scan count of 9 means that the query executed with a DOP of 8. The logical read count represents the number of reads needed to find the first 1000 rows and activity required to update the table’s data. I’m assuming that the UPDATE query runs in a loop because it only updates 1000 rows. As the query runs it’s likely that the logical read count will increase because SQL Server will need to scan more of the table to find the next 1000 rows to update.

For the second query:

Table ‘hspendingitems’. Scan count 1, logical reads 10253, physical
reads 20, read-ahead reads 0, lob logical reads 0, lob physical reads
0, lob read-ahead reads 0.

The query doesn’t run in parallel so the scan count is 1. The nonclustered index that you created is stored as a separate object so both the table and the index’s data need to be updated. I would expect the performance of this query to remain roughly constant as the query runs in a loop.

Should the index be created? It depends. Is there a performance problem right now with the current code? Does the index solve the performance problem in the best way? Are there other queries that could benefit from the index? Will the index slow down other DML on the table too much? We can’t answer those questions for you, but I will say that updating 1000 rows against an unindexed heap in a loop often isn’t a good strategy in terms of performance.

Leave a Reply

Your email address will not be published. Required fields are marked *