Execution plan vs STATISTICS IO order

Posted on

Question :

SQL Server graphical execution plans read right to left and top to bottom. Is there a meaningful order to the output generated by SET STATISTICS IO ON?

The following query:

SET STATISTICS IO ON;

SELECT  *
FROM    Sales.SalesOrderHeader AS soh
        JOIN Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID
        JOIN Production.Product AS p ON sod.ProductID = p.ProductID;

Generates this plan:

Graphical execution plan

And this STATISTICS IO output:

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 1, logical reads 1246, physical reads 3, read-ahead reads 1277, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 689, physical reads 1, read-ahead reads 685, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Product'. Scan count 1, logical reads 15, physical reads 1, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So, I reiterate: what gives? Is there a meaningful ordering to STATISTICS IO output or is some arbitrary order used?

Answer :

My initial playing around with various queries suggested no pattern at all but on paying closer attention it appears to be predictable for serial plans. I ended up at KB314648 which @AustinZellner mentions:

Each SQL Server connection has an associated process status structure
(PSS) that maintains connection-specific state information. Each
unique server process ID (SPID) in the sysprocesses system table
represents a different PSS, and the information in the sysprocesses
virtual table is a “view” into this status information.

And the section relevant to your question:

If STATISTICS IO is enabled for a connection, SQL Server allocates an
array during query execution to track IO information on a per-table
basis. As SQL Server processes the query, it records each logical
request for a page in the appropriate table’s entry in this array,
along with whether that logical IO request resulted in a physical IO.
SQL Server returns the information, at the end of the query, in error
message 3615.

The observed behaviour suggests that entries are made to the array in the order that IO is generated, essentially the result of GetNext() on a physical operator. The last entry in the statistics output is the first table that resulted in an IO being recorded, the first entry is the last table. I’d speculate that the order for parallel plans is not predictable (or less so) as there is no guarantee as to which parallel task will be scheduled first.

Looks to me like it is the opposite order of data read access in the plan. Your plan will first read from the Product table to build the hash table (worktable). Than it reads from SalesOrderHeader and form SalesOrderDetail combining them with the merge join operator. The worktable then is read from last to hash-match the original Product rows with those from the merge join. That is the exact opposite order in which they are listed in your statistics output.

However, I am not aware of any documentation that would specify this. If you want to be sure in what order table access happened, read the execution plan.

I always thought it had an order, from back when I did more programming than administration. I ran through a few execution plans and double checked my beliefs.

Here is what I see:

In a multi-step query (such as many of our stored procedures) the order reflects the physical order in which the queries are run.

For a particular query, it looks like the statistics IO reflect the execution plan by reporting statistics starting at from the right and working to the left

Perhaps this is more of an observation than anything else.

So I think that the results of statistics io give much more insight into what is actually happening at runtime, as it will take into account and be affected by the need to read from disk instead of cache, and also be influenced by permissions of the account that the query is being run under. The position of the table in the statistics return then is influenced by other factors than those considered by the profiler.

Here is a kb article that gives the insight and some examples:
http://support.microsoft.com/kb/314648

Leave a Reply

Your email address will not be published. Required fields are marked *