declare @X xml = ' <item ID = "0"/> <item ID = "1"/> <item/> <item/>'; select I.X.value('@ID', 'int') from @X.nodes('/item') as I(X);
----------- 0 1 NULL NULL
The top branch shreds the XML to four rows and the bottom branch fetches the value for the attribute
What strikes me as odd is the number of rows returned from the Stream Aggregate operator. The 2 rows that comes from the Filter is the
ID attribute from the first and second
item nodes in the XML. The Stream Aggregate returns four rows, one for each input row, effectively turning the Inner Join to an Outer Join.
Is this something that Stream Aggregate does in other circumstances as well or is it just something odd going on when doing XML queries?
I can not see any hints in the XML version of the query plan that this Stream Aggregate should behave any differently than any other Stream Aggregate I have noticed before.
The aggregate is a scalar aggregate (no group by clause). These are defined in SQL Server to always produce a row, even if the input is empty.
For a scalar aggregate,
MAX of no rows is
COUNT of no rows is zero, for example. The optimizer knows all about this, and can transform an outer join into an inner join in suitable circumstances.
-- NULL for a scalar aggregate SELECT MAX(V.v) FROM (VALUES(1)) AS V (v) WHERE V.v = 2; -- No row for a vector aggregate SELECT MAX(V.v) FROM (VALUES(1)) AS V (v) WHERE V.v = 2 GROUP BY ();
For more about aggregates, see my article Fun With Scalar and Vector Aggregates.
The thing to remember here is that execution plans suck the data through.
So the Nested Loop operator calls the Stream Aggregate 4 times. The Stream Aggregate calls the Filter 4 times as well, but only gets a value twice.
So the Stream Aggregate gives four values. Twice it gives a value, and twice it gives Null.