OFFSET ... FETCH model introduces with SQL Server 2012 offers simple and faster paging. Why are there any differences at all considering that the two forms are semantically identical and very common?
One would assume that the optimizer recognizes both and optimizes them (trivially) to the fullest.
Here is a very simple case where
OFFSET ... FETCH is ~2x faster according to the cost estimate.
SELECT * INTO #objects FROM sys.objects SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY object_id) r FROM #objects ) x WHERE r >= 30 AND r < (30 + 10) ORDER BY object_id SELECT * FROM #objects ORDER BY object_id OFFSET 30 ROWS FETCH NEXT 10 ROWS ONLY
One can vary this test case by creating a CI on
object_id or adding filters but it is impossible to remove all plan differences.
OFFSET ... FETCH is always faster because it does less work at execution time.
The examples in the question do not quite produce the same results (the
OFFSET example has an off-by-one error). The updated forms below fix that issue, remove the extra sort for the
ROW_NUMBER case, and use variables to make the solution more general:
DECLARE @PageSize bigint = 10, @PageNumber integer = 3; WITH Numbered AS ( SELECT TOP ((@PageNumber + 1) * @PageSize) o.*, rn = ROW_NUMBER() OVER ( ORDER BY o.[object_id]) FROM #objects AS o ORDER BY o.[object_id] ) SELECT x.name, x.[object_id], x.principal_id, x.[schema_id], x.parent_object_id, x.[type], x.type_desc, x.create_date, x.modify_date, x.is_ms_shipped, x.is_published, x.is_schema_published FROM Numbered AS x WHERE x.rn >= @PageNumber * @PageSize AND x.rn < ((@PageNumber + 1) * @PageSize) ORDER BY x.[object_id]; SELECT o.name, o.[object_id], o.principal_id, o.[schema_id], o.parent_object_id, o.[type], o.type_desc, o.create_date, o.modify_date, o.is_ms_shipped, o.is_published, o.is_schema_published FROM #objects AS o ORDER BY o.[object_id] OFFSET @PageNumber * @PageSize - 1 ROWS FETCH NEXT @PageSize ROWS ONLY;
ROW_NUMBER plan has an estimated cost of 0.0197935:
OFFSET plan has an estimated cost of 0.0196955:
That is a saving of 0.000098 estimated cost units (though the
OFFSET plan would require extra operators if you want to return a row number for each row). The
OFFSET plan will still be slightly cheaper, generally speaking, but do remember that estimated costs are exactly that – real testing is still required. The bulk of the cost in both plans is the cost of the full sort of the input set, so helpful indexes would benefit both solutions.
Where constant literal values are used (e.g.
OFFSET 30 in the original example) the optimizer can use a TopN Sort instead of a full sort followed by a Top. When the rows needed from the TopN Sort is a constant literal and <= 100 (the sum of
FETCH) the execution engine can use a different sort algorithm which can perform faster than generalized TopN sort. All three cases have different performance characteristics overall.
As to why the optimizer does not automatically transform the
ROW_NUMBER syntax pattern to use
OFFSET, there are a number of reasons:
- It’s almost impossible to write a transform that would match all existing uses
- Having some paging queries automatically transformed and not others could be confusing
OFFSETplan is not guaranteed to be better in all cases
One example for the third point above occurs where the paging set is quite wide. It can be much more efficient to seek the keys needed using a nonclustered index and manually lookup against the clustered index compared with scanning the index with
ROW_NUMBER. There are additional issues to consider if the paging application needs to know how many rows or pages there are in total. There is another good discussion of the relative merits of the ‘key seek’ and ‘offset’ methods here.
Overall, it is probably better that people make an informed decision to change their paging queries to use
OFFSET, if appropriate, after thorough testing.
With a slight fiddling of your query I get an equal cost estimate (50/50) and equal IO stats:
; WITH cte AS ( SELECT *, ROW_NUMBER() OVER (ORDER BY object_id) r FROM #objects ) SELECT * FROM cte WHERE r >= 30 AND r < 40 ORDER BY r SELECT * FROM #objects ORDER BY object_id OFFSET 30 ROWS FETCH NEXT 10 ROWS ONLY
This avoids the additional sort that appears in your version by sorting on
r instead of
They modified the query optimizer to add this feature in. Meaning they implemented mechanisms specifically to support the offset … fetch command. In other words for the top query SQL Server has to do a lot more work. Thus the difference in query plans.