Question :
Sql Server gives me a bad query plan and I’m trying to understand why that is.
The query is this:
SELECT TOP (10)
[Project1].[C1] AS [C1],
[Project1].[Id] AS [Id],
[Project1].[SupplierNumber] AS [SupplierNumber],
[Project1].[ArticleNumber] AS [ArticleNumber],
[Project1].[ArticleName] AS [ArticleName]
FROM ( SELECT
[Extent1].[SupplierNumber] AS [SupplierNumber],
[Extent1].[ArticleNumber] AS [ArticleNumber],
[Extent1].[Id] AS [Id],
[Extent1].[ArticleName] AS [ArticleName],
1 AS [C1]
FROM [dbo].[SalesEntry] AS [Extent1]
LEFT OUTER LOOP JOIN [dbo].[Article] AS [Extent2]
ON ([Extent1].[ArticleNumber] = [Extent2].[ArticleNumber])
AND ([Extent1].[SupplierNumber] = [Extent2].[SupplierNumber])
WHERE [Extent2].[id] IS NULL
) AS [Project1]
ORDER BY [Project1].[SupplierNumber] ASC, [Project1].[ArticleNumber] ASC
OPTION (TABLE HINT ([Extent1], INDEX(IX_Main)))
I have already annotated the query with two hints:
- the join is forced to be a loop join and
- I force an index that fits the order by criteria.
With these hints, I get an efficient query plan that looks like this:
Scan index
IX_Main
onSalesEntry
and for each 10 entries, look up the respective article entries withIX_Main
onArticles
.
Both tables have an IX_Main
index on (SupplierNumber
, ArticleNumber
).
That way the query is fast.
Without the hints, however, Sql Server does a clustered index scan on SalesEntry
, which is not useful at all, and an index scan on IX_Main
for Article
, and then brings the two streams together with a hash match.
That’s not so fast, particularly because all the rows of SalesEntry
now need to be scanned although we’re only interested in the top 10 regarding IX_Main
.
I’m confused as to why Sql Server would make that decision.
There’s a TOP 10
specifier. That should tell Sql Server that it can get enough rows for the result super-fast with the index it chooses to ignore (IX_Main). It then would need to do only a lousy ten lookups with the index IX_Main
on Article
.
I already tried and failed to reduce this to a simple example that can be reproduced, so I’m putting this with as much information out there as I think it relevant.
Does anyone have an idea about what Sql Server’s thought process might be?
(The query looks a bit weird as it is based on what my ORM, Entity Framework produces.)
EDIT: Here’s the problematic plan as xml in a gist.
Answer :
That top being kind of removed from the order by makes is hard for the query optimizer
It does not need to just do a lousy 10 lookups as it need top 10 WHERE [Extent2].[id] IS NULL
With more statistics the query optimizer may get smarter
I know you are using an ORM but give this a try
SELECT Top(10) 1 AS [C1],
[Extent1].[SupplierNumber] AS [SupplierNumber],
[Extent1].[ArticleNumber] AS [ArticleNumber],
[Extent1].[Id] AS [Id],
[Extent1].[ArticleName] AS [ArticleName]
FROM [dbo].[SalesEntry] AS [Extent1]
LEFT OUTER JOIN [dbo].[Article] AS [Extent2]
ON [Extent1].[ArticleNumber] = [Extent2].[ArticleNumber]
AND [Extent1].[SupplierNumber] = [Extent2].[SupplierNumber]
WHERE [Extent2].[id] IS NULL
ORDER BY [Extent1].[SupplierNumber] ASC, [Extent1].[ArticleNumber] ASC