Question :
I have roughly 1.5 million records. This query works fine for smaller data, but it gets a second slower for every few hundred thousand records. For 1.5 million, the query is noticeably struggling.
I mocked up a whole new Code First project to verify the problem is independent of my implementation, and to provide this example query:
SELECT TOP 1 Widgets.*
FROM [WidgetSandbox].[dbo].[Widgets] Widgets
INNER JOIN [WidgetSandbox].[dbo].[Status] Statuses ON Widgets.StatusId = Statuses.Id
INNER JOIN [WidgetSandbox].[dbo].[Colors] Colors ON Widgets.ColorCode = Colors.ColorCode
INNER JOIN [WidgetSandbox].[dbo].[Sizes] Sizes ON Widgets.SizeId = Sizes.Id
WHERE Statuses.Name = 'Available'
AND Colors.Name = 'Red'
ORDER BY Sizes.DiameterInches
DiameterInches is an int
, as a metaphor for “PriorityLevel” in my actual code.
If I comment out ORDER BY Sizes.DiameterInches
, it returns immediately, but if I want to find “the smallest available red widget”, then it crawls.
Is there a better way?
Execution plan XML: https://gist.github.com/RobertBaldini/57c8b61d135cc5c84c38b2da243611ad
DDL: https://gist.github.com/RobertBaldini/3740c7bb85eea47d7fe63cb8602ac2d6
Repo (data loader takes several minutes): https://github.com/RobertBaldini/WidgetSandbox
Answer :
You should retype the various Name columns from nvarchar(max) to nvarchar([right size]). It’s unlikely that names will be up to 2GB in length, and making them max sized prevents them being used as a key in an index. A good general rule of thumb is to avoid large object data types wherever possible.
You’ll probably need to make that change to the EF code, but in T-SQL:
-- Guessing at 100 characters maximum
ALTER TABLE dbo.[Status]
ALTER COLUMN Name nvarchar(100) NOT NULL;
ALTER TABLE dbo.Colors
ALTER COLUMN Name nvarchar(100) NOT NULL;
ALTER TABLE dbo.Sizes
ALTER COLUMN Name nvarchar(100) NOT NULL;
ALTER TABLE dbo.Widgets
ALTER COLUMN Name nvarchar(100) NOT NULL;
I have also changed the column definition to NOT NULL
there.
Anyway, given indexes:
CREATE NONCLUSTERED INDEX IX_dbo_Sizes__DiameterInches
ON dbo.Sizes (DiameterInches);
CREATE NONCLUSTERED INDEX IX_dbo_Widgets__SizeId_ColorCode_StatusId__Name
ON dbo.Widgets (SizeId, ColorCode, StatusId)
INCLUDE (Name);
…you should end up with a plan that avoids lookups and sorts, something like:
With the small number of rows in Status and Colors, it is probably not worth indexing their Name columns right now, but that could change over time. In any case, if the name columns should be unique, you should constrain them to be so using a unique constraint or unique nonclustered index, for example:
CREATE UNIQUE NONCLUSTERED INDEX UQ_dbo_Colors__Name
ON dbo.Colors (Name);
CREATE UNIQUE NONCLUSTERED INDEX UQ_dbo_Status__Name
ON dbo.[Status] (Name);
CREATE UNIQUE NONCLUSTERED INDEX UQ_dbo_Sizes__Name
ON dbo.Sizes (Name);
CREATE UNIQUE NONCLUSTERED INDEX UQ_dbo_Widgets__Name
ON dbo.Widgets (Name);
Second approach
With the same indexes, if you are able to change the SQL, you could also separate out the Color and Status lookups:
DECLARE
@StatusId integer,
@ColorCode nvarchar(6);
SELECT @StatusId = ST.Id
FROM dbo.[Status] AS ST
WHERE ST.Name = N'Available';
SELECT @ColorCode = C.ColorCode
FROM dbo.Colors AS C
WHERE C.Name = N'Red';
SELECT TOP (1)
W.*
FROM dbo.Widgets AS W
JOIN dbo.Sizes AS S
ON W.SizeId = S.Id
WHERE
W.StatusId = @StatusId
AND W.ColorCode = @ColorCode
ORDER BY
S.DiameterInches;
This makes for a simpler set of operations that also happen to make life easier for the optimizer:
Note also that Unicode string literals should be prefixed with N for correct data typing.
Oh, and you should also patch your instance. It is currently SQL Server 2008 Service Pack 1 – Service Pack 4 has been available for some time now.
The problem is that TopN Sort in the top left. It has to pull in all possible rows to then find the smallest one.
If you have an index on DiameterInches, it’s likely to start searching on the smallest ones first, and give you vastly improved performance.
Of course, then you’ll need an index on Widgets.SizeId, so that it can find those easily. And it’ll need to check each one to see if it’s Red and Available. You might find it’s a little over-normalised. If “Available” means StatusId = ‘A’, then use that, for example.
A suitable index might be:
CREATE INDEX ixDiameterInches
ON dbo.Sizes (DiameterInches, Id)
But also consider having:
CREATE INDEX ixSizeStatusColor
ON dbo.Widgets (SizeId, StatusId, ColorCode);
…so that you can quickly find those Available Red small things. You may find that (StatusId, ColorCode, SizeId)
works better if you don’t have many Available Red things.