I am trying to determine why the query optimizer in SQL Server is recommending creating a new index instead of using an existing index that appears to be sufficient for the query.
First the table. Column names changed to protect the innocent 🙂
CREATE TABLE [myTable] ( [id] [int] IDENTITY(1,1) NOT NULL, [serialNumber] [varchar](12) NOT NULL, [sName] [varchar](64) NOT NULL, [meanValue] [int] NOT NULL, [range] [int] NOT NULL, [modifiedDate] [datetime] NOT NULL, CONSTRAINT [PK_myTable] PRIMARY KEY CLUSTERED ( [id] ASC ) )
Create the index in question:
CREATE NONCLUSTERED INDEX [IDX_myIndex] ON [myTable] ([serialNumber], [sName], [meanValue], [range]) INCLUDE ([modifiedDate])
Add data for testing using your generator of choice 😉 Run the following query (table only has a few million records)
SELECT TOP 1000 [serialNumber], [sName], [meanValue], [range], [modifiedDate] FROM [myTable] WHERE [serialNumber] = 137802 AND [sName] = 'A Name'
The query optimizer recommends using a new index where the additional where clauses are covered in the INCLUDE instead of part of the key:
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[myTable] ([sName]) INCLUDE ([serialNumber],[meanValue],[range],[modifiedDate])
I was under the impression that a broader index that encompasses more columns will be used as the index as long as the order of the WHERE clauses represents the order of the columns indexed.
If I also WHERE on the modifiedData the index gets used and the query optimizer doesn’t complain:
SELECT TOP 1000 [serialNumber], [sName], [meanValue], [range], [modifiedDate] FROM [myTable] WHERE [serialNumber] = 137802 AND [sName] = 'A Name' AND ([modifiedDate] >= '2000-04-25' AND [modifiedDate] < '2019-04-30')
The DBA link
SQL Server 2008R2 – Why is my index not used suggests a closer correlation between the index key and includes with the SELECT statement helps determine index use (but in my example they are basically the same). I have lots of rows, which probably satisfies the row use probability test, and there are no NULLs – thus negating the NULL effect of the index.
I thought, maybe incorrectly, that an index
A, B, C, D would cover an query where
A, B, C, or
A, B, or
A would be run. Is this assumption wrong? I realize there could be edge conditions that throw this basic concept off-kilter, but at a fundamental level, isn’t this roughly how it’s supposed to work?
Thanks in advance for any help, pointing out the stupidity in my ways, recognizing I need to go (back) to DB school etc… 🙂
Your index is seemingly fine and good (i.e. covering) for the query and it should be used. The real problem is the query itself and specifically this condition which hides an implicit conversion:
WHERE [serialNumber] = 137802
According to SQL Server’s datatype precedence, when two values of different datatypes are compared, the value with the datatype of lower precedence is converted to the datatype of the higher precedence. Unfortunately,
int is higher in the list than
varchar. This blows up any hope of using the index as the column (
serialNumber) values are converted to integers. The column being the 1st position of the index, leads the optimizer to not use that index and search for an alternative (and thus the suggestion.)
Solution is not to have any implicit or explicit conversions of columns in
WHERE condition. Simply use:
WHERE [serialNumber] = '137802'