Question :
I have a table like this:
CREATE TABLE TestTable
(
[TestTableID] [int] IDENTITY(1,1) NOT NULL,
[IntField1] [int] NOT NULL,
[IntField2] [int] NOT NULL,
[IntField3] [int] NOT NULL,
[IntField4] [int] NOT NULL,
[IntField5] [int] NOT NULL,
[DateField1] [datetime] NOT NULL,
[IntField6] [int] NOT NULL,
[IntField7] [int] NOT NULL,
[TextField1] [nvarchar](300) NULL,
[DateField2] [datetime] NULL,
[TextField2] [nvarchar](300) NULL,
[DateField3] [datetime] NULL,
[BoolField1] [bit] NULL
)
I have created an index like this:
CREATE NONCLUSTERED INDEX IX_TestTable_DateField1
ON TestTable(DateField1);
And now I have this query:
DECLARE @startDate DATETIME = '20190101'
, @endDate DATETIME = '20200101'
SELECT [TestTableID],
[IntField1],
[IntField2],
[IntField3],
[IntField4],
[IntField5],
[DateField1],
[IntField6],
[IntField7],
[TextField1],
[DateField2],
[TextField2],
[DateField3],
[BoolField1]
FROM TestTable
WHERE DateField1 >= @startDate
AND DateField1 < @endDate
This table has nearly 10,000,000 records, and this query will return nearly 10,000 records.
Now, I expected the query to at least use my index IX_TestTable_DateField1 (Index Scan + Key Lookup) but it is doing a Clustered Index Scan (on PK field). I think this is because the query is returning all fields for the table.
My previous thinking was:
- If the index has INCLUDEd all fields, so the SqlServer will do an Index Seek;
- If not included all fields, but if the field is used in WHERE or ORDER, it will use Index Scan + Key Lookup;
- If neither 1 or 2, it will do a Clustered Index Scan;
Is this correct? Why an “Index Scan + Key Lookup” is not happening?
Answer :
Here’s what Microsoft says about Optimizing SELECT statements
The SQL Server Query Optimizer is a cost-based optimizer. Each
possible execution plan has an associated cost in terms of the amount
of computing resources used. The Query Optimizer must analyze the
possible plans and choose the one with the lowest estimated cost.[…]
The SQL Server Query Optimizer does not choose only the execution plan
with the lowest resource cost; it chooses the plan that returns
results to the user with a reasonable cost in resources and that
returns the results the fastest. For example, processing a query in
parallel typically uses more resources than processing it serially,
but completes the query faster.
As you can see the Query Optimizer will choose the plan it expects to obtain the most eficient execution. Sometimes using an (Index Scan + Key Lookup) isn’t the most eficient method.
As a test, you could compare the execution plan STATISTICS TIME and STATISTICS IO results you’re getting now with the ones generated for your query forcing it to use the index with a query hint. Please, notice that I’m not advising you to use this hint as a solution, but as a way of comparing what would the execution performance be like if it used the index as you wish.
For further reading, Benjamin Nevarez’s article brings some good information: The SQL Server Query Optimizer
SQL server can definitely do an index seek and then a lookup, even when you don’t cover the query.
The optimizer has no idea what values you have in your variables (that is how variables work). So it has to guess the selectivity. You can look at the actual execution plan and see how many rows it guesses. Apparently it guesses so many rows so it decides that it is better to do a table scan (cl ix scan).
If you add OPTION(RECOMPILE) at the end of the query, you should see a different estimated selectivity, and potentially usage of the index (all based on what selectivity you have in the end).
Also, if you have literals (values are known) or stored procedure parameters (values are sniffed), you’ll see how it will estimate differently.
The issue you are observing has to do with the way SQL Server determines the best way to execute a query and is called parameter sniffing.
Reference Reading
- Parameter Sniffing in Action (Brent Ozar)
- SQL SERVER – Parameter Sniffing Simplest Example (SQL Server Authority)
What?
When you executed your query for the fist time the SQL Serer Query Optimizer with the help of the Cardinality Estimator used the statistics of the available indexes to determine which way to retrieve the data you were then requesting.
Now if the values you passed to the variables were initially like the following:
DECLARE @startDate DATETIME = '19010101'
, @endDate DATETIME = '20200101'
..then the Query Optimizer quickly determined that it would need to scan all the data in the index IX_TestTable_DateField1
to retrieve all rows (or possibly even slightly more than 50% of the whole data) to efficiently fulfil the statement.
Instead of using the IX_TestTable_DateField1
index to retrieve all the records to match your query the query optimizer instead chose to read the Clustered Index because the Clustered Index is in effect the data. (Why read a non-clustered index to then retrieve the data, when you can read the clustered index instead and already have the data).
Because this was the first run, the Query Optimizer stored the execution plan (which was optimized for the initial values) in the plan cache.
Whenever a new query (with different values) hits the server, the QO will see that it already has an execution plan that meets the requirements of the query.
The major setback is that the query plan was optimized for the initial values and when you provide new values for the parameters, then the QO is not going to create a new query plan, because that is “expensive”.
Solution
If you want your query to use the index, then you are going to have to either:
- execute
WITH RECOMPILE
- execute the query
WITH OPTIMIZE FOR @startDate = '<value>', @endDate = '<value>'
- execute the query
WITH OPTIMIZE FOR UNKNOWN
- remove the query plan from the plan cache
After which you might observe that the data is retrieved using your index.
Reference Reading
- Query Hints (Tranact SQL) (Microsoft | SQL Docs)