Question :
We have a query that destroys our production server when run.
It is part of a reporting functionality and the bad part looks as follows:
SELECT DISTINCT
mt.ID AS ID
FROM
[dbo].[MyTable] mt
WITH (NOLOCK)
WHERE
(@aVariable IS NULL
OR (CONVERT(VARCHAR(22), mt.Date1, 112) >= CONVERT(VARCHAR(22), @date1, 112))
AND (@status IS NULL
OR @status <> 2
OR ( @status = 2
AND ( SELECT COUNT(*)
FROM
MyTable mt2
WITH (NOLOCK)
WHERE
mt2.CaseID = mt.CaseID
AND mt2.Date1 > mt.Date1
) = 0
)
)
AND (@aSecondVariable IS NULL
OR (CONVERT(VARCHAR(22), mt.Date1, 112) <= CONVERT(VARCHAR(22), @date1, 112)))
AND (@aThirdVariable IS NULL
OR (CONVERT(VARCHAR(22), mt.Date2, 112) >= CONVERT(VARCHAR(22), @date2, 112)))
AND (@aFourtVariable IS NULL
OR (CONVERT(VARCHAR(22), mt.Date2, 112) <= CONVERT(VARCHAR(22), @date2, 112)))
Furthermore, the table and indexes are created as follows:
CREATE INDEX [MyIndex] ON [dbo].[MyTable]
([AColumn], [AColumn2], [Date1], [Date2])
WITH (FILLFACTOR = 90)
MyTable
has about 80 columns and a single column Primary Key: (ID)
.
The MyTable mt
consists of about 10.000.000 rows. There is an index on the columns which contain aVariable
, aSecondVariable
, aThirdVariable
and aFourthVariable
. About half of the values of the date columns are null. In the index they are on place 3 and 4.
When we run the query on one server (without users) it performs really well. When we run it on production (with users) it takes too long and times out.
We are wondering how this can be. The execution plans are the same on both servers. We thought that the result might be cached somewhere or that the memory which is free (2GB RAM) is insufficient.
We are no experts on database performance and hope some real DBAs can provide us with their views. Thanks.
Answer :
A few suggestions:
- Remove the
DISTINCT
as theID
is the primary key. There is no way you’ll get duplicate rows in the result. - Do not convert the
datetime
columns. This makes your conditions non-sargable and the query will always do a table scan. The variables would need no conversion either if they are declared as dates but that is not a problem for sargability. - Use close-open ended ranges for date and datetimes, as @Aaron suggests in the comments. Please read his blog post: Bad habits to kick : mis-handling date / range queries .
- Use
NOT EXISTS
instead of(SELECT COUNT(*) ...) = 0
to check if there is no rows matching some condition. - Drop the
WITH (NOLOCK)
hints. Another blog post: Bad habits : PuttingNOLOCK
everywhere and a question in this site with several valuable points: IsNOLOCK
always bad?. - Add appropriate indexes. I’d guess separate indexes on
(Date1)
and(Date2)
or an index on(Date1, Date2)
would be ok but that needs testing. An index on(CaseId, Date1)
will be useful for the subquery. - (Minor note) always use the schema prefixes. Another blog post by @Aaron Bertrand: Bad habits to kick : avoiding the schema prefix.
- Add
OPTION (RECOMPILE)
(as @Mikael Eriksson suggested). This basically tells the optimizer to not rely on cached plans but spend some time for each query run to recompile the query – i.e. produce a new plan according to the new parameter values. With 7 variables that can change the query substantially, this seems a very good option. Read the article from @Paul White for more detailed explanation of the “Parameter Embedding Optimization”, parameter sniffing and also other options and advantages: Parameter Sniffing, Embedding, and theRECOMPILE
Options.
The query rewritten:
SELECT
mt.ID
FROM
[dbo].[MyTable] AS mt
WHERE
(@aVariable IS NULL OR mt.Date1 >= CAST(@date1 AS DATE))
AND (@aSecondVariable IS NULL OR mt.Date1 < DATEADD(day, 1, CAST(@date1 AS DATE)))
AND (@aThirdVariable IS NULL OR mt.Date2 >= CAST(@date2 AS DATE))
AND (@aFourtVariable IS NULL OR mt.Date2 < DATEADD(day, 1, CAST(@date2 AS DATE)))
AND ( @status IS NULL
OR @status <> 2
OR ( @status = 2
AND NOT EXISTS
( SELECT *
FROM dbo.MyTable AS mt2
WHERE mt2.CaseID = mt.CaseID
AND mt2.Date1 > mt.Date1
)
)
)
OPTION (RECOMPILE) ;
- The part about
@status
can be written in a slightly more compact way.
I don’t think this will make much difference (if any) in performance and it may look obfuscated but I add for completeness this version as well:
SELECT
mt.ID
FROM
[dbo].[MyTable] AS mt
WHERE
(@aVariable IS NULL OR mt.Date1 >= CAST(@date1 AS DATE))
AND (@aSecondVariable IS NULL OR mt.Date1 < DATEADD(day, 1, CAST(@date1 AS DATE)))
AND (@aThirdVariable IS NULL OR mt.Date2 >= CAST(@date2 AS DATE))
AND (@aFourtVariable IS NULL OR mt.Date2 < DATEADD(day, 1, CAST(@date2 AS DATE)))
AND NOT EXISTS
( SELECT *
FROM dbo.MyTable AS mt2
WHERE @status = 2
AND mt2.CaseID = mt.CaseID
AND mt2.Date1 > mt.Date1
)
OPTION (RECOMPILE) ;