Performance of query with complex where clause on 10M rows

Posted on

Question :

We have a query that destroys our production server when run.

It is part of a reporting functionality and the bad part looks as follows:

SELECT DISTINCT 
    mt.ID           AS ID
FROM 
    [dbo].[MyTable]    mt
WITH (NOLOCK)
WHERE           
    (@aVariable           IS NULL 
      OR (CONVERT(VARCHAR(22), mt.Date1, 112) >= CONVERT(VARCHAR(22), @date1, 112))
    AND (@status            IS NULL 
    OR @status <> 2 
    OR  (   @status = 2 
        AND (  SELECT COUNT(*) 
               FROM 
               MyTable mt2
               WITH (NOLOCK)
               WHERE 
                  mt2.CaseID = mt.CaseID 
                  AND mt2.Date1 > mt.Date1
             ) = 0
        )
    )
    AND (@aSecondVariable IS NULL 
           OR (CONVERT(VARCHAR(22), mt.Date1, 112) <= CONVERT(VARCHAR(22), @date1, 112)))
    AND (@aThirdVariable  IS NULL 
           OR (CONVERT(VARCHAR(22), mt.Date2, 112) >= CONVERT(VARCHAR(22), @date2, 112)))
    AND (@aFourtVariable  IS NULL 
          OR (CONVERT(VARCHAR(22), mt.Date2, 112) <= CONVERT(VARCHAR(22), @date2, 112)))

Furthermore, the table and indexes are created as follows:

CREATE INDEX [MyIndex] ON [dbo].[MyTable]
    ([AColumn], [AColumn2], [Date1], [Date2]) 
WITH (FILLFACTOR = 90)

MyTable has about 80 columns and a single column Primary Key: (ID).

The MyTable mt consists of about 10.000.000 rows. There is an index on the columns which contain aVariable, aSecondVariable, aThirdVariable and aFourthVariable. About half of the values of the date columns are null. In the index they are on place 3 and 4.

When we run the query on one server (without users) it performs really well. When we run it on production (with users) it takes too long and times out.

We are wondering how this can be. The execution plans are the same on both servers. We thought that the result might be cached somewhere or that the memory which is free (2GB RAM) is insufficient.

We are no experts on database performance and hope some real DBAs can provide us with their views. Thanks.

Answer :

A few suggestions:

  • Remove the DISTINCT as the ID is the primary key. There is no way you’ll get duplicate rows in the result.
  • Do not convert the datetime columns. This makes your conditions non-sargable and the query will always do a table scan. The variables would need no conversion either if they are declared as dates but that is not a problem for sargability.
  • Use close-open ended ranges for date and datetimes, as @Aaron suggests in the comments. Please read his blog post: Bad habits to kick : mis-handling date / range queries .
  • Use NOT EXISTS instead of (SELECT COUNT(*) ...) = 0 to check if there is no rows matching some condition.
  • Drop the WITH (NOLOCK) hints. Another blog post: Bad habits : Putting NOLOCK everywhere and a question in this site with several valuable points: Is NOLOCK always bad?.
  • Add appropriate indexes. I’d guess separate indexes on (Date1) and (Date2) or an index on (Date1, Date2) would be ok but that needs testing. An index on (CaseId, Date1) will be useful for the subquery.
  • (Minor note) always use the schema prefixes. Another blog post by @Aaron Bertrand: Bad habits to kick : avoiding the schema prefix.
  • Add OPTION (RECOMPILE) (as @Mikael Eriksson suggested). This basically tells the optimizer to not rely on cached plans but spend some time for each query run to recompile the query – i.e. produce a new plan according to the new parameter values. With 7 variables that can change the query substantially, this seems a very good option. Read the article from @Paul White for more detailed explanation of the “Parameter Embedding Optimization”, parameter sniffing and also other options and advantages: Parameter Sniffing, Embedding, and the RECOMPILE Options.

The query rewritten:

SELECT  
    mt.ID 
FROM 
    [dbo].[MyTable] AS mt
WHERE 
      (@aVariable       IS NULL OR mt.Date1 >= CAST(@date1 AS DATE))
  AND (@aSecondVariable IS NULL OR mt.Date1 < DATEADD(day, 1, CAST(@date1 AS DATE)))
  AND (@aThirdVariable  IS NULL OR mt.Date2 >= CAST(@date2 AS DATE))
  AND (@aFourtVariable  IS NULL OR mt.Date2 < DATEADD(day, 1, CAST(@date2 AS DATE)))
  AND ( @status IS NULL 
     OR @status <> 2 
     OR ( @status = 2 
          AND NOT EXISTS
              ( SELECT * 
                FROM dbo.MyTable AS mt2
                WHERE mt2.CaseID = mt.CaseID 
                  AND mt2.Date1 > mt.Date1
              )
        ) 
      ) 
OPTION (RECOMPILE) ;
  • The part about @status can be written in a slightly more compact way.

I don’t think this will make much difference (if any) in performance and it may look obfuscated but I add for completeness this version as well:

SELECT  
    mt.ID 
FROM 
    [dbo].[MyTable] AS mt
WHERE 
      (@aVariable       IS NULL OR mt.Date1 >= CAST(@date1 AS DATE))
  AND (@aSecondVariable IS NULL OR mt.Date1 < DATEADD(day, 1, CAST(@date1 AS DATE)))
  AND (@aThirdVariable  IS NULL OR mt.Date2 >= CAST(@date2 AS DATE))
  AND (@aFourtVariable  IS NULL OR mt.Date2 < DATEADD(day, 1, CAST(@date2 AS DATE)))
  AND NOT EXISTS
      ( SELECT * 
        FROM dbo.MyTable AS mt2
        WHERE @status = 2 
          AND mt2.CaseID = mt.CaseID 
          AND mt2.Date1 > mt.Date1
      )
OPTION (RECOMPILE) ;

Leave a Reply

Your email address will not be published.