How to optimize this SQL Server query?

Posted on

Question :

Anyone have any suggestions for how to optimize this query?
Running SQL Server 2008..

The object is to look at accounts that were created yesterday, compare their email addresses and phone numbers to existing accounts to identify if it is a duplicate account to one already in the system.

SELECT DISTINCT Customer.CustId
    , Customer.PriceClassID
    , Customer.Name
    , CustomerEDI.S4Future01 as First
    , CustomerEDI.S4Future02 as Last
    , Addr1
    , Addr2
    , City
    , State
    , Zip
    , Country
    , Customer.emailAddr
    , Customer.Phone
    , lastorder
    , CONVERT(varchar(8000), snotetext)  
FROM Customer
WHERE 
    EMailAddr IN ( 
        SELECT EmailAddr 
        FROM Customer 
        WHERE EmailAddr != '' 
        GROUP BY EmailAddr 
        HAVING COUNT(*) > 1) 
        AND EMailAddr IN ( 
            SELECT EMailAddr 
            FROM Customer 
            WHERE Crtd_DateTime >= DATEADD(day, -1, CAST(GETDATE() AS DATE))
            ) 
    OR Phone IN ( 
        SELECT Phone 
        FROM Customer 
        WHERE Phone != '' 
        GROUP BY Phone 
        HAVING COUNT(*) > 1) 
        AND Phone IN ( 
            SELECT Phone 
            FROM Customer 
            WHERE Crtd_DateTime >= DATEADD(day, -1, CAST(GETDATE() AS DATE))
            )
ORDER BY EMailAddr
    , Phone ASC
    , Name
    , Last
    , First ASC;

Answer :

There are multiple things to look at to improve query performance.

  • Understand your system load
  • Volume of data you are expecting to be in the tables mentioned in query
  • Volume of data you are going to fetch from tables mentioned in query
  • Indexes applied on tables mentioned in query
  • Structure of the query

I can suggest you improvements related to only query structure as I am not fully aware of your system. I am assuming that you have proper indexes applied on your tables.

After looking at your query I can suggest you following things,

  • You dont need DISTINCT clause here as you are selecting Customer.CustId which I assume to be a primary key of Customer table & its always going to be unique which inter makes related row always unique in result set.
  • You are using too many subqueries which I dont think is a good practice. Try to use SQL joins as much as you can. Plus I think you can merge some of these queries. For example, you are selecting EmailAddr from Customer having count greater than 1 & created after yesterday. You can combine both these conditions in one single query as following.
    SELECT 
        [EmailAddr] AS [EmailAddr] 
    FROM 
        [Customer] AS [Customer]
    WHERE 
        [Customer].[EmailAddr] != '' 
        AND [Customer].[Crtd_DateTime] >= DATEADD(day, -1, CAST(GETDATE() AS DATE)
    GROUP BY 
        [Customer].[EmailAddr]
    HAVING COUNT(*) > 1

same thing with phone also

SELECT 
    [Phone] AS [Phone]
FROM 
    [Customer] AS [Customer]
WHERE 
    [Customer].[Phone] != '' 
    AND [Customer].[Crtd_DateTime] >= DATEADD(day, -1, CAST(GETDATE() AS DATE)
GROUP BY 
    [Customer].[Phone]
HAVING COUNT(*) > 1

You can use CTE here. Define 2 CTEs one for email data & one for phone data. Then you can do inner join with Customer table on email & phone to get correct data. (check my query given below to get better picture)

  • Try avoiding functions in your where clause. While querying for email & phone data you are checking for yesterdays date also. Try to store this value in some sql variable first & then use that variable value to compare.
    example,
    DECLARE @dateTimeToCheck DATETIME
    set @dateTimeToCheck = DATEADD(day, -1, CAST(GETDATE() AS DATE)

I would also like to suggest following things,
– Try to write simple queries. Divide them into logical unites. This will help you to understand of complex queries easily & it will help you in query optimization also.
– Use alias names for columns you are selecting. This will not improve performance but it will definitely improve readability.

Here is my improved version of your query.

DECLARE @dateTimeToCheck DATETIME
SET @dateTimeToCheck = DATEADD(day, -1, CAST(GETDATE() AS DATE)

;WITH cteEmail AS (
    SELECT 
        [EmailAddr] = [Customer].[EmailAddr] 
    FROM 
        [Customer] AS [Customer] WITH(NOLOCK)
    WHERE 
        [Customer].[EmailAddr] != '' 
        AND [Customer].[Crtd_DateTime] >= @dateTimeToCheck
    GROUP BY 
        [Customer].[EmailAddr]
    HAVING COUNT(*) > 1
)

;WITH ctePhone AS (
    SELECT 
        [Phone] = [Customer].[Phone]
    FROM 
        [Customer] AS [Customer]
    WHERE 
        [Customer].[Phone] != '' 
        AND [Customer].[Crtd_DateTime] >= @dateTimeToCheck
    GROUP BY 
        [Customer].[Phone]
    HAVING COUNT(*) > 1
)

SELECT 
    [CustomerId] = [Customer].CustId,
    [PriceClassID] = [Customer].PriceClassID,
    [Name] = [Customer].Name ,
    [First] = [CustomerEDI].[S4Future01],
    [Last] = [CustomerEDI].[S4Future02],
    [Address1] = [Customer].[Addr1],
    [Address2] = [Customer].[Addr2],
    [City] = [Customer].[City],
    [State] = [Customer].[State],
    [Zip] = [Customer].[Zip],
    [Country] = [Customer].[Country],
    [CustomerEmail] = [Customer].[emailAddr],
    [CustomrePhone] = [Customer].[Phone],
    [LastOrder] = [Customer].[lastorder],
    [snotetext] = CONVERT(varchar(8000), [Customer].[snotetext])
FROM 
    [Customer] AS [Customer] 
INNER JOIN 
    [cteEmail] AS [CteEmail] ON [Customer].EmailAddr = [cteEmail].[EmailAddr]
INNER JOIN 
    [ctePhone] AS [CtePhone] on [Customer].Phone = [ctePhone].[Phone]
ORDER BY 
    [Customer].[EMailAddr]
    , [Customer].[Phone] ASC
    , [Customer].[Name]
    , [Customer].[Last]
    , [Customer].[First] ASC;

I hope it helps.

First of all, if custId is the Customer‘s Primary Key, then there’s no need to specify DISTINCT at all. Second, grouping and then filtering by the grouped key may be very inefficient when the filtering will likely render a small subset: filter first, then group. Third, I assume you don’t require casting to varchar when removing the DISTINCT clause. Fourth, no need to include the alias of every selected field when there’s only one table in the selection, it just makes reading the query more cumbersome!

This query can really benefit from indexes on EMailAddr and Phone.

This may perform better, especially if those indexes exist:

SELECT DISTINCT CustId
    , PriceClassID
    , Name
    , S4Future01 as First
    , S4Future02 as Last
    , Addr1
    , Addr2
    , City
    , State
    , Zip
    , Country
    , emailAddr
    , Phone
    , lastorder
    , snotetext
FROM Customer
WHERE 
    EMailAddr IN ( 
        SELECT EmailAddr 
        FROM Customer 
        WHERE EMailAddr IN ( 
            SELECT EMailAddr 
            FROM Customer 
            WHERE EmailAddr > '' AND Crtd_DateTime >= DATEADD(day, -1, CAST(GETDATE() AS DATE))
            ) 
        GROUP BY EmailAddr 
        HAVING COUNT(*) > 1) 
    OR Phone IN ( 
        SELECT Phone 
        FROM Customer 
        WHERE Phone IN ( 
            SELECT Phone 
            FROM Customer 
            WHERE Phone > '' AND Crtd_DateTime >= DATEADD(day, -1, CAST(GETDATE() AS DATE))
            )
        GROUP BY Phone 
        HAVING COUNT(*) > 1) 
ORDER BY EMailAddr
    , Phone ASC
    , Name
    , Last
    , First ASC;

Leave a Reply

Your email address will not be published. Required fields are marked *