Question :
Anyone have any suggestions for how to optimize this query?
Running SQL Server 2008..
The object is to look at accounts that were created yesterday, compare their email addresses and phone numbers to existing accounts to identify if it is a duplicate account to one already in the system.
SELECT DISTINCT Customer.CustId
, Customer.PriceClassID
, Customer.Name
, CustomerEDI.S4Future01 as First
, CustomerEDI.S4Future02 as Last
, Addr1
, Addr2
, City
, State
, Zip
, Country
, Customer.emailAddr
, Customer.Phone
, lastorder
, CONVERT(varchar(8000), snotetext)
FROM Customer
WHERE
EMailAddr IN (
SELECT EmailAddr
FROM Customer
WHERE EmailAddr != ''
GROUP BY EmailAddr
HAVING COUNT(*) > 1)
AND EMailAddr IN (
SELECT EMailAddr
FROM Customer
WHERE Crtd_DateTime >= DATEADD(day, -1, CAST(GETDATE() AS DATE))
)
OR Phone IN (
SELECT Phone
FROM Customer
WHERE Phone != ''
GROUP BY Phone
HAVING COUNT(*) > 1)
AND Phone IN (
SELECT Phone
FROM Customer
WHERE Crtd_DateTime >= DATEADD(day, -1, CAST(GETDATE() AS DATE))
)
ORDER BY EMailAddr
, Phone ASC
, Name
, Last
, First ASC;
Answer :
There are multiple things to look at to improve query performance.
- Understand your system load
- Volume of data you are expecting to be in the tables mentioned in query
- Volume of data you are going to fetch from tables mentioned in query
- Indexes applied on tables mentioned in query
- Structure of the query
I can suggest you improvements related to only query structure as I am not fully aware of your system. I am assuming that you have proper indexes applied on your tables.
After looking at your query I can suggest you following things,
- You dont need
DISTINCT
clause here as you are selectingCustomer.CustId
which I assume to be a primary key of Customer table & its always going to be unique which inter makes related row always unique in result set. - You are using too many subqueries which I dont think is a good practice. Try to use SQL joins as much as you can. Plus I think you can merge some of these queries. For example, you are selecting
EmailAddr
from Customer having count greater than 1 & created after yesterday. You can combine both these conditions in one single query as following.
SELECT [EmailAddr] AS [EmailAddr] FROM [Customer] AS [Customer] WHERE [Customer].[EmailAddr] != '' AND [Customer].[Crtd_DateTime] >= DATEADD(day, -1, CAST(GETDATE() AS DATE) GROUP BY [Customer].[EmailAddr] HAVING COUNT(*) > 1
same thing with phone also
SELECT
[Phone] AS [Phone]
FROM
[Customer] AS [Customer]
WHERE
[Customer].[Phone] != ''
AND [Customer].[Crtd_DateTime] >= DATEADD(day, -1, CAST(GETDATE() AS DATE)
GROUP BY
[Customer].[Phone]
HAVING COUNT(*) > 1
You can use CTE here. Define 2 CTEs one for email data & one for phone data. Then you can do inner join with Customer table on email & phone to get correct data. (check my query given below to get better picture)
- Try avoiding functions in your where clause. While querying for email & phone data you are checking for yesterdays date also. Try to store this value in some sql variable first & then use that variable value to compare.
example,
DECLARE @dateTimeToCheck DATETIME set @dateTimeToCheck = DATEADD(day, -1, CAST(GETDATE() AS DATE)
I would also like to suggest following things,
– Try to write simple queries. Divide them into logical unites. This will help you to understand of complex queries easily & it will help you in query optimization also.
– Use alias names for columns you are selecting. This will not improve performance but it will definitely improve readability.
Here is my improved version of your query.
DECLARE @dateTimeToCheck DATETIME
SET @dateTimeToCheck = DATEADD(day, -1, CAST(GETDATE() AS DATE)
;WITH cteEmail AS (
SELECT
[EmailAddr] = [Customer].[EmailAddr]
FROM
[Customer] AS [Customer] WITH(NOLOCK)
WHERE
[Customer].[EmailAddr] != ''
AND [Customer].[Crtd_DateTime] >= @dateTimeToCheck
GROUP BY
[Customer].[EmailAddr]
HAVING COUNT(*) > 1
)
;WITH ctePhone AS (
SELECT
[Phone] = [Customer].[Phone]
FROM
[Customer] AS [Customer]
WHERE
[Customer].[Phone] != ''
AND [Customer].[Crtd_DateTime] >= @dateTimeToCheck
GROUP BY
[Customer].[Phone]
HAVING COUNT(*) > 1
)
SELECT
[CustomerId] = [Customer].CustId,
[PriceClassID] = [Customer].PriceClassID,
[Name] = [Customer].Name ,
[First] = [CustomerEDI].[S4Future01],
[Last] = [CustomerEDI].[S4Future02],
[Address1] = [Customer].[Addr1],
[Address2] = [Customer].[Addr2],
[City] = [Customer].[City],
[State] = [Customer].[State],
[Zip] = [Customer].[Zip],
[Country] = [Customer].[Country],
[CustomerEmail] = [Customer].[emailAddr],
[CustomrePhone] = [Customer].[Phone],
[LastOrder] = [Customer].[lastorder],
[snotetext] = CONVERT(varchar(8000), [Customer].[snotetext])
FROM
[Customer] AS [Customer]
INNER JOIN
[cteEmail] AS [CteEmail] ON [Customer].EmailAddr = [cteEmail].[EmailAddr]
INNER JOIN
[ctePhone] AS [CtePhone] on [Customer].Phone = [ctePhone].[Phone]
ORDER BY
[Customer].[EMailAddr]
, [Customer].[Phone] ASC
, [Customer].[Name]
, [Customer].[Last]
, [Customer].[First] ASC;
I hope it helps.
First of all, if custId
is the Customer
‘s Primary Key, then there’s no need to specify DISTINCT
at all. Second, grouping and then filtering by the grouped key may be very inefficient when the filtering will likely render a small subset: filter first, then group. Third, I assume you don’t require casting to varchar when removing the DISTINCT clause. Fourth, no need to include the alias of every selected field when there’s only one table in the selection, it just makes reading the query more cumbersome!
This query can really benefit from indexes on EMailAddr
and Phone
.
This may perform better, especially if those indexes exist:
SELECT DISTINCT CustId
, PriceClassID
, Name
, S4Future01 as First
, S4Future02 as Last
, Addr1
, Addr2
, City
, State
, Zip
, Country
, emailAddr
, Phone
, lastorder
, snotetext
FROM Customer
WHERE
EMailAddr IN (
SELECT EmailAddr
FROM Customer
WHERE EMailAddr IN (
SELECT EMailAddr
FROM Customer
WHERE EmailAddr > '' AND Crtd_DateTime >= DATEADD(day, -1, CAST(GETDATE() AS DATE))
)
GROUP BY EmailAddr
HAVING COUNT(*) > 1)
OR Phone IN (
SELECT Phone
FROM Customer
WHERE Phone IN (
SELECT Phone
FROM Customer
WHERE Phone > '' AND Crtd_DateTime >= DATEADD(day, -1, CAST(GETDATE() AS DATE))
)
GROUP BY Phone
HAVING COUNT(*) > 1)
ORDER BY EMailAddr
, Phone ASC
, Name
, Last
, First ASC;