Question :
Below query never completes and runs for over 14 hours:
SELECT
U.Age, U.CreationDate, U.DisplayName,
P.AcceptedAnswerId, P.ClosedDate, p .Title, P.ViewCount,
C.CreationDate,C.Score ,C.UserId
FROM dbo.Users As U
LEFT JOIN dbo.Posts As P on U.id = P.id
LEFT JOIN dbo.Comments As C on U.id =C.id
WHERE
(
U.Location = 'India'
OR U.Location ='United Kingdom'
)
AND U.LastAccessDate>= '04/02/2018'
AND U.LastAccessDate< '06/02/2018'
AND P.LastActivityDate>= '04/02/2018'
AND P.LastActivityDate< '06/02/2018'
AND U.LastAccessDate>= '04/02/2018'
AND U.LastAccessDate< '06/02/2018'
AND U.Age>10
AND U.Age is NOT NULL
How can I improve and speed it up ?
Answer :
The biggest problem I see is in your joins. You’re joining the “id” of Users, to the “id” of Posts and Comments. This basically says, take every record in Users and join it to every record in Posts, then join it to every record in Comments.
Presuming that the UserID exists as a foreign key in Posts and Comments, the code below should work. If not, you’ll need to determine the correct foreign keys for the JOIN
statements.
SELECT
U.Age,
U.CreationDate,
U.DisplayName,
P.AcceptedAnswerId,
P.ClosedDate,
P.Title,
P.ViewCount,
C.CreationDate,
C.Score,
C.UserId
FROM
dbo.Users AS U
LEFT OUTER JOIN dbo.Posts AS P
ON U.UserID = P.UserID
LEFT OUTER JOIN dbo.Comments AS C
ON U.UserID = C.UserID
WHERE
U.Location IN ('India','United Kingdom')
AND U.LastAccessDate>= '04/02/2018'
AND U.LastAccessDate< '06/02/2018'
AND P.LastActivityDate>= '04/02/2018'
AND P.LastActivityDate< '06/02/2018'
AND U.Age>10
I replaced the location OR
statement with an IN
.
I removed the duplicated U.LastAccessDate
conditions.
I removed the unnecessary U.Age IS NOT NULL
condition.
If this query still runs slowly. My next best guess would be a missing index on Users.LastAccessDate and/or Posts.LastActivityDate.
Here what you need to do:
1) use #Temp table
2) break down query on parts, get data step by step
3) make sure columns which you use in the WHERE or join ON clause, are indexed
Temp Table – column data types below are just an example, use data types that exactly match column data types on your tables
create table #UserInfo ( tID int primary key clustered, tAge tinyint, tCreationDate datetime, tDisplayName varchar(50), tAcceptedAnswerID int, tClosedDate datetime, tTitle varchar(50), tViewCount int, tCreationDate datetime, tScore int, tUserID int)
Get Users info – make sure Location
, Age
, LastAccessDate
columns are indexed on Users table
-- get users info insert into #UserInfo (tID, tAge, tCreationDate, tDisplayDate) select U.ID, U.Age, U.CreationDate, U.DisplayName, from dbo.Users As U where (U.Location = 'India' OR U.Location ='United Kingdom' ) and U.Age > 10 and U.Age is NOT NULL and U.LastAccessDate >= '04/02/2018' and U.LastAccessDate-- Get Posts Info - make sure ID, LastActivityDate columns on Posts table are indexed
-- get posts info update #UserInfo set tAcceptedAnswerID = P.AcceptedAnswerID, tClosedDate = P.ClosedDate, tTitle = P.Title, tViewCount = P.ViewCount from #UserInfo join dbo.Posts As P on tID = p.id and P.LastActivityDate>= '04/02/2018' and P.LastActivityDate< '06/02/2018'
-- Get Comments info - make sure ID column on Comments table is indexed
-- get comments info update #UserInfo set tCreationDate = C.CreationDate, tScore = C.Score, tUserID = C.UserId from #UserInfo join dbo.Comments As C on U.id =C.id
-- show data select tAge [Age], tCreationDate [CreationDate], tDisplayName [DisplayName],
tAcceptedAnswerID [AcceptedAnswerID], tClosedDate [ClosedDate], tTitle [Title], tViewCount [ViewCount], tCreationDate [CreationDate], tScore [Score], tUserID [UserID]
from #UserInfo