Question :
Consider the query given below:
SELECT * FROM ((SELECT c.cust_id ,
c.username ,
REPLACE(qs.seg_type_ref_key_02, 'N/A', 'Non Vip') casino_group,
REPLACE(qs.seg_type_ref_key_03, 'N/A', 'Non Vip') bingo_group,
REPLACE(qs.seg_type_ref_key_04, 'N/A', 'Non Vip') games_group,
REPLACE(qs.seg_type_ref_key_12, 'N/A', 'Non Vip') poker_group,
REPLACE(qs.seg_type_ref_key_01, 'N/A', 'Non ViP') sportsbook_group,
c.country
, c.contactable
, c.email
, c.dob
, c.[status]
, c.first_name
, c.last_name
, c.[master]
, c.[language]
, c.gender
FROM warehouse.dbo.dim_customer c (nolock)
INNER JOIN warehouse . dbo . dim_segmentationcodehistory ( nolock )sc
ON sc . cust_id = c . cust_id
INNER JOIN warehouse . dbo . q_dim_segments qs ( nolock )
ON sc . seg_code_ref_key = qs . seg_code_ref_key
WHERE SC.active=1 and qs.seg_type_ref_key_04 <>'N/A' AND c.active = 1
and (qs.seg_type_ref_key_02 <> 'Prospect' and qs.seg_type_ref_key_03 <> 'Prospect' and qs.seg_type_ref_key_04 <> 'Prospect' and qs.seg_type_ref_key_12 <> 'Prospect' and qs.seg_type_ref_key_01 <> 'Prospect')) A
LEFT JOIN( SELECT c.cust_id cust_dup,
SUM(fc.turnover) AS Turnover_GBP,
SUM(fc.grosswin) AS GrossWin_GBP,
SUM(fc.chip_purch_amount_gbp) AS chip_purch_amount_gbp
FROM warehouse.dbo.fact_games fc (nolock)
INNER JOIN warehouse.dbo.dim_date d (nolock)
ON d.date_key = fc.date_ref_key
INNER JOIN warehouse.dbo.dim_customer c (nolock)
ON c.cust_ref_key = fc.cust_ref_key
INNER JOIN warehouse.dbo.dim_gamesgame gg(nolock)
ON gg.games_game_ref_key = fc.game_ref_key
WHERE d.[date] between getdate()- 10 AND getdate()-9
AND gg.Game_Group_Description <> 'Bingo'
GROUP BY c.cust_id )B
ON A.cust_id = B.cust_dup)
This query takes a little more than an hour. However, I require that this completes in as little time as possible.
Below is the level to which I have been able to optimize it:
IF OBJECT_ID('tempdb..#temp_shash_A') IS NOT NULL
DROP TABLE #temp_shash_A
IF OBJECT_ID('tempdb..#temp_shash_B') IS NOT NULL
DROP TABLE #temp_shash_B
-- A
(SELECT c.cust_id ,
c.username ,
REPLACE(qs.seg_type_ref_key_02, 'N/A', 'Non Vip') casino_group,
REPLACE(qs.seg_type_ref_key_03, 'N/A', 'Non Vip') bingo_group,
REPLACE(qs.seg_type_ref_key_04, 'N/A', 'Non Vip') games_group,
REPLACE(qs.seg_type_ref_key_12, 'N/A', 'Non Vip') poker_group,
REPLACE(qs.seg_type_ref_key_01, 'N/A', 'Non ViP') sportsbook_group,
c.country
, c.contactable
, c.email
, c.dob
, c.[status]
, c.first_name
, c.last_name
, c.[master]
, c.[language]
, c.gender
INTO #temp_shash_A
FROM warehouse.dbo.dim_customer c (nolock)
INNER JOIN warehouse . dbo . dim_segmentationcodehistory ( nolock )sc
ON sc . cust_id = c . cust_id
INNER JOIN warehouse . dbo . q_dim_segments qs ( nolock )
ON sc . seg_code_ref_key = qs . seg_code_ref_key
WHERE SC.active=1 and qs.seg_type_ref_key_04 <>'N/A' AND c.active = 1
and (qs.seg_type_ref_key_02 <> 'Prospect' and qs.seg_type_ref_key_03 <> 'Prospect' and qs.seg_type_ref_key_04 <> 'Prospect' and qs.seg_type_ref_key_12 <> 'Prospect' and qs.seg_type_ref_key_01 <> 'Prospect')
)
create clustered index S_1 on #temp_shash_A (cust_id)
-- B
( SELECT c.cust_id cust_dup,
SUM(fc.turnover) AS Turnover_GBP,
SUM(fc.grosswin) AS GrossWin_GBP,
SUM(fc.chip_purch_amount_gbp) AS chip_purch_amount_gbp
INTO #temp_shash_B
FROM warehouse.dbo.fact_games fc (nolock)
INNER JOIN warehouse.dbo.dim_date d (nolock)
ON d.date_key = fc.date_ref_key
INNER JOIN warehouse.dbo.dim_customer c (nolock)
ON c.cust_ref_key = fc.cust_ref_key
INNER JOIN warehouse.dbo.dim_gamesgame gg(nolock)
ON gg.games_game_ref_key = fc.game_ref_key
WHERE d.[date] between getdate()- 10 AND getdate()-9
AND gg.Game_Group_Description <> 'Bingo'
GROUP BY c.cust_id )
create clustered index S_2 on #temp_shash_B (cust_dup)
SELECT * FROM #temp_shash_A A
LEFT JOIN #temp_shash_B B
ON A.cust_id = B.cust_dup
This took just around 5-6 minutes when ran initially. However, it took around 35 minutes when ran today. Can anyone suggest a way for me to optimize this? Any help appreciated.
PS: I’m working on SQL Server 2008 R2 DB. The query is a dataset query for an SSRS report.
Answer :
from some research I have found the <>
or does not equal
takes longer to run than =
operations
in this answer to SQL Server “<>” operator is very slow compared to “=” on table with a few million rows
the Poster goes into Detail about the <>
operator and why it takes longer than the =
operator.
the Poster suggests a LEFT JOIN
for the OP’s Query that would make it so that they shouldn’t have to use the <>
operator.
if you could do something similar to weed out the records that you don’t want using a LEFT JOIN
or something of the sort. I think the post that I linked to should set you in the right direction.
P.S.
it sounds like the same thing happens when you use NOT IN