Question :
I have an insert statement, extracting data through multiple inner joins from around 8 tables. It is getting into suspended state then runnable and vice versa forever; never coming back to running state.
Here is the estimated execution plan: https://www.brentozar.com/pastetheplan/?id=Sy6gPl-6L
While running the trace on this particular process, I see, it holds lock on tempdb and also on one user database but not running any statements on them. Simply its state is changing in between suspended & runnable. When I kill and re-run, it is getting executed normally. I see there are no processes or jobs conflicting with this insert statement.
The wait types are “io_completion” and “sos_scheduler_yield” while the query got stuck. I have used sys.syprocesses and sp_who2 active also dm_exec_requests to monitor. I have run trace while the process is running, once the process went into suspended state showing up “io_completion” to “sos_scheduler_yield” and vice versa.
I am not able to figure out why it is happening. Could you please put some light on it and advise any solutions.
Answer :
Explanation of behavior
Some of the causes of the IO_COMPLETION
wait type are:
- Writing intermediate sort buffers to disk (these are called ‘Bobs’)
- Reading and writing sort results from/to disk during a sort spill
There are two sorts that could be spilling, which uses tempdb.
A source of the slowness could also be one the unfortunate “many to many merge join” in the middle of the execution plan (which also uses tempdb, although it doesn’t cause IO_COMPLETION
waits):
The estimated plan shows ~2 GB of data coming out of that merge join – and that amount of data could be even higher if the estimates are off.
You mentioned the problem is intermittent, which could be because of tempdb contention (if other queries are running at the time).
Suggestions for improvement
Temp table rewrite
The best option I can think of would be to break the query up into smaller chunks. For instance, you could just select the trans_header
rows that meet your where clause into a #temp
table. Then use that temp table in the main query instead of trans_header
.
This could improve estimates and let the optimizer produce a better plan – potentially avoiding any spills or other tempdb activity.
That, generally, looks like this. Sorry if there are any slight typos, it’s tough not having intellisense
SELECT
a.[term_id],
a.[trans_ref_no],
a.supplier_no,
a.cust_no,
a.trans_id,
a.carrier,
a.folio_yr,
a.folio_mo,
a.folio_no,
a.transaction_date
INTO #trans_header_temp
FROM [TOPHAT].[trans_header] AS a with (nolock)
WHERE
a.folio_yr=year(getdate())-1
AND a.transaction_date <>' 00000';
INSERT INTO [TOPHAT].[terminal_volume]
([term_id]
,[terminal_name]
,[folio_yr]
,[folio_mo]
,[folio_no]
,[folio_year_month]
,[product_id]
,[prod_name]
,[product_id_name]
,[supplier_no]
,[supplier_name]
,[supplier_no_name]
,[customer_no]
,[customer_name]
,[carrier_no]
,[carrier_name]
,[bbl]
,[gallon]
,[customer_no_name]
,[trans_code]
,[transaction_description]
,[transaction_code_description]
,[transaction_year]
,[transaction_month]
,[transaction_date]
,[getdate])
select a.[term_id],
tp.name as terminal_name,
a.folio_yr,
a.folio_mo,
a.folio_no,
cast(a.folio_yr + '-' + a.folio_mo +'-01' as date) as folio_year_month,
p.prod_id as product_id,
p.prod_name as prod_name,
p.prod_id+' '+p.prod_name as product_id_name,
s.supplier_no,
s.supplier_name,
s.supplier_no+' '+s.supplier_name as supplier_no_name,
cu.cust_no as customer_no,
cu.cust_name as customer_name,
a.carrier as carrier_no,
ca.name as carrier_name,
sum((convert(decimal(12,2),b.net)*(case when b.sign is null then 1 else -1 end))/42) as 'bbl',
sum(convert(decimal(12,2),b.net)*(case when b.sign is null then 1 else -1 end)) as gallon,
cu.cust_no+' '+ cu.cust_name as customer_no_name,
tv.trans_code,
tv.trans_desc as transaction_description,
tv.trans_code+' '+tv.trans_desc as transaction_code_description,
'20'+ left(a.transaction_date, 2) as transaction_year,
substring(a.transaction_date, 3, 2) as transaction_month,
case when isdate(a.transaction_date) = 0 then null
else
cast(a.transaction_date as date) end as transaction_date,
cast(getdate() as date) as [getdate]
from
#trans_header_temp as a with (nolock)
inner join [TOPHAT].[terminal_profile] as tp with (nolock)
on a.term_id = tp.term_id
inner join [TOPHAT].[trans_products] as b with (nolock)
on a.[trans_ref_no] = b.[trans_ref_no]
inner join [TOPHAT].[product] as p with (nolock)
on p.term_id = a.term_id
and p.prod_id = b.prod_id
inner join TOPHAT.supplier as s with (nolock)
on s.supplier_no = a.supplier_no
inner join TOPHAT.customer as cu with (nolock)
on cu.cust_no = a.cust_no
and cu.supplier_no = a.supplier_no
inner join TOPHAT.trans_value as tv with (nolock)
on tv.trans_code = a.trans_id
inner join [TOPHAT].[carrier] as ca with (nolock)
on ca.term_id = a.term_id
and ca.carr_no = a.carrier
group by
a.[term_id],
tp.name ,
a.folio_yr,
a.folio_mo,
a.folio_no,
p.prod_id,
p.prod_name,
p.prod_id+' '+p.prod_name,
s.supplier_no,
s.supplier_name,
s.supplier_no+' '+s.supplier_name,
cu.cust_no,
cu.cust_name,
cu.cust_no+' '+ cu.cust_name,
a.carrier,
ca.name,
tv.trans_code,
tv.trans_desc,
tv.trans_code+' '+tv.trans_desc,
'20'+ left(a.transaction_date, 2),
substring(a.transaction_date, 3, 2) ,
a.transaction_date,
cast(a.folio_yr + '-' + a.folio_mo +'-01' as date)
Fix implicit conversions
Speaking of estimates, you have several implicit conversion warnings. It’s difficult to advise on how to deal with them, since we don’t have table and index definitions, but you should review them to see if they can be avoided. Especially the ones in the WHERE
clause:
Join hint
One low-effort approach would be to try and avoid that specific merge join with a join hint. This might cause performance to get worse, though, because it will force the order of the joins as written, which limits the optimizer can do quite a bit:
inner HASH join [TOPHAT].[product] as p with (nolock)
on p.term_id = a.term_id
and p.prod_id = b.prod_id
Note that this wouldn’t likely help with sort spills.