sys.dm_exec_query_stats – what causes total_worker_time to be reset? How to work around that?

Posted on

Question :

I am monitoring stored procedure executions, mainly the execution times and the CPU usage, using sys.dm_exec_query_stats in the script below.

SET NOCOUNT ON;

declare @lastex datetime
select @lastex = max(datestamp) from tablebackups.[dbo].[_dm_exec_query_stats]

--Get proc counts since the starting point table creation
insert into DBA.dbo.SPStats
SELECT  bcp.datestamp as StartTime
        ,getdate() as EndTime
        ,@@Servername as 'Server'
    ,isnull(DB_NAME(st.dbid),'') DBName
    ,isnull(OBJECT_SChEMA_NAME(st.objectid,st.dbid),'') [Schema]
    ,isnull(OBJECT_NAME(st.objectid,dbid),'') StoredProcedure
      ,max(cp.usecounts)-max(bcp.usecounts) as Execution_count
      ,--case  
      --when sum(qs.total_worker_time)-sum(bqs.total_worker_time) < 0
        --  then (9223372036854775807 - sum(convert(bigint,bqs.total_worker_time))+sum(convert(bigint,qs.total_worker_time)))
    --else 
    sum(qs.total_worker_time)-sum(bqs.total_worker_time)
    --  end 
    AS 'total_cpu_time'

      ,case max(cp.usecounts)-max(bcp.usecounts)
        when 0
            then NULL
        else
            (sum(qs.total_worker_time)-sum(bqs.total_worker_time))/(max(cp.usecounts)-max(bcp.usecounts))
        END
            as 'CPU_Per_execution'
      ,cast(convert(float,(max(cp.usecounts)-max(bcp.usecounts)))/convert(float,(datediff(s,bcp.DateStamp,getdate()))) as Numeric(18,3)) as 'Executions_Per_Second'
      ,sum(qs.Total_Logical_Reads)-sum(bqs.Total_Logical_Reads) as Total_Logical_Reads
      ,(sum(qs.Total_Logical_Reads)-sum(bqs.Total_Logical_Reads))/datediff(s,bcp.DateStamp,getdate()) as Total_Logical_Reads_per_sec
      ,sum(qs.Total_Logical_Writes)-sum(bqs.Total_Logical_Writes) as Total_Logical_Writes
      ,(sum(qs.Total_Logical_Writes)-sum(bqs.Total_Logical_Writes))/datediff(s,bcp.DateStamp,getdate()) as Total_Logical_Writes_per_sec
      ,datediff(n,bcp.DateStamp,getdate()) as 'Elapsed Time (m)'
 FROM sys.dm_exec_cached_plans cp 
    inner join tablebackups.dbo._dm_exec_cached_plans bcp
        on  cp.plan_handle = bcp.Plan_handle
    join sys.dm_exec_query_stats qs 
        on cp.plan_handle = qs.plan_handle
    inner join tablebackups.dbo._dm_exec_query_stats bqs
        on cp.plan_handle = bqs.plan_handle
    CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st
where   
    cp.objtype = 'proc'
    and  qs.last_execution_time > @LastEx

group by DB_NAME(st.dbid)
        ,OBJECT_SCHEMA_NAME(objectid,st.dbid)
        ,OBJECT_NAME(objectid,st.dbid)
        ,bcp.DateStamp
order by max(cp.usecounts)-max(bcp.usecounts) desc

--Create baseline tables to get a starting count point
if object_id(N'tablebackups.dbo._dm_exec_query_stats') is not null
    drop table tablebackups.dbo._dm_exec_query_stats
if object_id(N'tablebackups.dbo._dm_exec_cached_plans') is not null
    drop table tablebackups.dbo._dm_exec_cached_plans

select getdate() as DateStamp,* into tablebackups.dbo._dm_exec_query_stats
from sys.dm_exec_query_stats
select getdate() as DateStamp,* into tablebackups.dbo._dm_exec_cached_plans
from sys.dm_exec_cached_plans

I basically saves the contents of DMVs into tables (tablebackups) so that I can compare the situation as it is now (straight from the DMVs) and the situation as it was (from the tablebackus tables).

works fine.

Only problem and hence this question:

Sometimes the total_worker_time now is less that what it used to be, for a reason I haven’t identified, and therefore it gives me negative values.

I have been run this script as a stored procedure that is called by a job every 5 min.

so in the last 5 min I know how many times each procedure has been executed, that has been accurate, but the CPU usage has not been, for the reason above.

what can I do to tackle this?

Answer :

I’m pretty terrible at these DMVs however:

  • When you’re joining on the history you can have one plan_handle with multiple sql_handle (where the plan executes with different options), so possibly this is skewing your stats.

  • The plan handle is for an entire batch, with individual statements inside it, which can be recompiled separately, and this might (I haven’t confirmed) overwrite the stats for the batch. You might be able to confirm by capturing with traces/events for plan recompilations and seeing if the times correlate with your counter resets.

Leave a Reply

Your email address will not be published. Required fields are marked *