T-SQL Optimization

Posted on

Question :

If the answer is ‘No’, I’m OK with that…

I’m looking to see if this can be optimized any…it’s part of a much larger stored proc. CGCode is varchar(50), Year and Month are smallint, FEIN is char(9)

select max(id)
from Table
where 1=1
    and cgcode = 123
    and datefromparts(cast(year as char(4)),cast(month as char(2)),'01') < getdate()
    and totalcount > 0 
group by cgcode, year, month, fein

Logical reads from actual execution plan: 1,566,473

Source table raw data 32 million+ records

Estimated rows: 640K, actual 55K, before the Group By kicks in

Implicit conversion warnings on Year/Month/CGCode (comes in as bigint)

Execution time roughly 7.5 seconds, doing a Non-Clustered Index Seek:

enter image description here

End result set is 114 rows (for this CGCode we test with…others vary)

Performance is roughly the same in Prod on significantly better hardware than the Dev box. This is only going to get worse over time, as its pulling everything older than the current month, to populate a history chart in a UI.

What other info can I provide?

Paste the plan

The index being used currently:

CREATE NONCLUSTERED INDEX [COIX_Table_TotalCount] ON [dbo].[Table]
(
    [TotalCount] ASC
)
INCLUDE (   [ID],
    [CGCode],
    [Year],
    [Month],
    [FEIN]) 
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, 
        ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CGCode limits the data most…281 values. year is only 3-4 years worth of data, month only 12 options there of course. TotalCount is a bigint, 17K distinct values across 32 million records. No idea what the purpose of the TotalCount column is.

Answer :

First of all, I would rewrite this condition. As it is, it can’t use any index effectively.:

datefromparts(cast(year as char(4)),cast(month as char(2)),'01') < getdate()

Rewrite it so there is no functions or calculations applied to the columns:

 ( year < year(getdate()) 
or year = year(getdate()) and month <= month(getdate())
 )

The rewrite would allow an index on (cgcode, year, month) to be used – or even better a (cgcode, year, month, fein) one, considering the GROUP BY parts.

Next, if totalcount is always part of the WHERE clause or thee specific totalcount > 0 condition is always used, you might squeeze even more efficiency by adding it to the INCLUDE part or with a more targeted, filtered index:

-- option A
index (cgcode, year, month, fein, id) 
      include (totalcount)

-- option B
index (cgcode, year, month, fein, id)
      where (totalcount > 0)

I will remove cgcode from Group By since I know what is cgcode and it is not range.So removing cgcode (varchar(60)) will give tremendous push.

So if I am correct my index will change too.

Use Variable in place of getdate()

declare @CurDate date=getdate()
declare @yr int=year(@CurDate ) 
declare @month tinyint=month(@CurDate )
declare @cg varchar(60)='123'

select @cg cgcode,max(id)
from Table
where  cgcode = @cg
    ( year < @yr  
or [year] = @yr and [month] <= @month
 )
    and totalcount > 0 
group by year, month, fein

Also technically why totalcount will be covering index in option A.I don’t understand,by that logic it should within index group or option B is correct .

Found it 🙂

The clientid is being passed in as INT…causing an implicit conversion to varchar for the CGCode field.

select max(id)
from Table
where 1=1
    and cgcode = CAST(123 as varchar(50))  ---this fixed it
    and datefromparts(cast(year as char(4)),cast(month as char(2)),'01') < getdate()
    and totalcount > 0 
group by cgcode, year, month, fein

From 7-30 seconds to 100ms. I’ll take that.

Leave a Reply

Your email address will not be published. Required fields are marked *