why computed columns slowing down performance all the sudden?

Posted on

Question :

Users were able to run reports before 10 am.
After that same reports became very slow, sometimes users just didn’t have patience to wait.
After some troubleshooting I found the column that was causing the delay. It was computed column that uses function in order to bring the result.

Approximately at the same time I got another complain about slow running report, that was always working fine. After some troubleshooting I found the columns that was causing a delay:

where (Amount - PTD) <> 0

And again, the Amount column is computed column.

So my question is: why all of the sudden computed columns that were always part of the reports started to slow down the performance significantly?
What could really happen approx after 10 am?
And what is the disadvantage if I make those columns persisted?

Thank you

–FUNCTION that bring the column:

ALTER FUNCTION [dbo].[CalcInvoiceAmtPTD]
(@SInvNum INT, @entityGuid uniqueIdentifier) RETURNS MONEY
AS
BEGIN
    DECLARE @Amt MONEY

    declare @toplevel uniqueidentifier
    set @toplevel = (select dbo.gettoplevelentity(@entityguid))

    declare @t table 
    (
        Guid uniqueidentifier
    )

    insert into @t select * from dbo.getlinkedentities(@toplevel) where guid is not null
    declare @tbl table (amount money, glacctid int)

    select @amt = isNull(sum(amount), 0) from tblfin_journalpostings jp
        inner join tblfin_journal j on j.transactnum = jp.transactnum 
            and (voiderfor is null and voidedby is null)and j.transdescid <> 'I'
        inner join tblfin_glaccounttypes glt on glt.glacctid = jp.glacctid and glt.accounttype = 'p'
        inner join @t t on t.guid = jp.entityguid
    where invoicenum = @SInvNum

    RETURN ISNULL(@Amt  , 0)
END

Answer :

The performance difference with and without the computed column is good to know, and may lead to an improvement of the query overall, but it doesn’t really explain why there’s a notable performance difference for the same query, with the computed column, at different times of the day.

There are many reasons why your query might be running slower after 10 AM. And we, of course, without an awful lot of information from you, are very unlikely to be able to tell you what’s happening on your system that would cause this change at this time.

However, here are a few possibilities:

  • Increased activity – The simplest possibility is that the query is running slower after 10AM because your system is more active after 10AM. In the US, a business with offices across the country, but with servers based in the Eastern time zone, where employees usually arrive around 9AM, would have all employees in the Eastern and Central time zones in the system by 10AM. For such businesses I’ve worked at, that’s been over 2/3rd of the total employee base. So, the system often runs a little slower past that point.

    I’ve also dealt with situations where certain activities tend to occur at certain times of the day consistently, and can have a heavy impact. A slowdown every afternoon at the same time on one system I worked on was ultimately tied back to employees who worked outside the office returning to the office, all around the same time, and synching their devices to upload their days’ activities.

    Or, from a slightly different direction: Maybe the query itself is running at the same speed at 10:15AM as it does at 8:15AM – but maybe half the people in the office are streaming content by 10AM, and the available bandwidth for the report data to make it to someone’s desktop has been reduced by a factor of 5.

    If this is your problem, then you’ll need to determine exactly what the bottleneck is (CPU/memory on the server; available network bandwidth between the server and the users), and see if you can increase the limiting resources. (Or, if the last scenario, make everyone stop streaming content!)

  • Type of other activity – In addition to simple specific activity, certain types of activity happening at different times of the day may also have an impact. If people are making sales call against your DB starting around 10 AM, there could be enough INSERT and UPDATE activity against one or more tables you’re hitting that the SELECTs may have to wait for those to clear before they can run. Even worse, if the application locks a row when someone starts a sales call (to ensure that you don’t have two people trying to call the same prospect at the same time), the waits could be even longer.

    Depending on your reporting requirements and how your application works, you might be able to use READ_COMMITTED_SNAPSHOT_ISOLATION, or even WITH (NOLOCK) table hints, to resolve the situation, if this is your actual problem.

  • Usage patterns – It’s also possible that there’s a difference in how the report is used before and after 10AM. If you’ve got a query plan cached that expects one of the parameters passed in to be highly selective, and after 10AM most runs of the query are using a less selective value, that could impact things.

    As an example: Say your company is headquartered in Cincinnati, Ohio (which is near that state’s borders with two other states, Kentucky and Indiana). 90% of your customers are in Ohio, with the remaining 10% in the two adjoining states. A report is run on a by-state basis on daily sales, and procedure is to run Ohio last. If the query plan is built with state = 'KY', and is run with that or Indiana before 10AM, and with state = 'OH', then the query plan might simply not perform as well.

    On the flip side – maybe some brilliant soul had that problem in the past – and his solution was to clear the plan cache every 15 minutes from 10 AM to 5PM. That could mess with the query as well (though you’d expect it to mess with everything).

  • Wear and tear – Perhaps you’re rebuilding your statistics every night at 1AM; but activity by 10AM has changed the data enough so those statistics are off.

  • Subjective time – Users might start up the report, go grab a cup of coffee, and come back to find it’s done before 10AM, and run it while at their desks, waiting for it to finish afterwards. Not saying it’s likely the answer, but stranger things have happened.

  • “And the rest…” – the above are by no means an exhaustive list of possibilities, just some thoughts based on what I’ve encountered over the years.

When you’re getting reports of issues like this, it’s important to solicit useful feedback from your user base. Ask things like:

  • Does the performance drop really happen at 10AM every day? does it vary some from day to day, or perhaps does it only happen on Monday and Tuesday?
  • Does the performance drop actually only affect this report? or does it affect other activity on the system? (If everything takes twice as long after 10AM, but most thing still only take 5-30 seconds, while this report takes 5 minutes before 10AM and 10 minutes afterwards, you might get the behavior you describe, but it might have absolutely nothing to do with the report).
  • Is there any difference in how you use the report before 10AM and after?
  • Was there a time when the report came back as quickly as it does before 10AM all morning? all day? if so, has the change been gradual, or was there an identifiable point where the post 10AM performance went from fine to bad? If so, when?

I’ll also note that, while you’ve told us this query goes from taking 4 minutes to 4 seconds if you drop the computed column, you haven’t told us how long it takes as is at 8:15, 9:15, 10:15 and 11:15. You haven’t mentioned the query plan for the query, and whether it’s the same plan before and after 10AM; post actual plan executions before and after 10AM to PasteThePlan and people here can see what’s different, and provide informed suggestions as to what might fix things. Even if this information doesn’t point to an answer, it may allow some possibilities to be ruled out.

I’m going to throw this out there, since no one else has mentioned it.

Having a scalar valued function in a computed column is an awful idea.

  • Forces ALL queries against the table to run serially (even if you don’t select the computed column)
  • Forces maintenance (Index, CHECKDB) to run serially, even if you’re
    on Enterprise
  • Executes row by row
  • If used in the WHERE clause, can cause issues with cardinality
    estimates

Here are a few blog posts I’ve written about similar subjects.

Still Serial After All These Years

Scalar Functions In Computed Columns

Scalar Functions In Check Constraints

You should always keep computed columns persisted unless you`re having very consistent inserts/updates with low number of reads from that table.

Why am i saying this?

Because adding persisted to computed column means that the moment calculation is done (on every insert/update) value of the column is written to the database.
And whenever you need those values again, sql server simply sends them to you already calculated.
This way CPU does not need to recalculate them every time you make a query on that table and request the column.
Which can save tremendous amount of time on your reports, since you’re doing some sort of calculations again

The downside is that every time you do any kind of insert or update sql server needs to calculate the value again, which can slow down the inserts/updates on a busy table.

Now to improve your query even fruther, you need to be familiar with SARGAble queries, which in a nutshell means that you cannot do any kind of calculations/modifications on a column in where condition such as :

where (Amount - PTD) <> 0 ; 
where Ammount + ' ' = 'Something' ; 
where (Ammount * 2) = 100 

etc etc

To get more familiar with it please read this

Alternatively you could do something like this :

 where Amount > PTD or Ammount < PTD  

For even further optimizations, it depends on the columns and other where conditions but you should also read about this

  1. Using covering indexes to improve query performance
  2. Clustered & Nonclustered Indexes

Hope this helps

All discussion above are helpful.

Inline Scalar function and its benefits was missing from discussion.

Rewrite your UDF and make it “Inline Scalar function

Inline Scalar function” definitely improve the performance

Or I will not use that computed column and use CROSS apply.

But first you have to make correct and optimize query.

i) your statistics is not updated.

ii) Or your data have grown over time.

Leave a Reply

Your email address will not be published. Required fields are marked *