# Calculating Running Average using Over clause in SQL server

Posted on

### Question :

My table looks like this:

``````CREATE TABLE [dbo].[TimeSeries](
[LOCID] [int] NOT NULL,
[Date] DateTime,
[YEAR] int,
[MONTH] tinyint,
[RTT] [int]
PRIMARY KEY ( [LOCID] ASC,[Date] ASC)
)
``````

It contains 12 rows(or MONTHS) per LOCID, per YEAR.

I have the following query:

``````SELECT
LOCID,
[Year],
[Date],
AVG(cast(RTT as float)) OVER (
PARTITION BY LOCID ORDER BY Date
ROWS BETWEEN CURRENT ROW AND 4 FOLLOWING ) AS avgRTT
FROM TimeSeries
``````

example data:

``````LOCID   Date    Year    Month   RTT
1   01/01/1982  1982    1   58
1   01/02/1982  1982    2   63
1   01/03/1982  1982    3   34
1   01/04/1982  1982    4   27
1   01/05/1982  1982    5   6
1   01/06/1982  1982    6   4
1   01/07/1982  1982    7   3
1   01/08/1982  1982    8   14
1   01/09/1982  1982    9   22
1   01/10/1982  1982    10  16
1   01/11/1982  1982    11  17
1   01/12/1982  1982    12  44
1   01/01/1983  1983    1   58
1   01/02/1983  1983    2   63
1   01/03/1983  1983    3   33
1   01/04/1983  1983    4   27
1   01/05/1983  1983    5   9
1   01/06/1983  1983    6   0
1   01/07/1983  1983    7   3
1   01/08/1983  1983    8   0
1   01/09/1983  1983    9   6
1   01/10/1983  1983    10  27
1   01/11/1983  1983    11  11
1   01/12/1983  1983    12  48
``````

Now I am trying to execute the above query for a specific MONTH in every year only, calculating the average for the 5 following dates/months/rows. But a WHERE clause to specify the MONTH value will not work because that affects which numbers are averaged.

I need to do this for tens of thousands of LOCID’s and millions of rows.

My question is if there is a way to execute the correct moving average for only a specific value for ‘MONTH’, thus hopefully reducing the processing time a number of times…

example desired output:

``````LOCID   Date    Year    Month   avgRTT
1   29952   1982    1   37.6
1   30317   1983    1   38
``````

Any pointers will be greatly appreciated.

You can apply your final filter on `@dekade` after computing the running averages.

In order to reduce the number of rows that need to be processed for the running averages, you can apply an earlier filter on `[dekade] IN (@dekade, (@dekade+1)%36, (@dekade+2)%36)` to ensure that you are processing the minimal amount of rows but still including all the rows that are necessary to including the following 11 rows in the running average. (The only reason for the `% 36` is to handle values at `@dekade` that fall at the end of the year.)

This will still result in a table scan given your current table structure, but at least the rows can be filtered out earlier on in the query plan.

``````DECLARE @dekade TINYINT = 1
SELECT *
FROM (
SELECT
LOCID,
[Year],
[Date],
AVG(cast(RTT as float)) OVER
(PARTITION BY LOCID ORDER BY Date
ROWS BETWEEN CURRENT ROW AND 11 FOLLOWING) AS avgRTT
FROM TimeSeries
-- If you want to limit the rows that are use when computing the running average,
-- you can make sure that only the desired @dekade plus the following two @dekades
-- (which may be needed to get the following 11 rows) are used for each year
) x
``````

It would be helpful to post some sample data so that we can actually run the query though.

You might try to put the `AVG` into a `CASE`, if you got luck the optimizer is smart enough to calculate the moving average only once per year:

``````select *
from
(
SELECT
LOCID,
[Year],
[Date],
[Month],
case when Month = 1 then
AVG(cast(RTT as float)) OVER (
PARTITION BY LOCID ORDER BY Date
ROWS BETWEEN CURRENT ROW AND 4 FOLLOWING )
end AS avgRTT
FROM TimeSeries
) as dt
where month = 1 -- or avgRTT is not null
``````