for user defined aggregate how to understand state transition function

Posted on

Question :

CREATE AGGREGATE newavg (
   sfunc = int4_avg_accum, basetype = int4, stype = _int8,
   finalfunc = int8_avg,
   initcond1 = '{0,0}'
);

select newavg(a) , pg_typeof(newavg(1)) from (values (1),(2),(3)) v(a);

return:

+--------------------+-----------+
|       newavg       | pg_typeof |
+--------------------+-----------+
| 2.0000000000000000 | numeric   |
+--------------------+-----------+

from the result, my conjecture is that newavg seems the same as avg.
to understand newavg, I need to understand int4_avg_accum, int8_avg.
after some test, I can guess what does int8_avg do.

select  int8_avg(array[6,3]); --return 0.50000000000000000000
select  int8_avg(array[3,6]); --return  2.0000000000000000

now the problem is int4_avg_accum
My guess is that newavg aggregate, sfunc function compute and return array of integer then pipe/feed to the finalfunc. Finally finalfunc return the value.

test of int4_avg_accum:

select int4_avg_accum(a,0) from (values (array[1,1]),(array[1,2]),(array[1,3])) v(a);

return:

+----------------+
| int4_avg_accum |
+----------------+
| {2,1}          |
| {2,2}          |
| {2,3}          |
+----------------+

select int4_avg_accum(a,0) from (values (array[1,1]),(array[2,1]),(array[3,1])) v(a);

return:

+----------------+
| int4_avg_accum |
+----------------+
| {2,1}          |
| {3,1}          |
| {4,1}          |
+----------------+

I found the int4_avg_accum source code, but i don’t C.

So overall the question is what does int4_avg_accum do? Based on the following query

select newavg(a) , pg_typeof(newavg(1)) from (values (1),(2),(3)) v(a);

there must a step that state transition function return array[3,6]. How can I play around get array[3,6].

Answer :

int4_avg_accum maintains the accumulator for the new value, provided it was already correct for previously seen values.

The first slot in the array it takes (and returns) is the count, the 2nd is the sum.

so int4_avg_accum(ARRAY[a,b],c) yields ARRAY[a+1,b+c].

but i don’t C.

Then it will be remarkably hard to deeply understand, or do any useful customizations with, the built in aggregates. You can write aggregates purely in SQL (and I have), but they will be slow.

Leave a Reply

Your email address will not be published. Required fields are marked *