Question :
I have a function that accepts as parameter an array like:
CREATE OR REPLACE FUNCTION my_func(some_data custom_datum[]) RETURNS VOID AS $$
BEGIN
create table foo_table as (
select
coalesce(foo_ind, bar_ind, ter_ind) as foobarter,
import_date,
-- do some stuff here
from unnest (some_data) as T
group by grouping sets ((foo_ind, import_date), (bar_ind, import_date), (ter_ind, import_date))
);
END
$$ LANGUAGE plpgsql;
The input array is generated by another function foo
. So I call everything in this way:
select my_func(array(select foo()));
where the function foo
is:
CREATE OR REPLACE FUNCTION foo() RETURNS SETOF custom_datum
The problem is that for a big amount of data array(select foo())
returns:
ERROR: array size exceeds the maximum allowed (1073741823)
What I am trying to workaround is the lack of possibility to pass different functions, as at the moment, the input array gets generated by different functions:
select my_func(array(select foo()));
select my_func(array(select bar()));
select my_func(array(select ter()));
.... etc
How can I workaround this problem?
Answer :
What your my_func
is essentially doing is creating a MATERIALIZED VIEW
— a materialized view is a cached copy of a result set stored as a table. Drop the function and use the normal MATERIALIZED VIEW
.
Skip generating an array — waste of a time and space and may even be serializing the result set to disk twice. And, instead, just use something like this:
CREATE MATERIALIZED VIEW foo_view
AS
SELECT whatever
FROM wherever
GROUP BY GROUPING SETS (
(foo_ind, import_date),
(bar_ind, import_date),
(ter_ind, import_date)
);
Now you can “refresh” this by doing REFRESH foo_view;
You can possibly limit the output of foo()
:
SELECT foo() LIMIT 10000;
Or, most probably better, avoid using arrays altogether (pass the set, not an array made of it). For this, you will, of course, have to rewrite my_func()
. Thinking about it, I can imagine that this will be beneficial from a performance point of view, too.