Is it possible to rewrite this Function for optimization?

Posted on

Question :

I need help optimizing this function as it takes more than 2 minutes currently which is far from optimal. This function basically gives you the name of column, corresponding table and list of unique values contained in the column.

Function Definition:

CREATE OR REPLACE FUNCTION get_pubcol(
    included_tables name[] default '{}',
    included_schema name[] default '{}'
)
RETURNS table(schema_name text, table_name text, column_name text, column_value text)
AS $$
begin
  FOR schema_name, table_name, column_name IN
      SELECT c.table_schema, c.table_name, c.column_name
      FROM information_schema.columns c
      JOIN information_schema.tables t ON
        (t.table_name = c.table_name AND t.table_schema=c.table_schema)
      WHERE (c.table_name=ANY(included_tables) OR included_tables='{}')
        AND (c.table_schema=ANY(included_schema) OR included_schema='{}')
        AND t.table_type='BASE TABLE'
  LOOP
    EXECUTE format('SELECT ARRAY_AGG(DISTINCT %I) AS column_value FROM %I.%I',
       column_name,
       schema_name,
       table_name
    ) INTO column_value;
    IF column_value is not null THEN
      RETURN NEXT;
    END IF;
 END LOOP;
END;
$$ language plpgsql;

Function call:

SELECT * FROM get_pubcol('{}', '{public}')

Explain Analyze Result:

"Function Scan on get_pubcol  (cost=0.25..10.25 rows=1000 width=128) (actual time=121235.178..121235.199 rows=73 loops=1)"
"Planning time: 0.049 ms"
"Execution time: 121237.436 ms"

Query Result:

schema_name table_name column_name   column_value
"public"    "MARA"  "Lab/Office"    "{" ",BL1,GL1,GP1,KL1,KP1,NaN,SL1,SP1,WL1,WL2,WP1}"

Answer :

You are running one query for each column and table. One way to optimize this, is to only run a single query for each table that retrieves the distinct values for all columns in one go. This result can be put into a JSONB which in turn can be returned as individual rows using jsonb_each:

CREATE OR REPLACE FUNCTION get_pubcol(
    included_tables name[] default '{}',
    included_schema name[] default '{}'
)
RETURNS table(schema_name text, table_name text, column_name text, column_value text)
AS 
$$
declare
  l_value jsonb;
  l_sql   text;
begin
  FOR schema_name, table_name, l_sql IN
    SELECT c.table_schema, c.table_name, 
           concat('select to_jsonb(t) from ( select ', 
                  string_agg(format('array_agg(distinct %I) as %I', c.column_name, c.column_name), ', '), 
                  format(' from %I.%I) as t', c.table_schema, c.table_name)) as sql
    FROM information_schema.columns c
      JOIN information_schema.tables t
        ON t.table_name = c.table_name AND t.table_schema=c.table_schema
    WHERE t.table_type = 'BASE TABLE'
      AND (c.table_name = ANY(included_tables) OR included_tables='{}')
      AND (c.table_schema = ANY(included_schema) OR included_schema='{}')
    group by c.table_schema, c.table_name
  LOOP
    EXECUTE l_sql into l_value;
    IF l_value is not null THEN
      return query 
        select schema_name, table_name, x.*
        from jsonb_each_text(l_value) as x(column_name, column_value);
    END IF;
 END LOOP;
END;
$$ language plpgsql;

The outer query assembles a single query for each table. E.g. if you have a table person with the columns id, firstname, lastname the expression in the query for the loop would generate something like:

select to_jsonb(t) 
from ( 
   select array_agg(distinct lastname) as lastname, array_agg(distinct id) as id, array_agg(distinct firstname) as firstname 
from public.person
) as t

That query returns a single jsonb value that contains the information for all columns. This is stored in the local variable l_value, something like:

{
    "id": [42,43,44,45],
    "lastname": ["Beeblebrox","Dent","McMillan","Prefect"],
    "firstname": ["Arthur","Ford","Tricia","Zaphod"]
}

This is then turned back into rows using jsonb_each_text.

The display will be different though.

Your example

{BL1,GL1,GP1,KL1,KP1,NaN,SL1,SP1,WL1,WL2,WP1}

will be displayed as

["BL1","GL1","GP1","KL1","KP1","NaN","SL1","SP1","WL1","WL2","WP1"]

Leave a Reply

Your email address will not be published. Required fields are marked *