Query slow when using function in the WHERE clause

Posted on

Question :

This is fast (49ms):

v_cpf_numerico := ext.uf_converte_numerico(new.nr_cpf);

select cd_cliente into v_cd_cliente
from central.cliente where nr_cpf_cnpj = v_cpf_numerico;

This is slow (15 seconds):

select cd_cliente into v_cd_cliente
from central.cliente where nr_cpf_cnpj = ext.uf_converte_numerico(new.nr_cpf);


create or replace function ext.uf_converte_numerico(_input varchar(30)) returns bigint
    _input := regexp_replace(_input, '[^0-9]+', '', 'g');

    if _input = '' then
        return null;
    end if;

    return cast(_input as bigint);
$$ language plpgsql;

I am using PostgreSQL 12.
Why is the second variant slow?

Answer :

Consider this simplified equivalent:

CREATE OR REPLACE FUNCTION ext.uf_converte_numerico(_input varchar(30))
SELECT NULLIF(regexp_replace(_input, '[^0-9]+', '', 'g'), '')::bigint;
  • IMMUTABLE, because it is, and for the reasons Laurenz explained.

  • PARALLEL SAFE in Postgres 10 or later, because it is. Without the label, functions default to PARALLEL RESTRICTED, which disables parallel queries. This may or may not affect the query on display. But the 15 seconds you reported indicate you are operating on big tables. So it can make a huge difference in other queries.

  • LANGUAGE SQL to enable function inlining, which won’t matter much for the query on display (after you labelled it IMMUTABLE), but will simplify query plans and improve overall performance.

  • NULLIF as minor simplification.

Aside: your input is varchar(30), which still allows out of range errors for bigint. Either consider varchar(18) to be sure. Or just make it text to remove the ineffective restriction.

Since the function is VOLATILE (by default), PostgreSQL doesn’t know that it will return the same value for every row in central.cliente, so it is evaluated repeatedly.

Set the volatility to IMMUTABLE and PostgreSQL knows that it has to be evaluated only once:

ALTER FUNCTION ext.uf_converte_numerico(varchar(30)) IMMUTABLE;

In the first case you are using pre-calculated value (v_cpf_numerico).

And in the second case value is calculated for each row in central.cliente during select.

Leave a Reply

Your email address will not be published. Required fields are marked *