Check whether empty strings are present in character-type columns

Posted on

Question :

I have an application that (as part of its logic) trims strings and replaces empty strings with NULL prior to insertion in the database. I guess one way to ensure that this is enforced would be to write a CHECK on every table that has a VARCHAR, TEXT (or similar) column.

Assuming that one can’t or does not want to do that is there a way to write a simple, generic, SQL query (obtaining table and column names from the database’s metadata) that would check if any textual columns in the database contain empty strings?

Answer :

Function for a single table

Returns all character-type columns of the given table with a count of empty values ('') and whether they are defined NOT NULL.

CREATE OR REPLACE FUNCTION f_tbl_empty_status(_tbl regclass)
  RETURNS TABLE (tbl text, col text, empty_ct bigint, not_null bool)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _typ      CONSTANT regtype[] := '{text, bpchar, varchar}';  -- base char types
   _sql      text;
   _col_arr  text[];
   _null_arr bool[];
BEGIN
   -- Build command
   SELECT INTO _col_arr, _null_arr, _sql
          array_agg(s.col)
        , array_agg(s.attnotnull)
        , '
   SELECT $1
        , unnest($2)
        , unnest(ARRAY [count('
                 || string_agg(s.col, ' = '''' OR NULL), count(')
                                   || ' = '''' OR NULL)])
        , unnest($3)
   FROM   ' || _tbl
   FROM  (
      SELECT quote_ident(attname) AS col, attnotnull
      FROM   pg_attribute
      WHERE  attrelid = _tbl           -- valid, visible, legal table name 
      AND    attnum >= 1               -- exclude tableoid & friends
      AND    NOT attisdropped          -- exclude dropped columns
   -- AND    NOT attnotnull            -- include columns defined NOT NULL
      AND    atttypid = ANY(_typ)      -- only character types
      ORDER  BY attnum
      ) AS s;

   -- Debug
   -- RAISE NOTICE '%', _sql;

   -- Execute
   IF _sql IS NULL THEN
      -- do nothing, nothing to return
   ELSE
      RETURN QUERY EXECUTE _sql
      USING  _tbl::text, _col_arr, _null_arr;
   END IF;
END
$func$;

Call:

SELECT * FROM f_tbl_empty_status('tbl_name'); -- optionally schema-qualified

Returns:

tbl   | col        | empty_ct | not_null
------+------------+----------+---------
tbl1  | txt        | 0        | f
tbl1  | vc         | 3        | f
tbl1  | "oDD name" | 7        | f

Works for Postgres 9.1 or later.

Output table names are automatically schema-qualified if needed, according to the current search_path.

Output table names and column names are automatically escaped if needed.

empty_ct is the count of rows where the value of the column is the empty string

not_null reports whether the column is defined NOT NULL (so you cannot convert possible empty strings to NULL!)

Input table name can optionally be schema-qualified, else defaults to the current search_path.

Your role needs privileges to actually read from the given table.

The function is highly optimized and only runs a single scan on the given table to check on all relevant columns.

Should be safe against SQL injection.

Using parallel unnest() to simplify the complex code somewhat:

You will be interested in this related answer on SO to actually replace empty strings – with more explanation:

Match strings of only space characters, too

As you commented, trim(s.col, ' ') = '' does the job just fine. But here’s a shortcut:

s.col::char = ''

How?
char is an alias for character(1), the rarely useful blank-padded type. Values are padded with space characters to the right up to the length specifier (which is 1 in this case, but irrelevant). Trailing spaces are effectively insignificant for this type. So ' ' is the same as '' or ' '. Voilá. And yes, it is faster, too, I tested.

To find strings of only space characters, too (not other white space!), add the cast to these lines above:

                  || string_agg(s.col, '::char = '''' OR NULL), count(')
                                    || '::char = '''' OR NULL)])

Wrapper function to report on a whole schema

CREATE OR REPLACE FUNCTION f_schema_empty_status(_sch text DEFAULT 'public')
  RETURNS TABLE (tbl text, col text, empty_ct bigint, not_null bool)
  LANGUAGE plpgsql AS
$func$
DECLARE 
   _tbl regclass;
BEGIN
   FOR _tbl IN
      SELECT c.oid
      FROM   pg_class c
      JOIN   pg_namespace n ON n.oid = c.relnamespace
      WHERE  n.nspname = _sch  -- 'public' by default
   -- AND    c.relname LIKE 'my_pattern%'  -- optionally filter table names
      AND    c.relkind = 'r'  -- regular tables only
      ORDER  BY relname
   LOOP
   -- Debug
   -- RAISE NOTICE 'table: %', _tbl;

      RETURN QUERY
      SELECT * FROM f_tbl_empty_status(_tbl);
   END LOOP;
END
$func$;

Call:

SELECT * FROM f_schema_empty_status();  -- defaults to 'public' without parameter

Returns:

tbl   | col        | empty_ct | not_null
------+------------+----------+---------
tbl1  | txt        | 0        | f
tbl1  | vc         | 3        | f
tbl1  | "oDD name" | 7        | f
tbl2  | some_text  | 123      | t
...

db<>fiddle here
Old sqlfiddle

The best way to do this is with a CHECK constraint, as you noted, possibly via a DOMAIN, e.g.

CREATE DOMAIN nonempty_string AS text 
CONSTRAINT non_empty CHECK (length(VALUE) > 0);

then ALTER existing columns to use the domain.

If that’s not possible, you will need to query INFORMATION_SCHEMA to find all columns of the target type across all tables, then for each column, dynamically generate a query to check. You can use PL/PgSQL and the EXECUTE statement for this; there are many examples of its usage this way elsewhere on Stack Overflow.

You cannot write a single query to do what you want. It just won’t work. You’ll have to use query generation from information_schema.

Leave a Reply

Your email address will not be published. Required fields are marked *