Question :
I have an application that (as part of its logic) trims strings and replaces empty strings with NULL prior to insertion in the database. I guess one way to ensure that this is enforced would be to write a CHECK on every table that has a VARCHAR
, TEXT
(or similar) column.
Assuming that one can’t or does not want to do that is there a way to write a simple, generic, SQL query (obtaining table and column names from the database’s metadata) that would check if any textual columns in the database contain empty strings?
Answer :
Function for a single table
Returns all character-type columns of the given table with a count of empty values (''
) and whether they are defined NOT NULL
.
CREATE OR REPLACE FUNCTION f_tbl_empty_status(_tbl regclass)
RETURNS TABLE (tbl text, col text, empty_ct bigint, not_null bool)
LANGUAGE plpgsql AS
$func$
DECLARE
_typ CONSTANT regtype[] := '{text, bpchar, varchar}'; -- base char types
_sql text;
_col_arr text[];
_null_arr bool[];
BEGIN
-- Build command
SELECT INTO _col_arr, _null_arr, _sql
array_agg(s.col)
, array_agg(s.attnotnull)
, '
SELECT $1
, unnest($2)
, unnest(ARRAY [count('
|| string_agg(s.col, ' = '''' OR NULL), count(')
|| ' = '''' OR NULL)])
, unnest($3)
FROM ' || _tbl
FROM (
SELECT quote_ident(attname) AS col, attnotnull
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible, legal table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
-- AND NOT attnotnull -- include columns defined NOT NULL
AND atttypid = ANY(_typ) -- only character types
ORDER BY attnum
) AS s;
-- Debug
-- RAISE NOTICE '%', _sql;
-- Execute
IF _sql IS NULL THEN
-- do nothing, nothing to return
ELSE
RETURN QUERY EXECUTE _sql
USING _tbl::text, _col_arr, _null_arr;
END IF;
END
$func$;
Call:
SELECT * FROM f_tbl_empty_status('tbl_name'); -- optionally schema-qualified
Returns:
tbl | col | empty_ct | not_null
------+------------+----------+---------
tbl1 | txt | 0 | f
tbl1 | vc | 3 | f
tbl1 | "oDD name" | 7 | f
Works for Postgres 9.1 or later.
Output table names are automatically schema-qualified if needed, according to the current search_path
.
Output table names and column names are automatically escaped if needed.
empty_ct
is the count of rows where the value of the column is the empty string
not_null
reports whether the column is defined NOT NULL
(so you cannot convert possible empty strings to NULL!)
Input table name can optionally be schema-qualified, else defaults to the current search_path
.
Your role needs privileges to actually read from the given table.
The function is highly optimized and only runs a single scan on the given table to check on all relevant columns.
Should be safe against SQL injection.
Using parallel unnest()
to simplify the complex code somewhat:
You will be interested in this related answer on SO to actually replace empty strings – with more explanation:
Match strings of only space characters, too
As you commented, trim(s.col, ' ') = ''
does the job just fine. But here’s a shortcut:
s.col::char = ''
How?
char
is an alias for character(1)
, the rarely useful blank-padded type. Values are padded with space characters to the right up to the length specifier (which is 1
in this case, but irrelevant). Trailing spaces are effectively insignificant for this type. So ' '
is the same as ''
or ' '
. Voilá. And yes, it is faster, too, I tested.
To find strings of only space characters, too (not other white space!), add the cast to these lines above:
|| string_agg(s.col, '::char = '''' OR NULL), count(')
|| '::char = '''' OR NULL)])
Wrapper function to report on a whole schema
CREATE OR REPLACE FUNCTION f_schema_empty_status(_sch text DEFAULT 'public')
RETURNS TABLE (tbl text, col text, empty_ct bigint, not_null bool)
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass;
BEGIN
FOR _tbl IN
SELECT c.oid
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = _sch -- 'public' by default
-- AND c.relname LIKE 'my_pattern%' -- optionally filter table names
AND c.relkind = 'r' -- regular tables only
ORDER BY relname
LOOP
-- Debug
-- RAISE NOTICE 'table: %', _tbl;
RETURN QUERY
SELECT * FROM f_tbl_empty_status(_tbl);
END LOOP;
END
$func$;
Call:
SELECT * FROM f_schema_empty_status(); -- defaults to 'public' without parameter
Returns:
tbl | col | empty_ct | not_null
------+------------+----------+---------
tbl1 | txt | 0 | f
tbl1 | vc | 3 | f
tbl1 | "oDD name" | 7 | f
tbl2 | some_text | 123 | t
...
The best way to do this is with a CHECK
constraint, as you noted, possibly via a DOMAIN
, e.g.
CREATE DOMAIN nonempty_string AS text
CONSTRAINT non_empty CHECK (length(VALUE) > 0);
then ALTER
existing columns to use the domain.
If that’s not possible, you will need to query INFORMATION_SCHEMA
to find all columns of the target type across all tables, then for each column, dynamically generate a query to check. You can use PL/PgSQL and the EXECUTE
statement for this; there are many examples of its usage this way elsewhere on Stack Overflow.
You cannot write a single query to do what you want. It just won’t work. You’ll have to use query generation from information_schema
.