Question :
My intuition is NULL
will take up less space than {}
when used as the default value of an ARRAY[]
column in PostgreSQL.
Am I correct?
Answer :
AMG’s answer worked great but it has some bugs/spelling/case errors. I tried editing it but the edits were rejected.
Here’s his recommendation cleaned up (copy/paste-able directly into SQL Editor), and the results.
1 – Create 2 tables:
CREATE TABLE TABLE_WITH_NULL (col1 int not null, col2 int[]);
CREATE TABLE TABLE_WITH_DEFAULT (col1 int not null, col2 int[] default '{}'::int[]);
2 – Make 1 million inserts for each table with:
INSERT INTO TABLE_WITH_NULL(col1) SELECT generate_series(1,1000000);
INSERT INTO TABLE_WITH_DEFAULT(col1) SELECT generate_series(1,1000000);
3 – Check table disk usage using this query :
(found in https://wiki.postgresql.org/wiki/Disk_Usage )
SELECT *, pg_size_pretty(total_bytes) AS total
, pg_size_pretty(index_bytes) AS INDEX
, pg_size_pretty(toast_bytes) AS toast
, pg_size_pretty(table_bytes) AS TABLE
FROM (
SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
, c.reltuples AS row_estimate
, pg_total_relation_size(c.oid) AS total_bytes
, pg_indexes_size(c.oid) AS index_bytes
, pg_total_relation_size(reltoastrelid) AS toast_bytes
FROM pg_class c
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE relkind = 'r' and relname in ('table_with_null','table_with_default')
) a
) a;
4 – With answers in hand drop the tables:
DROP TABLE TABLE_WITH_NULL;
DROP TABLE TABLE_WITH_DEFAULT;
Here are my results. (PostgreSQL 10.5)……
So the direct answer to the question is empty arrays {}
consume more space than NULL
.
Further reading: How do completely empty columns in a large table affect performance?
Why don’t you check by yourself?
1 – Create 2 tables:
CREATE TABLE TABLE1 (column1 int not null, column2 int[] default {});
CREATE TABLE TABLE2 (column1 int not null, column2 int[]);
2 – make 1 million inserts for each table with:
insert into TABLE1(colulmn1) SELECT generate_series(1,1000000);
insert into TABLE2(colulmn1) SELECT generate_series(1,1000000);
3 – check table disk usage using this query :
(found in https://wiki.postgresql.org/wiki/Disk_Usage )
SELECT *, pg_size_pretty(total_bytes) AS total
, pg_size_pretty(index_bytes) AS INDEX
, pg_size_pretty(toast_bytes) AS toast
, pg_size_pretty(table_bytes) AS TABLE
FROM (
SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
, c.reltuples AS row_estimate
, pg_total_relation_size(c.oid) AS total_bytes
, pg_indexes_size(c.oid) AS index_bytes
, pg_total_relation_size(reltoastrelid) AS toast_bytes
FROM pg_class c
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE relkind = 'r' and relname in ('TABLE1','TABLE2')
) a
) a;
4 – whenever you are convinced just drop the tables:
DROP TABLE TABLE1;
DROP TABLE TABLE2;