Is there a disk space impact when choosing NULL vs empty array {} for the default of an ARRAY[] column? [duplicate]

Posted on

Question :

My intuition is NULL will take up less space than {} when used as the default value of an ARRAY[] column in PostgreSQL.

Am I correct?

Answer :

AMG’s answer worked great but it has some bugs/spelling/case errors. I tried editing it but the edits were rejected.

Here’s his recommendation cleaned up (copy/paste-able directly into SQL Editor), and the results.

1 – Create 2 tables:

CREATE TABLE TABLE_WITH_NULL (col1 int not null, col2 int[]);
CREATE TABLE TABLE_WITH_DEFAULT (col1 int not null, col2 int[] default '{}'::int[]);

2 – Make 1 million inserts for each table with:

INSERT INTO TABLE_WITH_NULL(col1) SELECT generate_series(1,1000000);
INSERT INTO TABLE_WITH_DEFAULT(col1) SELECT generate_series(1,1000000);

3 – Check table disk usage using this query :
(found in https://wiki.postgresql.org/wiki/Disk_Usage )

SELECT *, pg_size_pretty(total_bytes) AS total
    , pg_size_pretty(index_bytes) AS INDEX
    , pg_size_pretty(toast_bytes) AS toast
    , pg_size_pretty(table_bytes) AS TABLE
  FROM (
  SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
      SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
              , c.reltuples AS row_estimate
              , pg_total_relation_size(c.oid) AS total_bytes
              , pg_indexes_size(c.oid) AS index_bytes
              , pg_total_relation_size(reltoastrelid) AS toast_bytes
          FROM pg_class c
          LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
          WHERE relkind = 'r' and relname in ('table_with_null','table_with_default')
  ) a
) a;

4 – With answers in hand drop the tables:

DROP TABLE TABLE_WITH_NULL;
DROP TABLE TABLE_WITH_DEFAULT;

Here are my results. (PostgreSQL 10.5)……

results

So the direct answer to the question is empty arrays {} consume more space than NULL.

Further reading: How do completely empty columns in a large table affect performance?

Why don’t you check by yourself?

1 – Create 2 tables:

CREATE TABLE TABLE1 (column1 int not null, column2 int[] default {});
CREATE TABLE TABLE2 (column1 int not null, column2 int[]);

2 – make 1 million inserts for each table with:

insert into TABLE1(colulmn1) SELECT generate_series(1,1000000);
insert into TABLE2(colulmn1) SELECT generate_series(1,1000000);

3 – check table disk usage using this query :
(found in https://wiki.postgresql.org/wiki/Disk_Usage )

SELECT *, pg_size_pretty(total_bytes) AS total
    , pg_size_pretty(index_bytes) AS INDEX
    , pg_size_pretty(toast_bytes) AS toast
    , pg_size_pretty(table_bytes) AS TABLE
  FROM (
  SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
      SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
              , c.reltuples AS row_estimate
              , pg_total_relation_size(c.oid) AS total_bytes
              , pg_indexes_size(c.oid) AS index_bytes
              , pg_total_relation_size(reltoastrelid) AS toast_bytes
          FROM pg_class c
          LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
          WHERE relkind = 'r' and relname in ('TABLE1','TABLE2')
  ) a
) a;

4 – whenever you are convinced just drop the tables:

DROP TABLE TABLE1;
DROP TABLE TABLE2;

Leave a Reply

Your email address will not be published. Required fields are marked *