What’s the fastest way to add a BIGSERIAL column to a huge table (~3 Bil. rows, ~ 174Gb)?
- I want the column to be incremented values for existing rows (
- I didn’t set a fillfactor (which looks like a bad decision in retrospect).
- I don’t have a problem with disk space, just want it to be as fast as possible.
What’s wrong with:
ALTER TABLE foo ADD column bar bigserial;
Will be filled with unique values automatically (starting with 1).
If you want a number for every existing row, every row in the table has to be updated. Or do you not?
The table will be bloated to twice it’s size if it cannot reuse dead tuples or free space on the data pages. Performance of the operation might benefit a lot from a
FILLFACTOR lower than 100 or just random dead tuples spread out over the table. Else you may want to run
VACUUM FULL ANALYZE afterwards to recover disk space. This won’t be quick, though.
You may be interested in this extension. It helps you gather statistics on your tables. To find out about dead tuples and free space:
Install extension once per databae:
CREATE EXTENSION pgstattuple;
SELECT * FROM pgstattuple('tbl');
If you can afford to create a new table, which would break depending views, foreign keys, …
Do it all in a single transaction, to skip writing WAL for the data inserted into a table that has been created in the same transaction. This saves a lot for populating a big table. But it requires
wal_level set to
minimal, which was the default up to Postgres 9.6. But the default changed to
replica with Postgres 10 to allow WAL archiving and replication by default. If you don’t have nor need either, you can set that back to
minimal for great savings. You need:
max_wal_senders = 0 wal_level = minimal
-- start transaction BEGIN; -- create empty copy of the old table - possibly add INCLUDING ... ① CREATE TABLE new_tbl (LIKE old_tbl); -- add the bigserial column ... ALTER new_tbl ADD column bar bigserial; -- ... **or** an IDENTITY column in Postgres 10 or later ② ALTER new_tbl ADD column bar bigint GENERATED ALWAYS AS IDENTITY; -- insert data from old table, automatically filling the new serial /IDENTITY INSERT INTO new_tbl SELECT * -- new column filled with default FROM old_tbl ORDER BY something; -- or don't order if you don't care: faster COMMIT;
CREATE TABLE ... (LIKE ...) with
(You may want to hold off costly features until after the table has been filled.)
The new serial column is missing in the
SELECT to the
INSERT and will be filled with its default value automatically. You can spell out all columns and add
nextval() to the
SELECT list to the same effect.
Make sure you got all your data in the new table.
Add indexes, constraints, triggers etc. you had in the old table now.
DROP TABLE old_tbl; ALTER TABLE new_tbl RENAME TO old_tbl;
Might be quite a bit faster overall. This leaves you with a vanilla table (and indexes) without any bloat.
You need free disk space – around the size of the old table, depending on the state of the table – as wiggle room. But you may need just as much with the first simple method because of table bloat. Again, details depend on the state of your table.