Question :
What’s the fastest way to add a BIGSERIAL column to a huge table (~3 Bil. rows, ~ 174Gb)?
EDIT:
- I want the column to be incremented values for existing rows (
NOT NULL
). - I didn’t set a fillfactor (which looks like a bad decision in retrospect).
- I don’t have a problem with disk space, just want it to be as fast as possible.
Answer :
What’s wrong with:
ALTER TABLE foo ADD column bar bigserial;
Will be filled with unique values automatically (starting with 1).
If you want a number for every existing row, every row in the table has to be updated. Or do you not?
The table will be bloated to twice it’s size if it cannot reuse dead tuples or free space on the data pages. Performance of the operation might benefit a lot from a FILLFACTOR
lower than 100 or just random dead tuples spread out over the table. Else you may want to run VACUUM FULL ANALYZE
afterwards to recover disk space. This won’t be quick, though.
pgstattuple
You may be interested in this extension. It helps you gather statistics on your tables. To find out about dead tuples and free space:
Install extension once per databae:
CREATE EXTENSION pgstattuple;
Call:
SELECT * FROM pgstattuple('tbl');
Alternative
If you can afford to create a new table, which would break depending views, foreign keys, …
Do it all in a single transaction, to skip writing WAL for the data inserted into a table that has been created in the same transaction. This saves a lot for populating a big table. But it requires wal_level
set to minimal
, which was the default up to Postgres 9.6. But the default changed to replica
with Postgres 10 to allow WAL archiving and replication by default. If you don’t have nor need either, you can set that back to minimal
for great savings. You need:
max_wal_senders = 0
wal_level = minimal
.. and then restart the server to take effect.
Read the manual here and here. And this blog post for understanding.
-- start transaction
BEGIN;
-- create empty copy of the old table - possibly add INCLUDING ... ①
CREATE TABLE new_tbl (LIKE old_tbl);
-- add the bigserial column ...
ALTER new_tbl ADD column bar bigserial;
-- ... **or** an IDENTITY column in Postgres 10 or later ②
ALTER new_tbl ADD column bar bigint GENERATED ALWAYS AS IDENTITY;
-- insert data from old table, automatically filling the new serial /IDENTITY
INSERT INTO new_tbl
SELECT * -- new column filled with default
FROM old_tbl
ORDER BY something; -- or don't order if you don't care: faster
COMMIT;
① About CREATE TABLE ... (LIKE ...)
with INCLUDING
clause(s):
(You may want to hold off costly features until after the table has been filled.)
② About IDENTITY
columns:
The new serial column is missing in the SELECT
to the INSERT
and will be filled with its default value automatically. You can spell out all columns and add nextval()
to the SELECT
list to the same effect.
Make sure you got all your data in the new table.
Add indexes, constraints, triggers etc. you had in the old table now.
DROP TABLE old_tbl;
ALTER TABLE new_tbl RENAME TO old_tbl;
Might be quite a bit faster overall. This leaves you with a vanilla table (and indexes) without any bloat.
You need free disk space – around the size of the old table, depending on the state of the table – as wiggle room. But you may need just as much with the first simple method because of table bloat. Again, details depend on the state of your table.