Question :
I have an application that does bulk loads into a large table (100 million rows). I am using Postgres’ COPY FROM
functionality to load data from a flat file. The target table has a primary key of id
.
To get the bulk insert to work, I created the ids for each row in the load file using:
SELECT nextval('apps_id_seq'::regclass)
FROM "apps"
ORDER BY "apps"."id" ASC
LIMIT 1
Unfortunately, I am no seeing this query take in excess of 150 seconds. It’s causing a whole lot of backups, because some of these files have tens of thousands of rows in them.
Yet when I run that at the command line, I get the return in thousandths of a millisecond. Here is an explain analyze
:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..0.64 rows=1 width=4) (actual time=0.016..0.017 rows=1 loops=1)
-> Index Only Scan using apps_pkey on apps (cost=0.57..15886651.40 rows=228128608 width=4) (actual time=0.015..0.015 rows=1 loops=1)
Heap Fetches: 0
Total runtime: 0.030 ms
What could it be that causes the delay? The delay is being reported from the NewRelic
service.
Answer :
I have studied your question hard, but can’t make sense of the procedure you describe. (You might work on the description some more.)
Why would you generate sequence numbers by hand, when you can just have Postgres generate them automatically? Per documentation:
If a list of columns is specified,
COPY
will only copy the data in the
specified columns to or from the file. If there are any columns in the
table that are not in the column list,COPY FROM
will insert the
default values for those columns.
Bold emphasis mine. The default value for a serial
column is the next id from its sequence.
Are you sure you are not doing a lot of redundant work in a very expensive fashion?