Question :
Background
This problem relates to ignoring duplicate inserts using PostgreSQL 9.2 or greater. The reason I ask is because of this code:
-- Ignores duplicates.
INSERT INTO
db_table (tbl_column_1, tbl_column_2)
VALUES (
SELECT
unnseted_column,
param_association
FROM
unnest( param_array_ids ) AS unnested_column
);
The code is unencumbered by checks for existing values. (In this particular situation, the user does not care about errors from inserting duplicates — the insertion should “just work”.) Adding code in this situation to explicitly test for duplicates imparts complications.
Problem
In PostgreSQL, I have found a few ways to ignore duplicate inserts.
Ignore Duplicates #1
Create a transaction that catches unique constraint violations, taking no action:
BEGIN
INSERT INTO db_table (tbl_column) VALUES (v_tbl_column);
EXCEPTION WHEN unique_violation THEN
-- Ignore duplicate inserts.
END;
Ignore Duplicates #2
Create a rule to ignore duplicates on a given table:
CREATE OR REPLACE RULE db_table_ignore_duplicate_inserts AS
ON INSERT TO db_table
WHERE (EXISTS ( SELECT 1
FROM db_table
WHERE db_table.tbl_column = NEW.tbl_column)) DO INSTEAD NOTHING;
Questions
My questions are mostly academic:
- What method is most efficient?
- What method is most maintainable, and why?
- What is the standard way to ignore insert duplication errors with PostgreSQL?
- Is there a technically more efficient way to ignore duplicate inserts; if so, what is it?
Thank you!
Answer :
As the answers on the other question (of which this one is considered a duplicate) mention, there is (since version 9.5) a native UPSERT
functionality. For older versions, keep reading 🙂
I have set up a test for checking the options. I’ll include the code below, which can be run in psql
on a linux/Unix box (simply because for the sake of clarity in the results, I piped the output of the setup commands to /dev/null
– on a Windows box one could choose a log file instead).
I tried to make different results comparable by using more than one (that is, 100) INSERT
per type, run from a loop inside a plpgsql
stored procedure. Additionally, before each run, the table is reset by truncating and reinserting the original data.
Checking a few test runs, it looks like that using the rule and explicitly adding WHERE NOT EXISTS
ot the INSERT
statement spend similar time, while cathcing an exception takes significantlly more time to complete.
The latter isn’t that surprising:
Tip: A block containing an EXCEPTION clause is significantly more
expensive to enter and exit than a block without one. Therefore, don’t
use EXCEPTION without need.
Personally, because of readability and maintainability, I prefer adding the WHERE NOT EXISTS
bit to the INSERT
s themselves. Just like with triggers (which could be tested here as well), debugging (or simply tracing INSERT
behaviour) is more complicated with rules present.
And the code I used (feel free to point misconceptions or other problems):
o /dev/null
timing off
-- set up data
DROP TABLE IF EXISTS insert_test;
CREATE TABLE insert_test_base_data (
id integer PRIMARY KEY,
col1 double precision,
col2 text
);
CREATE TABLE insert_test (
id integer PRIMARY KEY,
col1 double precision,
col2 text
);
INSERT INTO insert_test_base_data
SELECT i, (SELECT random() AS r WHERE s.i = s.i)
FROM
generate_series(2, 200, 2) s(i)
;
UPDATE insert_test_base_data
SET col2 = md5(col1::text)
;
INSERT INTO insert_test
SELECT *
FROM insert_test_base_data
;
-- function with exception block to be called later
CREATE OR REPLACE FUNCTION f_insert_test_insert(
id integer,
col1 double precision,
col2 text
)
RETURNS void AS
$body$
BEGIN
INSERT INTO insert_test
VALUES ($1, $2, $3)
;
EXCEPTION
WHEN unique_violation
THEN NULL;
END;
$body$
LANGUAGE plpgsql;
-- function running plain SQL ... WHERE NOT EXISTS ...
CREATE OR REPLACE FUNCTION insert_test_where_not_exists()
RETURNS void AS
$body$
BEGIN
FOR i IN 1 .. 100
LOOP
INSERT INTO insert_test
SELECT i, rnd, md5(rnd::text)
FROM (SELECT random() AS rnd) r
WHERE NOT EXISTS (
SELECT 1
FROM insert_test
WHERE id = i
)
;
END LOOP;
END;
$body$
LANGUAGE plpgsql;
-- call a function with exception block
CREATE OR REPLACE FUNCTION insert_test_function_with_exception_block()
RETURNS void AS
$body$
BEGIN
FOR i IN 1 .. 100
LOOP
PERFORM f_insert_test_insert(i, rnd, md5(rnd::text))
FROM (SELECT random() AS rnd) r
;
END LOOP;
END;
$body$
LANGUAGE plpgsql;
-- leave checking existence to a rule
CREATE OR REPLACE FUNCTION insert_test_rule()
RETURNS void AS
$body$
BEGIN
FOR i IN 1 .. 100
LOOP
INSERT INTO insert_test
SELECT i, rnd, md5(rnd::text)
FROM (SELECT random() AS rnd) r
;
END LOOP;
END;
$body$
LANGUAGE plpgsql;
o
timing on
echo
echo 'check before INSERT'
SELECT insert_test_where_not_exists();
echo
o /dev/null
timing off
TRUNCATE insert_test;
INSERT INTO insert_test
SELECT *
FROM insert_test_base_data
;
timing on
o
echo 'catch unique-violation'
SELECT insert_test_function_with_exception_block();
echo
echo 'implementing a RULE'
o /dev/null
timing off
TRUNCATE insert_test;
INSERT INTO insert_test
SELECT *
FROM insert_test_base_data
;
CREATE OR REPLACE RULE db_table_ignore_duplicate_inserts AS
ON INSERT TO insert_test
WHERE EXISTS (
SELECT 1
FROM insert_test
WHERE id = NEW.id
)
DO INSTEAD NOTHING;
o
timing on
SELECT insert_test_rule();