Optimal way to ignore duplicate inserts? [duplicate]

Posted on

Question :

Background

This problem relates to ignoring duplicate inserts using PostgreSQL 9.2 or greater. The reason I ask is because of this code:

  -- Ignores duplicates.
  INSERT INTO
    db_table (tbl_column_1, tbl_column_2)
  VALUES (
    SELECT
      unnseted_column,
      param_association
    FROM
      unnest( param_array_ids ) AS unnested_column
  );

The code is unencumbered by checks for existing values. (In this particular situation, the user does not care about errors from inserting duplicates — the insertion should “just work”.) Adding code in this situation to explicitly test for duplicates imparts complications.

Problem

In PostgreSQL, I have found a few ways to ignore duplicate inserts.

Ignore Duplicates #1

Create a transaction that catches unique constraint violations, taking no action:

  BEGIN
    INSERT INTO db_table (tbl_column) VALUES (v_tbl_column);
  EXCEPTION WHEN unique_violation THEN
    -- Ignore duplicate inserts.
  END;

Ignore Duplicates #2

Create a rule to ignore duplicates on a given table:

CREATE OR REPLACE RULE db_table_ignore_duplicate_inserts AS
    ON INSERT TO db_table
   WHERE (EXISTS ( SELECT 1
           FROM db_table
          WHERE db_table.tbl_column = NEW.tbl_column)) DO INSTEAD NOTHING;

Questions

My questions are mostly academic:

  • What method is most efficient?
  • What method is most maintainable, and why?
  • What is the standard way to ignore insert duplication errors with PostgreSQL?
  • Is there a technically more efficient way to ignore duplicate inserts; if so, what is it?

Thank you!

Answer :

As the answers on the other question (of which this one is considered a duplicate) mention, there is (since version 9.5) a native UPSERT functionality. For older versions, keep reading 🙂

I have set up a test for checking the options. I’ll include the code below, which can be run in psql on a linux/Unix box (simply because for the sake of clarity in the results, I piped the output of the setup commands to /dev/null – on a Windows box one could choose a log file instead).

I tried to make different results comparable by using more than one (that is, 100) INSERT per type, run from a loop inside a plpgsql stored procedure. Additionally, before each run, the table is reset by truncating and reinserting the original data.

Checking a few test runs, it looks like that using the rule and explicitly adding WHERE NOT EXISTS ot the INSERT statement spend similar time, while cathcing an exception takes significantlly more time to complete.

The latter isn’t that surprising:

Tip: A block containing an EXCEPTION clause is significantly more
expensive to enter and exit than a block without one. Therefore, don’t
use EXCEPTION without need.

Personally, because of readability and maintainability, I prefer adding the WHERE NOT EXISTS bit to the INSERTs themselves. Just like with triggers (which could be tested here as well), debugging (or simply tracing INSERT behaviour) is more complicated with rules present.

And the code I used (feel free to point misconceptions or other problems):

o /dev/null
timing off

-- set up data
DROP TABLE IF EXISTS insert_test;

CREATE TABLE insert_test_base_data (
    id integer PRIMARY KEY,
    col1 double precision,
    col2 text
);

CREATE TABLE insert_test (
    id integer PRIMARY KEY,
    col1 double precision,
    col2 text
);

INSERT INTO insert_test_base_data
SELECT i, (SELECT random() AS r WHERE s.i = s.i)
FROM 
    generate_series(2, 200, 2) s(i)
;

UPDATE insert_test_base_data
SET col2 = md5(col1::text)
;

INSERT INTO insert_test
SELECT *
FROM insert_test_base_data
;



-- function with exception block to be called later
CREATE OR REPLACE FUNCTION f_insert_test_insert(
    id integer,
    col1 double precision,
    col2 text
)
RETURNS void AS
$body$
BEGIN
    INSERT INTO insert_test
    VALUES ($1, $2, $3)
    ;
EXCEPTION
    WHEN unique_violation
    THEN NULL;
END;
$body$
LANGUAGE plpgsql;



-- function running plain SQL ... WHERE NOT EXISTS ...
CREATE OR REPLACE FUNCTION insert_test_where_not_exists()
RETURNS void AS
$body$
BEGIN
    FOR i IN 1 .. 100
    LOOP
        INSERT INTO insert_test
        SELECT i, rnd, md5(rnd::text)
        FROM (SELECT random() AS rnd) r
        WHERE NOT EXISTS (
            SELECT 1
            FROM insert_test
            WHERE id = i
        )
        ;
    END LOOP;
END;
$body$
LANGUAGE plpgsql;



-- call a function with exception block
CREATE OR REPLACE FUNCTION insert_test_function_with_exception_block()
RETURNS void AS
$body$
BEGIN
    FOR i IN 1 .. 100
    LOOP
        PERFORM f_insert_test_insert(i, rnd, md5(rnd::text))
        FROM (SELECT random() AS rnd) r
        ;
    END LOOP;
END;
$body$
LANGUAGE plpgsql;



-- leave checking existence to a rule
CREATE OR REPLACE FUNCTION insert_test_rule()
RETURNS void AS
$body$
BEGIN
    FOR i IN 1 .. 100
    LOOP
        INSERT INTO insert_test
        SELECT i, rnd, md5(rnd::text)
        FROM (SELECT random() AS rnd) r
        ;
    END LOOP;
END;
$body$
LANGUAGE plpgsql;



o
timing on


echo 
echo 'check before INSERT'

SELECT insert_test_where_not_exists();

echo 



o /dev/null

timing off

TRUNCATE insert_test;

INSERT INTO insert_test
SELECT *
FROM insert_test_base_data
;

timing on

o

echo 'catch unique-violation'

SELECT insert_test_function_with_exception_block();

echo 
echo 'implementing a RULE'

o /dev/null
timing off

TRUNCATE insert_test;

INSERT INTO insert_test
SELECT *
FROM insert_test_base_data
;

CREATE OR REPLACE RULE db_table_ignore_duplicate_inserts AS
    ON INSERT TO insert_test
    WHERE EXISTS ( 
        SELECT 1
        FROM insert_test
        WHERE id = NEW.id
    ) 
    DO INSTEAD NOTHING;

o 
timing on

SELECT insert_test_rule();

Leave a Reply

Your email address will not be published. Required fields are marked *