Non-sequential Primary Key performance in PostgreSQL 9.3

Posted on

Question :

Due to requirement to support distributed environment, each of my clients has been assigned unique node id, it appends the node id with running number as primary key and stores at local database. E.g.

Client A node id = 200, first row in a table will have primary key
200,000

Client B node id = 100, first row in a table will have primary key
100,000

These records then replicate to centralized database. Since primary key at centralized database is not in sequence, will it cause any serious performance issue when data size getting bigger?

Possible sequence of inserting new data at centralized database:

200,000
100,000
100,001
200,001
300,000
100,002

This may cause a big performance in SQL Server with table is clustered along the PK. However, will this happens in PostgreSQL 9.3?

Notes:

  1. I can’t use composite keys as it does not play well at my presentation layers.
  2. The 3 digits running number is just a simplified example, real
    running number will be much bigger and sufficient.

Answer :

I have 3 solutions for you:

Direct reference of sequence and using concat

One possible solution is to reference the seqence in insert statement directly and prepend your node-id. A similar question including answer you can find here:
https://stackoverflow.com/a/17925601/4206293

Using a UUID

Another possible solution is, if you don’t need you node-id in the primary-key field, you can use the uuid-ossp extension which provides the type uuid and the functions to generate uuids: http://www.postgresql.org/docs/9.4/static/uuid-ossp.html

Use a trigger for these solutions

For both solutions: you can use a trigger to set the primary key.

Example:

-- table
CREATE TABLE test(
   id character varying(10) NOT NULL,
  "name" character varying,
   CONSTRAINT idx_pk PRIMARY KEY (id)
);

-- seqence
CREATE SEQUENCE test_seq
  INCREMENT 1
  MINVALUE 1
  MAXVALUE 9223372036854775807
  START 1
  CACHE 1;

-- Create Function
CREATE OR REPLACE FUNCTION insert_trigger() 
RETURNS TRIGGER    
AS $$
BEGIN
   NEW.id := '100' || nextval('test_seq')::TEXT;
   RETURN NEW;
END;
$$ LANGUAGE 'plpgsql';

-- add Trigger
CREATE TRIGGER insert_table_trigger
BEFORE INSERT ON pkTable
FOR EACH ROW EXECUTE PROCEDURE insert_trigger();

If you now insert a value in the table

INSERT INTO test (name) VALUES ('test text');

you get your primary key ‘1001’

Using MINVALUE and MAXVALUE

Another way is you use the MINVALUE and MAXVALUE of SEQUENCE to define an numeric space:

CREATE SEQUENCE node100_seq
  INCREMENT 1
  MINVALUE 100000000000
  MAXVALUE 100999999999
  START 100000000000
  CACHE 1;

CREATE SEQUENCE node200_seq
  INCREMENT 1
  MINVALUE 200000000000
  MAXVALUE 200999999999
  START 200000000000
  CACHE 1;

Resurrection!!
This is an old thread and I think the same question and problem still exists more now since people have all kinds of distributed systems. Regarding the approach shown in the question, my 2c on this:

200,000
100,000
100,001
200,001
300,000
100,002

This is exactly what I was thinking a day earlier because I have the exact same problem. But even if I dont see it as a db perf issue there is stll a logical limit and assumption that records in first source will not exceed a Million. ( I was having a larger number like 20M in mind)

So while looking around I found this very old PG forum thread and I like the suffix approach more than the prefix so the new ids look like for node1:

101
201
301 

and for node 2

102
202
302 

This gives you possible 100 nodes (you can always use 001,002 suffix for 1000 nodes).

  • Pros:
    • Very simple solution, less gaps with PK index
    • Can be adjusted to add any number of nodes
  • Cons:
    • This requires transformation when bringing data to central nodes
    • I am not much in favour of using some function to generate such PKs

But in our case since we do have other transformation rules- I think we can handle this.

You can create a sequence and make it global. Like below:

  create sequence seq;

Create a view on this sequence:

  CREATE VIEW seq_view AS SELECT nextval(‘seq’) as a;

Now, We will be using a foreign data wrapper to access the sequence:

  CREATE EXTENSION postgres_fdw;
  CREATE SERVER global_seq FOREIGN DATA WRAPPER postgres_fdw OPTIONS(host 
 ‘192.168.XX.XX’, port ‘5432’, dbname ‘mydatabase’);
  CREATE USER MAPPING FOR PUBLIC SERVER global_seq OPTIONS (userUser’ , 
  password ‘xxxx’);
  CREATE FOREIGN TABLE seqtable (a bigint) SERVER global_seq OPTIONS ( 
  table_name ‘seq_view’);

Now, you can create a function like below to access the sequence for each insert:

   CREATE OR REPLACE FUNCTION public.func()
   RETURNS bigint
   LANGUAGE plpgsql
   SECURITY DEFINER
   AS $function$
   declare b bigint;
   BEGIN
   b := (select * from seqtable);
   RETURN b;
   end;
   $function$;

Now, attach the function with table:

   Create table mytab(
   col1 int8 not null default func(),
   col2 int2
   );

Same steps you can follow with other table and use the common sequence with them.

For detailed and more information refer by post: https://medium.com/@shishulkargini/postgres-make-your-sequence-global-a80924a81e29

Leave a Reply

Your email address will not be published. Required fields are marked *