Due to requirement to support distributed environment, each of my clients has been assigned unique node id, it appends the node id with running number as primary key and stores at local database. E.g.
Client A node id = 200, first row in a table will have primary key
Client B node id = 100, first row in a table will have primary key
These records then replicate to centralized database. Since primary key at centralized database is not in sequence, will it cause any serious performance issue when data size getting bigger?
Possible sequence of inserting new data at centralized database:
200,000 100,000 100,001 200,001 300,000 100,002
This may cause a big performance in SQL Server with table is clustered along the PK. However, will this happens in PostgreSQL 9.3?
- I can’t use composite keys as it does not play well at my presentation layers.
- The 3 digits running number is just a simplified example, real
running number will be much bigger and sufficient.
I have 3 solutions for you:
Direct reference of sequence and using concat
One possible solution is to reference the seqence in insert statement directly and prepend your node-id. A similar question including answer you can find here:
Using a UUID
Another possible solution is, if you don’t need you node-id in the primary-key field, you can use the uuid-ossp extension which provides the type uuid and the functions to generate uuids: http://www.postgresql.org/docs/9.4/static/uuid-ossp.html
Use a trigger for these solutions
For both solutions: you can use a trigger to set the primary key.
-- table CREATE TABLE test( id character varying(10) NOT NULL, "name" character varying, CONSTRAINT idx_pk PRIMARY KEY (id) ); -- seqence CREATE SEQUENCE test_seq INCREMENT 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1; -- Create Function CREATE OR REPLACE FUNCTION insert_trigger() RETURNS TRIGGER AS $$ BEGIN NEW.id := '100' || nextval('test_seq')::TEXT; RETURN NEW; END; $$ LANGUAGE 'plpgsql'; -- add Trigger CREATE TRIGGER insert_table_trigger BEFORE INSERT ON pkTable FOR EACH ROW EXECUTE PROCEDURE insert_trigger();
If you now insert a value in the table
INSERT INTO test (name) VALUES ('test text');
you get your primary key ‘1001’
Using MINVALUE and MAXVALUE
Another way is you use the MINVALUE and MAXVALUE of SEQUENCE to define an numeric space:
CREATE SEQUENCE node100_seq INCREMENT 1 MINVALUE 100000000000 MAXVALUE 100999999999 START 100000000000 CACHE 1; CREATE SEQUENCE node200_seq INCREMENT 1 MINVALUE 200000000000 MAXVALUE 200999999999 START 200000000000 CACHE 1;
This is an old thread and I think the same question and problem still exists more now since people have all kinds of distributed systems. Regarding the approach shown in the question, my 2c on this:
200,000 100,000 100,001 200,001 300,000 100,002
This is exactly what I was thinking a day earlier because I have the exact same problem. But even if I dont see it as a db perf issue there is stll a logical limit and assumption that records in first source will not exceed a Million. ( I was having a larger number like 20M in mind)
So while looking around I found this very old PG forum thread and I like the suffix approach more than the prefix so the new ids look like for node1:
101 201 301
and for node 2
102 202 302
This gives you possible 100 nodes (you can always use 001,002 suffix for 1000 nodes).
- Very simple solution, less gaps with PK index
- Can be adjusted to add any number of nodes
- This requires transformation when bringing data to central nodes
- I am not much in favour of using some function to generate such PKs
But in our case since we do have other transformation rules- I think we can handle this.
You can create a sequence and make it global. Like below:
create sequence seq;
Create a view on this sequence:
CREATE VIEW seq_view AS SELECT nextval(‘seq’) as a;
Now, We will be using a foreign data wrapper to access the sequence:
CREATE EXTENSION postgres_fdw; CREATE SERVER global_seq FOREIGN DATA WRAPPER postgres_fdw OPTIONS(host ‘192.168.XX.XX’, port ‘5432’, dbname ‘mydatabase’); CREATE USER MAPPING FOR PUBLIC SERVER global_seq OPTIONS (user ‘User’ , password ‘xxxx’); CREATE FOREIGN TABLE seqtable (a bigint) SERVER global_seq OPTIONS ( table_name ‘seq_view’);
Now, you can create a function like below to access the sequence for each insert:
CREATE OR REPLACE FUNCTION public.func() RETURNS bigint LANGUAGE plpgsql SECURITY DEFINER AS $function$ declare b bigint; BEGIN b := (select * from seqtable); RETURN b; end; $function$;
Now, attach the function with table:
Create table mytab( col1 int8 not null default func(), col2 int2 );
Same steps you can follow with other table and use the common sequence with them.
For detailed and more information refer by post: https://medium.com/@shishulkargini/postgres-make-your-sequence-global-a80924a81e29