Guid vs. integer ID for unique fields

Posted on

Question :

This is really a generic question.

Suppose I have a replication pattern of a single write DB and multiple read DB replication slaves – I do not have to have GUID’s as my unique IDs fields. Is that a correct assumption?

If I have multiple DB instances where all are writeable and sync between themselves, GUID’s in unique ID fields is mandatory. Is that also correct?

I am setting up PostgreSQL on Amazon RDS.

Thanks.

Answer :

Suppose I have a replication pattern of a single write DB and multiple
read DB replication slaves – I do not have to have GUID’s as my unique
IDs fields. Is that a correct assumption?

Correct. As Basil Bourque wrote it doesn’t matter what you kind of IDs you use if you have a single master source of data.

If I have multiple DB instances where all are writeable and sync
between themselves, GUID’s in unique ID fields is mandatory. Is that
also correct?

No, it is not mandatory to use GUID in this case, but it may be convenient.

I assume a quite common scenario: You have N writable databases DB1 – DBn. They can “talk” to each other to exchange data, but connection between them is not 24/7 or not fast enough to be done “in real-time” as the data changes. So, each database accumulates its changes locally and then periodically the changes are transferred to other databases. Since, once in a while all databases have exactly same data, there must be a method to assign some identifier to each row that would be unique not just within one database, but within all databases.

There are many ways to achieve it, GUID is just one.
Another common method is to have ID that consists of two parts: ID of the database plus standard sequence of growing unique numbers.

Another method that I used for two-way sync of two databases: ID effectively consists of two numbers. First number is a standard unique sequence of the local database, second number is a sequence from remote database.

By definition, “replication” means an exact copy. So the data type of your ID (primary key) columns is irrelevant. Whatever data is written into the master database gets written into the replicants be it integers, UUIDs, or text. Think “Xerox copies of each invoice as written”, not “writing new fresh separate invoices”.

If you are exchanging data between systems, or federating multiple database systems, then your primary key becomes a critical issue.

Sequential numbers are tricky to coordinate across multiple sources of data. One option for coordinating is to assign a range of numbers to be used by each source. The Montréal office uses numbers under one billion, and the Paris office uses numbers starting at one billion. Such schemes have proved clumsy or failures in my experience.

UUIDs were invented for such purposes because they require no coordination between sources of data. The original Version 1 UUIDs eliminate any practical chance of collisions. Even the entirely-random UUIDs will have no collisions for any limited number of records if generated with a quality random generator.

Leave a Reply

Your email address will not be published. Required fields are marked *