Best solution for PostgreSQL partial table replication

Posted on

Question :

We have the following architecture of our system (PostgreSQL 9.6 is used on all servers):
1. there is a bunch of little servers (“sensors”) storing data in some table
2. there is a big server (“management console”) that collects data updates from all those servers and stores them in the table with the same name and structure
3. sensors can create new rows and update existing ones, the changes must be replicated to the console
4. console can update or delete rows that are relevant for particular sensors, these changes must be replicated to those servers
5. sensors must store only those rows that belong to them (there is a field “source” in the table storing the sensor’s id)

My questions are:
1. This is a case of multimaster replication, right?
2. What replication technology would be most suitable for this use case, considering that we don’t need autofailovers and sensors must store only a part of data, not the whole table that the console has?

Answer :

The set up that springs to my mind here is a combination logical replication from the sensor databases to the central database, and writable foreign data wrapper from the central database to the sensor databases. You would want to upgrade from 9.6 to at least 10 (but why not 11?) to do that.

  1. console can update or delete rows that are relevant for particular sensors, these changes must be replicated to those servers

You would use foreign data wrapper to implement this part. The central database would use the FDW to update or delete on the sensor database, then wait for those changes to replicate back to the central database before requerying them. You might be able to use triggers or rules to make this more or less transparent, so that you can write it as an update or delete on the central table and it will route automatically. But I think that that is more likely to cause confusion than just writing the routing logic into the app, so that the part of the app that does the updates or deletes is explicitly aware that it is targeting them to the remote servers. Especially since, if you did use triggers, it is not clear to me how you would implement the “wait for those changes to replicate back” part. How important it is to wait for any given update to propagate is probably a business-logic question that the database cannot answer for you.

Our company works with a technology called symmetric-ds to replicate our tables, from server to client, and from client to server. Take a look at the site.

OK, folks, thanks for your answers, but I’ve finally found a solution and tested it.

So, you will need to use pglogical (didn’t test other replication tools, they might work as well) and create subscriptions on both sides – one subscription on a sensor and one subscription for each sensor on the console. This will solve the problem of bidirectional data updates.

In order to prevent sensors from getting data related to other sensor you will need to create triggers on them that would block such updates and inserts. Make sure that you enabled the trigger via “ENABLE REPLICA” otherwise the console will still be able to push data related to other sensors to a sensor.

Leave a Reply

Your email address will not be published. Required fields are marked *