Question :
Recently I have run some performance tests on the application I work on, and it turns out that it didn’t do really well ( the problem is mainly between the back and the DB). So as investigating the problemsolution, we have found out that using read/write datasources ( read/write master 1 or multiple reads slaves) could be a good way to go. As I found in those sources: http://fedulov.website/2015/10/14/dynamic-datasource-routing-with-spring/
To sum up the solution consists of defining the datasources, and before each transaction ( @transaction ) define which datasource should we use. But with already having a huge number of defined services and transactions ( my case) it seems too much time consuming to choose at every step which datasource to use.
Is there an automated way to split (select vs /post/update ) opreations ? or a project that serves as a proxy to route the queries. ( I have seen this question that was asked 9 years ago but I think certainly there are new solutions How to setup Hibernate to read/write to different datasources?).
Also I have read about latency problems between writing and reading, ( are we talking about ms, s latency ?) does the number of read instances influence the latency? what to do to prevent such behavior before staring to code. ( an architecture to adopt maybe ? a design pattern? )
Ps: I am using spring, spring data jpa, hibernate, postgresql, hikari connection pool. Thank your for time.
Answer :
There is no automatic way to tell a data modifying query from a reading one.
Imagine the following SQL statement performing a function call:
SELECT delete_all();
There is no way of telling what the function does without analyzing its source code.
So you have to make the decision in your code that (hopefully) knows which queries modify data and which don’t.
Using streaming replication in PostgreSQL to off-load read workload to standby servers works, but there is one caveat:
By default, replication is asynchronous, so there is a delay between the transaction commit on the primary server and the time the changes become visible on the standby. Your application has to be able to deal with this (or perform the queries that cannot on the replication primary).
The only way around that would be to use synchronous replication with synchronous_commit = remote_apply
, but that will drastically increase the time a transaction commit takes, because it has to wait for the change to be replayed on a standby before it can report success.
As Laurenz mentioned, on DB level you can’t really use load balancing (third party tools are available).
One way can be to setup middle-ware to differentiate between select queries and the rest of the queries. So if for example all select queries can be send to one dns entry(for standby replicas) with three ip addresses and get dns to round robin for load balancing.
Not a straight forward solution but not very difficult to implement.
Latency really depends on three main factors. Load, Network Connection and Standby Hardware. If they are part of same data centre and identical machines than usually its just a few ms.