Is it possible today to replicate 30 TB from multiple databases to PostgreSQL close to real-time? [closed]

Posted on

Question :

If you had 20 databases spread across 20 servers, could you replicate them all into a single PostgreSQL instance? And can this be done on a server costing less than $100,000?

The databases total 30 TB on disk. There are ~30,000 tables. The maximum number of rows in a single table is 1.2 billion. The number of rows changed per second is modest (100s or less).

Some of the servers run PostgreSQL, some MySQL. The versions are different, but they’re relatively current (PostgreSQL >= 9.1, MySQL >= 5.6).

Answer :

How fast is the network connection? Do some math — 30TB through a single pipe will take how many days? That’s just to get the data started. Then redo the calculation based on how fast the data grows.

There is some amount of overhead; I guestimate that the initial 30TB would take a few weeks. You have not provided numbers for how much traffic per hour in steady state. You can estimate that by watching the size of the binlogs on a representative server, then multiply by 20.

Once you have the 30TB on a single server, have you figured out what you can do with it? Again, some math would compute how many days or weeks it would take to do something with all that data.

Oh, I forgot about the overhead that you might incur to get the index(es) built. Are you using SSDs?

Leave a Reply

Your email address will not be published. Required fields are marked *