I have a mysql database that contains some tables with private information, and some tables with public information.
I would like to replicate only the tables containing public information from one database to another, making sure that NO confidential information ever gets stored on the slave.
I know I can use the
replicate-do-table to specify that only some tables are replicated, but my understanding is that the entire bin log is transferred to the slave.
Is there a way to ensure that only the public information is transferred to the slave?
I am very hesitant to add another copy of the database to an existing server – I just don’t think the existing server has the available capacity, in either RAM or CPU.
The only way to filter on one DB server is to run multiple instances of MySQL on one DB Server.
Port 3306 would be your normal database instance for your application
Port 3307 would be a slave of Port 3306
There are a few things you need to do with the MySQL instance running on DB1
- Enable the following in my.cnf for
As an option, convert all tables in DB1 3307 to the BLACKHOLE storage engine.
That way, DB1 3307 only has binlogs with information. No actual data.
Setup MySQL instance and make it Slave from the 3307 instance of DB1. Why is that good ?
Because the binlogs in the DB1 3307 instance should only contain the public info. Thus, all slaves from DB1 3307 will only see public info.
Please see my other posts on how to use BLACKHOLE tables in replication
Apr 18, 2013: Single slave – multiple master MySQL replication
Feb 03, 2012: One slave, multiple masters MySql
Jun 01, 2011: What can we do in MySQL 5.0 Replication to address bandwidth concerns?
May 16, 2011: Is Multi Master Single Slave possible in mySQL DB?
Mar 11, 2011: MySQL in star topology
As you stated you cannot prevent events getting written to the binlogs with replicate-do-tables, however that will determine what gets written to the slaves binlogs if you have log-slave-updates on.
Consider setting up an intermediate slave on the same trusted machine as your private master that runs for the sole purpose of binlogs filtering, then have your “public” database replicate from that.
If you worried about the ‘private’ data being written to the binlog even that environment keep in mind the file system permissions restrict read access to just the mysql system user. If you’re worried about that account getting compromised so they can read your binlogs, keep in mind at that point you have larger problems and they can just grab the entire datadirectory.
If your architecture permits you can try a reverse approach using the
replicate-ignore-table only for your private tables
replicate-ignore-table=<database>.pvt_table_name replicate-ignore-table=<database>.other_pvt_table_name ...
combining a behaviour like this:
SET sql_log_bin = 0; INSERT INTO pvt_table_name () VALUES(); -- or UPDATES SET sql_log_bin = 1;
A more managed way is to create your own procedure so:
DELIMITER | CREATE PROCEDURE write_my_private_data (`id` INT, `private_data` VARCHAR(255)) BEGIN SET SESSION SQL_LOG_BIN = 0; INSERT INTO `pvt_table_name` (`id`, `private_data`) VALUES (id, private_data); SET SESSION SQL_LOG_BIN = 1; END | DELIMITER ;
And replace into your software your:
INSERT INTO `pvt_table_name` (`id`, `private_data`) VALUES (id, private_data);
CALL write_my_private_data (1,'privateData');
See the sql_log_bin to turn off logging for the current session
This condition is true only if you are sure that there is a single point of insert/update (your software with the dynamic declaration of sql_log_bin), on the contrary this condition fails if there may also be interventions by third parties, such as manual insert direct on the table.
You may want to look at MySQL::Replication – Decentralised, peer-to-peer, multi-master MySQL replication.
[ While it is Perl, it is a standalone application. Not sure if it is still being maintained. ]
Simply put, it allows selective replication such as:
- replicate only some tables
- only some columns or rows
- Star-schema node replication
It does quite a bit out of the box; being a separate application, you can incorporate any type of filtering you like without needing to restart your database instances. And being Perl, you can easily hack at it.
For something specialized like this, a common approach is to copy the rows that you want to move into a temp table/database (or equiv buffer) and then have replication take it from there. The industry term for this is ETL (Extract, Transform, Load). The intermediate/temp/buffer gives you an opportunity to manipulate (Transform step) the data before it ships-out (Load step). It can be a little tedious but guarantees a consistent result.