Question :
I have multiple slaves reading from one master.. But few of them are keep disconnecting due to network issue.
Is there any way that i can make sure before setting up the bin log purging script on master that all the slave have read upto certain bin logs which we can purge?
Answer :
The easiest method is the following
- Go to each Slave and run
SHOW SLAVE STATUSG
- Look for
Relay_Master_Log_File
on Each Slave - Whichever Slave has the oldest
Relay_Master_Log_File
is the one you purge to on the Master
Why Relay_Master_Log_File
? First Look at SHOW SLAVE STATUSG
mysql> show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.48.20.253
Master_User: replicant
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000255
Read_Master_Log_Pos: 858190247
Relay_Log_File: relay-bin.066069
Relay_Log_Pos: 873918
Relay_Master_Log_File: mysql-bin.000254
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 858190247
Relay_Log_Space: 873772
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
1 row in set (0.00 sec)
You see Master_Log_File
and Relay_Master_Log_File
In this display they are different. So why choose Relay_Master_Log_File
?
Master_Log_File
represents the binlog with the last binlog event downloaded to the SlaveRelay_Master_Log_File
represents the binlog with the last binlog event executed on the Slave
In the event of any replication lag, Relay_Master_Log_File
is always the oldest. If they are the same, fine. You choose Relay_Master_Log_File
always.
In this case, you purge to mysql-bin.000254
on the Master. In the event the SQL thread dies, you don’t want to erase binlogs from the Master the Slave has not processed yet.
Getting back to multiple slaves, you choose the oldest Relay_Master_Log_File
of all Slaves.
To clarify, the oldest Relay_Master_Log_File
is the binary log name with the lowest number.
You should rather monitor your slaves for replication lag or if it’s actually catching-up / connected with master or not!
You may simply write a shell script to look at io_thread and sql_thread values and alert if they’re NO. There are already many scripts available for reference.
You can use Percona Monitoring Tools‘s pmp-check-mysql-replication-delay / pmp-check-mysql-replication-running. You may google for setup instructions to setup with Nagios or setup with Zabbix or this.
Btw, 7 days worth binary logs (with backups) are sufficient but again depends on policies!
If you are to avoid disconnects due to transient network issues, depending on which MySQL you’re on you may want to set combination of master-retry-count / slave-net-timeout / master-connect-retry. (but it’s better to fix network right?)
The first thing I would do is fix the network issue.
I would also set up pt-heartbeat to monitor Mysql slave delay.
Why do you need to purge the binlog on Master? If you miss the position of a binlog to purge you could affect all your slaves.
Mysql allows to set up binlog retentions.
If you are running out of space, because your binlog are too large, I would check if you can use SQL Statement binlog format or mixed. RAW does take more space.
If you have to use RAW, i would get more space. Master is the driver of all your transactions, you want to make sure it is healthy and have room to grow.
No way I think. But
1) You could just stop purging if any of the slaves are unreachable(ssh or mysql) from the server where you are running this purging script
2) If the connectivity issue is only with MySQL and not SSH, you could get the positions from master.info files of the slaves
Actually I follow this method
If you cannot afford more space, another option is to use : START SLAVE …. UNTIL
You would execute this command on ALL Slaves which will have them stop at exact same position. You would then check all slave to make sure they all stopped at same position. Then on Master you can run :
PURGE BINARY LOGS TO ‘mysql-bin.xxx’;
PURGE BINARY LOGS BEFORE ‘2015-08-27 00:00:00’; (easier to use, don’t need to find out position)