Question :
we are using percona galera cluster with 3 nodes.
In my cluster one node got crashed due to low disk space ,we have cleared some log files and provided sufficient disk space.Now we are trying to start our cluster in that node.mysql is not up .when i go to error log i found below errors. i was tried with innodb_force_recovery=6 also but no luck. can anybody help me how to fix below issue.
error log
130919 12:55:15 [Note] WSREP: Closed send monitor.
130919 12:55:15 [Note] WSREP: view((empty))
130919 12:55:15 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
130919 12:55:15 [Note] WSREP: gcomm: closed
130919 12:55:15 [Warning] WSREP: gcomm: backend already closed
130919 12:55:15 [ERROR] WSREP: gcs/src/gcs_fifo_lite.c:gcs_fifo_lite_close():70: Trying to close a closed FIFO
130919 12:55:15 [Note] WSREP: Flow-control interval: [16, 16]
130919 12:55:15 [Note] WSREP: Received NON-PRIMARY.
130919 12:55:15 [Note] WSREP: Shifting JOINED -> OPEN (TO: 66674748)
130919 12:55:15 [Note] WSREP: Received self-leave message.
130919 12:55:15 [Note] WSREP: Flow-control interval: [0, 0]
130919 12:55:15 [Note] WSREP: Received SELF-LEAVE. Closing connection.
130919 12:55:15 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 66674748)
130919 12:55:15 [Note] WSREP: RECV thread exiting 0: Success
130919 12:55:15 [Note] WSREP: recv_thread() joined.
130919 12:55:15 [Note] WSREP: Closing replication queue.
130919 12:55:15 [Note] WSREP: Closing slave action queue.
130919 12:55:15 [ERROR] WSREP: gcs/src/gcs.c:gcs_close():1321: Failed to join recv_thread(): -3 (Unknown error 18446744073709551613)
130919 12:55:15 [Note] WSREP: Closing replication queue.
130919 12:55:15 [Note] WSREP: /usr/sbin/mysqld: Terminated.
130919 12:55:15 [ERROR] WSREP: gcs/src/gcs_fifo_lite.c:gcs_fifo_lite_close():70: Trying to close a closed FIFO
130919 12:55:15 [Note] WSREP: Closing slave action queue.
130919 12:55:16 mysqld_safe Number of processes running now: 0
130919 12:55:16 mysqld_safe WSREP: not restarting wsrep node automatically
130919 12:55:16 mysqld_safe mysqld from pid file /u01/mysql/folmobileqa2/log/mysqld.pid ended
Answer :
Ok solved my issue..
My hard drive failed so mysql log got hosed, end of story..
I needed full state transfer and to remove everything except the mysql database with all those myisam tables full of permissions..
Stop the broken node completely..
root@my-galera-clustercontrol:~# s9s_galera --stop-node -i 1 -h10.0.1.111
Rsync out the mysql database folder from mysql data folder..
root@my-galera-percona1:/mnt/data/mysql# rsync -av mysql ../mysqlback/
Delete everything include bad logs from mysql data folder:
root@my-galera-percona1:/mnt/data/mysql# rm -rf *
Restore the mysql data folder:
root@my-galera-percona1:/mnt/data/mysql# rsync -av ../mysqlback/mysql .
Start the node back with a in cluster control causing a full state transfer:
root@my-galera-clustercontrol:~# s9s_galera --start-node -i 1 -h 10.0.1.111 -d 10.0.1.112
From there you can watch crazy io with iftop until eventually the node came back online as synced..