Mongodb Secondary Node not recovering

Posted on

Question :

Secondary node of mongodb cluster has entered in Recovering state and it’s not coming out of it. Below is what I see in log. I know one way to fix this issue is to reinitialize secondary node by deleting data directory and restarting secondary. But I don’t want to try that option as I have 2 tb of data and primary is getting write continuously.

2017-06-13T12:02:14.946+0000 I REPL [replication-12569] We are too stale to use mongodb.prod.mcse-reporting-olap.services.dal1.prod.walmart.com:27017 as a sync source. Blacklisting this sync source because our last fetched timestamp: 59351d47:3357 is before their earliest timestamp: 593f8b97:5b11 for 1min until: 2017-06-13T12:03:14.946+0000 2017-06-13T12:02:14.946+0000 I REPL [replication-12569] could not find member to sync from 2017-06-13T12:02:14.948+0000 E REPL [rsBackgroundSync] too stale to catch up — entering maintenance mode 2017-06-13T12:02:14.948+0000 I REPL [rsBackgroundSync] Our newest OpTime : { ts: Timestamp 1496653127000|13143, t: 499 } 2017-06-13T12:02:14.948+0000 I REPL [rsBackgroundSync] Earliest OpTime available is { ts: Timestamp 1497336727000|23313, t: 502 } 2017-06-13T12:02:14.948+0000 I REPL [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember 2017-06-13T12:02:14.948+0000 I REPL [rsBackgroundSync] going into maintenance mode with 11386 other maintenance mode tasks in progress

Answer :

Link in the error message exactly explain what happened.

A replica set member becomes “stale” when its replication process
falls so far behind that the primary overwrites oplog entries the
member has not yet replicated. The member cannot catch up and becomes
“stale.” When this occurs, you must completely resynchronize the
member by removing its data and performing an initial sync.

To avoid this in future:

  • You need to investigate why secondary fall behind so much. Possible
    more writes then normally expected.

  • Your oplog size might not be set up correctly. Once your secondary is
    behind than the first entry in oplog it will never catch up as it has
    no way to get those transactions.

try stopping mongod on the source and destination servers. then copy all the files in the data directory using the fastest copy tool you can find and then restart the servers and the sync will pick up from the end of copied data

Leave a Reply

Your email address will not be published. Required fields are marked *