Why does InnoDB corrupt tables upon recovery?

Posted on

Question :

When setting

innodb_fast_shutdown=0

MySQL drops all incomplete transactions before shutting down/crash. The default value is 1, which is understandable to preserve incomplete transactions.

However, in the recovery mode (e.g., innodb_force_recovery = 6), MySQL treats all incomplete transactions as committed. This obviously damages the table as the transactions are incomplete.

What is the purpose of preserving incomplete transactions when it damages the whole table?

If it is necessary in some cases, why doesn’t innodb_force_recovery have an option to simply drop all incomplete transactions (when they cannot be committed) like innodb_fast_shutdown=0 to save the table? Or am I missing something?

Answer :

innodb_fast_shutdown controls the change buffer, not the redo log. No matter what value of innodb_fast_shutdown is, if a transaction wasn’t committed at the time of shutdown it won’t survive the restart.

If a transaction was committed then whether it’s going to persist depends on innodb_flush_log_at_trx_commit. If it’s the default 1 the transaction will persist. If != 1, may not persist.

Now, innodb_force_recovery. Here’s what possible. When you run an UPDATE in a transaction, the new record is written to a page (in the buffer pool, the page becomes dirty). Later on, the data page can be flushed to disk even if the transaction isn’t committed yet. When you have to use innodb_force_recovery that means you are ready to sacrifice data consistency (is it what you call a damaged table?) just to get the some data back. With innodb_force_recovery=6 InnoDB tries to salvage as much data as possible. It cannot complete the crash recovery process, but the new record is already written to the page – why not make it available to a user?

innodb_force_recovery values 1 through 5 should be tried before resorting to “6”. They do less “damage”.

The Change Buffer is sort of a “write cache” for updating index blocks. It provide speed during INSERTs/UPDATEs/DELETEs by delaying both the read and write of blocks for secondary indexes. More specifically, neither the read nor write need be done for non-unique indexes during such queries. But they must be done eventually. And the ‘changes’ can, in some cases, be significantly “batched” for efficiency.

Flushing as you shutdown means doing all the read-modify-writes to update perhaps thousands of index blocks. But the startup will be faster.

Waiting until after restart means figuring out what got lost and reconstructing it.

A compromise that some people use is to gradually decrease innodb_change_buffer_max_size (default=25, meaning percent of buffer_pool) in order to flush the change buffer gradually. This lets both the shutdown and startup be relatively “fast”.

The dropping of incomplete transactions is independent of innodb_fast_shutdown. It only affects if they are rolled back before shutdown, or on recovery at startup.

innodb_force_recovery of >= 4 has a comment ‘This value can permanently corrupt data files’ in the manual. Once your up to this level just be grateful if you get anything back :-).

Leave a Reply

Your email address will not be published. Required fields are marked *