SQL database stuck in recovery

Posted on

Question :

I have SQL server 2017. I have one 3 TB size database there. Somehow due to long running transaction the database got stuck ‘IN Recovery’ mode after SQL server Restarted. When I checked the sql error logs it says 2 189 255 seconds remaining (Phase 2 of 3) completed which is almost 25 days. My goal is to bring the database online even if I lose some data.

So I’ve ran below commands but no luck.

  USE [master]
  GO

  RESTORE DATABASE test WITH RECOVERY
  --Msg 3101, Level 16, State 1, Line 6
  --Exclusive access could not be obtained because the database is in use.
  --Msg 3013, Level 16, State 1, Line 6
   --RESTORE DATABASE is terminating abnormally.


  ALTER DATABASE test SET EMERGENCY;
  GO
  --Msg 5011, Level 14, State 7, Line 13
  --User does not have permission to alter database 'DragonDriveConnect', 
  the database does not exist, or the database is not in a state that 
  allows access checks.
  --Msg 5069, Level 16, State 1, Line 13
  --ALTER DATABASE statement failed.

  DBCC CHECKDB (DragonDriveConnect, REPAIR_ALLOW_DATA_LOSS) WITH 
  ALL_ERRORMSGS;
  GO
  --Msg 922, Level 14, State 1, Line 22
  --Database 'DragonDriveConnect' is being recovered. Waiting until 
   recovery is finished.

At last I’ve tried to Delete the database too but that is also not working and giving me error saying Cant Delete.

How do I get out of this situation?

Answer :

The error you’re seeing in the SQL Server Error Log is this one:

Recovery of database ‘CrashTestDummy’ (9) is 0% complete (approximately 42 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required

More generically, it will say:

Recovery of database ‘{Database Name}‘ ({Database ID}) is {N}% complete (approximately {N} seconds remain). Phase {N} of 3. This is an informational message only. No user action is required

Because your database was not shut down cleanly on restart, the database must go through “crash recovery”. This is required to ensure the database remains consistent. When the database is not shut down cleanly, SQL Server must make sure that transactions written to the transaction log have been reconciled properly against the data files.

All transactions are guaranteed to be written to the transaction log. However, updating data is initially only done in memory. Updates to the physical data files are done asynchronously via a checkpoint. The asynchronous nature of the data file updates are why a crash or unclean shutdown requires extra work on startup.

As indicated by the error message, there are three phases to the recovery. Each of these is essentially a pass through the transaction log:

  1. Analysis
  2. Redo / Roll Forward
  3. Undo / Rollback

Analysis

This phase is simply to review the transaction log & determine what needs to be done. It will identify when the most recent checkpoint was, and what transactions might need to be rolled forward or back to ensure consistency.

Redo / Roll Forward

Completed transactions from the transaction log need to be reviewed to ensure the data file has been completed updated. Without this, changes that were only in-memory might have been lost.

This phase will take those transactions that were committed after the most recent checkpoint and redo them, to ensure they are persisted to the data file.

If you are using SQL Server Enterprise edition, Fast Recovery will allow the database to come online & be available after this phase of recovery. If you are not using Enterprise Edition, the database will not be available until after the Undo phase has completed.

Undo / Rollback

Transactions from the transaction log that were rolled back, or were uncommitted at the time of “crash” must be rolled back. SQL Server must verify that if uncommitted changes were made to the data file, they are undone. Without this, a rolled back change could be partially committed, violating the ACID principles of the database.

This phase will perform the rollback of any transactions that were uncommitted at the time of crash, or were rolled back after the final checkpoint.

So what can you do about it?

While the database is in recovery, attempts to bring the database online via a RESTORE command like this will fail:

RESTORE DATABASE CrashTestDummy WITH RECOVERY;

SQL Server is already attempting to do this. The RESTORE...WITH RECOVERY; will simply put the database through the exact same steps in order to bring the database online in a consistent manner.

Be patient

The right thing to do is just be patient. This part of the message from the error log is the one that you should pay attention to:

No user action is required

Note, too, that the time remaining is an estimate. In my experience, it is wildly inaccurate. Sometimes the time remaining will grow larger, rather than reducing. Sometimes it will report a very long completion time, and suddenly complete very fast. Its just an estimate.

Can you just “throw away” your transaction log and start fresh?

I advise against it. I would suggest you never, ever do this with a production database. There is a procedure to attach a database without a transaction log, and ask SQL Server to ATTACH_REBUILD_LOG. I won’t detail all the steps, but the “punchline” for that procedure is to do this:

CREATE DATABASE CrashTestDummy 
ON (FILENAME = 'C:SQLMSSQL15.MSSQLSERVERMSSQLDATACrashTestDummy.mdf') 
FOR ATTACH_REBUILD_LOG;

Running this on a crashed database may result in this error:

The log cannot be rebuilt because there were open transactions/users when the database was shutdown, no checkpoint occurred to the database, or the database was read-only. This error could occur if the transaction log file was manually deleted or lost due to a hardware or environment failure.
Msg 1813, Level 16, State 2, Line 5
Could not open new database ‘CrashTestDummy’. CREATE DATABASE is aborted.

In which case, you’re stuck. You’ll need to use the original transaction log & be patient. Just wait for it to recover.

Patience, patience!

Possible causes for the databases to stuck in “In Recovery” mode?

  • Huge size of transaction log file.
  • SQL restarted during a long running transaction.
  • Huge number of VLFs (i.e. virtual Log Files).
  • Could be a bug in SQL Server which is fixed with the help of some patches.

The very first thing that your should do is checking of ERRORLOG. In Errorlog, we should see the very first message in the database (TestMe is the name of my database):

Starting up database ‘TestMe’.

This means the files are opened and recovery is started. After sometime, you should see phase 1.

Recovery of database ‘TestMe’ (28) is 0% complete (approximately 37 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 3% complete (approximately 36 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.

Once phase 1 is complete, it would go with Phase 2 and 3.

And once it completes, you should use something similar.

3807 transactions rolled forward in database ‘TestMe’ (28). This is an informational message only. No user action is required.
0 transactions rolled back in database ‘TestMe’ (28). This is an informational message only. No user action is required.
Recovery is writing a checkpoint in database ‘TestMe’ (28). This is an informational message only. No user action is required.
Recovery completed for database TestMe (database ID 28) in 30 second(s) (analysis 1289 ms, redo 29343 ms, undo 72 ms.) This is an informational message only. No user action is required

In my case, This issue has been resolved by Restarting SQL Server Service, You can do it from SQL Server Configuration Manager.

enter image description here

If you have SQL Server 2019 you can enable accelerated database recovery which is many times faster and can help avoid this problem in the future

https://docs.microsoft.com/en-us/sql/relational-databases/accelerated-database-recovery-concepts?view=sql-server-ver15

Leave a Reply

Your email address will not be published.