Batch Delete with checkpoint still causes log growth

Posted on

Question :

I have a large table (600 million rows, 60GB data) that I need to migrate to another table. This is for a few reasons:

  • The current table is a heap and I want to convert it to having a clustered index
  • I have a column which is time(7) that only really needs to be time(4)
  • The table has previous deleted columns that are still reserving space from what I can see (they are just marked as is_dropped in the metadata.

The best way I could come up with doing all this in the fewest operations (i.e. not having to rebuild the table multiple times) was to create a new table with the column definitions I want and copy the data across into the new table while deleting the old data, followed by a name switch. I’ve split this into batches in a loop to prevent the log file growing continously, however this doesn’t seem to be working and after about 70million rows, the log file has grown to around 110GB and there is no space left on the server…

The database is using simple recovery and I have a checkpoint after each batch, which I thought was meant to stop the log file growing! Am I missing something, or is there some other reason my log file continues to grow? This is testing on a dev server I’m doing, so there are no other operations currently running on the table.

My query is:

WHILE 1 = 1
BEGIN
    DELETE TOP(1000000) FROM RunningData with (tablock)
    OUTPUT 
        deleted.[Data], 
        deleted.[Date], 
        deleted.Time, 
        deleted.Category, 
        deleted.SystemID
    INTO [dbo].[TmpRunningData] (
        Data,
        Date,
        [Time],
        Category,
        SystemID
    )

    IF @@ROWCOUNT = 0 BREAK
    CHECKPOINT
    RAISERROR ('Moving to next batch', 0, 1) WITH NOWAIT
    waitfor delay '00:00:01'
END

I really want to be able to sort this table out, but I’m struggling to find a way of processing this table due to the size! Any suggestions on how I can stop the log growing, or another way to do it would be very appreciated, thanks! In production the database can go down for a weekend, so there should be plenty of time to do it.

Answer :

First step is to ask SQL Server why the log grows. You should find the answer in the log_reuse_wait_desc column in the sys.databases catalog view. Being in simple mode, it shouldn’t be log backup. Then take it from there.

And don’t do it all in one transaction, since SQL Server can’t truncate past the oldest open transaction.

Leave a Reply

Your email address will not be published. Required fields are marked *