Is there a faster way of moving data between databases than CTEs?

Posted on

Question :

I use the following query on SQL Server 2012 Express to move data that is older than 4 days old out of one [primary] database and into another [secondary database] for archiving purposes. The only issue is that this takes the database offline as it is usually moving about 500k rows of data, and that figure is climbing with more data points being added to the database.

The result is that my web based app is not able to access the database for (in most cases) about 2 hours, and this stalls a lot of other processes, as well as the app.

DECLARE @4daysago datetime
SELECT @4daysago = DATEADD(d, -4, GetDate());

SET IDENTITY_INSERT [activetrackarchivedb].dbo.[Data Import] ON;

--Transfer from current (production) DB to Archive DB
    SELECT TOP 1000000 *
    FROM [activetrackdb].dbo.[Data Import] 
    WHERE [activetrackdb].dbo.[Data Import].[Receive Date] < @7daysago
    ORDER BY [Receive Date] ASC)
  DELETED.[Company id], 
  DELETED.[Site id],
  DELETED.[Site name],
  DELETED.[Receive date],
  DELETED.[Detect date],
  INTO  [activetrackarchivedb].dbo.[Data Import] 
  [Company id], 
  [Site id],
  [Site name],
  [Receive date],
  [Detect date]);

Is there a better method I could use to ‘transfer’ these rows? Even if the new method was slower, if it could at least still allow access to the database. This one took me a week to implement (I’m a novice) and involved the StackOverflow community’s help. It’s been great up until now, but as amount of data has increased, its become a very cumbersome process.

This is not for backup purposes either. The database the rows are transferred to simply have a higher capacity and the archived data is still used in the production environment (it just makes the primary database more maintainable and responsive when combined with other maintenance and indexing scripts)

Any help would be greatly appreciated.

Answer :

This doesn’t have anything to do with using a CTE specifically, as your question title implies. It is probably a combination of at least one or more of the following:

  • an expensive scan (or seek + a million lookups) to identify the rows to delete (I assume [Receive Date] is not the clustering key)
  • many underlying non-clustered indexes that also must be updated
  • blocking (both on the delete and on the target/archive table)
  • a batch size too large that causes too much transaction log activity (which can be especially bad if you have poor t-log configuration)

See this blog post but generally you may want to delete in smaller chunks and inject some artificial delays and/or checkpoints / log backups in between each individual delete.

Selecting and deleting a large number of rows will lead to blocking, which is probably why you cannot use the system for a couple of hours.

The easiest way in SQL Server Express to control blocking is to control how many rows you are deleting at one time. For example, you might read Aaron Bertrand’s notes here:

He points out that deleting (or, in your case, copying and deleting) smaller numbers of rows at one time allows the deletes to succeed, then frees up the locks so that other work can progress, and so on.

See Aaron’s code samples and the approaches that you might use. As he notes, one size does not fit all.

Leave a Reply

Your email address will not be published. Required fields are marked *