Large database partitioning and backups of only updated data (partial backups)

Posted on

Question :

I’m setting up a large SQL Server database (2-4tb), but only a small percentage of it will be updated. I’d like to partition by year so that my backups can be small. HOWEVER, some of the old data may change on occasion, so I am not going to mark any file or file group as read only.

Will I still be able to do a partial backup on just the files I choose without marking any files as read only? Or is it required that the files I want to skip be set to read only?

Now, if I have to mark the old data partitions as read only – if an old record is updated some time in the future, including the field that is used to assign it to a partition, would SQL server move the record to the read write partition, or would this update fail? (I know, read only means read only, but what would happen in this scenario?). In this instance, perhaps I would just add the record to the read write partition, and search for the latest version of the record (max modificationdate etc..)

Out of one million records, I would expect only 20,000 new or updated records per year. Only 200k of the records are “current”.

I’m looking to only have to regularly back up the read /write files, and performing less frequent backup of the old data.

This is a SQL Server 2016 installation (Standard Edition).

Answer :

I am not going to mark any file or file group as read only.

Will I still be able to do a partial backup on just the files

No. A minimal restore sequence includes all read/write filegroups. And so “a partial backup contains the data in the primary filegroup, every read-write filegroup, and, optionally, one or more read-only files”

if an old record is updated some time in the future, including the field that is used to assign it to a partition, would SQL server move the record to the read write partition, or would this update fail?

The update would fail.

You could just do a full backup less frequently, then regular differential and/or transaction log backups. These would be very small if little has changed.

The size of the diffs will grow larger and larger as more and more data changes. They are not incremental, they include all changes since the last full backup.

Restoring would take a little longer, as you would need the most recent full backup, the most recent diff which could be large, and all following log backups.

You could consider a Full File Backup strategy. This is primarily of benefit in your scenario if the database is using the full recovery model.

This alternative can be more challenging to set up and maintain, but offers advantages that seem relevant to your needs. Quoting from the documentation link above:

Benefits of File Backups

File backups offer the following advantages over database backups:

  • Using file backups can increase the speed of recovery by letting you restore only damaged files, without restoring the rest of the database.

  • For example, if a database consists of several files that are located on different disks and one disk fails, only the file on the failed disk has to be restored. The damaged file can be quickly restored, and recovery is faster than it would be for an entire database.

  • File backups increase flexibility in scheduling and media handling over full database backups, which for very large databases can become unmanageable. The increased flexibility of file or filegroup backups is also useful for large databases that contain data that has varying update characteristics.

Disadvantages of File Backups

  • The primary disadvantage of file backups compared to full database backups is the additional administrative complexity. Maintaining and keeping track of a complete set of these backups can be a time-consuming task that might outweigh the space requirements of full database backups.

  • A media failure can make a complete database unrecoverable if a damaged file lacks a backup. You must therefore maintain a complete set of file backups, and, for the full/bulk-logged recovery model, one or more log backups covering minimally the interval between the first full file backup and last full file backup.

The number of transaction log backups you need to keep to achieve complete recovery can be reduced by also taking differential file backups.

Please do read the extensive documentation, and try various restore scenarios on your test system before committing to this option. It does allow the maximum amount of flexibility around how often each file is backed up, but keeping everything consistent and recoverable can be a challenge.

Note that Standard Edition only supports offline file restores.

Leave a Reply

Your email address will not be published.