I recently attended an interview for my next job and was asked this:
How can I speed up the backup of a database sized close to 1TB or 800 GB in SQL Server?
My Answer: We can use Database Compression or use a third-party tools like Light Speed.
But they expected this answer which I was not aware of:
Solution : Increase throughput by adding more destination files
I created this sample script by adding more backup destination files:
DBCC TRACEON (3605, -1) DBCC TRACEON (3213, -1) BACKUP DATABASE [AdventureWorks2008] TO DISK = N'E:ADWBackupAdventureWorks2008-Full.bak', DISK = N'E:ADWBackupAdventureWorks2008-Full-File2.bak', DISK = N'E:ADWBackupAdventureWorks2008-Full-File3.bak' WITH NOFORMAT, INIT, NAME = N'AdventureWorks2008-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 10 DBCC TRACEOFF(3605, -1) DBCC TRACEOFF(3213, -1)
Can you please answer the below questions:
- What is the use of following this method instead of using a single file?
- Will there be any time difference if I do like this?
- How can I restore the DB using the multiple split files?
- Will it really increase the speed of the backup?
I think the perfect way to speed the backup process (and restore) of the 1TB database is to take SAN or VMWare snapshots.
Nevertheless, the answer that your potential employer expected is ok, but in some circumstances may give no results at all. For this approach to work you have to have multiple disk arrays. To simplify this idea, just imagine if instead of 1 SSD, you were writing your 5 different backup files to 5 different SSDs in parallel. This is what they had in mind.
Your script is perfectly fine, but if you test it on your laptop, as mentioned earlier, you may get little to no improvement at all. Plus you have to consider other factors, like throughput of a network etc.
A good article to start would be https://www.mssqltips.com/sqlservertip/935/backup-to-multiple-files-for-faster-and-smaller-sql-server-files/
Multiple targets for the backup can be very useful, but only in certain situations.
Your source database can be read faster than your current single file backup can write.
Your backup targets for each database are on separate spindles.
You aren’t using shared SAN HBA or iSCSI or other bandwidth between the SQL Server instance and the media.
i.e. the IOPS from writing Backup File A DO NOT use the same disks as writing Backup File B.
If all of these are true, then it’s possible that some degree of parallelism (many output files) will increase the speed. If all of these are not true, more than likely you’ll cause one or more sets of disks to thrash, and your multi-file backups will actually be slower and also may cause OS filesystem or storage level fragmentation, because you’re writing Backup File A and Backup File B at the same time!
Note that many of the same things apply that I covered in my answer to Transaction Log Backups Serial or Parallel?, since this is about writing in parallel (though not necessarily reading, depending on how your filegroups and files are laid out).
To really understand this, you may need to see about borrowing some equipment from close friends.
Obtain one (1) fast solid state disk
Any of the good SATA2 or SATA3 or mSATA Samsung, Crucial, Mushkin, etc. internal drives will do
You need to ensure that you can read faster than you can write.
Beg, borrow, or buy at least two disk drives in addition to the drive your SQL Server data and log files are on.
One (1) of these can be a USB2 external drive (or a USB3 drive) plugged into a USB2 port
One (1) of these can be a USB2 external drive plugged into a USB3 port
We can be reasonably sure the USB2 and USB3 ports aren’t sharing a bus different controllers
You can’t put a USB3 drive into a USB3 port here, because we WANT our backup devices to be slow individually, but not contend with each other for bus bandwidth
One or more drives can be SATA spinning disks
Log into your local SQL Server Developer or SQL Server Express instance
Create a reasonably big database with data and log files on the fast SATA SSD.
Now try backing up twice in a row (two BACKUP DATABASE commands in the same SQL batch) to each backup device, one device at a time
Record the speed of backup and verify on only the second backup to each, i.e. after caches have been filled.
Now you know how fast each is by themselves; one is likely better than the others.
Now try backing up twice in a row (two BACKUP DATABASE commands in the same SQL batch) to ALL backup devices at once (i.e. with multiple DISK arguments)
See the speed difference?
Now play around with BUFFERCOUNT, MAXTRANSFERSIZE, and if you’re on developer edition, COMPRESSION.
ALWAYS use the CHECKSUM option!
What is the use of following this method instead of using a single file?
This is used to split the backup to multiple non-mirrored devices
each one of them holds only part of the backup, and all of them needed to restore the backup. Yes, this can be used to achieve a faster backup if I/O is the slowest part of writing to the destination.
Read more here: Media Sets, Media Families, and Backup Sets (SQL Server)
Will there be any time difference if I do like this?
Two major things to know:
- You must keep all the files in a healthy state to be able to restore; you cannot restore from only part of them.
- This can lead to a faster backup if writing to destination is the slowest part of the backup flow.
How can I restore the DB using the multiple split files?
The same as you did the backup, in the restore command use multiple files (if you miss one the system will throw an exception and will not execute the restore). If you are using the GUI just add all the files in the backup files window.
Will it really increase the speed of the backup?
You must test what is the slowest part of your backup process. For example, I don’t know if you are using compression. That might have impact on CPU and if CPU is heavily loaded it will not help even if you add 10 file destinations.