Why is a .bak so much smaller than the database it’s a backup of?

Posted on

Question :

I just took a backup of a SQL Server database. The MDF and LDF files together total around 29 GB, but the .bak file was only 23 GB, about 20% smaller.

My first guess when one version of a set of data is smaller than another version containing the same data would be data compression, but compression usually yields a much better compression ratio than 20%, especially for highly-ordered data (such as database tables.) Also, compressed data can’t easily be compressed further, but I know that .bak files can be compressed.

So if the data isn’t being compressed, and nothing’s being discarded, (because the whole point of making a backup is to be able to restore it to an identical state afterwards,) then what’s that 20% that’s unaccounted for?

Answer :

The space was allocated to the database files, but not used.

You can create a new database, make it 10GB in size, and see the files allocate that amount of space on disk. However, until you put data in the database, the file is essentially empty, and your backup file size will be minimal.


For a full backup, the LDF can be ignored usually

The MDF contains the actual data

The Bak file contains only data pages that are in use inside the mdf. Some space won’t be used. This space is overhead user for index rebuilds for example.

It’s quite typical to have a 100gb backup for a DB that may have a 250gb mdf. If my mdf is the same size as my backup it would be red flag about an unexpected DB shrink or lack of disk space etc

When a DB is created, you can specify (for performance) how much space you want to allocate to the data and log files. This space is then reserved even if no data is stored in the tables. Only the extents that have data written to it are backed up.

In your case, your MDF/LDF total could have even been 100 GB but your backup would still be around 23 GB for the backup that you did. If around 1 GB of data was added, your MDF/LDF total would still be 100 GB, but your backup would now be around 24 GB.

A full backup contains all the extents that have data in them and a bit of the log file. The full backup contains all the data from the time the backup task ended, and not just from the time the backup task started; this is why a bit of the log file is also required.

Leave a Reply

Your email address will not be published. Required fields are marked *