What are the primary architectural differences between the backup formats of some of the more common RDBMS’ and what advantage do they serve?

Posted on

Question :

Of the systems with which I am familiar, I have noticed that SQL Server’s native database backup format is binary and proprietary. PostgreSQL and MySQL allow for a simpler approach, dumping the schemata and data. This leads to an interesting question.

What are the primary architectural differences between the backup formats of some of the more common RDBMS’ and what advantage do they serve?

It seems to me that the PostgreSQL and MySQL dump-style implementations would allow for one to open the backup in a text editor and get a sense for the data being restored.

Answer :

The native backup of sql server, commonly have extension of .bak is something which is proprietary of microsoft.

Different RDBMS providers have different formats and it also depends on whether they are open source (PostgreSQL and MySQL) or are patented (MS SQL Server, ORACLE or SybaseASE).

For MS SQL Server :

A backup is a page-by-page copy of the database, as it existed the moment you took the backup, a restored copy of that database will be in exactly the same state as it was the moment you took the backup.

A backup contains more than just the data. It contains the entire structure of the database including all the data structures and data along with procedures, views, functions and any other code. It also contains the settings and definitions of the database along with the users of the database. Exception is contained databases have a concept called USER WITH PASSWORD.

If you want to use something that you mentioned, have a look at SqlDump – Microsoft SQL Server database backup program

SqlDump is a program to backup a Microsoft SQL Server database as a text file. SqlDump generates SQL statements for tables, indexes, user-defined types, views, procedures, etc and also the table data. This data can be used for a variety of purposes, such as database backups, moving databases to another server, or setting up a test database based on the contents of an existing database

Note: I have not tried the SQLDump, as I would personally prefer SSIS or T-SQL along with BCP and BULK INSERT – to move data out and in of sql server.

The biggest difference is transactional consistency.

To achieve a truly transactionally consistent backup using the generate-a-script method (aka dump method), you would have to stop all write access to all tables in the database. Otherwise, while you are scripting out records in one table, changes in another table might occur that would break defined or undefined (e.g. enforced by the app layer) relationships.

With a page by page backup as SQL Server is offering you might run into the same issue as well. However, SQL Server stores the log sequence number that the page was changed at last together with each page. That way that information makes it into the backup too. SQL Server stores also enough log information with each backup to undo or redo transaction depending on their state at the time of the backup.

During a restore, first the data pages get copied from the backup to the new database file. Afterwards a redo/undo phase is executed to bring all pages into a consistent state using that additional log information that was stored within the backup.

So, when you use SQL Server you can run a true 24/7 operation and still take backups that are transactionally consistent.

There are also additional mechanisms like differential backups that backup only the data that changed since the last backup, or log backups, that can be used to restore a database to a particular point in time with sub-second precision (while still guaranteeing transactional consistency). All that would also not be possible with dump style backups.

Finally, dump style backups tend to take up a lot more space than native backups, but that probably could be mitigated through compression.

As for all the other stuff like procedures or permissions, that all could be scripted out too, so I do not think that those are a valid differentiator, but I do not know if postgres offers that option out of the box.

Leave a Reply

Your email address will not be published. Required fields are marked *