I’m looking for a way to reorganize a MS SQL Server database. By re-organize I mean that both tables and indexes are left with no fragmentation.
The resulting database will only be used in read-only mode. One of the aims of the exercise is to release space in the database and the other is to try to make it go as fast as possible. (assuming here that a contiguous layout of both tables and indexes actually has a positive impact on performance, I understand there’s some debate about this in modern SSD world … but at the very least it doesn’t have a negative impact on performance). All in all the result will be a frozen copy of its source. The clone-and-compact operation must take place periodically and must be able to run unattended.
The size of actual data (tables and indexes) is approx 500 GB but there’s a lot of unused space so the on-disk size is almost 2 TB. I don’t care if the clone is inaccessible during the reorganization operation, nor do I care if it takes days to execute the reorg. But it must be automated (meaning it must run unattended, please don’t point me to MS SSMS)
I come from PostgreSQL background. In PostgreSQL I would use
pg_restore to export/import the whole database and such operation would by its very nature re-write tables and indexes to be contiguous. In Oracle world a so-called
imp would achieve the same if I remember correctly. Isn’t there are similar export/import tool in MS SQL world ?
I’ve looked at:
RESTORE. Not applicable. This method retains table and index fragmentation.
DBCC SHRINKDATABASE. These methods release unused space at the tail. Not quite what I’m after. They are also insanely slow as they are meant to execute while the database is operational. … I appreciate the reorg-while-online feature but it is not what I need. (I haven’t been able to find any docs if such operations can be made go faster if I put the db into SINGLE_USER mode or whatever)
bcp. Can only export table data, not a full database with all its tables, constraints, indexes, users, triggers, stored procedures, etc.
I’m looking for a Microsoft way of doing this. Buying a third-party tool just to do this type of export/import doesn’t seem fair to me. SQL Server is expensive enough as it is.
Perhaps I’m just hung up on how this is done in other database engines and therefore looking for information in the wrong place?
Edit 1: Aaron noted in a comment that compression can be beneficial here. Good point. Compression options include columnstore indexes (clustered or nonclustered – let’s not go into details at this point), row compression and page compression.
As you know, backup and restore will give you a binary copy of the database, so it won’t help you.
Shrink will fragment the indexes even more since it move pages towards the beginning of the file without consideration of the page ordering of an index.
So you are left with index reorg or rebuild. Rebuild is more thorough since it creates a new index. But it need space for this. So if you shrink and then rebuild, then the rebuild will create a new index “at the end of the file” possible making the file grow (depending on how much free space you left when you did the shrink).
So, it is a bit of a contradiction to both have stuff nicely aligned and have no free space. What you can hope for is to have free space corresponding to you your largest allocation (index/heap) in the file. I.e., “working space” for the largest one.
(If that isn’t acceptable, then you are left with deciding which allocations are too large to have spare room for (remember, you only need spare room for one). For indexes you can drop them, shrink and then create. For the data (clustered index and heap) you’d have to export outside SQL Server, truncate the table, shrink and then import the data.
Edit 2: Nikita added a good point on moving the data to a different filegroup as an option to exporting it outside SQL Server. Good point and definitely worth considering. If you decide to go the “get the data out of the filegroup” route.)
How to go about this? Use Ola’s scripts as suggested by Nikita for your B-trees. And for your heaps (I hope you don’t have that many) you can for instance use my stored procedure.
So, shrink first, then rebuild your b-trees and heaps. And it can take a very long time, especially the shrink (depends on if you have LOBs, heaps etc).
Come to think about it, we have two options for exporting and importing the data:
BACPAC. This will give you a zipped file containing an XML file describing your tables and BCP files with the data. And then you use a tool to import based on this. You can use BACPACs from SSMS (right-click a database and the databases folder), sqlpackage.exe and of course there are a few Powershell options. BACKPACs are designed to work with Azure SQL database so it will choke on anything what isn’t available there, AFAIK.
SSMS Generate Script. You find this if you right-click a database. This will create a .sql file containing DDL and INSERT. Cross your fingers that it does it right. And for 500 GB…, I have my doubts. Anyhow, you can also access this functionality using SSIS, if that is beneficial for you.
As noted previously there is no native tool that will give a database with zero fragmentation and zero free space. Rebuilding to remove fragmentation will add unused space; shrinking to remove unused space will cause fragmentation. My suggestion to achieve this is to build a new database from scratch.
Much of this is T-SQL. The rest, and command orchestration, can be done through Powershell. The schema can be obtained using SMO if you need the fine-grained control. I’ve done this, and it works but can be fiddly. The same can be achieved with a single call to DBCC CLONEDATABASE. There’s a walk-through at MSSQLTips.
In the destination DB drop all the non-clustered indexes. If we leave them they’ll be populated with fragmentation as the data arrives. They’ll be rebuilt later. ALTER the clustered indexes to set PAD_INDEX = OFF and FILLFACTOR = 100. (This may not be necessary, but won’t do any harm.) Set the size and growth of data files appropriately. Set the recovery model and take a full backup if necessary.
That takes care of the schema but there is no data. The tricky part is FKs. My approach would be to use system tables and dynamic SQL. Cycle through all foreign keys; create dynamic SQL to disable each. Cycle through data tables; create dynamic SQL to transfer data from source DB to new DB in clustered key sequence. By transferring in this sequence the destination tables are built with no fragmentation. The order in which tables are processed does not matter since there are no active FK constraints.
If the new DB really is read-only I would be tempted to leave the FK constraints disabled. The data will be no less reliable than it is in the source system.
If there will be writes iterate through the FKs once more, creating dynamic SQL to re-enable them. This may be slow if large tables have many FKs. A better approach would be to leave the FKs enabled and transfer data in dependency sequence. That would entail determining an arbitrary dependency graph at run time, which is beyond the scope of this answer.
Finally the non-clustered indexes can be re-created. Obtain the script from the source DB (SMO or MSSQLTips), change the FILLFACTOR and PAD_INDEX then execute against the new DB. The algorithm to build an index from scratch is different to that which maintains it as data arrives so these non-clustered indexes will be unfragmented too.
Finally shrink the DB file using TRUNCATEONLY to return allocated but unused space. A further full backup would be prudent.
I haven’t tested the above in toto but I have implemented each part at one time or another. It isn’t trivial but the resulting PS script would be generic and robust across many SQL Server versions.
if I remember correctly. Isn’t there are similar export/import tool in MS SQL world ?
But as @MichaelGreen points out using SQL Server’s cross-database query capability it’s pretty simple to just create new tables and INSERT … SELECT from one database to the other.