Clustered index size requirement

Posted on

Question :

I have a HEAP table with no indexes (on SQL Server 2012 Std). I need to create a clustered index on an INT column. There are ~1.2 billion rows and the int column is nullable and not unique. I’m struggling to figure out how much space this will take up in the filegroup. Could someone advise please?

Edit: I’ll sort in the tempdb by the way

Answer :

When you create a clustered index this “replaces” the heap so there are no extra index pages as such (aside from the upper levels of the b-tree), and as you have absolutely no other indexes there is nothing else to affect the size of.

Therefore if your target column already exists I wouldn’t expect very much extra space to be permanently taken unless you have fill factor set to something other than full (for example with a fill factor of 80% the new index would leave approximately 20% of each leaf page spare for later additions).

Some extra space may be used as the heap is rebuilt as a clustered index so your database may need to grow to accommodate this, but any such extra space will be free for future allocation once the process is complete.

For an example, see Index Disk Space Example in the product documentation.

There is even more detailed information in the Online Indexing Operations in SQL Server 2005 technical article (despite the name, it covers offline operations as well).

Extra

It could take less space than before as you have noticed. You may have had a lot of part-used or empty pages due to deletes and updates over time. Heaps do not release the space used by empty pages created by deletes unless TABLOCK was specified at the time of the deletion, and even then not always. From DELETE (Transact-SQL):

When rows are deleted from a heap the Database Engine may use row or page locking for the operation. As a result, the pages made empty by the delete operation remain allocated to the heap. When empty pages are not deallocated, the associated space cannot be reused by other objects in the database.

There may also have been a small amount of space used by forward record pointers (which don’t exist in clustered indexes), and if your rows are of variable length blocks of space in pages that are too small to use for new rows. Your newly reorganised rows will be more tightly packed (fill factor settings permitting) so may well use fewer pages.

As mentioned in MSDN:

You can use the following steps to estimate the amount of space that is required to store data in a clustered index:

  1. Calculate the space used to store data in the leaf level of the clustered index.

  2. Calculate the space used to store index information for the clustered index.

  3. Total the calculated values.

Also, you can refer to this article Heaps in SQL Server: Performance, Maintenance, and Your Sanity, which helps how to identify heaps, how to measure their size, and how to plan a change to convert your heaps to clustered indexes.

Leave a Reply

Your email address will not be published. Required fields are marked *