Would a table benefit if it was a heap

Posted on

Question :

I’ve a logging table with approx 1.500.000 rows, the primary key is an ascending identity and the clustered index is on the primary key. The identity value is auto-generated => records are always added at the end. The average row size is 1570 bytes.

There are a lot of page splits because new rows are added frequently. No rows get updates/deleted and there is a non-clustered index on the table so rows can be selected.
Due to the page splits, the clustered index is always fragmented > 65%.

I wonder my table would benefit of removing the clustered index and make it a heap table?

This is how my table + non-clustered index looks like:

CREATE TABLE [dbo].[LogEntry](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Application] [varchar](20) NOT NULL,
[EntityFullName] [varchar](80) NOT NULL,
[Action] [int] NOT NULL,
[UserName] [varchar](25) NOT NULL,
[TimeStamp] [datetime] NOT NULL,
[EntityId] [varchar](50) NOT NULL,
[WhatChanged] [nvarchar](max) NULL,
CONSTRAINT [PK_LogEntry] PRIMARY KEY CLUSTERED(     
   [Id] ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY] ) 
ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]


CREATE NONCLUSTERED INDEX [ID_Application_Entity_FullName_TimeStamp] ON [dbo].[LogEntry]
(
    [Application] ASC,
    [EntityFullName] ASC,
    [TimeStamp] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF) ON [PRIMARY]
GO

Update:
Someone had enabled auto-shrink behind my back => this will be the cause of the fragmentation

Answer :

This started as a comment/questions but it got to long so I moved it here:

I’m really thrown by this question. 1.5mil rows isn’t really all that big. And the point behind an identity is that it’s ever increasing. If that’s your CL you shouldn’t be doing inserts into the middle of a page, certainly not often enough to cause the level of fragmentation you’re seeing.

Couple of questions:

Are you doing IDENTIY_INSERTS? Basically specifying what the identity value should be? Or have you re-set the identity at some point so that you are inserting into the middle of the range?

Typically if you are doing inserts it looks like this:

5 6 7 8 < Next insert goes here >

But if you have something like this (assume your next identity value is 4)

 1 2 3 < Next insert goes here > 100 101  

Then you could be seeing quite a few page splits. But in the normal course of things you shouldn’t be.

Is there any chance you are shrinking your database? Auto_shrink or a maint plan/job that does shrinks? If so it’s the shrink that’s causing your fragmentation not the clustered index.

In general there is nothing wrong with a HEAP and they can be faster for INSERTs. My biggest concern with them tends to be if you are doing large numbers of deletes or updates (which you say you aren’t). In those cases you can get a space leak and end up with a table that is multiple GBs in size but has 0 rows.

Actual answer

Given you have a log file, and are only ever inserting, you could try dropping the PK and see how performance goes (in a test environment first of course). Once you’ve run some tests using your workload and seeing how things go then make your change in production and monitor there for a while. You might even consider dropping the identity column entirely.

Do check that SHRINK thing though. That’s a killer.

You might find it interesting to check Thomas Kejser’s take on indexes in SQL Server. Although clustered indexes are very useful, there can be reasons to keep a heap. For example, read this post:

http://kejser.org/clustered-indexes-vs-heaps/

Particularly look at the topic: Fragmentation Prone tables with lots of INSERT activity

This topic seems to exactly describe the issue that you are facing with fragmentation. So…

Consider the alternative: an indexing strategy where you pick a
sufficiently “random” key (like a GUID) and instead of blindly
declaring that key a cluster index
, you instead leave the table as a
heap and just put a good old fashioned unique, non-clustered key index
on the GUID
. The majority of the table growth from INSERTs now goes to
the heap which will nicely fill up and stay un-fragmented.

Another comment from technet: http://technet.microsoft.com/en-us/library/hh213609.aspx

“Sometimes data architects use heaps when data is always accessed
through nonclustered indexes and the RID is smaller than a clustered
index key. “

Those posts may give you something to think about on your issue.

The FILLFACTOR on the table is 100, thus there are no free pages in the cluster index. If you’re doing lots of inserts try setting FILLFACTOR to something like 80. And read the SQL Server books online regarding FILLFACTOR. 😉

Leave a Reply

Your email address will not be published. Required fields are marked *