index maintenance strategy if few insertion points relative to # of rows

Posted on

Question :

In SQL Server 2008 R2, I have a nonclustered covering index on multiple tables with 100M+ rows. The table has a few thousand “insertion points” where all new inserts happen. This means that regardless of fill factor, I’ll quickly end up with page splits and fragmentation at every insertion point, and no fragmentation or splits anywhere else in the table. Unfortunately, queries always include new rows and hence fragmented areas of the index.

  • what happens when there’s a page split but inserts continue sequentially after the split? Is there a way to tell SQL Server to do the split with lots of extra room for subsequent inserts, without wasting space on existing pages with a large fill factor that for most pages will never be filled?
  • what are good index maintenance strategies to use for indexes like this?
  • is there a good automated way to identify tables like this where fragmentation is severe but not uniform? These tables don’t show up as more than 5% fragmented overall.
  • are there index schema changes I should be considering?

Here’s more info about the problem. The indexes all look like this pattern (simplifying for clarity below) :

    id int identity(1,1) PRIMARY KEY CLUSTERED, 
    foreign_key int, 
    log_time datetime, 
CREATE NONCLUSTERED INDEX on Foo (foreign_key, log_time) INCLUDE (...)

Queries on this table are always in this form:

WHERE log_time > getdate()-70 AND foreign_key IN (select ...)

Other facts:

  • there are about 5,000 foreign_key values, each with 10,000’s of rows for each.
  • average row size is 55 bytes, meaning around 150 rows per page
  • the IN filter usually includes 10%-50% of foreign_key values rows and the date filter includes 20%-40% of the rows. The average is about 15% of total rows selected.
  • the index is a covering index for the queries, so no clustered index access is needed.

Answer :

The answer by ThomasStringer is very good and the usage of partitioning here would likely help overall performance and decrease your maintenance cost – however, it won’t do anything for your page split scenario.

Can you verify that your FK’s are trusted? This won’t decrease or impact your page splits but it will help your query execution time.

select 'ALTER TABLE ' + + '.' + OBJECT_NAME(o.object_id) + ' WITH CHECK CHECK     CONSTRAINT ' + + '
FROM sys.foreign_keys i  INNER JOIN sys.objects o ON i.parent_object_id = o.object_id     INNER JOIN sys.schemas s ON o.schema_id = s.schema_id WHERE     i.is_not_trusted = 1 AND i.is_not_for_replication = 0
-- Foreign Key Check... Will build the statement below to CHECK (validate) the FK for     those FK's that are untrusted.
select 'ALTER TABLE ' + + '.' + OBJECT_NAME(o.object_id) + ' WITH CHECK CHECK     CONSTRAINT ' + + '
from sys.check_constraints i join sys.objects o on i.parent_object_id = o.object_id     join sys.schemas s on o.schema_id = s.schema_id where i.is_not_trusted =     1 and i.is_not_for_replication = 0

-- Check Constraint Check... Will build the statement below to CHECK (validate) the       check constraints for those check constraints that are untrusted.

-- !!!! -- The output (in TEXT) will look similar to what you see below.  once you     have the output, put it into the query window and execute it.

Another idea you already hit on, using fill factor – if you use a low fill factor and combine that with either row or page level compression, your data density, per page will increase. Again, this won’t help your page split problem, BUT it can decrease the quantity of page splits (due to having a higher data density per page).

Lastly, if you can, look at changing your index and lead with the log_time column instead of the foreign key column. This change, depending on how the log_time data comes in (hopefully it’s more “in order” than the FK column). This could decrease your page splits significantly. If you pair this with both compression (ROW) and a good partition scheme, you might see some significant improvements.

A page split is a page split. You can’t control what SQL Server does when it requires a page split. That is not a variable and configurable setting. When SQL Server needs to split a page, it’ll put roughly half of the data in one page and half in the new page.

If you are having “multiple” insertion points into the index, causing hot spots in the index to undergo many page splits and much fragmentation then there isn’t much you can do to alleviate that with the current design. It sounds like the way your data is dispersed that could be the root of your problem. If you insert into the middle of an index, you will cause page splits/fragmentation.

There are a couple of things you can do, depending on how predictable your “insertion points” are. You could utilize table partitioning, and this would allow you to rebuild the indexes on a particular partition (or partitions). This will keep cold data from receiving the maintenance, and minimize the duration of the rebuild to the hot and fragmented partitions.

Depending on your edition, you could also utilize online index rebuild operations to reduce the availability impact on this maintenance procedure.

Leave a Reply

Your email address will not be published.