A fragmented Clustered Index is not performing well, even after a
REBUILD of the index. If the index is
REORGANIZED then performance increases for the given table/index.
I am seeing this unusual behavior only on SQL Server 2016 and above, I have tested this scenario on different hardware and different versions (all personal machines and all have the conventional rotating hard disk). Let me know if need any more information.
Is this a bug in SQL Server 2016 and onwards?
I can provide the complete details and analysis with the script if anybody wants, but not providing right now because the script quite large and will take a lot of space in the question.
Please test the shorter version of sample script taken from the provided link below in your DEV environment if you have SQL Server 2016 and above.
-- SECTION 1 /* Create a Test Folder in the machine and spefiy the drive in which you created */ USE MASTER CREATE DATABASE RebuildTest ON ( NAME = 'RebuildTest', FILENAME = 'F:TESTRebuildTest_db.mdf', SIZE = 200MB, MAXSIZE = UNLIMITED, FILEGROWTH = 50MB ) LOG ON ( NAME = 'RebuildTest_log', FILENAME = 'F:TESTRebuildTest_db.ldf', SIZE = 100MB, MAXSIZE = UNLIMITED, FILEGROWTH = 10MB ) ; GO BEGIN TRAN USE RebuildTest select top 1000000 row_number () over ( order by (Select null)) n into Numbers from sys.all_columns a cross join sys.all_columns CREATE TABLE [DBO].FRAG3 ( Primarykey int NOT NULL , SomeData3 char(1000) NOT NULL ) ALTER TABLE DBO.FRAG3 ADD CONSTRAINT PK_FRAG3 PRIMARY KEY (Primarykey) INSERT INTO [DBO].FRAG3 SELECT n , 'Some text..' FROM Numbers Where N/2 = N/2.0 Update DBO.FRAG3 SET Primarykey = Primarykey-500001 Where Primarykey>500001 COMMIT -- SECTION 2 SELECT @@VERSION /* BEGIN PART FRAG1.1 */ ----- BEGIN CLEANBUFFER AND DATABASE AND MEASURE TIME CHECKPOINT; DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; SET STATISTICS TIME ON Select Count_Big (*) From [DBO].[FRAG3] Where Primarykey >0 Option (MAxDop 1) SET STATISTICS TIME OFF ----- END CLEANBUFFER AND DATABASE AND MEASURE TIME -------------BEGIN PART FRAG1.2: REBUILD THE INDEX AND TEST AGAIN --BEGIN Rebuild the Index Alter Table [DBO].[FRAG3] REBUILD --END Rebuild the Index ----- BEGIN CLEANBUFFER FROM DATABASE AND MEASURE TIME CHECKPOINT; DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; SET STATISTICS TIME ON Select Count_Big (*) From [DBO].[FRAG3] Where Primarykey >0 Option (MAxDop 1) SET STATISTICS TIME OFF ----- END CLEANBUFFER FROM DATABASE AND MEASURE TIME --BEGIN REORGANIZE the Index ALTER INDEX ALL ON [DBO].[FRAG3] REORGANIZE ; --END REORGANIZE the Index ----- BEGIN CLEANBUFFER FROM DATABASE AND MEASURE TIME CHECKPOINT; DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; SET STATISTICS TIME ON Select Count_Big (*) From [DBO].[FRAG3] Where Primarykey >0 Option (MAxDop 1) SET STATISTICS TIME OFF ----- END CLEANBUFFER FROM DATABASE AND MEASURE TIME -------------BEGIN PART FRAG1.4: REBUILD THE INDEX AND TEST AGAIN --BEGIN Rebuild the Index Alter Table [DBO].[FRAG3] REBUILD --END Rebuild the Index ----- BEGIN CLEANBUFFER FROM DATABASE AND MEASURE TIME CHECKPOINT; DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; SET STATISTICS TIME ON Select Count_Big (*) From [DBO].[FRAG3] Where Primarykey >0 Option (MAxDop 1) SET STATISTICS TIME OFF ----- END CLEANBUFFER FROM DATABASE AND MEASURE TIME -------------END PART FRAG1.4: REBUILD THE INDEX AND TEST AGAIN
Crystal Disk Mark Test Results
I am seeing an unusual behavior of storage engine (maybe) on SQL Server 2016 and above, I have created a highly fragmented table for (read problems with fragmentation) demo purpose then rebuilt it.
Even after rebuilding the index performance doesn’t increase as expected.
To make sure data access pattern should be in key order not in IAM-driven (Allocation Order Scan), I used the range predicate.
Initially, I thought maybe SQL Server 2016 and above is more aggressive for large scans. To check that, I adjusted the page count and row count but the performance pattern doesn’t change. I tested all on a personal system so I can say there was no other user activity was going on.
I have tested this behavior on other hardware as well (all have traditional rotating hard disks). Performance patterns are almost same.
I have checked wait stats all seem normal only
PAGELATCH_IO (using Paul Randal script) there. I checked data pages using DMV
sys.dm_db_database_page_allocations it also seems ok.
If I reorganize the table or move all the data to a new table with same index definition disk IO performance increases. I have checked this with
perfmon and it seems like reorganizing index /new table can leverage sequential IO and the rebuild index still using the random reads in spite of the fact that both have almost same data pages internal and external fragmentation.
I am attaching the complete query with the results on my system I captured.
if you guys have SQL Server 2016 and above DEV box, please check this and share your results.
WARNING: This test consists some undocumented commands and
DROPCLEANBUFFERS so not at all run at the production server.
If this is really a bug I think I should file it.
So the question is: Is it really a bug or I am missing something 😉
The queries in question exercise the SQL Server read-ahead feature. With read-ahead performance optimization, the SQL Server storage engine prefetches data during scans so that pages are already in buffer cache when needed by the query so less time is spent waiting for data during query execution.
The difference in execution times with read-alead reflects how well (or not) the storage system and Windows APIs handle large IO sizes along with differences in SQL Server read ahead behavior that vary by version. Older SQL Server versions (SQL Server 2008 R2 in the aforementioned article) limit prefeatch to 512K IO sizes whereas SQL Server 2016 and later issue read ahead IO in larger sizes to leverage capabilities of modern production grade commodity hardware (RAID and SSD). Keep in mind that SQL Server is generally optimized to run on current generation hardware at the time of release, exploiting larger processor cache, NUMA architecture, and storage system IOPS/bandwidth capability. Furthermore, Enterprise/Developer editions also perform prefetch more aggressively than lesser editions to maximize throughput even more.
To better understand the reason for the different performance of SQL 2008 R2 compared to later versions, I executed a modified versions of your scripts on a older physical machine with different versions of SQL Server Developer Edition. This test box has both 7200 RPM HDD and SATA SSD, allowing the same test to be run on the same machine against different storage systems and SQL versions. I captured file_read and file_read_completed events during each test with an Extended Event trace for more detailed analysis of IO and timings.
The results show roughly comparable performance with all SQL Server versions and storage system types except for SQL Server 2012 and later versions on a single HDD spindle following the clustered index rebuild. Interestingly, the XE trace showed “Contiguous” mode during read-ahead scans in SQL Server 2008 R2 only; the trace showed “Scatter/Gather” mode was used in all other versions. I can’t say if this difference contributes to the faster performance.
Also, analysis of the trace data shows SQL 2016 issues much larger reads during read ahead scans and the average IO size varies by storage type. This doesn’t necessarily mean SQL Server adjusts the read-ahead IO size based on physical hardware but instead that it might adjust the size based on unknown measurements. The heuristics used by the storage engine are not documented and may vary by version and patch level.
Below is a summary of test timings. I’ll add more information gathered from the traces when I have some more time (unfortunately IO size is not available in SQL Server 2008 R2 XE). In summary, the IO profile differs by version and storage type. The average IO size for versions through SQL Server 2014 never exceeded 512K whereas SQL Server 2016 read over 4MB in a single IO in these tests. The number of outstanding reads was also much less in the SQL 2016 test because SQL Server fewer IO requests to accomplish the same work.
SQL_Version Storage Device Test Duration SQL 2008 R2 HDD initial table 00:00:03.686 SQL 2012 HDD initial table 00:00:03.725 SQL 2014 HDD initial table 00:00:03.706 SQL 2016 HDD initial table 00:00:03.654 SQL 2008 R2 HDD fragmented table 00:00:07.796 SQL 2012 HDD fragmented table 00:00:08.026 SQL 2014 HDD fragmented table 00:00:07.837 SQL 2016 HDD fragmented table 00:00:06.097 SQL 2008 R2 HDD after rebuild 00:00:06.962 SQL 2012 HDD after rebuild 00:00:21.129 SQL 2014 HDD after rebuild 00:00:19.501 SQL 2016 HDD after rebuild 00:00:21.377 SQL 2008 R2 HDD after reorg 00:00:04.103 SQL 2012 HDD after reorg 00:00:03.974 SQL 2014 HDD after reorg 00:00:04.076 SQL 2016 HDD after reorg 00:00:03.610 SQL 2008 R2 HDD after reorg and rebuild 00:00:07.201 SQL 2012 HDD after reorg and rebuild 00:00:21.839 SQL 2014 HDD after reorg and rebuild 00:00:20.199 SQL 2016 HDD after reorg and rebuild 00:00:21.782 SQL 2008 R2 SATA SSD initial table 00:00:02.083 SQL 2012 SATA SSD initial table 00:00:02.071 SQL 2014 SATA SSD initial table 00:00:02.074 SQL 2016 SATA SSD initial table 00:00:02.066 SQL 2008 R2 SATA SSD fragmented table 00:00:03.134 SQL 2012 SATA SSD fragmented table 00:00:03.129 SQL 2014 SATA SSD fragmented table 00:00:03.129 SQL 2016 SATA SSD fragmented table 00:00:03.113 SQL 2008 R2 SATA SSD after rebuild 00:00:02.065 SQL 2012 SATA SSD after rebuild 00:00:02.097 SQL 2014 SATA SSD after rebuild 00:00:02.071 SQL 2016 SATA SSD after rebuild 00:00:02.078 SQL 2008 R2 SATA SSD after reorg 00:00:02.064 SQL 2012 SATA SSD after reorg 00:00:02.082 SQL 2014 SATA SSD after reorg 00:00:02.067 SQL 2016 SATA SSD after reorg 00:00:02.072 SQL 2008 R2 SATA SSD after reorg and rebuild 00:00:02.078 SQL 2012 SATA SSD after reorg and rebuild 00:00:02.087 SQL 2014 SATA SSD after reorg and rebuild 00:00:02.087 SQL 2016 SATA SSD after reorg and rebuild 00:00:02.079
I also ran these tests on a VM with a HDD-backed SAN and observed similar performance as the SATA SSD. So the bottom line is this performance issue occurs only with a single-spindle HDD data file, something that is common only on PCs, not modern production systems. Whether or not this should be considered a performance regression bug is questionable but I’ll reach out to see if I can get more information.
I reached out to my contacts and Erland Sommarskog pointed out the higher fragment count reported by sys.dm_db_index_physical_stats in later versions. Digging deeper, I noticed the REBUILD statement is a parallel query. The implication is that a parallel rebuild may actually increase the fragment count (and even introduce fragmentation on a table with no fragmentation) because space allocations are done in parallel. This will generally not affect performance except with read-ahead scans and is particularly a problem with singe-spindle spinning media as the tests show. This is a consideration in all SQL versions.
Paul Randal pointed this out as by design and referenced this document for more information. A best practice for index rebuilds for a workload that will leverage read-ahead scans (e.g. data warehousing) is to rebuild
WITH (MAXDOP = 1);