Does SQL Server use pointers instead of storing duplicate rows?

Posted on

Question :

I’m using the built in sp_spaceused stored procedure before and after performing a operation in our software to see which tables have row inserts and how the size of each table changes.

What I’m seeing is that out of all the tables that get rows written to them, only a handful show that the table has increased in side. The others that show rows were added show no change in size from this stored procedure.

The only case this is not true is on the first transaction after performing a truncate on all the tables. So to me it appears that instead on storing duplicate data SQL Server is showing rows are inserted but must be just storing pointers to previous identical rows.

Can anyone confirm this please?

Answer :

No, SQL Server does not detect duplicate rows

SQL Server is filling up empty or partially empty pages within the allocated pages.

So if I have a very narrow row (2 columns say), I can add a few hundred more rows on the same page without increasing space used.

Quick and dirty demo (without duplicate rows, but you can play with this if you want)

IF OBJECT_ID('dbo.Demo') IS NOT NULL
    DROP TABLE dbo.Demo;
GO
CREATE TABLE dbo.Demo (DemoID int NOT NULL IDENTITY(1,1), Demo char(1) NOT NULL)
GO
SELECT 'zero rows, zero space', SUM(ps.reserved_page_count)/128.0 AS ReservedMB, SUM(ps.used_page_count)/128.0 AS UsedMB
FROM sys.dm_db_partition_stats ps
WHERE ps.object_id = OBJECT_ID('dbo.Demo')
GO

INSERT dbo.Demo VALUES ('a');
GO
SELECT 'one row. Peanuts', SUM(ps.reserved_page_count)/128.0 AS ReservedMB, SUM(ps.used_page_count)/128.0 AS UsedMB
FROM sys.dm_db_partition_stats ps
WHERE ps.object_id = OBJECT_ID('dbo.Demo')
GO

INSERT dbo.Demo VALUES ('b');
GO 100
SELECT '101 rows. All on one page', SUM(ps.reserved_page_count)/128.0 AS ReservedMB, SUM(ps.used_page_count)/128.0 AS UsedMB
FROM sys.dm_db_partition_stats ps
WHERE ps.object_id = OBJECT_ID('dbo.Demo')
GO

INSERT dbo.Demo VALUES ('b');
GO 1899
SELECT '2000 rows. More than one page', SUM(ps.reserved_page_count)/128.0 AS ReservedMB, SUM(ps.used_page_count)/128.0 AS UsedMB
FROM sys.dm_db_partition_stats ps
WHERE ps.object_id = OBJECT_ID('dbo.Demo')
GO

TRUNCATE TABLE dbo.Demo
GO
SELECT 'zero rows, zero space. TRUNCATE deallocates pages', SUM(ps.reserved_page_count)/128.0 AS ReservedMB, SUM(ps.used_page_count)/128.0 AS UsedMB
FROM sys.dm_db_partition_stats ps
WHERE ps.object_id = OBJECT_ID('dbo.Demo')
GO

INSERT dbo.Demo VALUES ('c');
GO 500
SELECT '500 rows. Some space used', SUM(ps.reserved_page_count)/128.0 AS ReservedMB, SUM(ps.used_page_count)/128.0 AS UsedMB
FROM sys.dm_db_partition_stats ps
WHERE ps.object_id = OBJECT_ID('dbo.Demo')
GO

DELETE dbo.Demo
GO
SELECT 'zero rows after delete. Space still allocated', SUM(ps.reserved_page_count)/128.0 AS ReservedMB, SUM(ps.used_page_count)/128.0 AS UsedMB
FROM sys.dm_db_partition_stats ps
WHERE ps.object_id = OBJECT_ID('dbo.Demo')
GO

IF OBJECT_ID('dbo.Demo') IS NOT NULL
    DROP TABLE dbo.Demo;
GO

Does SQL Server use pointers instead of storing duplicate rows?

It depends on SQL Server’s version and data compression options:

  • Starting from SQL Server 2008, there is a compression option at row or page level.
  • Page level compression uses many algorithms/techniques for compression. Regarding your question (pointers for duplicated data), page compression uses (also) prefix compression and dictionary compression:

Prefix Compression […]The repeated prefix values in the column are
replaced by a reference to the corresponding prefix […]

Dictionary Compression After prefix compression has been completed,
dictionary compression is applied. Dictionary compression searches for
repeated values anywhere on the page, and stores them in the CI area.
Unlike prefix compression, dictionary compression is not restricted to
one column. Dictionary compression can replace repeated values that
occur anywhere on a page. The following illustration shows the same
page after dictionary compression.

So, for prefix and dictionary compression (page compression), SQL Server uses pointers to store (partially or fully) duplicated values (not duplicated rows) within same column or within diff. columns.

CREATE DATABASE TestComp;
GO

USE TestComp;
GO

CREATE TABLE Person1 (
    PersonID INT IDENTITY PRIMARY KEY,
    FirstName NVARCHAR(100) NOT NULL,
    LastName NVARCHAR(100) NOT NULL
);
GO

DECLARE 
    @f NVARCHAR(100) = REPLICATE('A',100), 
    @l NVARCHAR(100) = REPLICATE('B',100);

INSERT Person1 (FirstName, LastName)
VALUES (@f, @l);
GO 1000

CREATE TABLE Person2 (
    PersonID INT IDENTITY PRIMARY KEY,
    FirstName NVARCHAR(100) NOT NULL,
    LastName NVARCHAR(100) NOT NULL
);
GO

ALTER TABLE Person2
REBUILD
WITH (DATA_COMPRESSION=PAGE);
GO

DECLARE 
    @f NVARCHAR(100) = REPLICATE('A',100), 
    @l NVARCHAR(100) = REPLICATE('B',100);

INSERT Person2 (FirstName, LastName)
VALUES (@f, @l);
GO 1000

SELECT  f.page_count AS PageCount_Person1_Uncompressed
FROM    sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Person1'), 1, DEFAULT, DEFAULT) f
SELECT  f.page_count AS PageCount_Person2_Compressed
FROM    sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Person2'), 1, DEFAULT, DEFAULT) f
GO

Results:

PageCount_Person1_Uncompressed
------------------------------
53

PageCount_Person2_Compressed
----------------------------
2

Leave a Reply

Your email address will not be published. Required fields are marked *