I have two tables containing exactly the same data. Both table have
bigint primary key identity column, 60 columns and 300 000 rows. The difference is that all of the columns of the second table have
I am creating temporary table and importing the data from the two tables in it. When data is extracted from the
sql-variant columns it is cast to the corresponding SQL type.
The data extraction from the first table is performed for
1 sec and from the second table for
Basically, the differences in the executions are in the estimations:
and in the
read-ahead reads count:
I am wondering why the
SQL Server cannot load in advance the data readed from the
sql-variant fields (there are almost no
Also, the storage size of the tables is almost the same:
SQL Server thinks it should read
The column types are:
16 x BIGINT 8 x DECIMAL(9,2) 36 x NVARCHAR(100)
dbcc dropcleanbuffers command is used each time before data extraction and population.
In order to test you can download the sample data files from here. Then,
- Execute 3 times the
Tech05_03_TestingInsertionInTempTable.sqlfile and excute one time like this:
DECLARE @TableStructured SYSNAME = '[dbo].[NormalDataTypes_60Cols]'; DECLARE @TableFromWhichDataIsCoppied SYSNAME = '[dbo].[SQLVariant01_60Cols]';
and one time like this:
DECLARE @TableStructured SYSNAME = '[dbo].[NormalDataTypes_60Cols]'; DECLARE @TableFromWhichDataIsCoppied SYSNAME = '[dbo].[NormalDataTypes_60Cols]';
There are a few issues with the Tech05_03_TestingInsertionInTempTable.sql file. It uses a function called [dbo].[ConcatenateWithOrder] which isn’t defined. I think I understood your intention and adjusted it to produce a few insert statements.
To answer your first question, it is not true that SQL Server cannot use read-ahead reads to get sql_variant data. If I run
DBCC DROPCLEANBUFFERS; immediately before the insert I get the following output from statistics io:
Table ‘SQLVariant01_60Cols’. Scan count 1, logical reads 66311, physical reads 3, read-ahead reads 66307, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
To answer your second question about the estimated row size in the plan, SQL Server does not look at the space used by the table and divide by the number of rows. Instead the query plan contains an estimation based on the datatypes in the table. For more details read this post by Martin Smith.
SQL Server estimates your table to have a row size of 235 KB. I can get close to that:
240484.0 = 0.5 * (8016 * 60 + 8)
That’s a little off from 235 KB but I don’t know how the space estimate for sql_variants are calculated exactly. But you can see that you are likely to get overestimates when using a datatype that can store many different datatypes. For the record I don’t think that estimate has anything to do with the performance issue that you observed.
To answer the question that I think you wanted to ask (why is the query with sql_variant so much slower than other one?), I tested it a bit and there appears to be a large CPU overhead when selecting from the sql_variant table. I selected the rows instead of inserting them (discarding them before they reached the client) and found a significant difference in cpu time. The sql_variant query had a CPU time of 3151 ms and a total elapsed time of 3268 ms. The other query had a CPU time of 686 ms and a total elapsed time of 744 ms. The difference in CPU time almost fully explains the difference in run time.
It doesn’t seem unreasonable for there to be an overhead when working with sql_variant columns, right? I don’t know of any tricks to avoid paying the overhead. If you feel that your data model needs sql_variant columns I can only recommend testing to make sure that you can get good enough performance.