Question :
I just noticed that in our of our databases we have a view that converts a BIT
column to an INT
. The underlying table has a computed BIT
column which is defined like this:
IncludeInJobTotals as (CONVERT(bit,
case
when FL.[Status]=(5)
or FL.[Status]=(3)
or FL.[Status]=(2)
then (1)
else (0)
end,(0)))
In a view that uses IncludeInJobTotals
we are casting it as an INT
. Are there any performance problems that this can cause? What other issues might there be? Are there any reasons to use INT
over BIT
?
Answer :
To answer the general question first, yes, data type in views absolutely can affect performance. Let’s mock up some data to illustrate a trivial example:
DROP TABLE IF EXISTS X_162798;
CREATE TABLE X_162798 (
NUM_VARCHAR VARCHAR(10),
NUM_INT INT,
[Status] INT,
IncludeInJobTotals as (CONVERT(bit,
case
when [Status]=(5)
or [Status]=(3)
or [Status]=(2)
then (1)
else (0)
end,(0)))
);
-- insert 1 M rows with 999990 of them having a value of 1 for IncludeInJobTotals
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
INSERT INTO X_162798 WITH (TABLOCK)
SELECT
CAST(n AS VARCHAR(10))
, n
, CASE WHEN n <= 10 THEN 0 ELSE 5 END
FROM Nums
WHERE n <= 1000000;
UPDATE STATISTICS X_162798 WITH FULLSCAN;
-- index all the columns
CREATE INDEX IX_162798_NUM_VARCHAR ON X_162798 (NUM_VARCHAR);
CREATE INDEX IX_162798_NUM_INT ON X_162798 (NUM_INT);
CREATE INDEX IX_162798_STATUS ON X_162798 ([Status]);
CREATE INDEX IX_162798_JOB ON X_162798 (IncludeInJobTotals);
DROP VIEW IF EXISTS V_162798;
GO
CREATE VIEW V_162798
AS
SELECT
CAST(NUM_VARCHAR AS INT) NUM_INT_CASTED
, NUM_VARCHAR
, IncludeInJobTotals
, CONVERT(INT, IncludeInJobTotals) IncludeInJobTotals_INT
FROM X_162798;
GO
The following two queries return the same results but the performance is very different:
-- this query does a table scan and does an estimate based on density
SELECT *
FROM V_162798
WHERE NUM_INT_CASTED = '1';
-- this query does an index seek and does an estimate based on the histogram
SELECT *
FROM V_162798
WHERE NUM_VARCHAR = '1';
If I do the wrong type of data type conversion the filter is no longer sargable. It cannot use an index on the column and cardinality estimates offer suffer because the histogram cannot be used.
Does it matter in your case where you cast a BIT
column to an INTEGER
? We can filter on the IncludeInJobTotals
and IncludeInJobTotals_INT
columns in the view to run a few tests. Based on data type precedence we should expect that a BIT
will be converted to an INTEGER
in any comparison between a BIT
and an INTEGER
. SQL Server should be able to do an implicit conversion between them so I think that it will be difficult to find a performance problem.
Since we’re dealing with a bit column there aren’t that many possible test cases for simple filters. If I filter on either column of the view looking for rows with a 0
value I always get index seeks and a correct cardinality estimate:
-- all queries do an index seek and estimate 10 rows will be returned
SELECT NUM_VARCHAR
FROM V_162798
WHERE IncludeInJobTotals = 0;
SELECT NUM_VARCHAR
FROM V_162798
WHERE IncludeInJobTotals_INT = 0;
SELECT NUM_VARCHAR
FROM V_162798
WHERE IncludeInJobTotals = CAST(0 AS BIT);
SELECT NUM_VARCHAR
FROM V_162798
WHERE IncludeInJobTotals_INT = CAST(0 AS BIT);
If I filter on either column of the view looking for rows with a 1
value I always get a table scan and a correct cardinality estimate:
-- all queries do a table scan and estimate 999990 rows will be returned
SELECT NUM_VARCHAR
FROM V_162798
WHERE IncludeInJobTotals = 1;
SELECT NUM_VARCHAR
FROM V_162798
WHERE IncludeInJobTotals_INT = 1;
SELECT NUM_VARCHAR
FROM V_162798
WHERE IncludeInJobTotals = CAST(1 AS BIT);
SELECT NUM_VARCHAR
FROM V_162798
WHERE IncludeInJobTotals_INT = CAST(1 AS BIT);
Looking for out-of-range values works as expected too:
-- these queries do an index seek and estimate 1 row will be returned
SELECT NUM_VARCHAR
FROM V_162798
WHERE IncludeInJobTotals = 2;
SELECT NUM_VARCHAR
FROM V_162798
WHERE IncludeInJobTotals_INT = 2;
I wouldn’t worry about performance for this type of data type change. It’s hard to imagine a scenario in which you would run into performance issues. However, it is important to note that queries using the view can return different results depending on how you use the column. The BIT data type behaves differently from INT
:
Converting to bit promotes any nonzero value to 1.
Going back to the data type precedence list, a VARCHAR
value will be converted to a BIT
value. This means that one way to get different results is to filter or join on an out-of-range VARCHAR
value:
CREATE TABLE TABLE_WITH_THE_NUMBER_2 (NUM VARCHAR(1));
INSERT INTO TABLE_WITH_THE_NUMBER_2 VALUES ('2');
-- this query returns 999990
SELECT COUNT(*)
FROM V_162798 v
INNER JOIN TABLE_WITH_THE_NUMBER_2 t ON v.IncludeInJobTotals = t.NUM;
-- this query returns 0
SELECT COUNT(*)
FROM V_162798 v
INNER JOIN TABLE_WITH_THE_NUMBER_2 t ON v.IncludeInJobTotals_INT = t.NUM;
Perhaps your application will never do this, but what do you want to happen in that scenario? I would focus on correctness here over performance.