Question :
Windows Server 2012, Microsoft SQL Server.
I have a stored procedure (see below) that creates a view that I need to query.
The stored procedure part works great and takes like 5 seconds to finish, and the view is created.
The view has about 30-35k rows.
My problem is that running a simple query against the created view takes about 20 minutes! A simple query such as:
SELECT COUNT(*) FROM MY_VIEW
The above query takes about 20 minutes to finish until it returns me the number of rows. Running the same query against the actual table (that the view contains) returns the results instantly!
I am not sure if the stored procedure is even related since the views are created instantly and querying them is what I am having issue with, but I am posting it just in case.
I’d like to mention that other views created by the same stored procedures, that contains small amount of rows (several hundreds) are responding to queries fairly fast… so the amount of rows is surely a factor here.
What I don’t get is why querying a 30k rows table returns results in 2 seconds, and the same query when executed against a 30k rows view, takes 20 minutes.
Stored procedure
USE [QUARTERLY_SEC_REPORT]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[DynamicView_QR_VisitsDistSummary]
AS
BEGIN
DECLARE @CurrentView nvarchar(MAX) = null
DECLARE @SchemaName nvarchar(400)
DECLARE @TableName nvarchar(400)
DECLARE @DynSQL nvarchar(MAX)
DECLARE @DateModifier nvarchar(400)
DECLARE @DynDROP nvarchar(MAX) = 'DROP VIEW Unified_QR_VisitsDistSummary'
DECLARE @InclusionTable nvarchar(MAX) = '[dbo].[QUARTERLY_VIEW]'
Set @DynSQL = 'CREATE VIEW Unified_QR_VisitsDistSummary AS '
set @CurrentView = (select VIEW_DEFINITION from INFORMATION_SCHEMA.VIEWS WHERE TABLE_SCHEMA='dbo' and TABLE_NAME='Unified_QR_VisitsDistSummary')
DECLARE cursor1 CURSOR FOR
select TABLE_SCHEMA,TABLE_NAME
from INFORMATION_SCHEMA.TABLES
where
TABLE_SCHEMA='dbo' AND
TABLE_NAME like 'visits_dist_summary_ACC_%'
OPEN cursor1
FETCH NEXT FROM cursor1 INTO @SchemaName, @TableName
WHILE @@FETCH_STATUS = 0
BEGIN
-- Add the select code.
Set @DateModifier = '( SELECT MAX([retrieved_at]) FROM '+ @SchemaName +'.' + @TableName + ')'
Set @DynSQL = @DynSQL + 'Select * from ' + @SchemaName +'.' + @TableName +' INNER JOIN '+ @InclusionTable+ ' ON '+ @InclusionTable +'.AccountID = ' + @SchemaName +'.' + @TableName+ '.Account_ID WHERE ' + @InclusionTable +'.Appear_In_View =''True'' AND (retrieved_at =' + @DateModifier +' OR retrieved_at = DATEADD (MINUTE, -1, '+@DateModifier+ ')'+' OR retrieved_at = DATEADD (MINUTE, -2, '+@DateModifier+ ')' +' OR retrieved_at = DATEADD (MINUTE, -3, '+@DateModifier+ '))'
FETCH NEXT FROM cursor1
INTO @SchemaName, @TableName
-- If the loop continues, add the UNION ALL statement.
If @@FETCH_STATUS = 0
BEGIN
Set @DynSQL = @DynSQL + ' UNION ALL '
END
END
IF @CurrentView = @DynSQL
PRINT 'VIEW IS THE SAME, NEW VIEW WASN''T CREATED'
ELSE
BEGIN
if @CurrentView is not null
BEGIN
print @DynDROP
exec sp_executesql @DynDROP
END
PRINT @DynSQL
exec sp_executesql @DynSQL
END
END
View definition
SELECT *
FROM dbo.visits_dist_summary_ACC_12345 INNER JOIN
[dbo].[QUARTERLY_VIEW] ON [dbo].[QUARTERLY_VIEW].AccountID = dbo.visits_dist_summary_ACC_12345.Account_ID
WHERE [dbo].[QUARTERLY_VIEW].Appear_In_View = 'True' AND (retrieved_at =
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_12345) OR
retrieved_at = DATEADD(MINUTE, - 1,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_12345)) OR
retrieved_at = DATEADD(MINUTE, - 2,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_12345)) OR
retrieved_at = DATEADD(MINUTE, - 3,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_12345)))
UNION ALL
SELECT *
FROM dbo.visits_dist_summary_ACC_22222 INNER JOIN
[dbo].[QUARTERLY_VIEW] ON [dbo].[QUARTERLY_VIEW].AccountID = dbo.visits_dist_summary_ACC_22222.Account_ID
WHERE [dbo].[QUARTERLY_VIEW].Appear_In_View = 'True' AND (retrieved_at =
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_22222) OR
retrieved_at = DATEADD(MINUTE, - 1,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_22222)) OR
retrieved_at = DATEADD(MINUTE, - 2,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_22222)) OR
retrieved_at = DATEADD(MINUTE, - 3,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_22222)))
UNION ALL
SELECT *
FROM dbo.visits_dist_summary_ACC_77777 INNER JOIN
[dbo].[QUARTERLY_VIEW] ON [dbo].[QUARTERLY_VIEW].AccountID = dbo.visits_dist_summary_ACC_77777.Account_ID
WHERE [dbo].[QUARTERLY_VIEW].Appear_In_View = 'True' AND (retrieved_at =
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_77777) OR
retrieved_at = DATEADD(MINUTE, - 1,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_77777)) OR
retrieved_at = DATEADD(MINUTE, - 2,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_77777)) OR
retrieved_at = DATEADD(MINUTE, - 3,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_77777)))
Table Structure (sp_help output)
(The view “aggregates” the same table , for multiple accounts)
Name
visits_dist_summary_ACC_12345
account_id,varchar,no,12, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
siteid,varchar,no,12, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
countryCode,varchar,no,50, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
countryCount,float,no,8,53 ,NULL,yes,(n/a),(n/a),NULL
agentCode,varchar,no,100, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
agentCount,float,no,8,53 ,NULL,yes,(n/a),(n/a),NULL
retrieved_at,varchar,no,100, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
relevant_month,varchar,no,100, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
domain_name,varchar,no,100, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
Identity Seed Increment Not For Replication
No identity column defined. NULL NULL NULL
RowGuidCol
No rowguidcol column defined.
Data_located_on_filegroup
PRIMARY
Here is the execution plan.
Answer :
The view has about 30-35k rows.
A view (without a clustered index) is simply a stored query definition. It does not contain any rows directly.
My problem is that running a simple query against the created view takes about 20 minutes
This requires executing the stored query definition. The base tables (and view query) suffer from some data type issues and a lack of useful indexing, which result in an extraordinary amount of work being performed each time the view is accessed (explained below).
Data type and correctness issues
The column retrieved_at
is currently typed as varchar(100)
. You should use a proper date/time type instead. Aside from the performance considerations, you are almost certainly getting incorrect results right now:
MAX(retrieved_at)
will find the string that sorts highest, not the most recent value in datetime
terms. The comparisons that involve a DATEADD
end up converting to a datetime
, but only after the MAX
has been found (as a string).
Ideally, you would convert the base tables so that correct data types are used, for example using:
ALTER TABLE dbo.visits_dist_summary_ACC_12345
ALTER COLUMN retrieved_at datetime NOT NULL;
Overall strategy
It’s not clear from the question, but it is possible that you are intending to store a snapshot of the data from time to time. If that is the case, it would be much more efficient to write the result of your dynamic query to a permanent table than a view.
Current execution plan
The plan you provided highlights several issues. Some of this may be irrelevant given the points already mentioned, so this is provided for interest.
The hash join on account ID produces 36,222 rows, where only 1 was expected. This indicates that statistics are out of date on one or both of the tables involved in that join.
Updating statistics may improve that estimate, though you might need to go further, e.g. creating a filtered index (or statistics) on the [QUARTERLY_VIEW]
table with an Appear_In_View = 'true'
predicate. As a side note, if that column is true/false, a better data type choice than varchar
would be bit
.
The rest of the plan is driven by a nested loops left semi join. For each of the 36,222 rows coming from the hash join, SQL Server:
- Reads every row from the base table (Table Scan)
- Finds the
MAX()
retrieved_at` (Stream Aggregate) - Tests to see if the result matches the outer
retrieved_at
value.
Note that this process (full scan, aggregate, filter) occurs for every single one of the 36,222 rows produced by the initial hash join.
Worse, if the first scan-aggregate-filter branch does not find a match (satisfying the semi join), SQL Server goes on to run the same process again in full, for the -1, -2, and -3 minute cases.
The numbers shown in the execution plan above (using SQL Sentry Plan Explorer) indicate the total number of rows produced by each operator over all iterations of the nested loops join. In SSMS, you would need to look at the Actual Number of Rows property for each operator.
For the uppermost scan-aggregate-filter branch, this total is 1,312,033,284 rows. The second branch contributes an additional 436,185,324 rows. It would be worse if the -2 and -3 minute cases were ever needed to find a matching row. Hopefully, you can see why the ‘simple query’ runs for 20 minutes.
Actions
- Correct the data types
- Update statistics
- Use a table to store a static snapshot, if that meets your needs
-
Create an index on
retrieved_at
e.g.CREATE NONCLUSTERED INDEX i2 ON dbo.visits_dist_summary_ACC_12345 (retrieved_at);
-
Evaluate a clustered index on
account_id
e.g.CREATE CLUSTERED INDEX i1 ON dbo.visits_dist_summary_ACC_12345 (account_id);
The above steps should improve performance very significantly, especially the index on retrieved_at
(typed as datetime).
A query rewrite may be needed to avoid computing the MAX
four times, due to optimizer limitations/priorities, but the index should make that operation trivial (reading one row from the end of the index), so ought not to be
necessary in practical terms.
In case it is useful, one query rewrite approach is:
SELECT
QV.AccountID,
QV.Appear_In_View,
QV.Report_Name,
VDSA.account_id,
VDSA.siteid,
VDSA.countryCode,
VDSA.countryCount,
VDSA.agentCode,
VDSA.agentCount,
VDSA.retrieved_at,
VDSA.relevant_month,
VDSA.domain_name
FROM dbo.QUARTERLY_VIEW AS QV
JOIN dbo.visits_dist_summary_ACC_12345 AS VDSA
ON VDSA.account_id = QV.AccountID
WHERE
QV.Appear_In_View = 'True'
AND VDSA.retrieved_at IN
(
SELECT
V.max_date_candidates
FROM
(
-- Compute maximum date once
SELECT TOP (1)
VDSA2.retrieved_at
FROM dbo.visits_dist_summary_ACC_12345 AS VDSA2
ORDER BY
VDSA2.retrieved_at DESC
) AS Q (max_retrieved_at)
CROSS APPLY
(
-- Generate four rows based on the maximum date
VALUES
(DATEADD(MINUTE, -0, Q.max_retrieved_at)),
(DATEADD(MINUTE, -1, Q.max_retrieved_at)),
(DATEADD(MINUTE, -2, Q.max_retrieved_at)),
(DATEADD(MINUTE, -3, Q.max_retrieved_at))
) AS V (max_date_candidates)
);
The estimated plan for this query with the data type and index changes above: