Question :
We have a view which includes a sub-query like this:
SELECT
id AS OrderNo,
timestamp AS OrderDate,
email,
(SELECT TOP (1) reasonId
FROM dbo.pr_cancel WITH (NOLOCK)
WHERE (orderId = dbo.pr_orders.id)
ORDER BY id DESC) AS OrderReason
FROM dbo.pr_orders WITH (NOLOCK)
WHERE (pay_canceled IS NULL) AND (pay_error IS NULL)
If I use a “select * from view”, I get 700k results in a matter of seconds, if I request the results in chunks of 25’000, the second, third, fourth chunks take longer and longer:
declare @NumRows int
declare @i int
SET @NumRows = 25000 --return 10 rows at a time
SET @i = 3 --i want the 3rd resultset, 30-40
SELECT * FROM (
SELECT
row_number() OVER(ORDER BY BestellNr) AS RW,
dbo.dp_bestellungen.*
FROM dbo.dp_bestellungen ) myAlias
WHERE myAlias.RW BETWEEN (@NumRows * @i) AND ((@NumRows * @i)+ @NumRows)
Setting “i” to 5 or 10 makes the query extremely slow (up until 20 minutes).
Removing the sub-query results in fast answer times.
Is there a way to optimize the first query? I cannot change the second query as this is executed by an external system, where I don’t have access to.
PS: “dp_bestellungen” refers to the name of the view in the second query.
Answer :
The main reason for your slow query is the fact that you are using row_number()
.
Your subselect needs to create a row number over all rows in your table.
SELECT
row_number() OVER(ORDER BY BestellNr) AS RW,
dbo.dp_bestellungen.*
FROM dbo.dp_bestellungen
In fact, if you are using SQL Server 2012 or above, you have a quite better solution for this use case.
You can use OFFSET/FETCH NEXT
to achieve the same behaviour. OFFSET
defines how many rows should be skipped. FETCH NEXT
defines how much rows to return, after skipping the offset.
You can try the code below:
declare @NumRows int
declare @i int
SET @NumRows = 25000 --return 10 rows at a time
SET @i = 3 --i want the 3rd resultset, 30-40
SELECT dbo.dp_bestellungen.*
FROM dbo.dp_bestellungen
ORDER BY BestellNr OFFSET (@i*@NumRows) ROWS FETCH NEXT @NumRows ROWS ONLY;
In fact this is quite faster if you have a good index on BestellNr
, in the best case BestellNr
is your CLUSTERED INDEX
for this usecase.
If you can’t use OFFSET/FETCH
, I would try to reduce the load for ROW_NUMBER
using a selfdefined OFFSET
using this:
declare @NumRows int
declare @i int
SET @NumRows = 25000 --return 10 rows at a time
SET @i = 3 --i want the 3rd resultset, 30-40
SELECT * FROM (
SELECT TOP ((@i+1) * @NumRows)
row_number() OVER(ORDER BY BestellNr) AS RW,
dbo.dp_bestellungen.*
FROM dbo.dp_bestellungen
) myAlias
WHERE myAlias.RW >= (@NumRows * @i)
Your second query has a sort within it:
row_number() OVER(ORDER BY BestellNr)
causes the result to be sorted by BestellNr.
To optimize it you can save the whole result (your query without WHERE myAlias.RW BETWEEN (@NumRows * @i) AND ((@NumRows * @i)+ @NumRows)
) into a temp table and create a PK on RW.
But if the underlying tables change constantly, it’s not a solution because the temp table will store only a “snapshot” of your tables included in view at the moment you’ve cteated a temp table
First
select *
from ( SELECT id AS OrderNo,
, timestamp AS OrderDate
, email
, reasonId
, row_number() over (partion by orderId order by id desc) as rn
FROM dbo.pr_orders WITH (NOLOCK)
JOIN dbo.pr_cancel WITH (NOLOCK)
on orderId = dbo.pr_orders.id
and pay_canceled IS NULL
AND pay_error IS NULL
) tt
where tt.rn = 1