Question :
I have tables which holds transactions for properties that are stored as lat long pairs. (There are more columns and datapoints then in my example schema).
A common request is to find the transactions that happened within X miles of a particular point, and only retrieve the 5 most recent transactions that happened for each nearby property.
To make this work, I decided to add a view that encapsulates this logic of most recent:
create or alter view dbo.v_example
with schemabinding as
select example_id
,transaction_dt
,latitude
,longitude
,latlong
,most_recent= iif(row_number() over (partition by latitude,longitude order by transaction_dt desc) < 5,1,null)
from dbo.example;
So a query might look like this::
select *
from dbo.v_example
where latlong.STDistance(geography::Point(40,-74,4326)) <=1609.344e1
and most_recent = 1
Unfortunately, SQL Server does not want to use the spatial index when I query through the view. If I remove schemabinding
and try to add a hint on the view, I get that the query processor cannot create a plan.
How can I encapsulate the logic and still get it to use my spatial index?
Here’s a db<>fiddle with example data and plan shapes.
The table is much larger, and it is much slower to scan it then to do a clustered index seek and then find the near by point.
Answer :
Window functions & Views
Recently I answered a different question regarding views & window functions but not all answers given there apply here.
Two differences are that this example uses geography
datatypes and filters on the window function. The positive part here is that you are not bound to the window function and could use something like CROSS APPLY
to get the closest results for one latitude
& longitude
combination.
The previous question + Answer
In short, the window function is calculated before the filtering is applied.
E.G.
where latlong.STDistance(geography::Point(40,-74,4326)) <=1609.344e1
and most_recent = 1
Extra information can be found in this blogpost from 2013 by Paul White on window functions and views.
Testing
As a first test, the result is clear when omitting the ROW_NUMBER()
from the view
create or alter view dbo.v_example2
with schemabinding as
select example_id
,transaction_dt
,latitude
,longitude
,latlong
from dbo.example;
set statistics xml on;
select *
from dbo.v_example2
where latlong.STDistance(geography::Point(40,-74,4326)) <=1609.344e1
Resulting in the expected performant query plan.
But you obviously still want the additional filtering
Keeping the view + window function
You could choose to add the window function afterwards like so:
create or alter view dbo.v_example2
with schemabinding as
select example_id
,transaction_dt
,latitude
,longitude
,latlong
from dbo.example;
&
set statistics xml on;
;WITH CTE AS
(
select most_recent= iif(row_number() over (partition by latitude,longitude order by transaction_dt desc) < 5,1,null)
,*
from dbo.v_example2
where latlong.STDistance(geography::Point(40,-74,4326)) <=1609.344e1
)
SELECT
*
FROM CTE
WHERE most_recent is not null;
Again resulting in the expected execution plan.
Using an inline table valued function + window function
CREATE FUNCTION dbo.F_Example
(
@P1 INT
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
(
SELECT example_id
,transaction_dt
,latitude
,longitude
,latlong
,most_recent FROM
(
select example_id
,transaction_dt
,latitude
,longitude
,latlong
,most_recent= iif(row_number() over (partition by latitude,longitude order by transaction_dt desc) < 5,1,null)
from dbo.example
WHERE latlong.STDistance(geography::Point(40,-74,4326)) <= 1609.344e1
) AS A
WHERE most_recent= @P1
);
EXEC SP_EXECUTESQL
N'SELECT * FROM dbo.F_Example(@P1)',N'@P1 INT',@P1 = 1
Resulting in the query plan you expected.
Using a CTE + window function
;WITH CTE AS
(
select example_id
,transaction_dt
,latitude
,longitude
,latlong
,most_recent= row_number() over (partition by latitude,longitude order by transaction_dt desc)
FROM dbo.example
WHERE latlong.STDistance(geography::Point(40,-74,4326)) <=1609.344e1
)
SELECT example_id
,transaction_dt
,latitude
,longitude
,latlong
,most_recent
FROM CTE2
WHERE most_recent < 5;
Using CROSS APPLY
and TOP(4)
without the window function
SELECT example_id
,a.transaction_dt
,latitude
,longitude
,latlong
from dbo.example e1
CROSS APPLY(
SELECT TOP(4) transaction_dt
FROM dbo.example e2
WHERE e1.latitude = e2.latitude and e1.longitude = e2.longitude
GROUP BY latitude,longitude,transaction_dt
ORDER BY transaction_dt desc
) as a
WHERE latlong.STDistance(geography::Point(40,-74,4326)) <=1609.344e1
AND e1.transaction_dt = a.transaction_dt
ORDER BY transaction_dt desc;
I used top(4)
since the resultset was based on < 5
not <= 5
When using the CROSS APPLY
method there is one more index to be added, to remove the index spool & sort:
CREATE INDEX IX_latitude_longitude_transaction_dt
ON dbo.example(latitude,longitude,transaction_dt);
It goes without saying that you could add the CROSS APPLY
solution to a view and query it the same way as you used to, an example of this here.
All above examples in one DB<>Fiddle here.
End solution comment by MichaelB
Thanks Randi, I was able to rewrite my logic into a select subquery
using the idea of your cross apply. As an added bonus, if I don’t
mention the field in the columns I select, I do not get a performance
hit.
create or alter view dbo.v_example2
with schemabinding as
select example_id
,transaction_dt
,latitude
,longitude
,latlong
,most_recent= (
select a.transaction_dt
intersect
select top 5 b.transaction_dt
from dbo.example b
where a.latitude=b.latitude
and a.longitude=b.longitude
order by transaction_dt desc
)
from dbo.example a;
&
select example_id
,transaction_dt
,latitude
,longitude
,latlong
from dbo.v_example2
where latlong.STDistance(geography::Point(40,-74,4326)) <=1609.344e1
and most_recent is not null