SQL Server spatial filter with additional conditions performance issue

Posted on

Question :

I have a table Event that has 30 million rows and a table Location that has 35 million rows in SQL Server. An event can have multiple locations.

I have a spatial index, geometry auto grid (cells per object: 4000), on the Location table. The bounding box of all the locations covers and area of about 500 000 km2.

The performance is fine when only using STContains in the WHERE clause. If I try to filter on a additional column(s) the performance takes a big hit.

The following query takes 171ms:

SELECT TOP 2000 [E].*
FROM [dbo].[Event] AS [E]
INNER JOIN [dbo].[Location] AS [L] ON [E].[Id] = [L].[EventId]
WHERE 
    --[E].[Price] >= @Price
    --AND
    @BoundingBox.STContains([L].[Position]) = 1

If I add the filter for Price, the same query takes 10 seconds to run. If I remove the spatial index the query takes 500ms. Depending on the size of the bounding box and the value of price the query takes either like second or less or anywhere between three to thirty seconds. Only filtering on the spatial column gives good performance for all bounding box sizes.

Every article I’ve found about spatial queries in SQL Server never says anything about additional column(s) filters. They only describes the spatial filter and how to tune performance for the spatial index/query and it this case the spatial filter works fine with different sized bounding boxes. The problem is the additional filter columns I need.

Is it possible to get performance from a query with a spatial filter and with additional column(s) filters?

EDIT

95% of the price values are below 5000. So filtering on a high value will only return around 100 rows. I’ve tried with a bounding box that is 24000 sq km and contains around 4 million events. So the issue might be with the distribution of the price values. I have a second column Date which has a better distribution of values and filtering on that column is fast as long as we don’t get to close the the max date.

SELECT TOP 2000 [E].*
FROM [dbo].[Event] AS [E]
INNER JOIN [dbo].[Location] AS [L] ON [E].[Id] = [L].[EventId]
WHERE 
    [E].[Date] >= @Date
    AND
    @BoundingBox.STContains([L].[Position]) = 1

Also the geometry have srid 4326 if that makes any difference.

Answer :

The answer is yes it is possible to get performance with filters over multiple columns including spatial columns.

I’ve tried to replicate your tables and closely as I can to your brief description using RAND to create prices, location points and link them to events. But I can’t duplicate you performance problem. Have I got the basic table details right?

--  Event table
SELECT TOP 30000000 
    N EventID, 
    RAND(CAST(NEWID() AS VARBINARY)) * 10000 + 5000 Price
    into Event
FROM Tally;

ALTER TABLE Event ALTER COLUMN EventID INT NOT NULL;
GO
ALTER TABLE Event ADD CONSTRAINT Event_PK PRIMARY KEY (EventID);
GO
CREATE INDEX Event_Price_IDX ON Event(Price);
GO

-- Location Table
SELECT TOP 35000000
    N LocationID, 
    CEILING(RAND(CAST(NEWID() AS VARBINARY)) * 30000000) EventID,
    Geometry::Point(RAND(CAST(NEWID() AS VARBINARY)) * 710000 + 40000, RAND(CAST(NEWID() AS VARBINARY)) * 710000 + 40000, 0) Location
    into Location
FROM Tally;

Running the 3 versions of your query with hopefully sensible parameters

-- Test Query
DECLARE @price FLOAT = 13000.00 -- ~20% of records
-- 10 sq km query window
DECLARE @BoundingBox Geometry = Geometry::STGeomFromText('POLYGON(( 300000 300000, 310000 300000, 310000 310000, 300000 310000, 300000 300000))',0)

SELECT TOP 2000 [E].*
FROM [dbo].[Event] AS [E]
INNER JOIN [dbo].[Location] AS [L] ON [E].[EventId] = [L].[EventId]
WHERE 
    --[E].[Price] >= @Price
    --AND
    @BoundingBox.STContains([L].[Location]) = 1

SELECT TOP 2000 [E].*
FROM [dbo].[Event] AS [E]
INNER JOIN [dbo].[Location] AS [L] ON [E].[EventId] = [L].[EventId]
WHERE 
    [E].[Price] >= @Price
    --AND
    --@BoundingBox.STContains([L].[Location]) = 1

SELECT TOP 2000 [E].*
FROM [dbo].[Event] AS [E]
INNER JOIN [dbo].[Location] AS [L] ON [E].[EventId] = [L].[EventId]
WHERE 
    [E].[Price] >= @Price
    AND
    @BoundingBox.STContains([L].[Location]) = 1

I get 93ms, 4ms and 93ms respectively, removing the TOP 2000 from the queries causes the price filtered query to blow out to a minute or so.

Are you able to give more details (execution plans, DDL, etc) to help us replicate your issue. My initial thought was that the TOP 2000 was hiding the underlying problem. I think that may still be the case.

Would you please try the following with spatial index on the Location-table and report back:

SELECT *
FROM (
  SELECT TOP 2000 [E].*
  FROM [dbo].[Event] AS [E]
  INNER JOIN [dbo].[Location] AS [L] ON [E].[Id] = [L].[EventId]
  WHERE [E].[Price] >= @Price
  ) AS INNER_QUERY
WHERE @BoundingBox.STContains(INNER_QUERY.[Position]) = 1

We probably encountered the same problem and the execution plan seems ridiculous. The exact circumstances our problem occured under provided the main part of this workaround, that is using regular filters in an inner query, spatial filters in the outer query and using top to limit the inner queries’ results (any number will do).

EDIT:

To expand on the issue, it seems to be an adverse decision of the query optimizer under certain conditions (might be table size, i can’t reproduce the issue with ~1m entries with index on filter column). The execution plan looks like it would do the spatial filter over the whole table instead of the filtered subset but the actual time suggests both filters actually do interact (A query just using the spatial filter takes a lot longer). No idea what exactly is going on. In our case query cost was times 100 of what it should be. Other than the weird workaround i posted you can also force a query to use a certain index. so that would probably be a better idea:

SELECT TOP 2000 [E].*
FROM [dbo].[Event] AS [E] WITH (INDEX(<INDEX_NAME>))
INNER JOIN [dbo].[Location] AS [L] ON [E].[Id] = [L].[EventId]
WHERE [E].[Price] >= @Price
AND @BoundingBox.STContains([L].[Position]) = 1

SQL Server Version used: Microsoft SQL Server 2014 (SP2-GDR)

Leave a Reply

Your email address will not be published. Required fields are marked *