Composite multicolumn index for geopoint range and numeric range query

Posted on

Question :

I am building an app where the server needs to select rows based on some criteria/filters. One of them is the location of the user and the radius at which the user want’s to see posts and other filters such date range and filter for a value of another column. This is going to be for an ad-hoc event discovery app.

I have read about PostGIS, its geometry,geography types and I know there is a native point datatype. Based on this answer I understood that it is better to order from equality to range columns, even though I feel like geo point column should be the first.

Suppose the following few rows of a simplified events table (disregard the validity position data):

id  event_title                  event_position   event_type  is_public  start_date
    (varchar)                    (point lat/lon)  (smallint)  (boolean)  (timestamptz)
--  ---------------------------  ---------------  ---------   ---------  ----
 1  "John's Party"               (122,35)         0           0          2020-07-05
 2  "Revolution then Starbucks"  (123,30)         1           1          2020-07-06
 3  "Study for math exam"        (120,36)         2           1          2020-07-07
 4  "Party after exam"           (120,36)         1           1          2020-07-08
 5  "Hiking next to the city"    (95,40)          3           1          2020-07-09
 6  "Football match"             (-42,31)         4           1          2020-07-10

Imagine the table contains several thousand records at least, obviously not only 6.

So in this table a user would be able to query public events close to (122,34) by 100km (suppose first three rows fall into this area) and of event types 0, 1 or 2 falling between dates 2020-07-05 and 2020-07-07. The user would get the rows with ID 2 and 3.

This is the query I want to optimize with an appropriate index. My question is, how is it possible to create such an index? I thought about GiST or GIN index but not sure how these could help. Thanks!

Answer :

@jjanes’ answer on stackoverflow that I accepted

With the help of the btree_gist extension, you can include the
event_type and start_date columns into the GiST index along with the
event_position. However, the event_type won’t be very useful in the
index as long as the restriction clause is something like event_type in (0, 1, 2). (But if the list only contains one element, it will be
rewritten into an equality, and in that case it can use that column in
the index efficiently.) So using the other two columns would be my
starting point. I would put the usually more selective one first. If
you are going to leave past events in the table rather than clearing
them out, then the date is probably going to eventually be the more
selective one.

If the location condition is not very selective at all (e.g. most of
your events are in NYC, and most of your users are in NYC, and almost
everything is within 100km of almost everything else), then you might
want a different approach. Just make a default BTREE index on
(event_type, start_date). Unlike GiST, such a BTREE can effectively
use a condition like event_type in (0, 1, 2) along with AND start_date between x and y.

I don’t think that there is a fundamental reason GiST can’t use the
in-list effectively while BTREE can. Maybe that will be fixed for
GiST in some future version.

Leave a Reply

Your email address will not be published. Required fields are marked *