Question :
I have a large Postgres table contain geographic positions (latitude and longitude) both fields are indexed, and both are defined as NUMERIC(9,6)
.
If I run a query looking for an exact position match, something like this:
WHERE latitude = 1.234567890123456789
AND longitude = 9.876543210987654321
Then get a very fast response, but I get very few results because the database is searching for a very precise match.
For my purposes, I’m looking for positions that match to within a few meters so a match to 4 or 5 decimal places should be fine. This gives me the results I’m looking for:
WHERE ABS(latitude - 1.234567890123456789) < 0.0001
AND ABS(longitude - 9.876543210987654321) < 0.0001
But NOT the performance (it can take 5 minutes to run, compared to a fraction of a second for the exact search)
Next I tried rounding the precision down:
WHERE ROUND( latitude, 4) = ROUND( 1.234567890123456789, 4)
AND ROUND( longitude,4) = ROUND( 9.876543210987654321, 4)
Again, same problem. Got the results I wanted, but took far too long.
So, my question is how can I search for a close match between two numbers, without losing performance?
UPDATE – SOLVED:
As a couple of commenters have observed, using BETWEEN
seems to work fine.
Answer :
The smart and fast solution for this class of problems is an index-backed “nearest neighbor” search.
For the record: if you want precise results with spatial data use PostGis and operate with geometry
or geography
types. Here is a starting point. And operate with ST_DWithin()
. Examples:
Sticking to your setup (2-D points, no PostGis), and ignoring the additional approximation error of handling spatial data in a 2-D plain, which seems negligible for the case at hand – I suggest a space-partitioned GiST index (on an expression in your case):
CREATE INDEX tbl_location_spgist_idx ON tbl USING SPGIST (point(longitude, latitude));
Why SP-Gist? See:
To get a maximum of 10 “nearest neighbors” in next to no time:
SELECT *
FROM tbl
ORDER BY point (longitude, latitude)
<-> point '(9.876543210987654321,1.234567890123456789)'
LIMIT 10;
You can then filter the ones close enough. To get a maximum of 10 closest within a maximum distance:
SELECT *, ((longitude - 9.876543210987654321) ^ 2
+ (latitude - 1.234567890123456789) ^ 2) AS squared_dist
FROM tbl
WHERE ((longitude - 9.876543210987654321) ^ 2
+ (latitude - 1.234567890123456789) ^ 2) < 0.000000001 -- or whatever
ORDER BY point (longitude, latitude)
<-> point '(9.876543210987654321,1.234567890123456789)'
LIMIT 10;
db<>fiddle here
I use a squared distance to get nearest neighbors based on simple Pythagorean theorem. The beauty of it: the calculation is only performed on the nearest neighbors, so it’s still very fast when the calculation gets more expensive – even in big tables.