Postgres query very slow when using index

Posted on

Question :

So I’ve run into this issue with a slow Postgres query. I’ve added an index but it doesn’t seem to help. It takes 8 minutes to get 100 records from a table with 2822968 records.

Here’s the query that I run:

SELECT  "organisations"."id", "organisations"."url"
FROM "organisations" 
WHERE "organisations"."url" IS NOT NULL 
AND "organisations"."domain" IS NULL 
ORDER BY "organisations"."id" ASC

And here is the explain analyze

Limit  (cost=17468.48..17468.73 rows=100 width=38) (actual time=517448.061..517449.431 rows=100 loops=1)
  ->  Sort  (cost=17468.48..17482.23 rows=5501 width=38) (actual time=517448.057..517448.791 rows=100 loops=1)
        Sort Key: id
        Sort Method: top-N heapsort  Memory: 38kB
        ->  Bitmap Heap Scan on organisations  (cost=193.31..17258.23 rows=5501 width=38) (actual time=93.370..516988.919 rows=1096205 loops=1)
              Recheck Cond: ((url IS NOT NULL) AND (domain IS NULL))
              Heap Blocks: exact=59205
              ->  Bitmap Index Scan on index_organisations_on_url_and_domain  (cost=0.00..191.93 rows=5501 width=0) (actual time=82.242..82.243 rows=1098765 loops=1)
Planning Time: 0.202 ms
Execution Time: 517453.118 ms

I use rails – here is the partial multi column index from the schema file:

t.index ["url", "domain"], name: "index_organisations_on_url_and_domain", where: "((url IS NOT NULL) AND (domain IS NULL))"

If it helps this is running on Postgres RDS on a small instance. There are no other connections running as I turned off all background jobs, and there are no users connected using the app at the time of running.

I’m not sure how I can get this to return faster.

Answer :

It seems that the statistics for that index are way off, and PostgreSQL may be faster performing a sequential scan.

Try to

ANALYZE organizations;

and see if that improves matters.

Another idea may be an index on (if you don’t already have that) with the idea to fetch rows in sort order and discard the ones that don’t meet the criterion.

The perfect index would be

CREATE INDEX ON organizations (id)

You could also include the url column in the index (using the INCLUDE clause in v11) and aim for an index only scan.

a better index would be

CREATE INDEX ON organizations (domain,url);

as that index will group all the null domains together and under that all the not-null urls will also be in one block.

Leave a Reply

Your email address will not be published. Required fields are marked *