postgres_fdw performance is slow

Posted on

Question :

The following query on a foreign takes about 5 seconds to execute on 3.2 million rows:

SELECT x."IncidentTypeCode", COUNT(x."IncidentTypeCode") 
FROM "IntterraNearRealTimeUnitReflexes300sForeign" x 
WHERE x."IncidentDateTime" >= '05/01/2016' 
GROUP BY x."IncidentTypeCode" 

When I execute the same query on normal table, it returns in .6 seconds. The execution plans are quite different:

Normal Table

Sort  (cost=226861.20..226861.21 rows=4 width=4) (actual time=646.447..646.448 rows=7 loops=1) 
  Sort Key: "IncidentTypeCode" 
  Sort Method: quicksort  Memory: 25kB 
  -> HashAggregate (cost=226861.12..226861.16 rows=4 width=4) (actual  time=646.433..646.434 rows=7 loops=1)
     Group Key: "IncidentTypeCode"
     -> Bitmap Heap Scan on "IntterraNearRealTimeUnitReflexes300s" x  (cost=10597.63..223318.41 rows=708542 width=4) (actual time=74.593..342.110 rows=709376 loops=1) 
        Recheck Cond: ("IncidentDateTime" >= '2016-05-01 00:00:00'::timestamp without time zone) 
        Rows Removed by Index Recheck: 12259 
        Heap Blocks: exact=27052 lossy=26888
        -> Bitmap Index Scan on idx_incident_date_time_300  (cost=0.00..10420.49 rows=708542 width=0) (actual time=69.722..69.722 rows=709376 loops=1) 
           Index Cond: ("IncidentDateTime" >= '2016-05-01 00:00:00'::timestamp without time zone) 

Planning time: 0.165 ms 
Execution time: 646.512 ms

Foreign Table

Sort  (cost=241132.04..241132.05 rows=4 width=4) (actual time=4782.110..4782.112 rows=7 loops=1)   
  Sort Key: "IncidentTypeCode" 
  Sort Method: quicksort  Memory: 25kB
  -> HashAggregate  (cost=241131.96..241132.00 rows=4 width=4) (actual time=4782.097..4782.100 rows=7 loops=1)
     Group Key: "IncidentTypeCode"
     -> Foreign Scan on "IntterraNearRealTimeUnitReflexes300sForeign" x  (cost=10697.63..237589.25 rows=708542 width=4) (actual time=1.916..4476.946 rows=709376 loops=1) 

Planning time: 1.413 ms 
Execution time: 4782.660 ms

I think I’m paying a high price for the GROUP BY clause, which isn’t passed to the foreign server when I EXPLAIN VERBOSE:

    PUBLIC ."IntterraNearRealTimeUnitReflexes300s"
            "IncidentDateTime" >= '2016-05-01 00:00:00' :: TIMESTAMP WITHOUT TIME ZONE

This returns 700k rows. Is there a way around this?

I spent a lot of time reading this documentation page yesterday, and thought I had found my answer with setting use_remote_estimate to true, but it had no effect.

I do have access to the foreign server to create objects if necessary. The timestamp value in the WHERE clause can be anything; it does not come from a list of predefined values.

Answer :

If you use use_remote_estimate be sure to run ANALYZE the foreign table (I see estimations pretty close with the returned, you’d probably did it). Also, the pushdown improvements are not available in <9.5 version. I also assume that you have the same table structure on the remote server either (including indexes). If a bitmap is needed due to the low cardinality, it won’t use the index due to the limitations on the pushdown mechanism. You may want to reduce the amount of returned rows to force a BTREE index scan (timestamp ranges). Unfortunately, there is no clean way to avoid the SeqScan on the remote server if the filter returns +10% of the rows of the table (may vary this percentage if the planner considers that scanning the whole table is cheaper than seek reads). If you are using SSD, you will probably find useful to tweak random_page_cost).

You can use CTE to isolate the GROUP BY behavior:

WITH atable AS (
    SELECT "IncidentTypeCode"
    FROM PUBLIC ."IntterraNearRealTimeUnitReflexes300s"
              BETWEEN '2016-05-01 00:00:00'::TIMESTAMP WITHOUT TIME ZONE 
                  AND '2016-05-02 00:00:00'::TIMESTAMP WITHOUT TIME ZONE)
SELECT atable."IncidentTypeCode", COUNT(atable.IncidentTypeCode) 
FROM atable
GROUP BY atable."IncidentTypeCode" 
ORDER BY atable."IncidentTypeCode";

Leave a Reply

Your email address will not be published. Required fields are marked *