PostgreSQL: Another seq scan where an index scan is expected

Posted on

Question :

Here’s the query:

SELECT DISTINCT ON(points.track_id)
  participants.contest_number,
  points.track_id,
  points.lat,
  points.lon,
  points.alt,
  points.timestamp,
  points.h_speed,
  points.v_speed,
  points.distance
FROM
  tracks_competitiontrackpoint points
  JOIN tracks_competitiontrack tracks ON points.track_id=tracks.track_ptr_id
  JOIN competitions_contestparticipant participants ON tracks.participant_id=participants.id
WHERE
  tracks.task_id = 24 AND points.timestamp <= 1409663400 AND tracks.online = true
ORDER BY
  points.track_id, points.timestamp desc, points.id;

Query plan:

Unique  (cost=103411.24..104217.95 rows=527 width=38) (actual time=2507.814..2558.081 rows=113 loops=1)
  ->  Sort  (cost=103411.24..103814.60 rows=161342 width=38) (actual time=2507.805..2541.076 rows=82022 loops=1)
        Sort Key: points.track_id, points."timestamp", points.id
        Sort Method: external merge  Disk: 3984kB
        ->  Hash Join  (cost=69.66..85041.85 rows=161342 width=38) (actual time=2253.401..2408.140 rows=82022 loops=1)
              Hash Cond: (points.track_id = tracks.track_ptr_id)
              ->  Seq Scan on tracks_competitiontrackpoint points  (cost=0.00..73678.24 rows=2581475 width=34) (actual time=0.711..1398.816 rows=2575197 loops=1)
                    Filter: ("timestamp" <= 1409663400)
              ->  Hash  (cost=69.10..69.10 rows=45 width=8) (actual time=1.831..1.831 rows=122 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 5kB
                    ->  Hash Join  (cost=12.60..69.10 rows=45 width=8) (actual time=0.522..1.738 rows=122 loops=1)
                          Hash Cond: (participants.id = tracks.participant_id)
                          ->  Seq Scan on competitions_contestparticipant participants  (cost=0.00..48.74 rows=974 width=8) (actual time=0.008..0.679 rows=976 loops=1)
                          ->  Hash  (cost=12.04..12.04 rows=45 width=8) (actual time=0.383..0.383 rows=122 loops=1)
                                Buckets: 1024  Batches: 1  Memory Usage: 5kB
                                ->  Bitmap Heap Scan on tracks_competitiontrack tracks  (cost=5.30..12.04 rows=45 width=8) (actual time=0.194..0.297 rows=122 loops=1)
                                      Filter: (online AND (task_id = 24))
                                      ->  Bitmap Index Scan on tracks_competitiontrack_online  (cost=0.00..5.29 rows=139 width=0) (actual time=0.144..0.144 rows=159 loops=1)
                                            Index Cond: (online = true)
  Total runtime: 2559.586 ms

Or highlighted results: http://explain.depesz.com/s/ENi

I use DISCTINCT ON to query the latest result for each tracks_competitiontrackpoint.track_id ordered by tracks_competitiontrackpoint.timestamp. Index is enabled on tracks_competitiontrackpoint.track_id, tracks_competitiontrackpoint.timestamp, tracks_competitiontrack.participant_id, tracks_competitiontrack.task_id – pretty much everything that. Still there’a sequential scan for two tables, one of which is taking a lot of time. Could you please explain why?

Answer :

Since you are on 9.3, can you try this rewriting:

SELECT 
    participants.contest_number,
    points.track_id,
    points.lat,
    points.lon,
    points.alt,
    points.timestamp,
    points.h_speed,
    points.v_speed,
    points.distance
FROM
    ( SELECT track_ptr_id, participant_id
      FROM tracks_competitiontrack 
      WHERE task_id = 24
        AND online = true
    ) AS tracks 
  LEFT JOIN 
    competitions_contestparticipant AS participants 
      ON tracks.participant_id = participants.id
  JOIN LATERAL
    ( SELECT p.*
      FROM tracks_competitiontrackpoint AS p
      WHERE p.track_id = tracks.track_ptr_id
        AND p.timestamp <= 1409663400
      ORDER BY p.track_id, p.timestamp desc, p.id
      LIMIT 1
    ) AS points ON TRUE
ORDER BY
    points.track_id ;

I think that with indexes on tracks (task_id, online, track_ptr_id, participant_id) and on pooints (track_id, timestamp desc, id), it would preform better, assuming that the condition (WHERE task_id = 24 AND online = true) restricts to a small number of rows.

Leave a Reply

Your email address will not be published. Required fields are marked *