Why is this simple query making up 50% execution time?

Posted on

Question :

I’m using postgresql 9.5. I installed the pg_stat_statements extension to analyze queries and got the following surprising result:

scheduler=# SELECT substring(query, 1, 30) AS query,                                                                                                                              calls,                                                                                                                                                                      round(total_time::numeric, 2) AS total_time,                                                                                                                                round(mean_time::numeric, 2) AS mean_time,
      round((100 * total_time / sum(total_time) OVER ())::numeric, 2) AS percentage
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;
             query              | calls | total_time  | mean_time | percentage
--------------------------------+-------+-------------+-----------+------------
 SELECT "auth_user"."id", "auth | 41468 | 14663721.55 |    353.62 |      50.84
 SELECT "rsvp_event"."id", "rsv | 24052 |  1405270.05 |     58.43 |       4.87
 SELECT "scheduler_playofffixtu |  1722 |  1282956.76 |    745.04 |       4.45
 SELECT "teammgr_team"."id", "t |  1651 |   943087.30 |    571.22 |       3.27
 SELECT "scheduler_match"."id", |  2206 |   828515.56 |    375.57 |       2.87
 SELECT "scheduler_match"."star |  1455 |   745773.01 |    512.56 |       2.59
 SELECT "auth_user"."id", "auth |  2085 |   526799.42 |    252.66 |       1.83
 SELECT "scheduler_match"."id", |   781 |   482996.54 |    618.43 |       1.67
 SELECT (?) AS "a" FROM "schedu |  3389 |   471169.19 |    139.03 |       1.63
 SELECT "scheduler_playofffixtu |   641 |   470971.65 |    734.75 |       1.63

The query in question is:

SELECT "auth_user"."id", 
       "auth_user"."password", 
       "auth_user"."last_login", 
       "auth_user"."is_superuser", 
       "auth_user"."username", 
       "auth_user"."first_name", 
       "auth_user"."last_name", 
       "auth_user"."email", 
       "auth_user"."is_staff", 
       "auth_user"."is_active", 
       "auth_user"."date_joined" 
FROM "auth_user" 
  LEFT OUTER JOIN "schedule_maker_association_admins" ON ( "auth_user"."id" = "schedule_maker_association_admins"."user_id" ) 
WHERE ("auth_user"."is_staff" = ? 
       OR "schedule_maker_association_admins"."association_id" = ?)

It checks whether a user is staff or the admin of a given association. This is a very common query in our app.

I am not a db expert but I don’t see anything wrong with it. The involved columns are indexes:

scheduler=# d schedule_maker_association_admins
                             Table "public.schedule_maker_association_admins"
     Column     |  Type   |                                   Modifiers
----------------+---------+--------------------------------------------------------------------------------
 id             | integer | not null default nextval('schedule_maker_association_admins_id_seq'::regclass)
 association_id | integer | not null
 user_id        | integer | not null
Indexes:
    "schedule_maker_association_admins_pkey" PRIMARY KEY, btree (id)
    "schedule_maker_association__association_id_4724c38a38e0bdb_uniq" UNIQUE CONSTRAINT, btree (association_id, user_id)
    "schedule_maker_association_admins_association_id" btree (association_id)
    "schedule_maker_association_admins_user_id" btree (user_id)
Foreign-key constraints:
    "association_id_refs_id_6974cbf10541a1a3" FOREIGN KEY (association_id) REFERENCES schedule_maker_association(id) DEFERRABLE INITIALLY DEFERRED
    "user_id_refs_id_4b393f8cdfbf03cc" FOREIGN KEY (user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED

And here’s auth_user:

Indexes:
    "auth_user_pkey" PRIMARY KEY, btree (id)
    "auth_user_username_key" UNIQUE CONSTRAINT, btree (username)

I’ve analyzed it and found nothing suspcious, but it does take 300-400 ms. I would expect such simple query to be much faster.

EXPLAIN ANALYZE SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" LEFT OUTER JOIN "schedule_maker_association_admins" ON ( "auth_user"."id" = "schedule_maker_association_admins"."user_id" ) WHERE ("auth_user"."is_staff" = true OR "schedule_maker_association_admins"."association_id" = 123456);



Merge Left Join  (cost=0.84..59074.80 rows=2 width=174) (actual time=0.012..308.119 rows=15 loops=1)
   Merge Cond: (auth_user.id = schedule_maker_association_admins.user_id)
   Filter: (auth_user.is_staff OR (schedule_maker_association_admins.association_id = 123456))
   Rows Removed by Filter: 439229
   ->  Index Scan using auth_user_pkey on auth_user  (cost=0.42..47174.48 rows=419703 width=174) (actual time=0.005..135.908 rows=425904 loops=1)
   ->  Index Scan using schedule_maker_association_admins_user_id on schedule_maker_association_admins  (cost=0.42..7426.66 rows=228294 width=8) (actual time=0.004..46.421 rows=227838 loops=1)
 Planning time: 0.168 ms
 Execution time: 308.140 ms
(8 rows)

Possible explanation and solution?

Edit: here’s EXPLAIN (ANALYZE, BUFFERS)

Merge Left Join  (cost=0.84..59074.80 rows=2 width=174) (actual time=0.009..406.478 rows=15 loops=1)
   Merge Cond: (auth_user.id = schedule_maker_association_admins.user_id)
   Filter: (auth_user.is_staff OR (schedule_maker_association_admins.association_id = 123456))
   Rows Removed by Filter: 439231
   Buffers: shared hit=147747
   ->  Index Scan using auth_user_pkey on auth_user  (cost=0.42..47174.48 rows=419703 width=174) (actual time=0.004..217.342 rows=425906 loops=1)
         Buffers: shared hit=116728
   ->  Index Scan using schedule_maker_association_admins_user_id on schedule_maker_association_admins  (cost=0.42..7426.66 rows=228294 width=8) (actual time=0.003..57.262 rows=227840 loops=1)
         Buffers: shared hit=31019
 Planning time: 0.947 ms
 Execution time: 406.536 ms

Answer :

With the right indexes (you haven’t told us what indexes the other table has) you could get this to be efficient by writing the query this way:

SELECT whatever FROM "auth_user" where "auth_user"."is_staff" = true 
  UNION 
SELECT whatever FROM "auth_user" 
JOIN "schedule_maker_association_admins" ON ( "auth_user"."id" = "schedule_maker_association_admins"."user_id" ) 
WHERE ("schedule_maker_association_admins"."association_id" = 79)

But you will have to think carefully about whether you want UNION, or UNION ALL, or something which differs from both of those.

If someone is both staff and the admin of the given association, should they be returned once or more than once, or doesn’t it matter? If they are staff and also admin of several different associations (none of which are the one you queries for), how many times should they be returned? Does it matter, as long as it is at least once?

Leave a Reply

Your email address will not be published. Required fields are marked *