Question :
I’m evaluating an approach for reporting in a tag based time tracking system. I’m struggling to understand varying query runtimes or coming up with a different system to produce the desired query results.
Schema:
CREATE TABLE intervals_tags (
interval_id integer NOT NULL,
interval_duration integer NOT NULL,
tag_type character varying NOT NULL,
tag character varying NOT NULL
);
CREATE INDEX index_interval_id ON intervals_tags(interval_id int4_ops);
CREATE INDEX index_tag_type ON intervals_tags(tag_type text_ops);
CREATE INDEX index_tag ON intervals_tags(tag text_ops);
At the planned full capacity the intervals_tags
table holds around 2 million rows and the goal is to aggregate intervals based on different tag based criterions. One such query is “Duration per user for a specific client”:
SELECT
tag AS user,
SUM(interval_duration)
FROM intervals_tags
INNER JOIN (
SELECT
interval_id
FROM intervals_tags
WHERE tag = 'client:60'
) tt1
ON intervals_tags.interval_id = tt1.interval_id
WHERE tag_type = 'user'
GROUP BY tag;
With my dataset the query returns in 20ms. When I instead pick the client tag client:84
the query takes over 500ms to complete.
Questions
-
Why is there such a big difference? (The only difference is the amount of rows per tag.
client:60
has around 2.000 rows in the table, andclient:84
stands at 10.000 rows.) -
What can I do to improve performance? (The goals is a P99 of 500ms.)
-
What schema would be better suited for such tag based queries?
Explain for client:60
GroupAggregate (cost=7681.25..7695.20 rows=797 width=19) (actual time=18.695..19.373 rows=15 loops=1)
Group Key: intervals_tags.tag
-> Sort (cost=7681.25..7683.24 rows=797 width=15) (actual time=18.638..18.925 rows=1987 loops=1)
Sort Key: intervals_tags.tag
Sort Method: quicksort Memory: 142kB
-> Nested Loop (cost=21.99..7642.84 rows=797 width=15) (actual time=0.779..14.715 rows=1987 loops=1)
-> Bitmap Heap Scan on intervals_tags intervals_tags_1 (cost=21.57..2207.74 rows=663 width=4) (actual time=0.766..3.191 rows=1987 loops=1)
Recheck Cond: ((tag)::text = 'client:60'::text)
Heap Blocks: exact=1848
-> Bitmap Index Scan on index_tag (cost=0.00..21.40 rows=663 width=0) (actual time=0.458..0.458 rows=1987 loops=1)
Index Cond: ((tag)::text = 'client:60'::text)
-> Index Scan using index_interval_id on intervals_tags (cost=0.43..8.19 rows=1 width=19) (actual time=0.005..0.005 rows=1 loops=1987)
Index Cond: (interval_id = intervals_tags_1.interval_id)
Filter: ((tag_type)::text = 'user'::text)
Rows Removed by Filter: 3
Planning time: 0.348 ms
Execution time: 19.427 ms
Explain for client:84
HashAggregate (cost=61532.59..61544.24 rows=1165 width=19) (actual time=551.934..551.987 rows=15 loops=1)
Group Key: intervals_tags.tag
-> Hash Join (cost=41533.65..61466.55 rows=13208 width=15) (actual time=444.025..547.858 rows=9958 loops=1)
Hash Cond: (intervals_tags_1.interval_id = intervals_tags.interval_id)
-> Bitmap Heap Scan on intervals_tags intervals_tags_1 (cost=289.58..14421.85 rows=10987 width=4) (actual time=4.059..22.508 rows=9958 loops=1)
Recheck Cond: ((tag)::text = 'client:84'::text)
Heap Blocks: exact=7210
-> Bitmap Index Scan on index_tag (cost=0.00..286.83 rows=10987 width=0) (actual time=2.478..2.478 rows=9958 loops=1)
Index Cond: ((tag)::text = 'client:84'::text)
-> Hash (cost=32230.19..32230.19 rows=490951 width=19) (actual time=439.674..439.674 rows=500000 loops=1)
Buckets: 65536 Batches: 8 Memory Usage: 3449kB
-> Bitmap Heap Scan on intervals_tags (cost=11813.30..32230.19 rows=490951 width=19) (actual time=77.849..258.581 rows=500000 loops=1)
Recheck Cond: ((tag_type)::text = 'user'::text)
Heap Blocks: exact=14280
-> Bitmap Index Scan on index_tag_type (cost=0.00..11690.56 rows=490951 width=0) (actual time=74.719..74.719 rows=500000 loops=1)
Index Cond: ((tag_type)::text = 'user'::text)
Planning time: 0.359 ms
Execution time: 552.223 ms
Answer :
SELECT tag AS user,
SUM(interval_duration)
FROM intervals_tags
WHERE interval_id in (
select interval_id from intervals_tags
WHERE tag = 'client:60' )
and tag_type = 'user'
GROUP BY tag;