I have millions of things in a table. I’d like to assign groups to the things based on theirs prefixes.
I tried to add all the indexes but the planner still does nested loop left join.
A demo case SQL:
create table things as select i as id, left(md5(random()::text), 8) as name from generate_series(1, 100000) as i; create table match_group_rules as select i as id, trunc(random() * 5 + 1) as group_id, left(md5(random()::text), 2) as rule from generate_series(1, 100) as i; create extension if not exists pg_trgm; create index match_group_rules_rule on match_group_rules (rule); create index match_group_rules_rule_pattern on match_group_rules (rule text_pattern_ops); create index things_name_idx on things (name); create index things_name_pattern_idx on things (name text_pattern_ops); create index things_name_gin_trgm_idx on things using gin (name gin_trgm_ops); create index things_name_gist_trgm_idx on things using gist (name gist_trgm_ops); explain select * from things t left join match_group_rules r on t.name like r.rule || '%';
The demo case planner output:
Nested Loop Left Join (cost=0.00..176543.25 rows=100000 width=57) Join Filter: (t.name ~~ (r.rule || '%'::text)) -> Seq Scan on things t (cost=0.00..1541.00 rows=100000 width=13) -> Materialize (cost=0.00..2.50 rows=100 width=44) -> Seq Scan on match_group_rules r (cost=0.00..2.00 rows=100 width=44)
- What makes the planner ignore the indexes?
- Is there a quick way to compute the groups for the things?
The initial problem is the left join. That restricts it to using
things as the driving table, but once you do that there is no way to use the indexes, as
text_pattern_ops offer ways to index for texts matching a given pattern, not for patterns matching a given text.
In order to use an index on the left join, you would have to first do a nested loop with
match_group_rules driving, memorize all the rows of
things retrieved, and then go back and make up NULL rows for the
things not found. There is no inherent reason that I can see that PostgreSQL wouldn’t be able to do this complex operation, it just isn’t implemented. You can do it yourself with a CTE and a UNION ALL:
with foobar as (select t.id as t_id, name, r.id, r.group_id, r.rule from things t join match_group_rules r on t.name like r.rule || '%' ) select * from foobar union all select id, name, NULL, NULL, NULL from things t2 where not exists (select 1 from foobar where t2.id=foobar.t_id);