Question :
I have this Postgres function. It forces text to be lowercase and all nulls to become empty strings, so that I more cleanly perform searches for things that don’t match something, etc:
CREATE OR REPLACE FUNCTION magic_text(txt text) RETURNS text
IMMUTABLE PARALLEL SAFE
LANGUAGE SQL AS $$
SELECT lower(coalesce(txt,''))
$$;
This function is heavily used in many different query types, so I created all of the varieties of text index for it:
CREATE INDEX index_magic_cards_on_oracle_text_magic
ON magic_cards
USING BTREE (magic_text(oracle_text)) WITH (fillfactor = 100);
CREATE INDEX index_magic_cards_on_oracle_text_magic_text_pattern
ON magic_cards
USING BTREE (magic_text(oracle_text) text_pattern_ops) WITH (fillfactor = 100);
CREATE INDEX index_magic_cards_on_oracle_text_magic_gist_trgm
ON magic_cards
USING GIST (magic_text(oracle_text) gist_trgm_ops);
CREATE INDEX index_magic_cards_on_oracle_text_magic_gin_trgm
ON magic_cards
USING GIN (magic_text(oracle_text) gin_trgm_ops)
These indexes are interfering somehow with some (but not all) kinds of complex regular expression searches. I haven’t been able to determine if specific regex symbols or features cause the problem.
Here’s an example (explain.depesz):
SELECT card_name
FROM magic_cards
WHERE magic_text(oracle_text) ~ '***:(?n)eldrazi (?!scion)';
This returns nothing, and according to the query planner, it performs a bitmap index scan on index_magic_cards_on_oracle_text_magic_gin_trgm
.
This expanded query also scans the same index, and fails to find anything (explain.depesz):
SELECT card_name
FROM magic_cards
WHERE lower(coalesce(oracle_text, '')) ~ '***:(?n)eldrazi (?!scion)';
However, if I force Postgres to not use my index, this query has results! (explain.depesz) The following has multiple rows of results and performs a sequential scan on the table. All I did below was change the coalesce fallback to ⌘
, which shouldn’t affect the results:
SELECT card_name
FROM magic_cards
WHERE lower(coalesce(oracle_text, '⌘')) ~ '***:(?n)eldrazi (?!scion)';
Why does using the index change the results for regular expression searches?
(Postgres 9.6.1, pg_trgm 1.3)
Answer :
This looks like a live bug in pg_trgm to me. I can strip away much of the stuff and still reproduce it with this simple test case:
create table foobar (x text);
insert into foobar values ('eldrazi scion'),('eldrazi scio');
create extension pg_trgm ;
create index on foobar using gin (x gin_trgm_ops);
select * from foobar where x ~ 'eldrazi (?!scion)'; -- returns 1 row
set enable_seqscan TO off;
select * from foobar where x ~ 'eldrazi (?!scion)'; -- returns 0 rows
Note that a similar bug was fixed recently (in yet-to-be-released code, commit 16500d2278ab3dd), but that fix does not fix this bug.
I’ve reported this bug myself, as bug #14623.
And it has now been fixed. When 9.6.3 is released, it should contain the bug-fix.