postgres: index to SELECT strings like ‘%foo%’;

Posted on

Question :

I have an expensive, periodic PostgreSQL 9.3 query of the form:

SELECT * from mytable where name NOT LIKE '%foo%';

foo is actually a constant here that never changes. The query is expensive because it requires a table scan. I expect that only a few rows will actually match the query, so I want to use a partial index to speed up the query.

CREATE INDEX foo_idx ON mytable ((name NOT LIKE '%foo%')) WHERE name NOT LIKE '%foo%';

However, when I do an EXPLAIN on my query, it still is using a sequential scan instead of relying on the new index. What am I doing wrong?

I have tried several different expressions inside the parenthesized expression clause of the CREATE INDEX statement, but nothing helps:

CREATE INDEX foo_idx ON mytable (name) WHERE name NOT LIKE '%foo%';
CREATE INDEX foo_idx ON mytable ((name LIKE '%foo%')) WHERE name NOT LIKE '%foo%';

I have also tried removing the WHERE clause for the index, and that doesn’t help either.

Answer :

The index should work as is. As pointed out by @jjanes, you just needed to run ANALYZE. However, I suggest you modify the index. The expression in your definition is not useful.

CREATE INDEX foo_idx ON intentions ((data LIKE '%foo%')) WHERE data NOT LIKE '%foo%';

The expression is always FALSE and useless noise. Either simplify to the slightly cheaper equivalent index:

CREATE INDEX foo_idx ON intentions ((FALSE)) WHERE data NOT LIKE '%foo%';

Or, better yet, put something useful there. Since each index entry is going to allocate MAXALIGN (typically 8 bytes) for data anyway, and the overhead per index row is another 12 bytes (and the default fillfactor for btree indexes is 90%) you might as well put the index to good use and help other queries. If you have any other queries that might use the same partial index?

If you put something worth up to 8 bytes (like one or two int4 columns), the index retains exactly the same size and performance, but offers more possible use cases. If (one of) the column(s) is involved in updates, you have some additional maintenance cost. Still, if you have anything useful, use it. Any additional column that’s used in a JOIN / WHERE or ORDER BY clause. Or column(s) that enable index-only scans:

CREATE INDEX foo_idx ON intentions (<useful_column(s)>)
WHERE data NOT LIKE '%foo%';

You could try a slightly different approach.

CREATE INDEX foo_idx ON mytable (strpos(name,'foo'));

SELECT * from mytable where strpos(name,'foo') = 0;

Index on Expression

The NOT LIKE will be problematic. Imagine this query without the NOT. The index could be walked to find the interval of values (indexes are sorted lists) which match. The NOT means you want everything else, both before and after the matching interval. It’s a pretty special optimiser which will do that.

You may be able to invert the indexes to identify all the rows which do contain “foo”, then use that result in a NOT EXISTS (or equivalent) to find your desired output. I think you’ll still be seeing a table scan, however.

Alternatively add the column IsFoo to the table. Populate this on table writes (perhaps use a persisted calculated column?). Include IsFoo in your indexes and filters.

I needed to run ANALYZE for the proper query plan to take effect. Once this was done, everything worked like a charm. My final index looks like this:

CREATE INDEX foo_idx ON intentions ((data LIKE '%foo%')) WHERE data NOT LIKE '%foo%';

Leave a Reply

Your email address will not be published. Required fields are marked *