How to make MYSQL smart enough to understand when it should take into account stop word and when it shouldn’t?

Posted on

Question :

Ok, here is the problem. I have a table

Text
text nature ....
text2 nature ....
a lot of nature ... rows here..
....
go with nature
with nature her go it

Now, i select:

select * From MyFullText where MATCH(text) AGAINST ('+go +with +nature' IN BOOLEAN MODE)

As a human being, we will bring out only 2 rows:

Text
go with nature
with nature her go it

However, MySql is not human & it shows all rows that have the word “nature” since “go” & “with” are stop words so it doesn’t take into account these words.

What if we have 1000 rows containing the word “nature”? if that is the case then user will waste a lot of time for searching.

In this case like '%go with nature%' does a much better job than fulltext.

So, can you find out an elegant solution that meets these requirements:

-Use “Fulltext ability for all non-stop words”

-If there are “stop words” in the search term then it should go back to use LIKE as in like '%go with nature%'

-Generally, it will be very smart to bring the best result depending on the search term.

Answer :

I state you have two possibilities with MySQL:

Deactivate stop word list

You can deactivate the stop word list completely.

To override the default stopword list, set the ft_stopword_file system
variable. […] The variable value should be the path name of the file
containing the stopword list, or the empty string to disable stopword
filtering.

http://dev.mysql.com/doc/refman/5.5/en/fulltext-fine-tuning.html

(You might want to change ft_min_word_len too, since its default value is 4 and “go” is only 2 characters long.)

This would allow you to use the fulltext index even in case of presence of stop words, which offers by far more possibilities than any search with LIKE and is a lot faster.

Control the query

Your application can scan the search words before the query to actually change the query depending on “do the search words contain a stop word”. If yes, query with LIKE, otherwise query with AGAINST. This way still leaves you with a very slow LIKE based query in the stop word case!

I do not know of further possibilities with MySQL as its capabilities concerning fulltext searches are rather limited.

Non-MySQL-Options (Take a look around)

There is other software around, which eases problems of search&find, for example the modified MySQL code Sphinx. Sphinx obviously has a different approach than MySQL since its default stop word list is empty. If there even is the possibility to add another software to your infrastructure, you might take a look at its possibilities or search for other software which might fulfill your needs (e.g. Apache Lucene).

Leave a Reply

Your email address will not be published.