I’ve a database with around 1.9 Million rows. My DB details
Server: Localhost via UNIX socket Server type: Percona Server Server
version: 5.5.42-37.1 – Percona Server (GPL), Release 37.1, Revision
727 Protocol version: 10 User: ****@localhost Server charset: UTF-8
I’m using the following algorithm on my FULLINDEX column First, I match my required string as per the below query
SELECT title FROM my_db WHERE MATCH (`Title`) AGAINST ('my string' IN BOOLEAN MODE)
And then I use a
levenstein() distance function on the results row server side through PHP to get the closest match to the string.
- Would it be faster to implement a levenshtein entirely instead of a MATCH, AGAINST on a fulltext for such a huge database?
- Is there any implementable algorithm which is better than levenshtein or for that matter the best such algorithm existing right now?
- Is there any other work around instead of
- How will each the below search modifiers
IN NATURAL LANGUAGE MODE,
IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION,
IN BOOLEAN MODE,
WITH QUERY EXPANSION,
enhance or optimize my searching and results? Hence, which one would be the best?
FULLTEXT index is very efficient for small resultsets in a huge table. Using levenshtein involves checking each row. So, I agree with your approach to do FT as a first step.
I would suggest
- Remove short words, etc, from the string. (not necessary for MyISAM since it ignores them; necessary for InnoDB, else it returns nothing)
IN BOOLEAN MODEbut without any “+” on the words;
ORDER BY MATCH... DESCand add
LIMIT. This keeps edge cases from coming up with thousands of rows for the next step.
Keep in mind that the end result will be less than perfect, but at least it should be “fast enough”.