In my MariaDB database I store combined metadata from multiple other sources (face databases). There is a ‘main’ table containing great majority of data, and each row is identified by both it’s own ID, and additional Source_ID (foreign key) pointing to other table consisting of rows with unique face databases. Main table consists of nearly 17 million lines right now, and is expected to grow at least couple of times.
Naturally, SELECT * FROM ‘main_table’ where Source_ID = x, is my most used and most desired statement.
If I select data from a source that contains more than 10k lines, the length of a query grows as the count of data available from a specific source increases. For example, data from a source that contains 20k lines is usually returned in a couple of seconds, while 500k lines take 15-30 seconds to complete.
However, here is the phenomenon I cannot explain:
When I select a data from source that has 100 < X < 10000 lines, the query takes more than 2 (sometimes 5-7) minutes(!) to complete. During this query, server’s RAM reaches maximum allocated count and processor has a noticeable load as well. When I perform a SELECT not by key value, but by Main Table ID instead (effectively selecting the very same data), query’s duration decreases 5 – 10 times.
I have tried doing a fresh reinstall of MariaDB, with same issue persisting.
I am using MariaDB 10.7, server has i7-2600, 16 GB RAM, running on Windows10. The issue happens whether I make SELECT query through HeidiSQL or through python code using mariadb module.
I would greatly appreciate any ideas on what could possibly be a cause of such issue, or, at least, how should I better think, test and probe this issue in order to discover the actual roots of it. Thank you in advance.
After reading @J.D ‘s response, I went in and performed several ANALYZE queries on the selects I previously did. It became quite apparent that for few specific queries specified by ‘WHERE index = X’, MariaDB wasn’t using indexes, and instead returned the whole collection of lines, later filtering them by the ‘WHERE’ criteria.
What ended up solving this issue was the ‘ANALYZE TABLE table_name’ query, which, to my understanding, updates the collection of known indexes in databases records. This resulted all of my queries successfully use indexes from now on and perform queries in a reasonable amount of time.
I’ve found this post to be helpful, especially in case the index ignoring issue has deeper roots:
mySQL query optimizer not using indexes