Question :
I’m trying to query our entire database to build a report and I’ve run into the problem that I have queries like the one below that are taking near 2 hours to run and some take even longer. I’ve changed the some of the names but the structure is the same.
Person_table has about 5 million rows
check_table is a subset of Person_table and has about 3 million rows
event_table has about 20 million rows
update check_table as CT
inner join
( SELECT B1.id
from Person_table as PT
inner join event_table as B1 ON PT.id = B1.person_id
inner join event_type_table as E using (code_id)
inner join
( SELECT *
from event_table as ET
where ET.code_id in ('XYZ','ABC','DEF','ETC.')
) as F ON ET.id = F.person_id
and B1.Event_date = F.event_date
) as B2 using (id) set flag_1=1;
I’ve tried to get an explain on many of the queries I have to do but it always spits back a “Explain Data not available for statement”.
Answer :
It may (or may not) be possible to add some composite indexes to speed up the query. It may (or may not) be possible to reformulate the query to speed it up — perhaps by turning subqueries into JOINs
.
But it seems like 3M possible updates will lead to a very slow query in any case. I suggest chunking the UPDATE
into no more than 1000 rows at a time. This will avoid various locks that interfere with other traffic. And it may actually run faster — due to not needing to hang onto millions of rows for potential ROLLBACK
.
Discussion of chunking: http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks
If you want me to check your indexes, please provide SHOW CREATE TABLE
.