I have requirement to identify customers using multiple emails for the same billing address. There are 414375 records. I let the query run for 20 minutes and it still did not return. How do I speed it up?
select so.customer_email from sales_order so inner join sales_order_address soab on soab.parent_id = so.entity_id and soab.address_type ='billing' where exists ( select 1 FROM sales_order so2 inner join sales_order_address soab2 on soab2.parent_id = so2.entity_id and soab2.address_type ='billing' where so2.customer_email <> so.customer_email and soab.firstname = soab2.firstname and soab.lastname = soab2.lastname and soab.street = soab2.street and soab.city = soab2.city and soab.region = soab2.region )
See if this gives what you need, but faster:
SET @@group_concat_max_len = 8000; SELECT GROUP_CONCAT(customer_email SEPARATOR ' ') FROM sales_order AS so WHERE EXISTS ( SELECT 1 FROM sales_order_address WHERE address_type = 'billing' AND parent_id = so.entity_id ) GROUP BY first_name, last_name, street, city, region HAVING COUNT(*) > 1;
(The output won’t be formatted the same.)
SELECT GROUP_CONCAT(customer_email SEPARATOR ' ') FROM ( SELECT so.first_name, so.last_name, so.street, so.city, so.region, so.customer_email FROM sales_order AS so JOIN sales_order_address AS soa WHERE soa.address_type = 'billing' AND soa.parent_id = so.entity_id ) AS x GROUP BY first_name, last_name, street, city, region HAVING COUNT(*) > 1;
Indexes that may help:
sales_order_address: INDEX(address_type, parent_id) sales_order: INDEX(entity_id) -- unless it is the PRIMARY KEY