Question :
I have requirement to identify customers using multiple emails for the same billing address. There are 414375 records. I let the query run for 20 minutes and it still did not return. How do I speed it up?
select so.customer_email
from sales_order so
inner join sales_order_address soab on soab.parent_id = so.entity_id and soab.address_type ='billing'
where exists
(
select 1 FROM sales_order so2
inner join sales_order_address soab2 on soab2.parent_id = so2.entity_id and soab2.address_type ='billing'
where so2.customer_email <> so.customer_email
and soab.firstname = soab2.firstname
and soab.lastname = soab2.lastname
and soab.street = soab2.street
and soab.city = soab2.city
and soab.region = soab2.region
)
Answer :
See if this gives what you need, but faster:
SET @@group_concat_max_len = 8000;
SELECT GROUP_CONCAT(customer_email SEPARATOR ' ')
FROM sales_order AS so
WHERE EXISTS ( SELECT 1 FROM sales_order_address
WHERE address_type = 'billing'
AND parent_id = so.entity_id )
GROUP BY first_name, last_name, street, city, region
HAVING COUNT(*) > 1;
(The output won’t be formatted the same.)
Another possibility:
SELECT GROUP_CONCAT(customer_email SEPARATOR ' ')
FROM ( SELECT so.first_name, so.last_name, so.street, so.city, so.region,
so.customer_email
FROM sales_order AS so
JOIN sales_order_address AS soa
WHERE soa.address_type = 'billing'
AND soa.parent_id = so.entity_id ) AS x
GROUP BY first_name, last_name, street, city, region
HAVING COUNT(*) > 1;
Indexes that may help:
sales_order_address: INDEX(address_type, parent_id)
sales_order: INDEX(entity_id) -- unless it is the PRIMARY KEY