How to speed up query with exists clause?

Posted on

Question :

I have requirement to identify customers using multiple emails for the same billing address. There are 414375 records. I let the query run for 20 minutes and it still did not return. How do I speed it up?

select so.customer_email
from sales_order so
inner join sales_order_address soab on soab.parent_id = so.entity_id and soab.address_type ='billing'
where exists
(
    select 1 FROM sales_order so2
    inner join sales_order_address soab2 on soab2.parent_id = so2.entity_id and soab2.address_type ='billing'
    where so2.customer_email <> so.customer_email
    and soab.firstname = soab2.firstname
    and soab.lastname = soab2.lastname
    and soab.street = soab2.street
    and soab.city = soab2.city
    and soab.region = soab2.region
)

Answer :

See if this gives what you need, but faster:

SET @@group_concat_max_len = 8000;

SELECT GROUP_CONCAT(customer_email SEPARATOR '  ')
    FROM sales_order AS so
    WHERE EXISTS ( SELECT 1 FROM sales_order_address 
                       WHERE address_type = 'billing'
                         AND parent_id = so.entity_id )
    GROUP BY first_name, last_name, street, city, region
    HAVING COUNT(*) > 1;

(The output won’t be formatted the same.)

Another possibility:

SELECT GROUP_CONCAT(customer_email SEPARATOR '  ')
    FROM ( SELECT so.first_name, so.last_name, so.street, so.city, so.region,
                  so.customer_email
            FROM sales_order AS so
            JOIN sales_order_address AS soa
            WHERE soa.address_type = 'billing'
              AND soa.parent_id = so.entity_id ) AS x
    GROUP BY first_name, last_name, street, city, region
    HAVING COUNT(*) > 1;

Indexes that may help:

sales_order_address:  INDEX(address_type, parent_id)
sales_order:          INDEX(entity_id)   -- unless it is the PRIMARY KEY

Leave a Reply

Your email address will not be published. Required fields are marked *