How to profile a particularly slow or inefficient query

Posted on

Question :

I’m currently writing queries for my database but this is the first time I’ve dealt with large data sets of any sort, and I’m now finding that I need to improve the performance of my queries.


I have a series of tables for purchases and customers as described below

users (user_id, forename, surname) 20k+ records

customers (customer_id, user_id, etc etc) 20k+ records

purchases (purchase_id, item_id, customer_id, price, timestamp, store_id) 150k+ records

deliveries (purchase_id, delivered_time etc) 150k+ records

store (store_id, name etc) <50 records

The query I am currently trying to perform should retrieve all customer names, the date of their last order, when that order was delivered and their total number of orders for all customers at a particular store whos most recent order was in the last 6 months. Here is what I’m currently using.

SELECT `customers`.`customer_id`, `forename`, `surname`, `delivered_time`, MAX(`purchases`.`timestamp`) as lastPurchase, COUNT(purchases.purchase_id) as totalPurchases FROM (`purchases`) 
JOIN `deliveries` ON `purchases`.`purchase_id` = `deliveries`.`purchase_id` 
JOIN `customers` ON `customers`.`customer_id` = `purchases`.`customer_id` 
JOIN `users` ON `users`.`user_id` = `customers`.`user_id` 
WHERE `store_id` = '1' GROUP BY `customer_id` HAVING lastPurchase >= '1372086055'

However, this is pretty slow and takes 1-2 seconds each time, and I can only assume this time would increase as the amount of data increases.

In an attempt to profile the query I have used EXPLAIN which seems to indicate it is searching through around 20k records in the deliveries table. What should my next steps be to optimize this query? How can I go about reducing the execution time?

Answer :

You didn’t provide CREATE TABLE scripts; thus, it’s hard to give you a specific advice; I’ll try to describe general approach….

If you want to speed up this query, first of all, make sure you have indexes on the columns that define join condition, used as a filter (WHERE), or column[s] you have in GROUP BY or ORDER BY. In general, a good sign that some indexes are missed is when EXPLAIN shows “NULL” in key,key_len and “ALL” in type column (not always though; for instance if you select all rows from the table, or table is small enough, so full scan is faster than index seek + lookup). Then you may want to tweak some indexes to make them covering for this query.

Side note. The query seems to me like a BI query which normally is not executed against OLTP database. When dealing with a big data, it makes sense to build a few cubes based on data from operational db and query them instead of original data. The nature of OLTP implies high level of normalization and optimization for INSERT/UPDATE/DELETE, not for SELECTs (still possible, but queries can be very long, not clear, and quite slow).

Leave a Reply

Your email address will not be published. Required fields are marked *