I apologize if this is the wrong place to ask, but I really need some opinion on whether or not my query is good or not.
Basically, I need to get a count of old and invalid entries per group, which is identified by a value of NULL or simply a month old. The selected data is going to be stored in a smaller table called “statistics”. PHP is used for a cron script.
Now, the problem is I’m still a novice so I can’t tell if the query is fast or slow. To finish the following query, it takes roughly 3 minutes, which according to my boss is slow and needs to be at least 50% faster, but I myself am not sure if anything can be done to speed it up. “cat_01” table has 50 mil row, the others have less then 2000.
I did add indexes to all used columns, which lowered the time from 6 to 3 minutes.
The query I’m using is as follows (I used pseudonyms for tables and columns, so don’t mind the logic):
SELECT 'cat_01' as category, m.shop_id, o.name, count(m.shop_id) as total, tr2.traffic, IF(mm.bought IS NULL,0,mm.bought) as bought FROM cat_01 m JOIN shops o ON m.shop_id = o.id JOIN ( select mmm.shop_id, max(mmm.buy_date) as bought FROM cat_01 mmm GROUP BY mmm.shop_id ) mm ON o.id = mm.shop_id LEFT JOIN ( SELECT tr.shop_id, sum(tr.sales) AS traffic FROM ( SELECT mmmmm.shop_id, bs.sales, bs.order_id, bs.id FROM cat_01 mmmmm JOIN orders bs ON mmmmm.order = bs.order_id ) tr group by tr.shop_id ) tr2 ON tr2.shop_id = mm.shop_id WHERE (m.buy_date IS NULL) OR (m.buy_date < UNIX_TIMESTAMP()) GROUP BY m.shop_id
- Create table “cat01”:
CREATE TABLE `cat_01` ( `id` int(11) NOT NULL AUTO_INCREMENT, `shop_id` int(11) DEFAULT '0', `buy_date` int(11) DEFAULT NULL, `order` int(11) DEFAULT '0', PRIMARY KEY (`id`), KEY `shop_id` (`shop_id`), KEY `buy_date` (`buy_date`), KEY `order` (`order`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
- Create table “shops”:
CREATE TABLE `shops` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8
- create table “orders”:
CREATE TABLE `orders` ( `id` int(11) NOT NULL AUTO_INCREMENT, `order_id` int(11) NOT NULL, `sales` int(11) NOT NULL DEFAULT '0', `status` tinyint(1) NOT NULL DEFAULT '1', PRIMARY KEY (`id`), KEY `order_id` (`order_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8
Here is the explain (I don’t have access to the live data from home, so I used dummy rows, but rest is the same)
EDIT: added UNIX_TIMESTAMP() as an example
EDIT2: I ended up removing the subqueries from the SELECT and used them seperately, then using a PHP loop, I created the appropriate INSERT query by mixing the results together.
P.S. I can’t seem to find a direct example of how sql code formatting on sctackechange looks, so I apologize if it ends up garbled, I’ll try to fix it shortly.
I do have an EXPLAIN result but not sure how to format that, might end up posting an image later.
If you need some particular info, feel free to ask, I myself am not sure what I should be looking at.
FROM ( SELECT ... ) JOIN ( SELECT ... )
does not optimize well. Think of a better way to write the query. If that fails, put one of the subqueries into a
TEMPORARY TABLE and add an index to it.
Consider using the datatype
INT, for dates.
OR is a performance killer (because it prevents use of an index). Consider other ways to deal with
order_id is unique in
orders, then get rid if
id and user
order_id as the
PRIMARY KEY. (Plus any changes needed to other tables?) Lookup by PK is faster than lookup by secondary key.
If the information in some of these tables is “static” (written once, but not updated or deleted), then consider building and maintaining a “Summary table”.
For doing this query 13 times for 13 categories, it may be faster to do it once, with a
GROUP BY category_id.
UNION may speed things up, especially if it lets a better index be used.
First, add these indexes:
ALTER TABLE cat_01 ADD INDEX `ShopBuyDates` (`shop_id`,`buy_date`) ALTER TABLE orders ADD INDEX `OrderSales` (`order_id`,`sales`)
Then try this query, and report the results. Make sure to run it at least twice, and discard the first result’s performance, to flush out the effects of populating the cache.
SELECT 'cat_01' as category, m.shop_id, o.name, count(m.shop_id) as total, tr2.traffic, IF(mm.bought IS NULL,0,mm.bought) as bought FROM cat_01 m JOIN shops o ON m.shop_id = o.id JOIN ( select mmm.shop_id, max(mmm.buy_date) as bought FROM cat_01 mmm GROUP BY mmm.shop_id ) mm ON o.id = mm.shop_id LEFT JOIN ( SELECT mmmmm.shop_id, sum(bs.sales) AS traffic FROM cat_01 mmmmm JOIN orders bs ON mmmmm.order = bs.order_id group by mmmmm.shop_id ) tr2 ON tr2.shop_id = mm.shop_id WHERE (m.buy_date IS NULL) OR (m.buy_date = 0) OR (m.buy_date < 1) GROUP BY m.shop_id
After we see the improvement from this (a new EXPLAIN would be good too, from this query), we can iterate to see what else is slowing down your query.