Is it possible to optimize this query? Or any recommendations to speed it up?

Posted on

Question :

After looking at my mysql slow log file I see that almost every entry there is related to this query:

SELECT 
    n.nid AS nid, AVG(v.value) AS rating
FROM 
  node n
  LEFT OUTER JOIN votingapi_vote v 
    ON  v.entity_id = n.nid 
    AND v.timestamp < UNIX_TIMESTAMP()-304800 
    AND v.timestamp > UNIX_TIMESTAMP()-1524000
WHERE 
    ( (n.type = 'gallery') AND (n.status = '1') )
GROUP BY 
    n.nid
ORDER BY 
    rating DESC, n.created DESC
LIMIT 20 OFFSET 0;

Slow log shows this stats: Query_time: 1 Lock_time: 0 Rows_sent: 20 Rows_examined: 1981034

As you may see it joins 2 tables – node (45000 rows) and votingapi_vote (only 705 rows for now). votingapi_vote will grow in future and will have thousands rows.
Node table has a content rows like n.nid, n.title, n.created, n.type, n.status.
Votingapi_vote table has a voting rows like v.entity_id, v.value, v.timestamp

So I need to get list of 20 nodes with type gallery who has the best average rating for selected period of time.

Can you pls help me to optimize this query?

Thanks!

UPDATE: This is EXPLAIN results:

+----+-------------+-------+------+----------------------------+------------------+---------+-------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys              | key              | key_len | ref         | rows | Extra                                        |
+----+-------------+-------+------+----------------------------+------------------+---------+-------------+------+----------------------------------------------+
|  1 | SIMPLE      | n     | ref  | node_status_type,node_type | node_status_type | 102     | const,const |  830 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | v     | ALL  | NULL                       | NULL             | NULL    | NULL        |  759 |                                              | 
+----+-------------+-------+------+----------------------------+------------------+---------+-------------+------+----------------------------------------------+
2 rows in set (0.00 sec)

UPDATE 2: This is EXPLAIN for query where I changed LEFT OUTER JOIN to INNER JOIN:

mysql> EXPLAIN SELECT      n.nid AS nid, AVG(v.value) AS rating FROM    node n   INNER JOIN votingapi_vote v      ON  v.entity_id = n.nid      AND v.timestamp < UNIX_TIMESTAMP()-304800      AND v.timestamp > UNIX_TIMESTAMP()-1524000 WHERE      ( (n.type = 'gallery') AND (n.status = '1') ) GROUP BY      n.nid ORDER BY      rating DESC, n.created DESC LIMIT 20 OFFSET 0;
+----+-------------+-------+--------+------------------------------------+---------+---------+-----------------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                      | key     | key_len | ref                         | rows | Extra                                        |
+----+-------------+-------+--------+------------------------------------+---------+---------+-----------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | v     | ALL    | NULL                               | NULL    | NULL    | NULL                        |  653 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | n     | eq_ref | PRIMARY,node_status_type,node_type | PRIMARY | 4       | drupal.v.entity_id |    1 | Using where                                  | 
+----+-------------+-------+--------+------------------------------------+---------+---------+-----------------------------+------+----------------------------------------------+
2 rows in set (0.01 sec)

Is INNER JOIN better? Btw it returned 0 results (without EXPLAIN). So looks like INNER JOIN wrong in this case.

Answer :

If presented with this situation I would test the performance improvement of this:

  1. Add two fields to the node table: vote_count, vote_sum
  2. Add a generated field to the node table vote_avg which is stored and indexed with a descending sort on the index key.
  3. Add a trigger to the votingapi_vote table so that when row is inserted the respective vote_count is incremented by one and the value of the vote is added to the vote_sum. (Likewise for UPDATE or DELETE.)

The drawbacks to this approach are that now your writes take longer so that you can do faster reads, and this might not be suitable if the filter that you use in your query on the timestamp column isn’t predictable.

It’s tough to optimise Aggregation (AVG), Group By and Order by. These operations even if well indexed will generally cost internal temp tables and regularly on-disk temp tables for grouping and sorting. It’s widely accepted to cache the results for re-use rather then rerunning this at any level of concurrency. You may have heard of summary tables to record such values. Worth a read, google is your friend.

Leave a Reply

Your email address will not be published. Required fields are marked *