selecting next todo item from child table

Posted on

Question :

I’ve a table with millions of records, relationship is one (object) to many (object_items).

object_items:

CREATE TABLE `object_items` (
  `item_name` varchar(50) NOT NULL DEFAULT '',
  `object_id` int(10) unsigned NOT NULL DEFAULT '0',
  `sequence` int(10) unsigned NOT NULL,
  `completed` tinyint(1) NOT NULL DEFAULT '0',
  `is_active` tinyint(1) NOT NULL DEFAULT '0',
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uni_seq_object_id` (`sequence`,`object_id`),
  KEY `idx_object_id` (`object_id`),
  KEY `idx_seq` (`sequence`)
) ENGINE=InnoDB AUTO_INCREMENT=3408237 DEFAULT CHARSET=utf8mb4  

Sample Data:

+-----------+-----------+----------+-----------+------+
| item_name | object_id | sequence | completed | id   |
+-----------+-----------+----------+-----------+------+
| ABCD      |        10 |        1 |         1 |    1 |
| BCDE      |        10 |        2 |         1 |    2 |
| CDEF      |        10 |        3 |         1 |    3 |
| DEFG      |        10 |        4 |         0 |    4 |
| ABCD      |        11 |        1 |         1 |    5 |
| BCDE      |        11 |        2 |         1 |    6 |
| CDEF      |        11 |        3 |         0 |    7 |
| DEFG      |        11 |        4 |         0 |    8 |
| ABCD      |        12 |        1 |         1 |    9 |
| BCDE      |        12 |        2 |         1 |   10 |
+-----------+-----------+----------+-----------+------+

Desired Result:

+-----------+-----------+----------+-----------+------+
| item_name | object_id | sequence | completed | id   |
+-----------+-----------+----------+-----------+------+
| DEFG      |        10 |        4 |         0 |    4 |
| CDEF      |        11 |        3 |         0 |    7 |
+-----------+-----------+----------+-----------+------+

The Query I run:

select
  a.*
from object_items a
where a.sequence = (
  select min(sequence)
  from object_items b
  where a.object_id = b.object_id
    and b.completed = 0
)

Which actually works but when I use limit, but if I run count(*) it just dies.

Explaining the query:

+----+--------------------+-------+------------+------+---------------+---------------+---------+-----------------+---------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys | key           | key_len | ref             | rows    | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------+---------------+---------+-----------------+---------+----------+-------------+
|  1 | PRIMARY            | a     | NULL       | ALL  | NULL          | NULL          | NULL    | NULL            | 3268598 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | b     | NULL       | ref  | idx_object_id | idx_object_id | 4       | db.a.object_id  |      21 |    10.00 | Using where |
+----+--------------------+-------+------------+------+---------------+---------------+---------+-----------------+---------+----------+-------------+

Is there a better way to get next TODO item that’s not complete yet, by sequence, only for the objects those have at least one item to be done, with heavy database like this?

Thanks

Answer :

Step by step. First look for not completed object_items, and the min(sequence). This is done with

-- Select object_items that are *not* completed, find min(sequence)
SELECT
    object_id, min(sequence) AS sequence
FROM
    object_items a
WHERE
    completed = 0 
GROUP BY
    object_id
ORDER BY
    object_id, sequence ;
object_id | sequence
--------: | -------:
       10 |        4
       11 |        3

You can JOIN back with the original table to retrieve the rest of the corresponding row data:

SELECT
    item_name, q.object_id, q.sequence, completed, id
FROM
    (
    -- Select object_items that are *not* completed
    SELECT
        object_id, min(sequence) AS sequence
    FROM
        object_items a
    WHERE
        completed = 0 
    GROUP BY
        object_id
    ) AS q 
    JOIN object_items oi
         ON oi.object_id  = q.object_id 
        AND oi.sequence   = q.sequence
ORDER BY
    q.object_id, q.sequence ;
item_name | object_id | sequence | completed | id
:-------- | --------: | -------: | --------: | -:
DEFG      |        10 |        4 |         0 |  4
CDEF      |        11 |        3 |         0 |  7

The query plan looks reasonable enough to scale well (it does not have a DEPENDENT SUBQUERY that needs reevaluation every time the DB loops):

id | select_type | table      | type   | possible_keys                           | key               | key_len | ref                    | rows | Extra                                       
-: | :---------- | :--------- | :----- | :-------------------------------------- | :---------------- | :------ | :--------------------- | ---: | :-------------------------------------------
 1 | PRIMARY     | <derived2> | ALL    | null                                    | null              | null    | null                   |   10 | Using where; Using filesort                 
 1 | PRIMARY     | oi         | eq_ref | uni_seq_object_id,idx_object_id,idx_seq | uni_seq_object_id | 8       | q.sequence,q.object_id |    1 |                                             
 2 | DERIVED     | a          | ALL    | null                                    | null              | null    | null                   |   10 | Using where; Using temporary; Using filesort

Adding the following index makes the plan even better, because it allows for covering the query named q:

CREATE INDEX idx_object_items_completed_object_id_sequence 
    ON object_items (completed, object_id, sequence) ; 
id | select_type | table      | type   | possible_keys                                 | key                                           | key_len | ref                    | rows | Extra                      
-: | :---------- | :--------- | :----- | :-------------------------------------------- | :-------------------------------------------- | :------ | :--------------------- | ---: | :--------------------------
 1 | PRIMARY     | <derived2> | ALL    | null                                          | null                                          | null    | null                   |    3 | Using where; Using filesort
 1 | PRIMARY     | oi         | eq_ref | uni_seq_object_id,idx_object_id,idx_seq       | uni_seq_object_id                             | 8       | q.sequence,q.object_id |    1 |                            
 2 | DERIVED     | a          | ref    | idx_object_items_completed_object_id_sequence | idx_object_items_completed_object_id_sequence | 1       | const                  |    3 | Using where; Using index   

You can check everything at dbfiddle here

Leave a Reply

Your email address will not be published. Required fields are marked *