InnoDB performance implications of one 3 column index VS three 2 column indexes

Posted on

Question :

I’m using MariaDB with InnoDB but this would apply with MySql as well.

I have a list of records that belong to categories and would like to order them by 3 different criteria.

I have come up with two approaches for the table schema and indexing.

Approach 1

CREATE TABLE `document` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `category_id` MEDIUMINT NULL,
  `nosql_document_id` VARCHAR NULL,
  `sort_criteria_1` MEDIUMINT NULL,
  `sort_criteria_2` MEDIUMINT NULL,
  `sort_criteria_3` MEDIUMINT NULL,
  PRIMARY KEY (`id`),
  INDEX `idx_cat_sc1` (`category_id` ASC, `sort_critetia_1` ASC),
  INDEX `idx_cat_sc2` (`category_id` ASC, `sort_critetia_2` ASC),
  INDEX `idx_cat_sc3` (`category_id` ASC, `sort_criteria_3` ASC)
) ENGINE=InnoDB CHARACTER SET utf8;

Query for each sort criteria would look like this:

SELECT id FROM document WHERE category_id = 1 ORDER BY sort_criteria_1 LIMIT 50,50;
SELECT id FROM document WHERE category_id = 1 ORDER BY sort_criteria_2 LIMIT 50,50;
SELECT id FROM document WHERE category_id = 1 ORDER BY sort_criteria_3 LIMIT 50,50;

Approach 2

CREATE TABLE `document` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `category_id` MEDIUMINT NULL,
  `nosql_document_id` VARCHAR NULL,
  `sort_criteria_id` TINYINT NULL,
  `sort` MEDIUMINT NULL,
  PRIMARY KEY (`id`),
  INDEX `idx_cat_combo` (`category_id` ASC, `sort_criteria_id` ASC, `sort` ASC)
) ENGINE=InnoDB CHARACTER SET utf8;

Query for each sort criteria would look like this:

SELECT id FROM document WHERE category_id = 1 AND sort_criteria_id = 1 ORDER BY sort LIMIT 50,50;
SELECT id FROM document WHERE category_id = 1 AND sort_criteria_id = 2 ORDER BY sort LIMIT 50,50;
SELECT id FROM document WHERE category_id = 1 AND sort_criteria_id = 3 ORDER BY sort LIMIT 50,50;

With 1 Million documents I would need 1M records in first approach, and 3M records in second approach, which uses:

Approach 1: 129M total table size.

Approach 2: 193M total table size.

Assuming I will not need more than 3 sort criterias:

Questions I have:

  1. Can I assume that Approach 1 is better in speed because of overall smaller data and index size?
  2. Would it make sense to have covering index here so that nosql_document_id is always in clustered index? (id, nosql_document_id)
  3. Are there any better approaches to accomplishing above that I should consider? I could not think of a way to have a clustered covering index including ordering. Is there a way?

UPDATE 1

It would probably make sense to replace Approach 2 with following:

Approach 3

CREATE TABLE `category` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB CHARACTER SET utf8;

CREATE TABLE `list` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `category_id` MEDIUMINT UNSIGNED NOT NULL,
  `sort_criteria_id` TINYINT UNSIGNED NOT NULL,
  PRIMARY KEY (`id`),
  INDEX `idx_combo` (`category_id` ASC, `sort_criteria_id` ASC)
) ENGINE=InnoDB CHARACTER SET utf8;

CREATE TABLE `document` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `list_id` INT UNSIGNED NOT NULL,
  `nosql_document_id` VARCHAR NULL,
  `sort` MEDIUMINT UNSIGNED NULL,
  PRIMARY KEY (`id`),
  INDEX `idx_combo` (`list_id` ASC, `sort` ASC)
) ENGINE=InnoDB CHARACTER SET utf8;

However this till does not address the issue of having 3 duplicate records for same data as @rick-james correclty points out. It feels like it comes down to flexibility to add new sort options in Approaches 2 and 3 vs more constrained but effective method of Approach 1.

Further investigating size implications, I can tell that index size itself is almost identical, between approach 1 and 3. Only the size of data grows 3x due to primary id replicated. I wonder how this affects memory usage in innodb buffer. I do want all my indexes to be drawn from memory, does this mean that in terms of memory the approaches are same, and the difference is only in disk usage?

Here’s results:

show table status from approach1;
+----------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+
| Name     | Engine | Version | Row_format | Rows   | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment |
+----------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+
| category | InnoDB |      10 | Dynamic    |    100 |            163 |       16384 |               0 |            0 |         0 |            101 |
| document | InnoDB |      10 | Dynamic    | 943703 |             39 |    37289984 |               0 |     83558400 |   6291456 |        1000001 |
+----------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+
2 rows in set (0.001 sec)

show table status from approach3;
+----------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+
| Name     | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment |
+----------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+
| category | InnoDB |      10 | Dynamic    |     100 |            163 |       16384 |               0 |            0 |         0 |            101 |
| document | InnoDB |      10 | Dynamic    | 2986718 |             33 |    99205120 |               0 |     79314944 |   5242880 |        3000001 |
| list     | InnoDB |      10 | Dynamic    |     300 |             54 |       16384 |               0 |        16384 |         0 |            301 |
+----------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+
3 rows in set (0.001 sec)

Answer :

A basic database principle is to avoid duplication of data. Option 2 has 3 rows with essentially the same basic data. So, a vote for Option 1.

Option 1 is smaller overall, so another vote for it.

If you get much past 3 sorts, the problem gets messier.

You are doing SELECT id. Was that a simplification for this discussion? If you really are selecting only id, then we should discuss that inefficiency.

“Covering” should always be considered, but since your queries do not mention nosql_document_id, there is no need for it to be in the index. (This harks back to my previous comment.)

A composite secondary index starting with the PRIMARY KEY is almost always useless.

Is some combination of columns unique and not-null? If so, can you get rid of id? If so, then this entire discussion changes.

Leave a Reply

Your email address will not be published. Required fields are marked *