Shouldn’t mysql be using both indexes in a join?

Posted on

Question :

I have the following tables:

CREATE TABLE `tx_in` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `txid` char(64) DEFAULT NULL,
  `hashPrevOut` char(64) DEFAULT NULL,
  `address` char(34) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `hashPrevIndex` (`hashPrevOut`(8))
) ENGINE=InnoDB AUTO_INCREMENT=641391046 DEFAULT CHARSET=latin1


CREATE TABLE `tx_out` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `txid` char(64) DEFAULT NULL,
  `btc_value` double DEFAULT NULL,
  `address` char(34) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `txidIndex` (`txid`(8))
) ENGINE=InnoDB AUTO_INCREMENT=695391886 DEFAULT CHARSET=latin1

And I am running the following query:

UPDATE tx_in LEFT JOIN tx_out ON tx_in.hashPrevOut = tx_out.txid SET 
tx_in.address = tx_out.address;

I modified it into a SELECT statement so that I could run EXPLAIN on it.

EXPLAIN SELECT * FROM tx_in LEFT JOIN tx_out ON tx_in.hashPrevOut = tx_out.txid;

And got this output

+----+-------------+--------+------------+------+---------------+-----------+---------+------------------------------+-----------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys | key       | key_len | ref                          | rows      | filtered | Extra       |
+----+-------------+--------+------------+------+---------------+-----------+---------+------------------------------+-----------+----------+-------------+
|  1 | SIMPLE      | tx_in  | NULL       | ALL  | NULL          | NULL      | NULL    | NULL                         | 635293613 |   100.00 | NULL        |
|  1 | SIMPLE      | tx_out | NULL       | ref  | txidIndex     | txidIndex | 9       | blockchain.tx_in.hashPrevOut |         2 |   100.00 | Using where |
+----+-------------+--------+------------+------+---------------+-----------+---------+------------------------------+-----------+----------+-------------+

But the execution plan doesn’t have any reference to the index on hashPrevOut. Should I be concerned about this? It seems like it should just be using the two indexes to do the join. As you can see the tables are quite large so this could make a big difference in execution time.

Answer :

I think this is due to the partial index. i.e. the index has only 8 char of hashPrevOut

It estimated that the number of matching first 8 char will lead to a second read from the table to compare the full value from first table with the full value in the other table, so the optimizer decided to skip the index, and deal with the table directly.

Consider this suggestion,
A) in the query, tx_in.hashPrevOut = tx_out.txid
refers to two columns that are NOT indexed.
B) VARCHAR vs CHAR would conserve storage in the file (and HDD or SSD)
C) add INDEXES for the complete column content
for a predictable 1 to 1 match.
EXPLAIN SELECT original query to confirm your ndx’s are working and ROWS should be much lower as well as time to complete.

Leave a Reply

Your email address will not be published. Required fields are marked *