Why index is not used with order by?

Posted on

Question :

I trying to get info about why MySQL does not using my index when I create a inner join and trying to ORDER BY on the end.

I have my SQL query here:

SELECT
    *           
FROM
    product p INNER JOIN productStore ps ON p.productUUID = ps.productUUID       
ORDER BY
    ps.storeTitle 
LIMIT 50;

When I’m using order by this select take over 3,5 sec, when i remove order by its taking like 1,6ms to run the same SQL, my explain SQL is follow

With ORDER BY:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  ps  ALL PRIMARY NULL    NULL    NULL    942187  Using filesort
1   SIMPLE  p   eq_ref  PRIMARY PRIMARY 16  foeniks_core.ps.productUUID 1   NULL

Without ORDER BY:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  ps  ALL PRIMARY NULL    NULL    NULL    942187  NULL
1   SIMPLE  p   eq_ref  PRIMARY PRIMARY 16  foeniks_core.ps.productUUID 1   NULL

the field there not are indexing right is a varchar on 282 length.

My table design is here:

CREATE TABLE `productStore` (
  `productUUID` binary(16) NOT NULL,
  `storeUUID` binary(16) NOT NULL,
  `distributorLastUsed` binary(16) DEFAULT NULL,
  `storeTitle` varchar(282) DEFAULT NULL,
  `storeUrl` varchar(282) DEFAULT NULL,
  `storeDescription` text,
  `storeDescriptionDemo` text,
  `storePrice` int(11) NOT NULL DEFAULT '0',
  `storePriceNext` int(11) NOT NULL DEFAULT '0',
  `storePriceCost` int(11) NOT NULL DEFAULT '0',
  `overwrites` int(11) NOT NULL DEFAULT '0',
  `updated` datetime NOT NULL DEFAULT '1000-01-01 00:00:00',
  `added` datetime NOT NULL DEFAULT '1000-01-01 00:00:00',
  `allowDisplay` tinyint(1) NOT NULL DEFAULT '0',
  `activated` tinyint(1) NOT NULL DEFAULT '1',
  PRIMARY KEY (`productUUID`,`storeUUID`),
  KEY `productStoreLanguageToStore_idx` (`storeUUID`),
  KEY `productStoreToDistributor_idx` (`distributorLastUsed`),
  KEY `storeUrl` (`storeUrl`(180)) USING BTREE,
  KEY `testStoreTitle` (`storeTitle`(182)),
  CONSTRAINT `productStoreToDistributor` FOREIGN KEY (`distributorLastUsed`) REFERENCES `distributor` (`distributorUUID`) ON DELETE SET NULL ON UPDATE CASCADE,
  CONSTRAINT `productStoreToProduct` FOREIGN KEY (`productUUID`) REFERENCES `product` (`productUUID`) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT `productStoreToStore` FOREIGN KEY (`storeUUID`) REFERENCES `store` (`storeUUID`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Product table:

CREATE TABLE `product` (
  `productUUID` binary(16) NOT NULL,
  `productManufacturerUUID` binary(16) NOT NULL,
  `productManufacturerSKU` varchar(40) DEFAULT NULL,
  `productEan` varchar(40) DEFAULT NULL,
  `cnetID` varchar(10) DEFAULT NULL,
  `edbID` int(10) DEFAULT NULL,
  `overwrites` int(10) NOT NULL DEFAULT '0',
  `updated` datetime NOT NULL DEFAULT '1000-01-01 00:00:00',
  `added` datetime NOT NULL DEFAULT '1000-01-01 00:00:00',
  `activated` tinyint(1) NOT NULL DEFAULT '1',
  PRIMARY KEY (`productUUID`),
  KEY `manufacturerSKU` (`productManufacturerSKU`(16)),
  KEY `productToManufacturer_idx` (`productManufacturerUUID`),
  KEY `cnetID` (`cnetID`),
  KEY `productEAN` (`productEan`),
  CONSTRAINT `productToManufacturer` FOREIGN KEY (`productManufacturerUUID`) REFERENCES `manufacturer` (`manufacturerUUID`) ON DELETE NO ACTION ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Answer :

David Spillett’s answer is correct in all points, except for the “encouraging” suggestion.

Here’s a way to not only encourage but (in almost all versions) force the optimizer to choose a plan that uses the wanted index to find the 50 rows – and only after that perform the join. It can’t always be used but the FOREIGN KEY constraint assures that in this case the two queries will produce identical results.
I call this technique “first LIMIT, then JOIN:

SELECT     p.*, ps.*           
FROM       ( SELECT     *
             FROM       productStore 
             ORDER BY   storeTitle 
             LIMIT 50 
           ) ps 
    INNER JOIN product p 
        ON p.productUUID = ps.productUUID       
ORDER BY   ps.storeTitle ;

The real answer is that “prefix” indexes are virtually useless. I am referring to

KEY `testStoreTitle` (`storeTitle`(182))

Since the index contains only the truncated values, it does not have a completely ordered list of Titles, hence cannot easily be used to do the ORDER BY.

InnoDB has a limit of 767 bytes (the max for a utf8 VARCHAR(255)). That can be increased by a complex set of steps:

  1. Get 5.6.3 (or later).
  2. SET GLOBAL innodb_file_format=Barracuda;
  3. SET GLOBAL innodb_file_per_table=ON;
  4. ALTER TABLE tbl
    DROP INDEX testStoreTitle,
    ADD INDEX(storeTitle)
    ROW_FORMAT=DYNAMIC;
    — (or COMPRESSED)

I agree with “LIMIT (or GROUP BY) before JOIN” as ypercube suggests. That solution is mostly orthogonal to this one. My solution will probably be significantly faster since it won’t need to scan 942187 of anything.

The “why it takes so long” is Using filesort – this means that it is spooling all the results out to sort them without the index.

The problem is that the SQL engine can only use one index per table reference. In this case it will be using the index best suited for your join predicate (the one for your primary key).

IIRC mySql’s query planner still follows the order of your query rather than trying to be clever and reordering for efficiency, so even though you’ve limited the results to 50 it is going to have to join everything first (using the other key), sort, then pick the first 50. You might be able to get it to scan down productStore in order then lookup what is needed in product by doing:

SELECT     *           
FROM       productStore ps 
INNER JOIN product p 
        ON p.productUUID = ps.productUUID       
ORDER BY   ps.storeTitle 
LIMIT 50;

That might encourage it scan down the productStore.storeTitle index to get the first 50 then join each to the relevant rows in product. If it doesn’t, then ypercube’s derived table based solution might have the desired effect (I tend to use MS SQL Server and I think its query planner would be bright enough to so both my query & his as equivalent and do both the same way, but mySQL’s probably knows less tricks like that).

Your columns will come out in a different order, so be careful if the calling code is sensitive to this. * in select clauses is generally recommended against witht it is practical to avoid it for this reason amongst others.

Indicies are used only to select (find) the rows. Once they are put into a final resultset, it is simply a set of rows and columns, regardless of where they came. Order by (sort) is applied as the last operation to this resultset. In your case, you are getting a very large resultset. Obviously, any effort to sort it would add up to the query time.

Leave a Reply

Your email address will not be published. Required fields are marked *