MySQL query very slow with multiple concurrent users in a short time

Posted on

Question :

I have a site with a MySQL 5.5 database where most of the traffic happens at a fixed time every single day when thousands of users are loading the site in a period of 2-3 hours.

My slow query log (tracking >10 sec) gets hundreds (or even a thousand) of log entries on a daily basis, where the queries can be even up to 150 seconds long. In these cases the site is obviously very unusable for some of the users.

In 90% of the cases the problem is a query for fetching items which are shared to the user (but not created by user). Items can be shared either globally, per user, per group or not at all.

The problem happens only during that traffic peak. At slower times of the day the query usually only takes around 60 milliseconds.

Tables

CREATE TABLE `users` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `locationId` int(10) unsigned NOT NULL,
    -- *snip*
    PRIMARY KEY (`id`),
    KEY `fk_users1_idx` (`locationId`),
    CONSTRAINT `fk_users1` FOREIGN KEY (`locationId`) REFERENCES `locations` (`id`) ON UPDATE CASCADE
);

CREATE TABLE `items` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `ownerUserId` int(10) unsigned NOT NULL,
    `type` varchar(45) DEFAULT NULL,
    `shareToEveryone` tinyint(1) unsigned NOT NULL DEFAULT '0',
    -- *snip*
    PRIMARY KEY (`id`),
    KEY `fk_items1_idx` (`ownerUserId`),
    KEY `index_type` (`type`),
    CONSTRAINT `fk_items1` FOREIGN KEY (`ownerUserId`) REFERENCES `users` (`id`) ON UPDATE CASCADE
);

CREATE TABLE `itemgroupshares` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `itemId` int(10) unsigned NOT NULL,
    `objectId` varchar(255) NOT NULL,
    PRIMARY KEY (`id`),
    KEY `itemId` (`itemId`,`objectId`),
    CONSTRAINT `ItemGroupShares_ibfk_3` FOREIGN KEY (`itemId`) REFERENCES `items` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
);

CREATE TABLE `itemusershares` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `itemId` int(10) unsigned NOT NULL,
    `objectId` varchar(255) NOT NULL,
    PRIMARY KEY (`id`),
    KEY `itemId` (`itemId`,`objectId`),
    CONSTRAINT `ItemUserShares_ibfk_1` FOREIGN KEY (`itemId`) REFERENCES `items` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
);

users table has 15k rows, items has 5k rows, itemgroupshares has 1k rows and itemusershares has 3k rows.

Sizes:

  • users: data 4.5MB, index 3.4MB
  • items: data 1.5MB, index 448KB
  • igs: data 96KB, index 64KB
  • ius: data 352KB, index 224KB

Slow query

SELECT i.*
FROM items i
JOIN users u ON u.id = i.ownerUserId
LEFT JOIN itemgroupshares igs ON igs.itemId = i.id
LEFT JOIN itemusershares ius ON ius.itemId = i.id
WHERE
    u.id != :userId
    AND
    u.locationId = :locationId
    AND
    i.type = 'CUSTOM'
    AND
    (
        i.shareToEveryone = 1
        OR
        ius.objectId = :userGuid
        OR
        igs.objectId IN (:groupGuid1, :groupGuid2, :groupGuidN)
    )
GROUP BY i.id
ORDER BY i.id

:userId is current user’s id, :locationId is current user’s location id, :userGuid is current user’s guid and :groupGuid# are current user’s group guids.

Other details

Guids are obtained (and saved in session) on login from 3rd party API. Users can have up to 20 groupGuids so the IN-clause can get quite long.

EXPLAIN for an example user with 20 group guids returns this:

enter image description here

I am not very good at reading the result of that, but I think the problem lies with that 1072 rows and using temporary + filesort.

Is there any way to optimize this query?

  1. I have tried selecting only a subset of columns, it doesn’t improve the speed. I still should do it.
  2. Indexes on items: ownerUserId, type; itemgroupshares: itemId+objectId (composite to maintain uniqueness), itemusershares: itemId+objectId (same as previous).
  3. I could do ordering in application. Removing the ORDER BY changes the EXPLAIN slightly, but it still has the first entry with 1072 rows etc.
  4. I have tried thinking about how to reduce the where clause without returning wrong results but haven’t come up with anything yet.

New EXPLAIN after trying out Rick James’ suggestions (EXISTS):

enter image description here

And the improved query so far:

SELECT i.onlyRequiredColumnsHere 
FROM items i
JOIN users u ON u.id = i.ownerUserId
WHERE u.id != :userId
    AND u.locationId = :locationId
    AND i.type = 'custom'
    AND (
        i.shareToEveryone = 1
        OR EXISTS(SELECT 1 FROM itemusershares ius WHERE ius.itemId = i.id AND ius.objectId = :userGuid)
        OR EXISTS(SELECT 1 FROM itemgroupshares igs WHERE igs.itemId = i.id AND igs.objectId IN (:groupGuid1, :groupGuid2, :groupGuidN))
        )

The suggested indexes on users and items didn’t seem to change anything, and INDEX(itemId, objectId) is already on igs and ius.

I also updated MySQL server from 5.5 to 8.0.

Answer :

(just reformatting)

SELECT  i.*
    FROM  items i
    JOIN  users u  ON u.id = i.ownerUserId
    LEFT JOIN  itemgroupshares igs  ON igs.itemId = i.id
    LEFT JOIN  itemusershares ius  ON ius.itemId = i.id
    WHERE  u.id != :userId
      AND  u.locationId = :locationId
      AND  i.type = 'CUSTOM'
      AND  ( i.shareToEveryone = 1
              OR  ius.objectId = :userGuid
              OR  igs.objectId IN (:groupGuid1, :groupGuid2, :groupGuidN) 
           )
    GROUP BY  i.id
    ORDER BY  i.id

===

Instead of LEFT JOIN ius ON .. ... OR ugs..., do

    OR EXISTS( SELECT 1 FROM ius WHERE ... )

This will (1) look into ius only after failing the previous test(s) in the OR, and (2) will do a faster “semi-join”. (Ditto for igs.)

Text columns that you don’t need hinder performance when doing SELECT *.

The Optimizer might like these indexes:

users:  INDEX(locationId, id)  -- covering and composite
items:  INDEX(type, ownerUserId)  -- composite  (and DROP KEY(type))
igs:    INDEX(itemId, objectId)  -- covering and composite
ius:    INDEX(itemId, objectId)  -- covering and composite

(If you are using InnoDB, you effectively have (locationId, id).)

If igs and ius are many-to-many mapping tables, see advice here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

How big is the data? UUIDs/GUIDs have a preformance problem when the tables are much bigger than RAM. I have softened the impact of that some by (1) decreasing how often those tables are accessed and (2) making them “covering” (aka “Using index”)

Expanding on Rick James’ suggestions. I suggest rewriting the query as follows:

SELECT i.onlyRequiredColumnsHere 
FROM items i
JOIN users u ON u.id = i.ownerUserId
WHERE u.id != :userId
AND u.locationId = :locationId
AND i.type = 'custom'
AND i.shareToEveryone = 1

UNION DISTINCT

SELECT i.onlyRequiredColumnsHere 
FROM items i
JOIN users u ON u.id = i.ownerUserId
WHERE u.id != :userId
AND u.locationId = :locationId
AND i.type = 'custom'
AND EXISTS (
  SELECT 1 FROM itemusershares ius 
  WHERE ius.itemId = i.id 
  AND ius.objectId = :userGuid
)

UNION DISTINCT

SELECT i.onlyRequiredColumnsHere 
FROM items i
JOIN users u ON u.id = i.ownerUserId
WHERE u.id != :userId
AND u.locationId = :locationId
AND i.type = 'custom'
AND EXISTS (
  SELECT 1 FROM itemgroupshares igs 
  WHERE igs.itemId = i.id 
  AND igs.objectId IN (:groupGuid1, :groupGuid2, :groupGuidN)
)

With this combined query, the first 2 UNIONed queries both have all OR or IN clauses eliminated, which allows them to be much more efficient.

To further speed up the first query, add the following index to the items table (you can drop other indexes that match the leftmost columns as they will be redundant at that point): (type,ownerUserId,shareToEveryone)

This will allow the first query to use this new index as a covering index for the items table, at least before the SELECT is used. Ideally, you will include all SELECTed columns on the end of this index as well, but you are obscuring them in your question so I’ll just leave that aside.

You may be able to even further optimize this (potentially) by adding another UNIONed query for each groupGuid value, instead of using the IN clause in the 3rd query. However, this is highly dependent upon your data distribution and will require some trial and error.

For now, start with my rewritten query and see how it performs. You can even run each of the 3 queries separately to see if one is being a major bottleneck, which will help us narrow down where to focus further efforts.

1) Do you need every column from items table? Is there anything you could leave out?

2) What do your indexes look like on items, itemgroupshares and itemusershares?

3) Can the application handle the ordering of the results instead of the order by in SQL?

4) That where clause is really convoluted. I am about to go to meeting but I’d see if you could rewrite that in a way that doesn’t have so many subqueries.

I also like to break it apart piece by piece and see rowcounts being returned as well as how long each section is taking to see the bottlenecks. Execution plan shows you.

Leave a Reply

Your email address will not be published. Required fields are marked *