I have following trivial SELECT statement:
select * from `web_notifications` where `user_id` = 123456 order by `created_at` limit 500;
The query is pretty simple (no join statements, no aggregate functions). The query is being executed on a high-class AWS RDS Aurora machine (db.r5.4xlarge). Yet, the query executes for 0.70 seconds the first time and for 0.13 seconds afterwards constantly.
Here is my EXPLAIN result (as you can see I am using an index):
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | | -- | ----------- | ----------------- | ---------- | ---- | ------------------------------------------------------------------------ | ------------------------------------- | ------- | ----- | ---- | -------- | --------------------- | | 1 | SIMPLE | web_notifications | NULL | ref | web_notification_users_seen_at_idx,web_notification_users_created_at_idx | web_notification_users_created_at_idx | 4 | const | 912 | 100.00 | Using index condition |
Here is the create table statement:
CREATE TABLE `web_notifications` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `text` varchar(255) NOT NULL, `url` varchar(255) NOT NULL, `seen_at` timestamp NULL DEFAULT NULL, `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `deleted_at` timestamp NULL DEFAULT NULL, `user_id` int(10) unsigned NOT NULL, `operation` varchar(64) DEFAULT NULL, `start_year` smallint(5) unsigned NOT NULL, PRIMARY KEY (`id`), KEY `web_notification_users_seen_at_idx` (`user_id`,`seen_at`), KEY `web_notification_users_created_at_idx` (`user_id`,`created_at`), CONSTRAINT `fk_web_notification_users1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION ) ENGINE=InnoDB AUTO_INCREMENT=2000000000171904234 DEFAULT CHARSET=utf8
The table has 170 million rows.
Am I doing something wrong or I should look for an alternative storage solution (probably, noSQL) for such highly-loaded and highly-intensive table (as my table
web_notifications)? 0.70 is absolutely unacceptable execution time for such straightforward query.
P.S. I removed the
order by phrase and the query time didn’t fell down significantly.
The best thing you could do to improve query performance for that query is to change your clustered index to an actual key and not an auto-incrementing row identifier.
Right now, all records for a given
user are spread out across the pages of your table roughly based on order of insert. This is not optimal as to locate these records you will have to:
- Traverse a b-tree on a secondary index (not a huge deal)
- Read all the pages with entries for that
user, which could be a great deal more than just 1 or 2.
So assuming your key is
(user_id, created_at)1, you would set your table up as:
CREATE TABLE `web_notifications` ( `user_id` int(10) unsigned NOT NULL, `text` varchar(255) NOT NULL, `url` varchar(255) NOT NULL, `seen_at` timestamp NULL DEFAULT NULL, `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `deleted_at` timestamp NULL DEFAULT NULL, `operation` varchar(64) DEFAULT NULL, `start_year` smallint(5) unsigned NOT NULL, CONSTRAINT PK_webnotifications PRIMARY KEY (`user_id`,`created_at`), /* KEY `web_notification_users_seen_at_idx` (`user_id`,`seen_at`), */ /* This index would probably not be necessary or even utilzed with the new primary key */ CONSTRAINT `fk_web_notification_users1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION ) ENGINE=InnoDB DEFAULT CHARSET=utf8
This has the added benefits of:
- Allowing the engine to directly access the rows via the clustered index
- Saving a 8 bytes per row
- Actual uniqueness
- Potentially eliminating an additional index
However, there will be the need to additional maintenance, namely periodically rebuilding the table if performance degrades due to index fragmentation2. Fragmentation may be managed by tweaking the
fill_factor, but that will depend on the maturity of the table, the distribution of activity by
user_id, and how long you retain data in the table3.
Other things to consider:
- Don’t do
SELECT *– don’t waste I/O on columns you don’t need.
- If you haven’t already, wrap the query in a stored procedure. Better from a security standpoint, and having the plan cached should save some compilation time.
- You probably don’t need
deleted_at. Upon delete, just change the value for
updated_atand set a flag IsDeleted = 1. It will be understood that the value for
updated_atwas the time the record was deleted.
operationlooks like it might be something that has a limited number of values – consider creating a shorthand code for each of those and store that in a reference table.
start_yearis derived from another column, there is no need to persist it. If it doesn’t change depending on information in
web_notificationsconsider moving it to another table.
- You still may be limited by network I/O if you’re returning a large amount of data, this is just the reality of being in the cloud. But at least you’ll be eliminating all other bottlenecks.
1 If there is an additional column required for uniqueness, just add it to the key. Depending on the queries you run (and how many records each user has), it may or may not make sense to keep
created_at as the last column in the index or the one immediately following
2 Or don’t bother as Rick James mentions in the comments. He has more MySQL experience so I’ll defer to him.
3 Tweaking fill factor in other DBMSs with clustered indexes can reduce the rate at which a table fragments, but again I’ll defer to Rick.
If you don’t need to use the
ORDER BY clause then yes that should help a little bit, additionally stop using
SELECT * and provide the actual column list of only the columns you’ll need. It’ll reduce the amount of data that needs to be loaded, and potentially result in a faster execution plan being generated. Providing the
EXPLAIN might be helpful to include in your question as well.
What kind of application is this where sub-second return time is unacceptable for 500 rows for a particular user (especially 0.13s once cached)?… depending on the width of your table, there’s a certain point where it’s just an amount of data being returned problem (which again selecting only the columns you need) should help and reducing the data any other way or upping your provisioned server’s hardware become your only choices. (You can also look into optimizing the busyness of your server from other concurrent queries, but I doubt that’s the issue if you’re already sub-second.)
If it is the case this is an amount of data problem then even a NoSQL database won’t help you (and isn’t a system intended to solve performance problems anyway).