I notice that when I am executing queries the first time (i.e., just after
mysqld_safe --user=mysql &), the queries are slower than when they are executed second time. When the query (source) is:
select s_name, s_address from SUPPLIER, NATION where s_suppkey in ( select ps_suppkey from PARTSUPP where ps_partkey in ( select p_partkey from PART where p_name like 'green%' ) and ps_availqty > ( select 0.5 * sum(l_quantity) from LINEITEM where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date '1993-01-01' and l_shipdate < date '1993-01-01' + interval '1' year ) ) and s_nationkey = n_nationkey and n_name = 'ALGERIA' order by s_name;
And I got:
> SELECT .....; > 41.255s -------- first time after startup > SELECT .....; > 6.242s -------- second time after startup > SELECT .....; > 6.104s -------- third time after startup
However, in some other times, the difference does not exist. For example, this query(source):
SELECT s_name, count(*) as numwait FROM SUPPLIER, LINEITEM l1, ORDERS, NATION WHERE s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( SELECT * FROM LINEITEM l2 WHERE l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( SELECT * FROM LINEITEM l3 WHERE l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'EGYPT' GROUP BY s_name ORDER BY numwait desc, s_name LIMIT 100;
> SELECT .....; > 3m9.264s -------- first time after startup > SELECT .....; > 3m9.377s -------- second time after startup > SELECT .....; > 3m7.287s -------- third time after startup > > ------ This time, the difference do not exist.
MySQL version is 8.0.22
All configurations are default.
Why it is the case?
As others have mentioned in the comments, likely the performance improvement on subsequent runs that you saw in your first example is due to the data being cached in memory after your first run. There are multiple types of caches in MySQL, so even if you’re not using the Query Cache, your MySQL instance uses the Table Cache and the
buffer_pool which means the data from your query is actually cached in memory so that subsequent runs of that query don’t need to fetch the data from disk again. This is beneficial if the size of the tables in your query are large.
The reason your second query didn’t see much of a difference in performance on subsequent runs (though I would assume the raw table data was still cached) was because most of the runtime you’re experiencing is probably due to the complexity of the query itself. Some of those complexities are from the
NOT EXISTS that can be written in a more relationally efficient way, multiple inequality predicates, and the fact you’re doing a COUNT() which may result in scans instead of seek operations, for example. The
EXPLAIN ANALYZE / query plan for your second query would probably give more insight on specifically which part of it is the complexity that is causing most of the runtime.
Query cache — should not be in use, especially in 8.0
Table cache — this is unlikely to impact performance more than a few milliseconds per table involved in the query.
innodb_buffer_pool_size) — This is the main caching for data and indexes. The first time after restart none of the desired stuff is in RAM, hence the first run is slow. (If the system is busy, the stuff could be bumped out later, making it slow again.)
A Rule of Thumb: The second run will be about 10 times as fast as the first. (Your example was about 7x; this is “close enough”.) This applies to HDD; SSD shows much less difference.
Your 3-minute run was probably touching so much data that it could not keep it all in RAM (in the buffer_pool) even to finish the query! Hence, the second running of the query had to do lots of the same I/O, hence take just as long. Meanwhile, it was hurting the performance of most of the other queries.
LINEITEM: (l_orderkey, l_suppkey, l_commitdate, l_receiptdate) NATION??: (n_name, n_nationkey) SUPPLIER??: (s_nationkey, s_suppkey, s_name) ORDERS??: (o_orderstatus, o_orderkey) ORDERS??: (o_orderkey, o_orderstatus)
JOIN...ON syntax and use aliases:
NATION ... n_name
NATION AS n ... n.name
(For the first query, I recommend avoiding
IN ( SELECT ... ) and trying to use
INDEX(partkey, suppkey, shipdate) may be beneficial.)