Question :
I notice that when I am executing queries the first time (i.e., just after mysqld_safe --user=mysql &
), the queries are slower than when they are executed second time. When the query (source) is:
select
s_name, s_address
from
SUPPLIER, NATION
where
s_suppkey in (
select
ps_suppkey
from
PARTSUPP
where
ps_partkey in (
select
p_partkey
from
PART
where
p_name
like
'green%'
) and
ps_availqty > (
select
0.5 * sum(l_quantity)
from
LINEITEM
where
l_partkey = ps_partkey and
l_suppkey = ps_suppkey and
l_shipdate >= date '1993-01-01' and
l_shipdate < date '1993-01-01' + interval '1' year
)
) and
s_nationkey = n_nationkey and
n_name = 'ALGERIA'
order by
s_name;
And I got:
> SELECT .....;
> 41.255s -------- first time after startup
> SELECT .....;
> 6.242s -------- second time after startup
> SELECT .....;
> 6.104s -------- third time after startup
However, in some other times, the difference does not exist. For example, this query(source):
SELECT
s_name, count(*) as numwait
FROM
SUPPLIER, LINEITEM l1, ORDERS, NATION
WHERE
s_suppkey = l1.l_suppkey and
o_orderkey = l1.l_orderkey and
o_orderstatus = 'F' and
l1.l_receiptdate > l1.l_commitdate and
exists (
SELECT *
FROM LINEITEM l2
WHERE
l2.l_orderkey = l1.l_orderkey and
l2.l_suppkey <> l1.l_suppkey
) and
not exists (
SELECT *
FROM LINEITEM l3
WHERE
l3.l_orderkey = l1.l_orderkey and
l3.l_suppkey <> l1.l_suppkey and
l3.l_receiptdate > l3.l_commitdate
) and
s_nationkey = n_nationkey and
n_name = 'EGYPT'
GROUP BY
s_name
ORDER BY
numwait desc, s_name
LIMIT 100;
I got:
> SELECT .....;
> 3m9.264s -------- first time after startup
> SELECT .....;
> 3m9.377s -------- second time after startup
> SELECT .....;
> 3m7.287s -------- third time after startup
>
> ------ This time, the difference do not exist.
MySQL version is 8.0.22
All configurations are default.
Why it is the case?
Thanks.
Answer :
As others have mentioned in the comments, likely the performance improvement on subsequent runs that you saw in your first example is due to the data being cached in memory after your first run. There are multiple types of caches in MySQL, so even if you’re not using the Query Cache, your MySQL instance uses the Table Cache and the buffer_pool
which means the data from your query is actually cached in memory so that subsequent runs of that query don’t need to fetch the data from disk again. This is beneficial if the size of the tables in your query are large.
The reason your second query didn’t see much of a difference in performance on subsequent runs (though I would assume the raw table data was still cached) was because most of the runtime you’re experiencing is probably due to the complexity of the query itself. Some of those complexities are from the EXISTS
& NOT EXISTS
that can be written in a more relationally efficient way, multiple inequality predicates, and the fact you’re doing a COUNT() which may result in scans instead of seek operations, for example. The EXPLAIN ANALYZE
/ query plan for your second query would probably give more insight on specifically which part of it is the complexity that is causing most of the runtime.
Query cache — should not be in use, especially in 8.0
Table cache — this is unlikely to impact performance more than a few milliseconds per table involved in the query.
buffer_pool (cf innodb_buffer_pool_size
) — This is the main caching for data and indexes. The first time after restart none of the desired stuff is in RAM, hence the first run is slow. (If the system is busy, the stuff could be bumped out later, making it slow again.)
A Rule of Thumb: The second run will be about 10 times as fast as the first. (Your example was about 7x; this is “close enough”.) This applies to HDD; SSD shows much less difference.
Your 3-minute run was probably touching so much data that it could not keep it all in RAM (in the buffer_pool) even to finish the query! Hence, the second running of the query had to do lots of the same I/O, hence take just as long. Meanwhile, it was hurting the performance of most of the other queries.
Suggested indexes:
LINEITEM: (l_orderkey, l_suppkey, l_commitdate, l_receiptdate)
NATION??: (n_name, n_nationkey)
SUPPLIER??: (s_nationkey, s_suppkey, s_name)
ORDERS??: (o_orderstatus, o_orderkey)
ORDERS??: (o_orderkey, o_orderstatus)
Please use JOIN...ON
syntax and use aliases:
NATION ... n_name
–>
NATION AS n ... n.name
(For the first query, I recommend avoiding IN ( SELECT ... )
and trying to use JOIN
. INDEX(partkey, suppkey, shipdate)
may be beneficial.)