what is the difference between first and second execution of a query?

Posted on

Question :

I notice that when I am executing queries the first time (i.e., just after mysqld_safe --user=mysql &), the queries are slower than when they are executed second time. When the query (source) is:

select 
    s_name, s_address 
from 
    SUPPLIER, NATION 
where 
    s_suppkey in ( 
        select 
            ps_suppkey 
        from 
            PARTSUPP 
        where 
            ps_partkey in (
                select 
                    p_partkey 
                from 
                    PART 
                where 
                    p_name 
                like 
                    'green%'
            ) and 
            ps_availqty > (
                select 
                    0.5 * sum(l_quantity) 
                from 
                    LINEITEM 
                where 
                    l_partkey = ps_partkey and 
                    l_suppkey = ps_suppkey and 
                    l_shipdate >= date '1993-01-01' and 
                    l_shipdate < date '1993-01-01' + interval '1' year
            )
    ) and 
    s_nationkey = n_nationkey and 
    n_name = 'ALGERIA' 
order by 
    s_name;

And I got:

> SELECT .....;
>   41.255s  -------- first time after startup
> SELECT .....; 
>   6.242s   -------- second time after startup
> SELECT .....;
>   6.104s   -------- third time after startup

However, in some other times, the difference does not exist. For example, this query(source):

SELECT 
    s_name, count(*) as numwait 
FROM 
    SUPPLIER, LINEITEM l1, ORDERS, NATION 
WHERE 
    s_suppkey = l1.l_suppkey and 
    o_orderkey = l1.l_orderkey and 
    o_orderstatus = 'F' and 
    l1.l_receiptdate > l1.l_commitdate and 
    exists ( 
        SELECT * 
        FROM LINEITEM l2 
        WHERE 
            l2.l_orderkey = l1.l_orderkey and 
            l2.l_suppkey <> l1.l_suppkey
    ) and 
    not exists (
        SELECT * 
        FROM LINEITEM l3 
        WHERE 
            l3.l_orderkey = l1.l_orderkey and 
            l3.l_suppkey <> l1.l_suppkey and 
            l3.l_receiptdate > l3.l_commitdate
    ) and 
    s_nationkey = n_nationkey and 
    n_name = 'EGYPT' 
GROUP BY 
    s_name
ORDER BY 
    numwait desc, s_name 
LIMIT 100;

I got:

> SELECT .....;
>   3m9.264s -------- first time after startup
> SELECT .....;
>   3m9.377s -------- second time after startup
> SELECT .....;
>   3m7.287s -------- third time after startup
>
> ------ This time, the difference do not exist.

MySQL version is 8.0.22
All configurations are default.

Why it is the case?

Thanks.

Answer :

As others have mentioned in the comments, likely the performance improvement on subsequent runs that you saw in your first example is due to the data being cached in memory after your first run. There are multiple types of caches in MySQL, so even if you’re not using the Query Cache, your MySQL instance uses the Table Cache and the buffer_pool which means the data from your query is actually cached in memory so that subsequent runs of that query don’t need to fetch the data from disk again. This is beneficial if the size of the tables in your query are large.

The reason your second query didn’t see much of a difference in performance on subsequent runs (though I would assume the raw table data was still cached) was because most of the runtime you’re experiencing is probably due to the complexity of the query itself. Some of those complexities are from the EXISTS & NOT EXISTS that can be written in a more relationally efficient way, multiple inequality predicates, and the fact you’re doing a COUNT() which may result in scans instead of seek operations, for example. The EXPLAIN ANALYZE / query plan for your second query would probably give more insight on specifically which part of it is the complexity that is causing most of the runtime.

Query cache — should not be in use, especially in 8.0

Table cache — this is unlikely to impact performance more than a few milliseconds per table involved in the query.

buffer_pool (cf innodb_buffer_pool_size) — This is the main caching for data and indexes. The first time after restart none of the desired stuff is in RAM, hence the first run is slow. (If the system is busy, the stuff could be bumped out later, making it slow again.)

A Rule of Thumb: The second run will be about 10 times as fast as the first. (Your example was about 7x; this is “close enough”.) This applies to HDD; SSD shows much less difference.

Your 3-minute run was probably touching so much data that it could not keep it all in RAM (in the buffer_pool) even to finish the query! Hence, the second running of the query had to do lots of the same I/O, hence take just as long. Meanwhile, it was hurting the performance of most of the other queries.

Suggested indexes:

LINEITEM:  (l_orderkey, l_suppkey, l_commitdate, l_receiptdate)
NATION??:  (n_name, n_nationkey)
SUPPLIER??:  (s_nationkey, s_suppkey, s_name)
ORDERS??:  (o_orderstatus, o_orderkey)
ORDERS??:  (o_orderkey, o_orderstatus)

Please use JOIN...ON syntax and use aliases:

NATION ... n_name

–>

NATION AS n ... n.name

(For the first query, I recommend avoiding IN ( SELECT ... ) and trying to use JOIN. INDEX(partkey, suppkey, shipdate) may be beneficial.)

Leave a Reply

Your email address will not be published. Required fields are marked *