mysql optimization for intensive schema-altering queries (DDL)

Posted on

Question :

In the context of CI would like to massively parallelise the running of tests. The code under testing require a database to be present, so MySQL is used (as the application goes with that right now).

If using one (big) database server and splitting the test cases into e.g. 50 bins, with each bin going to a different runner, each of those runners will first run migrations that set-up a dedicated schema for that runner. One thing to note is that the tables are not expected to have a lot of data in their short lifespan.

The aim is to optimise the test run times to be as fast as possible, but currently, for various reasons (some unclear, but related to the DB/querying), the timings are not very deterministic.

Therefore my questions are:

  • how can the DB configuration be optimised for such a massively-parallel schema creation scenario? e.g. can schemas, tables, indexes, etc. be created with a small “size”, something that might mean faster execution time on all those create/alter queries?
  • can the queries be optimised somehow? and/or rather, can some parameter-setting queries be run at the very beginning, from the client (with admin access), to tweak it the right way?
  • any other strategy one can think of? even a different DB flavour that might be better for this is a good suggestion, at least for the purpose of the discussion – but strategy here can relate to anything else, not just DB software or engine

Answer :

See pt-online-schema-change, gh-ost, and newer versions with better “instant” ALTER.

Query optimization is a big topic. Let’s discuss specific queries. Use the
SlowLog to find the “worst” queries.

MySQL’s default configuration is virtually optimal. Again, let’s see specific queries to see if there is some subtle tunable that might help.

MySQL has virtually no parallelism. You can, for example, load multiple tables simultaneously in separate threads. Ditto for fetching. These things prevent much parallelism from being very useful:

  • Sorting — DISTINCT, GROUP BY, and ORDER BY tend to be choke points.
  • I/O — this tends to be a single funnel that processing has to pass through. Shrinking data sizes (an 8-byte BIGINT is rarely worth the extra size.)
  • “Composite” and/or “covering” indexes, may provide the best performance boost.
  • PARTITIONing rarely provides any performance benefit. Again, let’s see specifics.
  • The “size” of a table does matter in performance, but it depends on the query and indexes as to whether it is only slightly more than O(1), O(log N), O(N), etc.
  • Do not manually split a table into multiple subtables; this is almost always a bad idea, both for code and for performance.

You can ask for a tuning review; instructions here: http://mysql.rjweb.org/doc.php/mysql_analysis#tuning

See also “ShardQuery” — it is useful in some situations.

Leave a Reply

Your email address will not be published.