Why using innodb_file_per_table?

Posted on

Question :

There are many articles exaggerating (IMHO of course) the need for innodb_file_per_table. I understand that with innodb_file_per_table, there should be a better control over the individual tables; like backup each table separately. However, the claim for better performance is questionable.

In my test, there is no difference in performance of innodb_file_per_table and ibdata1 for a database of 60GB. Of course, it was a simple test with normal queries, and the situation can be different for complicated queries in real life (this is the reason that I asked this question). 64-bit linux with ext4 can effectively handle large files.

With innodb_file_per_table, more disk I/O operations are needed; and this is significant in complicated JOINs and FOREIGN KEY constraints.

Tablespace is shared on single ibdata; how dedicated tablespaces for separate tables can save disk space? Of course, it is easier to free table space for each table with ALTER, but it is still an expensive process (with table lock).

QUESTION: Does innodb_file_per_table has an effect on a better performance of mysql? If yes, why?

Answer :

I don’t think it’s a matter of performance but of management.

With separate file per table, you can store different databases in different storage devices for example.

You can deal with the case of very large databases in file systems that can’t handle big files (at least postpone the problem until one table reaches the file size limit).

You don’t have uncontrolled tablespace growth. If you have some big tables that you drop, the ibdata file stays small.

One aspect that may have some effect on performance is the fragmentation of table data and indexes, which will be limited per table. But that needs testing to be confirmed.

Why using innodb_file_per_table?

Because it is easier to manage individual since it can be done at the file-level. This means that even if the server is down, you can still copy data by copying the table files whereas using a shared table-space means either copying everything which can be unnecessarily massive, or finding some way to get the server running to extract data (you really don’t want to manually extract the data with a hex-editor).

Someone warned that you cannot simply copy and paste .ibd files from one server to another. This may be true, but it should not apply to backups on the same server (I am using the term backup here in the traditional sense of making a copy; i.e., not drastically changing the whole thing). Moreover, ibdata1 is automatically recreated on startup (as seen in the delete ibdata1 step of most “converting to file-per-table” guides). As such, you do not need to copy ibdata1 in addition to your .ibd files (and their corresponding .frm, etc. files).

If trying to recover a lost table, it should be sufficient to copy its .ibd and .frm file, as well as information_schema (which is much smaller than ibdata1). That way, you can put them in a dummy server and extract your table without having to copy the whole, massive thing.

However, the claim for better performance is questionable. … With innodb_file_per_table, more disk I/O operations are needed; and this is significant in complicated JOINs and FOREIGN KEY constraints.

Not surprisingly, the performance will depend entirely on the specific database(s) in use. One person will have (even vastly) different results from another.

It is true that there will be more disk I/O operations with file-per-table, but only slightly more. Think about how the system works.

  • For a monolithic database:

    1. Server is started
    2. ibdata1 is opened
    3. Header and meta-data are read
    4. Structures and meta-data are cached in memory
    5. Queries happen
      1. Server accesses the disk and reads the data from the already opened ibdata1
      2. Server may cache the data in memory
  • For a per-table database:

    1. Server is started
    2. ibdata1 is opened
    3. Header and meta-data are read
    4. Each individual .ibd file is opened
    5. Header and meta-data are read from each .ibd file
    6. Structures and meta-data are cached in memory
    7. Queries happen
      1. Server access the disk and reads the data from the already opened .ibd file
      2. Server may cache the data in memory

You will notice that when the server is running, you cannot move the data files because the server has open handles to them. This is because when it starts up, it opens them and leaves them open. It does not open and close them for each individual query.

As such, there is only some more I/O operations at the beginning, when the server starts up; not while it is running. Further, while each individual .ibd file has its own separate overhead (file signatures, structures, etc.), they are cached in memory and not re-read for each query. Moreover, the same structures are read even with a shared table-space, so there is barely any (if any at all) more memory required.

Does innodb_file_per_table has an effect on a better performance of mysql?

Actually, if anything, the performance may in fact be worse.

When using a shared table-space, read and write operations can sometimes/often be combined so that the server reads a swatch of data from multiple tables in one go from ibdata.

However, if the data is spread out amongst multiple files, then it has to perform a separate I/O operation for each one individually.

Of course this is again entirely dependent on the database in question; the real-world performance impact would depend on size, query frequency, and internal fragmentation of the shared table-space. Some people may notice a large difference while others may not see any impact at all.

Tablespace is shared on single ibdata; how dedicated tablespaces for separate tables can save disk space?

It does not. If anything, it increases disk usage some.

I don’t have a 60GB database to test with, but my “paltry” personal database which contains my WordPress installation and some a few small tables for personal use and development testing weighed in at ~30MB while using a shared table-space. After converting it to file-per-table, it bloated to ~85MB. Even by dropping everything and re-importing, it was still >60MB.

This increase is due to two factors:

  • The absolute minimum size for ibdata1 is—for some reason—10MB, even if you have nothing but information_schema stored in it.

  • With a shared table-space, only ibdata1 has overhead like file signatures, meta-data, etc., but with per-table, each individual .ibd file has all of that. This means that the total (even with a hypothetical <10MB ibdata1) would be somewhat larger by at least:

    GetTotalSizeofOverhead() * GetNumTables()
    

Obviously these are not going to be huge increases (unless you are using a host that limits your database size or storing them on a flash-drive, etc.), but they are increases nonetheless, and while by switching (every) table to file-per-table you can shrink ibdata1 down to 10MB, the overall total will invariably be more than it was.

This is my reason for ALWAYS using innodb_file_per_table:

Without file per table, the ibdata file never compresses or shrinks or lessens in space ever. Not when you delete a row, drop a table, or a database. 2GB of data can become a 20GB file in no time if you have an active queuing system.

Let’s say you want to make a backup of your current 1GB table before an alter, then drop it afterward. You’re stuck with a GB of now unused space in your ibdata. Bummer.

There’s probably endless examples of instances where temporary measures inflate the single data file, but suffice it to say that in my opinion, there’s never a reason to NOT use innodb_file_per_table

Also, here’s a good post to read:
http://code.openark.org/blog/mysql/reasons-to-use-innodb_file_per_table

My reason why not to use innodb_file_per_table is performance.

I did some tests for our database with 450 tables on mysql 5.5.45 Linux CentOS release 6.7

For unit tests which inserts fixtures into the database before each test (not using all tables everytime) and also tests itself does work with database a lot (inserts, update, deletes, selects) the performance was 3-5x better when database tables were not separated into more files.

I recommend to test your database with queries you want to use and compare it before you decide to use innodb_file_per_table

Maybe you can find out that for production server you can use innodb_file_per_table but for CI environment (continues integration) which starts unit tests (uses DB a lot) and also developers who starts unit tests a lot is better not to use it because of performance.

It makes the data more manageable because you can reclaim unused space, which is nice.

I think if your database is used mostly for select queries it won’t affect the performance much. It still has to read about the same amount of data. I don’t think it matters much what files it’s reading the data from.

However, it may make the performance worse on a database that does a lot of inserts and updates. This is because mysql calls fsync() on the storage file after you commit a transaction. If there is a single file it makes one call and waits for the call to complete. If there are many files it has to make the call multiple times and wait for all those calls to return before the commit command can return.

Here is a post from someone that experienced this issue: http://umangg.blogspot.com/2010/02/innodbfilepertable.html

As per below article, the performance isn’t about managing data (crud operations itself) but rather about creation and dropping objects.

innodb_file_per_table makes the massive creation and dropping objects slower than ibdata storage and for production isn’t applicable but for continuous test should be relevant.

https://www.percona.com/blog/2015/02/24/mysqls-innodb_file_per_table-slowing/

Leave a Reply

Your email address will not be published. Required fields are marked *