Question :
I have a Database having number of tables.
I want to delete some records from the tables say the no of records are more than 20K or 50K.
The All the Tables are InnoDB. And file_per_table
is off.
When I will delete the records from a number of tables there will be fragmentation in the tables.
Is there any way to remove the fragmentation.?
Update On 17th April
mysql> select TABLE_NAME, TABLE_SCHEMA, Data_free from information_schema.TABLES where TABLE_SCHEMA NOT IN ('information_schema', 'mysql') and Data_Free >0;
+-----------------+--------------+-----------+
| TABLE_NAME | TABLE_SCHEMA | Data_free |
+-----------------+--------------+-----------+
| City | world_innodb | 5242880 |
| City_Copy | world_innodb | 5242880 |
| Country | world_innodb | 5242880 |
| CountryLanguage | world_innodb | 5242880 |
| a | world_innodb | 5242880 |
| t1 | world_innodb | 5242880 |
| t2 | world_innodb | 5242880 |
+-----------------+--------------+-----------+
7 rows in set (0.00 sec)
So Now my Question is that how i will decide that my tables are fragmented or not.
Answer :
I have addressed this in StackOverflow back in October 2010.
Keep in mind the busiest file in the InnoDB infrastructure : /var/lib/mysql/ibdata1
This file normally houses four types of information
- Table Data
- Table Indexes
- MVCC (Multiversioning Concurrency Control) Data
- Table Metadata (List of tablespace IDs)
Running OPTIMIZE TABLE
against an InnoDB table stored in ibdata1 does two things:
- Makes the table’s data and indexes contiguous inside ibdata1, thus faster to access
- It makes ibdata1 grow because the contiguous data and index pages are appended to ibdata1
While you can segregate Table Data and Table Indexes from ibdata1 and manage them independently using innodb_file_per_table, the big gaping whole of diskspace in ibdata1 simply won’t disaapear and cannot be reclaimed. You must do more.
To shrink ibdata1 once and for all you must do the following:
1) MySQLDump all databases into a SQL text file (call it /root/SQLData.sql)
2) Drop all databases (except mysql schema)
3) Shutdown mysql
4) Add the following lines to /etc/my.cnf
[mysqld]
innodb_file_per_table
innodb_flush_method=O_DIRECT
innodb_log_file_size=1G
innodb_buffer_pool_size=4G
Sidenote: Whatever your set for innodb_buffer_pool_size, make sure innodb_log_file_size is 25% of innodb_buffer_pool_size.
5) Delete ibdata1, ib_logfile0 and ib_logfile1
At this point, there should only be the mysql schema in /var/lib/mysql
6) Restart mysql
This will recreate ibdata1 at 10 or 18MB (depending onthe version of MySQL), ib_logfile0 and ib_logfile1 at 1G each
7) Reload /root/SQLData.sql into mysql
ibdata1 will grow but only contain table metadata. In fact, it will grow very slowly over the years. The only way ibdata1 growth quickly is if you have one or more of the following:
- A lot of DDL (
CREATE TABLE
,DROP TABLE
,ALTER TABLE
) - A lot of transactions
- A lot of changes to commit per transaction
Each InnoDB table will exist outside of ibdata1
Suppose you have an InnoDB table named mydb.mytable. If you go into /var/lib/mysql/mydb, you will see two files representing the table
- mytable.frm (Storage Engine Header)
- mytable.ibd (Home of Table Data and Table Indexes for mydb.mytable)
ibdata1 will never contain InnoDB data and Indexes anymore.
With the innodb_file_per_table option in /etc/my.cnf, you can run OPTIMIZE TABLE mydb.mytable;
and the file /var/lib/mysql/mydb/mytable.ibd will actually shrink.
I have done this many times in my career as a MySQL DBA
In fact, the first time I did this, I collapsed a 50GB ibdata1 file into 500MB.
Give it a try. If you have further questions on this, email me. Trust me. This will work in the short term and over the long haul !!!
UPDATE 2012-04-19 09:23 EDT
After running the above steps, how can you determine what tables need to be defragmentated? It is possible to find out, but you will have script it.
Here is an example: Suppose you have the table mydb.mytable
. With innodb_file_per_table enabled, you have the file /var/lib/mysql/mydb/mytable.ibd
You will have to retrieve two numbers
FILESIZE FROM OS : You can ascertain the filesize from the OS like this
ls -l /var/lib/mysql/mydb/mytable.ibd | awk '{print $5}'
FILESIZE FROM INFORMATION_SCHEMA : You can ascertain the filesize from information_schema.tables like this:
SELECT (data_length+index_length) tblsize FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable';
Just subtract INFORMATION_SCHEMA value from OS value and divide the difference by the INFORMATION_SCHEMA value.
From there you would decide what percentage deems it necessary to defrag that table. Of course, you defrag it using one of the following commands:
OPTIMIZE TABLE mydb.mytable;
or
ALTER TABLE mydb.mytable ENGINE=InnoDB;
If you frequently delete rows (or update rows with variable-length data types), you can end up with a lot of wasted space in your data file(s), similar to filesystem fragmentation.
If you’re not using the innodb_file_per_table
option, the only thing you can do about it is export and import the database, a time-and-disk-intensive procedure.
But if you are using innodb_file_per_table
, you can identify and reclaim this space!
Prior to 5.1.21, the free space counter is available from the table_comment column of information_schema.tables. Here is some SQL to identify tables with at least 100M (actually 97.65M) of free space:
SELECT table_schema, table_name, table_comment FROM
information_schema.tables WHERE ENGINE LIKE ‘InnoDB’ AND
table_comment RLIKE ‘InnoDB free: ([0-9]{6,}).*’ ;
Starting with 5.1.21, this was moved to the data_free column (a much more appropriate place):
SELECT table_schema, table_name, data_free / 1024 / 1024 AS
data_free_MB FROM information_schema.tables WHERE ENGINE LIKE
‘InnoDB’ AND data_free > 100 * 1024 * 1024 ;
You can reclaim the lost space by rebuilding the table. The best way to do this is using ‘alter table’ without actually changing anything:
ALTER TABLE `TableName` ENGINE=InnoDB;
This is what MySQL does behind the scenes if you run ‘optimize table’ on an InnoDB table. It will result in a read lock, but not a full table lock. How long it takes is completely dependent on the amount of data in the table (but not the size of the data file). If you have a table with a high volume of deletes or updates, you may want to run this monthly, or even weekly.