Should frequently accessed tables containing large blobs with one-to-one relationships be normalized and columns split into two tables?

Posted on

Question :

I have a frequently accessed table containing 3 columns of blobs, and 4 columns of extra data that is not used in the query, but just sent as result to PHP. There are 6 small columns (big int, small int, tiny int, medium int, medium int, medium int) that are used in the queries in the WHERE/ORDER BY/GROUP BY.

The server has very low memory, around 1GBs, and so the cache is not enough to improve the performance one on the large table. I’ve indexed the last 6 small columns, but it doesn’t seem to be helping.

Would it be a good solution to split this large table into two?
One table containing the last 6 columns, and the other containing the blobs and extra data, and link it to the previous table with a foreign key that has a one to one relationship?
I’ll then run the queries on the small table, and join the little number of rows remaining after filtering to the table with the blobs and extra data to return them to PHP.

Please note, I’ve already done this, and I managed to decrease the query time from 1.2-1.4 seconds to 0.1-0.2 seconds. However I’m not sure if the solution I’ve tried is considered good practice, or is even advisable at all?

Answer :

Using InnoDB and the Barracuda file format permits you to deploy the Dynamic row_format. This configuration will aid you due to the way it will store the BLOBs off page. Queries not using the columns with the BLOBs shouldn’t need to touch their pages;

http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-row-format-dynamic.html

Splitting BLOG fields into another table is generally a pretty good idea – so you are on the right track. It benefits you by making sure that the stuff you care about being in memory (the index) stays there and doesn’t get polluted by the oversized BLOB pages.

How large are your BLOBs? Might it be worth storing them outside the database in a document NoSQL like store?

How do your queries look like?

Are you sure that the IO isn’t the bottleneck? Do you read the data sequentially or all selects are random? If there is a lot of multiple-row selects, it would be possible to speed up the retrieval with proper primary key. That is unless your DB resides on a SSD. Ordinary HDD’s are limited to 150-200 random accesses per second.

Leave a Reply

Your email address will not be published. Required fields are marked *