Question :
I have a table for users in a social networking site. I want it to be efficient from ground up. If I put mentioned fields on the table then I have to update user’s table on each changes. Should I have another table for fields like these that change frequently? Should I be concerned about fragmentation?
Is there any better approach that I’m unaware of?
Answer :
It most certainly will cause fragmentation, but you must compare it with normalization.
FRAGMENTATION
Fragmentation is introduced in a heavy-write environment.
- DELETEs automatically create empty space by at least the size of the row at the time of deletion.
- UPDATEs can also cause fragmentation, mostly notably on variable-length data.
Further Links on InnoDB Fragmentation and how to Eliminate it from InnoDB data and system tablespace
- http://www.bluegecko.net/mysql/innodb-tablespace-fragmentation-find-it-and-fix-it/
- How do you remove fragmentation from InnoDB tables? (April 11, 2012)
- Howto: Clean a mysql InnoDB storage engine? (My StackOverflow Post From Oct 29, 2010)
NORMALIZATION
Any columns you have in a user table that are immutable (i.e., will never experience changes) should act as the main table. Things such as
- Address Change
- Spelling Correction
- Last Name Change (in case a lady get married)
will produce very little fragmentation since changes of those kinds are rare.
Any information that logs frequent changes to user information should go into a userinfo
table. This will separate fragmentation issues from the immutable user data. You can easily defrag an InnoDB userinfo
table with one of the following
ALTER TABLE userinfo ENGINE=InnoDB;
OPTIMIZE TABLE userinfo;
CONCLUSION
You should split up the user data into user
and userinfo
tables. A simple INNER JOIN will combine them as needed.
I suggest you follow normalization rules. This will minimize the write and update processing (and errors) and simplifies coding of the logic of write and update routines. You can tune reads in many ways. You could use materialized views or indexed views to keep track of counts without queering the records one by one. Don’t sacrifice normalization for no obvious reasons.