Question :
I’m trying to decide whether to normalize a thing further, and need to know what the smallest a row in InnoDB can be (I’ve got MySQL 5.6).
This is not exactly my case, but for simplicity, let’s assume I just have a surrogate PK (bigint) and two TINYINT columns, neither of which can be null. How many bytes can I expect each row to need for storage if I’m gonna have 1 billion of these?
I’d imagine each row requires some metadata (I know it does at least for when there are NULLable columns), but can’t seem to find the documentation on how to calculate it.
Then, how much can I expect to be eaten by the free space in the page, and the metadata in the page (or is that all together)?
I want to know so that I can better judge when I would have a long table that is narrow enough to avoid proper normalization.
Answer :
Each row in InnoDB (let assume COMPACT format) has these headers:
- Offsets – one or two byte per variable length type field.
- NULL flags – one bit per NULL-able field, aligned to a whole number of bytes.
- So called “Extra bytes” – a bunch of flags like is_deleted, pointer to next record ect. Five bytes in COMPACT format, fixed.
Then comes primary key field(s).
Then 6 bytes transaction id and 7 bytes rollback pointer.
Then remaining field values.
For the given row (BIGINT, TINYINT, TINYINT) it’s size will be:
- 0 bytes offsets
- 0 bytes NULL flags
- 5 bytes “Extra bytes”
- 8 bytes BIGINT
- 6 bytes trx_id
- 7 bytes roll_ptr
- 1 byte TINYINT
- 1 byte TINYINT
So, 28 bytes per record.
But to calculate how much space the table will take you need to take into account:
- PAGE header (38+56 bytes)
- page trailer
- page fill factor
- infinum, supremum records
- space used by non-leaf pages
Check these slides for more details
http://www.slideshare.net/akuzminsky/data-recovery-talk-on-pluk
(I’m adding an “answer”, not a comment for formatting reasons.)
@akuzminsky covered most things quite thoroughly.
Why a BIGINT? Won’t an INT UNSIGNED (max value 4 billion) suffice? That would saver 4GB.
The fill factor comes in two flavors —
If the writes are cleanly done, then the blocks are 15/16 full. Grand total might be 40-45GB
If there is random INSERTs, then the blocks average 69% full.
Now. let’s ask why this layout? The two TINYINTs will be repeated a lot in a billion rows. (In particular, at least 1B/(256*256) = some will pairs will be repeated at least 4K times.) Maybe you could have only 65K rows, and have counters in them? Maybe some other summarization technique would work for your application? (OK, maybe this table is artificial.)
BTW, MyISAM would take 11GB for the data for that table (assuming no holes). Plus another 15-20GB for the PRIMARY KEY.