Do varchar/text/longtext columns in MySQL occupy full size in filesystem even for partially filled cells?

Posted on

Question :

Do varchar/text/longtext columns in MySQL occupy full length size in filesystem even for incomplete cells?

Referring to this blog article for example I have the following columns and i have put in a given string in it.

CHAR (72) - can store upto 72 characters - i put in a string "abcd"
VARCHAR (72) - can store upto 72 characters - i put in a string "abcd"
TINYTEXT - can store upto 255 characters - i put in a string "abcd"
TEXT - can store upto 65,535 characters - i put in a string "abcd"
LONGTEXT - can store upto 4,294,967,295 characters - i put in a string "abcd"

How much actual physical space will be occupied by each of these columns for a row? The full size upto which each column is capable of storing? Or only as much as required for storing “abcd”?

Answer :

‘abcd’ in CHAR(72) CHARACTER SET ascii occupies 72 bytes on disk.
‘abcd’ in CHAR(72) CHARACTER SET utf8 occupies 3*72 bytes on disk.
‘abcd’ in CHAR(72) CHARACTER SET utf8mb4 occupies 4*72 bytes on disk.
‘abcd’ in VARCHAR(72) occupies 1+4 bytes on disk.
‘abcd’ in TINYTEXT occupies 1+4 bytes on disk.
‘abcd’ in TEXT occupies 2+4 bytes on disk.
‘abcd’ in LONGTEXT occupies 4+4 bytes on disk.

Update: In some versions, InnoDB will store only 4 bytes for the CHAR cases.

Notes:
The 1,2,4 is for length; VARCHAR could be 2 in some situations.
The CHARACTER SET is important in all the cases, but does not impact the space occupied by ‘abcd’ except for CHAR.

@akuzminsky — You are wrong about the *3. CHAR(N) and VARCHAR(N) can hold up to N characters in the declared CHARACTER SET. That will become up to 3*N bytes for CHAR or 1+3*N for VARCHAR.

TEXT is limited to 65535 bytes (plus 2-byte length).

There is overhead on top of the lengths described above. In InnoDB, there are record flags, transaction ids, blocking factors, pointers to overflow areas, etc, etc. So, if you use this kind of arithmetic, it will underestimate the disk space used.

MyISAM is more frugal; it has at least 1 byte of overhead per record; in some cases, only 1.

And now, from the horse’s mouth – the MySQL documentation – all you’ve ever wanted to know about how much storage is used per type. AFAIK, there is no function which will return the storage used – see below for length and char_length. You could possibly devise a UDF to do this, though I fail to see any pressing interest.

Storage Requirements for String Types (InnoDB tables).

In the following table, M represents the declared column length in characters for nonbinary string types and bytes for binary string types. L represents the actual length in bytes of a given string value.

Data Type and Storage Required

  • CHAR(M) M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set. See Section 14.2.6.7, “Physical Row Structure” for information about CHAR data type storage requirements for InnoDB tables.

  • BINARY(M) M bytes, 0 <= M <= 255

  • VARCHAR(M), VARBINARY(M) L + 1 bytes if column values require 0 − 255 bytes, L + 2 bytes if values may require more than 255 bytes

  • TINYBLOB, TINYTEXT L + 1 bytes, where L < 28

  • BLOB, TEXT L + 2 bytes, where L < 216

  • MEDIUMBLOB, MEDIUMTEXT L + 3 bytes, where L < 224

  • LONGBLOB, LONGTEXT L + 4 bytes, where L < 232

  • ENUM('value1','value2',...) 1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum)

  • SET('value1','value2',...) 1, 2, 3, 4, or 8 bytes, depending on the number of set members (64 members maximum)

You might also be interested in this

LENGTH() returns the length of the string measured in bytes. 
CHAR_LENGTH() returns the length of the string measured in characters.

if you want to find the length in either bytes or characters of a given field.

Leave a Reply

Your email address will not be published. Required fields are marked *