Varchar indexing

Posted on

Question :

I am very rusty with MySQL and I’d like some advice.

I have several software tools that are deployed throughout the company and I am writing a DLL to keep track of the usage, errors and stuff of some of these tools.

One of the things I wanna keep track, is information about the computer’s processor, more specifically the string returned by the CPUID command, something like: Intel(R) Xeon(R) CPU E3-1270 V2 @ 3.50GHz

We don’t have a very huge diversity of processors (I think) and we have 6000+ workstations, my questions are:

  1. In this scenario, if I create a VARCHAR(64) column in the table and make an index of it, is this a good solution?

  2. How will MySQL store this? Say I have only 10 different processors, so only 10 different strings, will each row store the whole of the string, or just an ‘id’?

  3. Would GROUP BY be efficient using this indexed column?

I’m using MySQL 5.1 (I know it’s old…) and InnoDB tables

Thanks!

Answer :

You seem to be thinking of column-based storage systems (or you may come from that background).

MySQL (InnoDB, in particular) stores its rows clustered around its primary key, and thay, and any other secondary indexes are stored in B+Tree structures.

So, if you store the text in varchar(64) and index it wholly, it will take up to -the storage is in fact dynamic- 64 * charset byte size(3 for MySQL’s utf8) + some extras (length, possibility of null values, PKs on secondary indexes, etc.) * 2 (stored once “in-row” and a second time on the secondary index) * numrows.

The first thing that I would recommend is, if you are only going to have very few records different, is to manually the “id” substitution that you comment and link it with a foreign key to a separate table.

GROUP BYs are, in general, not efficient in MySQL beyond the possibility of caching its operations in memory by using a covering index. There are tricks using denormalization and precomputed values. You can even setup external software for more appropriate indexing. But with 6000 rows you may not need a more advanced solution. Just using numerical ids instead of text may make your dataset fit into memory, making it fast enough for your purposes.

Leave a Reply

Your email address will not be published.