Question :
I’ve inherited a database that uses sets of 32-byte hex strings to attempt to uniquely identify objects. Each string is called a fingerprint and each object can have up to 20 fingerprints.
I’m migrating the database across servers and I’ve noticed that re-populating the Fingerprint table takes hours. It seems that each fingerprint is given an integer id and is stored as a varchar, there’s an index on the fingerprint column and hence the slow inserts.
My question is: is it worth my while to refactor this design and remove the Fingerprint table altogether? I would guess that indexing on numbers rather than strings would be faster, is this correct?
If the design looks like this: (hopefully this make sense)
Object(1,*)<----->(*,*)ObjectFingerprint(*,*)<------>(1,*)Fingerprint
id object_id id
fingerprint_id fingerprint
Would it be better to use:
Object(1,*)<----->(*,*)Fingerprint
id object_id
fingerprint
I’m using a MySQL server with INNODb tables and I have roughly 1.4 million objects.
Answer :
I would recommend using the BINARY type; strings waste a lot of space, and integers require conversion.
http://dev.mysql.com/doc/refman/5.0/en/binary-varbinary.html
For the other part: why do you need the associative table? If objects and fingerprints are 1 to 1, you could either store them directly in the object table, or just have single table with the id, fingerprint, and object id FK.