Question :
I’m designing a relational database to use in MySQL. I have the next situation: In one table there are fields that in the most of situations are NULL
. The fields would be like:
Table name: tabla
Fields:
idtabla not null,
text (varchar(n)) not null,
image (mediumblob) {this can be null}
The doubt: Is better to create another table and when be necessary to use the image, query the new table?
Why is your solution, better for design? MySQL’s time response, ease to make queries, etc?
Answer :
You are touching on an almost philosophical argument here: should NULLable values be allowed at all given they violate the “closed world” assumption of the model relational databases are derived from (see relevant sections of http://en.wikipedia.org/wiki/Null_(SQL) and a number of other questions here such as Why shouldn’t we allow NULLs? for more notes on that). To avoid unknown values splitting properties that might be unknown (or simply not applicable in all circumstances) into their own relation (table) so that you don’t have values that are unknown but instead if a value is unknown it simply doesn’t exist. While this satisfies the theory, there are practical implications that make it less desirable:
- A
JOIN
operation is not free in most RDBMSs, searching for the properties in the extra table(s) increases the work the engine needs to do to satisfy your queries. - If the existence of properties is not codependent (i.e. each proerty can be unknown irrespective of the others) then to carry this implementation to its conclusion you sometimes end up needing a table for each property.
- The extra joins add to the complexity of your queries and can reduce maintainability.
- Updates also become less intuitive in places: blanking a property now becomes a DELETE operation and updating one may be either an
INSERT
or anUPDATE
.
Of course there are circumstances where that first point is reversed and breaking properties off improves efficiency:
- Most RBDMSs use page based storage and breaking optional information off from core data means you can fit more core data rows in a given page. Depending on your data size, RAM, and storage infrastructure this could be significant in reducing the amount of IO required for some large queries (but often isn’t: try not to “over optimise” around this point, at least without running good benchmarks to make sure things are improving not worsening).
This is of course assuming that your queries are designed so they pick up only what they need so the engine won’t need to care about the drawing in the extra information anyway. - Similarly there is less writing to do if you are only updating core information in an
INSERT
orUPDATE
, and if you have constraints or triggers around those extra properties this processing is potentially avoided on each row change.
Your example looks like it is storing images (or at least large bits of data in blob type columns. There are two extra considerations here:
- Most database engines store large data “off page” anyway, so if you avoid
SELECT *
you get the rows-per-page bonuses “for free” anyway. - Does moving them elsewhere allow you to share the large data between multiple entities rather than storing the same blob more than once, or is the information (if present) unique to a given row?
tl;dr: So no hard and fast answer I’m afraid. My advice would be to do whatever best fits your conceptual model of your data and therefore requires less thought on your part to maintain. This usually, but not always, means using NULLable columns rather than separate tables. Unless your data is truly massive the performance differences are going to be negligible.
Whether a separate table for images is helpful depends on the cardinality of the relationship between the images and the rest of the attributes in the original table:
A separate image-table is justifiable if either:
- multiple rows may contain the same image: enhances normalization since the image only has to be stored once in the image table instead of multiple times in the original table
- an image can exist without the rest of the attributes; e.g. if the image for a row is exchanged/deleted, this could allow you to keep a history
If the images are unique per row, I would not worry about null fields, as you would have null foreign keys in those cases with a separate image table.
If you decide to keep the images in the original table, it may be beneficial to only retrieve them if needed (as is the case with any data, so preferably do not use SELECT *
unless you need all the data available).
If you create a new table then you would probably need a “JOIN” to get the image data. Joins can decrease performance of RDBMS. But in your case as there would be only one join so it won’t have that much effect on over-all query performance.