Question :
(I’m particularly interested in answers that view the question below as a special case of the question: how should a RDBMS enforce structural constraints that are more specific than “one-to-many” and “many-to-many”?)
Much experimental data in biomedical research is collected in “plates” of rectangularly arrayed “wells”. These well array plates are commercially available a few standardized sizes: 2 × 3, 4 × 6, 8 × 12, 16 × 24, and 32 × 48.
Consider the following two alternatives for storing measurements from wells of 2 × 3 plates in an RDB:
-- alternative 1
CREATE TABLE measurement_foo (
plate_id FOREIGN KEY REFERENCES plate(plate_id),
plate_row CHAR(1),
plate_column INTEGER,
value REAL
);
-- alternative 2
CREATE TABLE measurement_foo (
plate_id FOREIGN KEY REFERENCES plate(plate_id),
a1 REAL,
a2 REAL,
a3 REAL,
b1 REAL,
b2 REAL,
b3 REAL
);
My instinct is to go with alternative 1: it generalizes to plates of any size, and it can be modified in a straightforward way to record multiple different measurements per well, as in
CREATE TABLE measurement (
plate_id FOREIGN KEY REFERENCES plate(plate_id),
plate_row CHAR(1),
plate_column INTEGER,
foo FLOAT,
bar FLOAT,
baz FLOAT
);
In contrast, to adapt alternative 2 to the 16 × 24-well format (for example), would entail defining a table with 1 + 384 columns: plate_id
, a01
, …, a24
, b01
, … , b24
, …, p01
, …, p24
. Moreover, a different such table would need to be defined for each type of measurement.
My main reason for even considering alternative 2 is that it alone enforces the constraint that each plate contains exactly 6 wells; alternative 1 does not.
(Also, one record of the table in alternative 2 would hold as much information as up to 6 records of the one in alternative 1, which makes me wonder whether the table in alternative 1 is fully normalized.)
For alternative 1, I can imagine ways to enforce the weaker constraint that a plate contains at most 6 wells, namely, by imposing a uniqueness constraint on (plate_id, plate_row, plate_column), and restricting columns plate_row and plate_column to the domains (‘a’, ‘b’) and (1, 2, 3), respectively. But it is still possible for wells to be missing from the data.
How should the database enforce the structural relationship between plates and wells?
Answer :
All current RDBMS’ tables can have CONSTRAINTS
on columns. These constraints are checked every time data is inserted into the table. It can also check data against other tables.
We know that each Plate Type has certain number of Rows and Columns. We can enumerate all Rows and Columns for each Plate Type. So, when data is inserted, the DB can check if a certain row/column combination exists for a given Plate Type.
Lets create a set of tables:
create table Plate_Types (
Plate_Type_id int,
Plate_Size int,
Plate_row int,
Plate_col int)
This table holds description of every Plate size like this:
Id Size Row Col
1 6 1 1 -- 2x3
1 6 1 2
...
1 6 2 3
5 1536 1 1 -- 32x48
...
5 1536 32 32
Then, in your main table from Alternative 1 we introduce a Foreign Key – a “link” to another table to check if row and column are valid for this Plate size.
create table MyTable (
well_id int,
plate_id int,
plate_size int,
row_id int,
col_id int,
value real);
ALTER TABLE MyTable
ADD CONSTRAINT FK_Plate_SizeCheck
FOREIGN KEY (Plate_size_id, Row_id, Column_id)
REFERENCES Plate_Types (Plate_type_id, Plate_row, Plate_col);
This Constraint here does the following: for every inserted row DB goes to table Plate_Types and looks for combination of Plate_size_id, Plate_row
and Plate_col
. In other words, it checks if this Plate Size can have row I and column J. If there is no match, then the DB fires an error.
Please note that this is one of several possible solutions for data integrity enforcement for your example. Medical data often comes in huge volumes and performance of this particular design is different question.
PS. This is a shortened explanation for non-developers. Code as well as table design is for concept illustration only.
I would commend alternative 1 to you.
Just as having cells as columns a01, a02 etc. is poor normalisation so, too, is having measurements as m01, m02 and so on. In your example you have disguised this by calling them foo, bar and baz, but that is what they are. To address this what you call measurement
whould more usefully be called Well
:
CREATE TABLE Well (
plate_id FOREIGN KEY REFERENCES plate(plate_id),
plate_row CHAR(1),
plate_column INTEGER,
);
and there would be new tables:
CREATE TABLE MeasurementType (
measurement_id CHAR(3) -- values "foo", "bar" and "baz"
);
Measured values are in the intersection of these two:
CREATE TABLE measurement (
-- Primary key of Well
plate_id,
plate_row CHAR(1),
plate_column INTEGER,
-- Primary key of MeasurementType
measurement_id CHAR(3),
-- the value measured from this well
measured_value FLOAT
);