I am looking for a design pattern that would allow me to capture metadata relating to columns on a table. More specifically, here is my situation.
I have a database with fields in a
field table, crops in a
crop table and prediction models in a
model table. Each
field can grow one
crop and subscribe to multiple
models. Each model offers different kind of predictions for a specific
crop allows a particular set of inputs from the farmer. For instance, all crops will expect a planting date while only some of them will expect a flowering date. Therefore, I create a
generic_input table and crop-specific tables (
soy_input, etc.) to capture that data. Each column of those table is an input that needs to be captured.
Here comes the interesting part. Each
model also specifies a set of required crop inputs. That is, inputs that need to be set in order for that model to be able to compute its predictions. That means I would essentially need to reference particular columns of my input tables which is not really possible.
So far, the only real solution I found would be the Entity-Attribute-Value pattern. However, I’ve read so many posts warning against its use (example) that I am quite reluctant to go down that road. Another option would be to create an enum or a table listing all my inputs but then if new inputs need to be added or if we add a new crop to the system, we need to update that list as well: that results in some sort of data duplication or poor relational consistency.
Are there any other pattern that could support my use case? What would be an appropriate solution? Is this a case where EVA would actually be a good solution?
EAV is a solution when the values to be stored cannot be identified during design-time but are captured at run-time. It sounds like your situation is slightly different – you know all the values to be captured but not how to store them.
One relational approach is to use entity inheritance. This is what you describe with
potato_input etc. Since each crop will have exactly one row in the generic table and exactly one row in a single specific table this inheritance can be flattened into a view for consumption by the models.
If you don’t know until run-time which values each model will consume you can introduce a
model_attribute table to list each model’s requirements. Use dynamic SQL to construct the SELECT from the above flattened view.
The data could be stored in general-purpose columns. The input table then becomes
create table input(crop, value_1, value_2, value_3, .. value_n). Further meta-data maps crops to columns’ meanings e.g. “for wheat value_1 is planting date; for poppies value_1 is flowering date”. This design is good if you really like logic puzzles and debugging.
It is only a small step to unpivot that table to get
input(crop, value_number, value). Then substitue the column meanings for value_number to give
input(crop, attribute, value) and you have EAV.
My experience with EAV is that it is really rubbish for queries with predicates (“select planting_date where flowering_date = 1st August”) but are fine for retrieving all values for a thing (“select * from EAV where entity_id = xyz”). It’s left to the application to make sense of the bucket of values that are returned, but it will do a better job than SQL.
EAV is a dynamic schema solution. Others that are better supported by DBMS are XML and JSON. These allow each row of a table to hold values that other rows don’t. Depending on the DB software the JSON/XML can be parsed as part of the SQL query. You don’t say, but maybe your models accept inputs in one of these formats, too? Of the two JSON is more compact and XML has a richer parsing & query ecosystem.