Database design for varying amount of relations

Posted on

Question :

I am currently trying to design a database model to store all the possible nutrition data, but for many products there isn’t as much data available as for the others, as a result many field could end up as NULL values (I do not want to use any placeholder/default values as I will need to be able to separate which ones are actually missing)

I found that PostgreSql might be a good solution for me, as I do not want to build a EAV (Entity-Attribute-Value) tables, because I have had my experience with it, it doesn’t scale very well. (If “products” could have anywhere from 0 to 200 attributes, it would quickly get into tens of millions rows).

PostgreSql has composite types and you can also store them as an array, which allows me to store key value pairs easily within a single column, as a result it is easy to lookup everything related to the “product”, I can easily do a join between tables and have the data available.

Now the question is:

Is it a good solution, what kind of obstacles could I face, is there a better solution to my problem?

Would having separate tables for example:

CREATE TABLE products (
    proteins_id FOREIGN KEY,
    lipids_id FOREIGN KEY,
    minerals_id FOREIGN KEY,
    vitamins_id FOREIGN KEY)

each foreign key referencing a table with possibly up to 50 null values in a row

OR

CREATE TYPE custom_type (
   field_1 INTEGER,
   field_2 INTEGER
)

CREATE TABLE products (
    proteins custom_type[],
    lipids custom_type[],
    minerals custom_type[],
    vitamins custom_type[]
)

I understand that the optimal solution might be somewhere in-between those two solution by finding a balance of some sort, but I really need some pointers on that, thank you guys! 🙂

EDIT: Plus another things to consider: The tables might need to be expanded quite often and it would be great if the process were least painful.

I also face the same kind of problem with other tables such as (food) “recipes” as the number of ingredients can be very varying plus I need to store the measures/quantity of the components.

Answer :

What you are describing is called a subtype. A subtype is a common data structure used for things that are very similar but have some differences. For example, and object Named Clothing can be used to describe many types of personal coverings but T-Shirts and Shorts have very different properties. With a Subtype, you store all the common properties in one table and have a related table of the properties that are different. That main table will have a FK to a Type table that will denote which of the subtype tables this object is.

Entitty: Clothing
ClothingID
ClothingTypeID -> ClothingType
Name
Size
Color

Type Entitty: ClothingType
ClothingTypeID
Name

Now for the subtype tables. You will have one for each value in the Type table.

Entity: TShirt
ClothingID (Subtypes always have the same primary key value as the parent object)
SleeveLength
ColorType

Entity: Shorts
ClothingID
HasPockets
LegLength
IsSwimSuit

If you want to model an easy-to-use database for “varying amount of relations” between tables (in the RELATIONAL model), then a ONE-TO-MANY relationship is what you should consider first (just to get “your foot in the door”). Eg if your reasoning is that each PRODUCT contains one or more (chemical) COMPOUNDS, then all you need are 2 tables. You also do NOT have to store NULLs – you just insert as many COMPOUNDS as you want/need for each PRODUCT.

However, chances are that you find some compounds that are contained in MORE than ONE product – in which case you will end up with a MANY-TO-MANY relationship between PRODUCT and COMPOUND. Thus, you need to use an intersection table. This will actually do the trick. No storing of NULLs! Just PRODUCTS, COMPOUNDS, and COMPOUNDSINPRODUCT (bad name, I know! I’m sure you will find a better one).

EXAMPLE

-- ---------------------------------------------------------------------------
--  table PRODUCTS:  
--  store all products you want to track (one entry per product)
-- ---------------------------------------------------------------------------
TABLE:PRODUCTS 
pid  name
1    first product
2    second product
3    third product
4    fourth product

-- ---------------------------------------------------------------------------
--  table COMPOUNDS
--  store all known compounds (one entry per compound)
--  NOTE: you can just use a number as cid, the letters are just for
--  adding "clarity"
-- ---------------------------------------------------------------------------
TABLE:COMPOUNDS
cid  name
L1   lipid_1 (this is supposed to be the ACTUAL name of the lipid here!)
M1   mineral_1 (this is supposed to be the ACTUAL name of the mineral here!)
P1   protein_1 (this is supposed to be the ACTUAL name of the protein here!)
V1   vitamin_1 (this is supposed to be the ACTUAL name of the vitamin here!)
L2   lipid_2 (this is supposed to be the ACTUAL name of the lipid here!)
M2   mineral_2 (this is supposed to be the ACTUAL name of the mineral here!)
P2   protein_2 (this is supposed to be the ACTUAL name of the protein here!)
V2   vitamin_2 (this is supposed to be the ACTUAL name of the vitamin here!)

-- ---------------------------------------------------------------------------
--  table: "compounds in product" CIP (or: COMPOUNDMAP 
--  store known/relevant compound information
--  the combinations of pid and cid are UNIQUE!
-- ---------------------------------------------------------------------------
TABLE:CIP 
pid  cid
1    L1    -- product 1 contains L1
1    M1    -- product 1 contains M1
1    M2    -- product 1 contains M2
2    L2    -- product 2 contains L1
2    V1
3    P2
3    P1
3    V1
4    P2

When modelling it this way, you can always add on more compounds, (or compounds that are contained in a product – in case future research shows that a product contains more compounds than we thought). And: you don’t need JSON, TYPEs, or arrays.

You can also combine this layout with the SUBTYPEs as @Joshua Guttman suggested. However, I think that you will get quite far with this approach.

Leave a Reply

Your email address will not be published. Required fields are marked *