When to use a “middle table” instead of storing a JSON with IDs?

Posted on

Question :

I’m working on a store for a mobile app, the database is mysql.
I have the store items in a table called items and that table has a column called price_point_id (one item has 1 price point) which references another table called price_points.
Now there is one last table called templates which has a one-to-many relationship with the items table (one template has many items).

I could create a middle table with 2 columns item_id and template_id and join them, but wouldn’t it be much faster to avoid a 4 tables join (I need data from items, price points and the middle table, but I only have the template ID to start) like in this example:

enter image description here

But what if I kept the item IDs in the templates table, that would avoid a join with big table (not huge but hundreads of thousands), In this case I could query the template first then use an IN statement to get the items + price points, sure it’s still a join but in much smaller tables and I’d be querying by primary keys only. Like in this example:
enter image description here

Do you guys see any draw backs with the second approach?

Answer :

As nbk mentions, the middle table is known as a bridge table or linking table. JSON is best for structureless or variable structure situations, but not so good when needed to be used in predicates within the context of a relational system.

By stuffing a bunch of values in a single JSON field and then using the IN clause, you’re telling MySQL to treat the values of that JSON field as a predicate which is very similar to as if you wrote a bunch of OR clauses with each value in it. Having a large number of predicates in this form is generally sub-performant.

Furthermore, you’re still dealing with roughly the same amount of data regardless if you store it in a JSON column or in a bridge table, because your templates table will still be the same number of rows times the same number of item_ids in the one-to-many relationship they naturally share. (I know there’s some data redundancy in practice, but the optimizations of the database system under the hood with normalized data probably make that a moot point from a performance perspective.)

Joining multiple tables together is not a bad thing for performance, especially only a few tables as you mentioned and especially with small tables of only a few hundred thousand rows at most. It’s more so an issue if you over-normalize the data and always need to rejoin all the tables back together in every query. You probably wouldn’t see any issues even in that case until your data grows significantly more and/or you’re joining around 10+ tables together every query (rough estimates).

So at the end of the day you’d be at best ending up with the same performance on average by leveraging a JSON column, and potentially worse off when querying it as a predicate, by sidestepping one of the key things a relational database is meant to do – join tables. 🙂

Regarding the purpose of a JSON column, it’s better used for data points that are variable in nature and structure, that generally need to be returned as part of the SELECT list (i.e. are commonly returned as part of your queries’ data lookups) as opposed to as being part of the JOIN, WHERE, or HAVING clause predicates.

Leave a Reply

Your email address will not be published. Required fields are marked *