Question :
I’m working on a store for a mobile app, the database is mysql.
I have the store items in a table called items
and that table has a column called price_point_id
(one item has 1 price point) which references another table called price_points
.
Now there is one last table called templates
which has a one-to-many relationship with the items
table (one template has many items).
I could create a middle table with 2 columns item_id
and template_id
and join them, but wouldn’t it be much faster to avoid a 4 tables join (I need data from items, price points and the middle table, but I only have the template ID to start) like in this example:
But what if I kept the item IDs in the templates table, that would avoid a join with big table (not huge but hundreads of thousands), In this case I could query the template first then use an IN statement to get the items + price points, sure it’s still a join but in much smaller tables and I’d be querying by primary keys only. Like in this example:
Do you guys see any draw backs with the second approach?
Answer :
As nbk mentions, the middle table is known as a bridge table or linking table. JSON is best for structureless or variable structure situations, but not so good when needed to be used in predicates within the context of a relational system.
By stuffing a bunch of values in a single JSON field and then using the IN
clause, you’re telling MySQL to treat the values of that JSON field as a predicate which is very similar to as if you wrote a bunch of OR
clauses with each value in it. Having a large number of predicates in this form is generally sub-performant.
Furthermore, you’re still dealing with roughly the same amount of data regardless if you store it in a JSON column or in a bridge table, because your templates
table will still be the same number of rows times the same number of item_ids
in the one-to-many relationship they naturally share. (I know there’s some data redundancy in practice, but the optimizations of the database system under the hood with normalized data probably make that a moot point from a performance perspective.)
Joining multiple tables together is not a bad thing for performance, especially only a few tables as you mentioned and especially with small tables of only a few hundred thousand rows at most. It’s more so an issue if you over-normalize the data and always need to rejoin all the tables back together in every query. You probably wouldn’t see any issues even in that case until your data grows significantly more and/or you’re joining around 10+ tables together every query (rough estimates).
So at the end of the day you’d be at best ending up with the same performance on average by leveraging a JSON column, and potentially worse off when querying it as a predicate, by sidestepping one of the key things a relational database is meant to do – join tables. 🙂
Regarding the purpose of a JSON column, it’s better used for data points that are variable in nature and structure, that generally need to be returned as part of the SELECT
list (i.e. are commonly returned as part of your queries’ data lookups) as opposed to as being part of the JOIN
, WHERE
, or HAVING
clause predicates.