Question :
I’m in the process of developing a web application, and need some help with some basic db design. Currently, I am unsure of the following three tables:
### User ###
- Id
- DetailId
- etc... (About 6 more columns)
### Contact ###
- Id
- UserId
- DetailId
- etc... (About 6 more columns)
### Detail ###
- Id
- FirstName
- LastName
- etc... (About 10 more columns)
I am starting to wonder if it may be best to just eliminate the Details table and simply denormalize those columns back into the User and Contact tables. Being that these tables will be read much more than they will be edited or updated, this denormalization should improve read performance by avoiding a table join. Is this a sound judgement or am I just falling for the root of all evil (Premature Optimization). Either way, which would be considered a better practice?
Edit/Update
Thanks for the great feedback! I realize I should have been a bit more descriptive, so here is some additional information on the use case for these tables.
-
The Details table contains mainly display information, like name, address, phone number, email, but this is NOT account information. Instead it is more like print information, that will be used when the user prints quotes, invoices, etc…
-
Details table contains only columns that are duplicated in both a user and contact in terms of printing data. So if I denormalize those columns back to the User and Contact table, any edits in column definition (or adding/removing columns) will have to be made twice.
-
The
Users
table may only have 100 records, but theContacts
andDetails
tables could potentially hold hundred of thousands, because each user can have hundreds of contacts.
But to be clear, aDetail
“should” only belong to oneContact
orUser
. However, aUser
can have manyContacts
. So If there are 100 Users and each User has 100 Contacts, that’s 10,000 Contacts, and 10,100 Details.
Answer :
I would say that having FirstName
and LastName
in the User
table is still normalized. Having the data in different tables seems excessive, and in practice you’ll have a 1-1 relation between the two.
If you expect a lot of read traffic on the User table, add appropriate indexes to that table instead.
When I think how to design such case, I prefer two-table design. General data and additional/optional information. The general table will contain only not-null columns that are required on most read operations. The additional table would include columns, which are rarely used and serve as a ‘display only’ data.
In your case, I would move 'FirstName'
and 'LastName'
to the 'user'
table, but I wouldn’t eliminate the 'Details'
table. Wide tables do have an impact on performance and readability, so I recommend to avoid them.
Since no one is advising 3 tables I will advise 3 table
Join on pk to indexed fk is very fast
OP said would search details some times
It would avoid dual updates of the data (if in two tables)
If you’re talking 100s of users, and 100s of contacts and details each…
100 x 100 x 2 x 4K of data each = 80MB of data. Heck, I’ll throw you Unicode and make it 160MB. Any RDBMS will serve any read query on this amount of data out of RAM, even when running on a Raspberry PI.
You do not have to worry about performance. Keep the design clean, i.e. 3 tables.
Added after the update to the question:
I don’t think you need 3 tables in this design (i.e. Contact and Detail should be merged), but again, with this amount of data, it really doesn’t matter. Everything after the initial query is going to come out of cache anyway.