Should I normalize a relational database design if I know the data will never be updated?

Posted on

Question :

I’m designing a PostgreSQL database that will be storing an archive of data that I am absolutely certain will never be updated, only read (these are archives will be pointing to real data at a point in the past, and since the past can’t be changed, these rows won’t either).

There will be a lot of data (up to terabytes), so I am trying to optimize the design as much as I can.

Should I worry about normalizing the data if I know I will never have to update it? Consider I could simplify the design substantially by keeping data that would be stored in two tables in a normalized version in one single table.


Answer :

It is not clear what the source of your data is, but your comment “these are archives pointing to real data at a point in the past, and since the past can’t be changed, these rows won’t either!” implies that your “new” database is a copy of subset of the “current” data.

Assuming that your are coping/moving a subset of old production data to a designated archive location, I would suggest not making any changes in the database design.

I would expect users have existing queries that they use to extract required production data. As an end user I would expect my existing queries to work on the archive data. If you make any design changes you put my existing solutions at risk.

TL:DR; Don’t make changes to archive copies of data, existing solutions for retrieving specific data, are dependent on existing structure.


Per your most recent comment This data exists but not in database format. Like say in giant CSV files in a hard disk, that can’t be queried You have a large flat file. In this case you designing a database structure makes sense. As Jonathan says in his comment You may get space savings by normalizing the data

You also need to consider the cost of your time; Is the potential space savings worth the cost of your time to achieve it? Are you moving the data so it can be queried, will the amount of time you spend result in notable resource saving for the end users? There are several business case questions that you would need to decide internally.

Now if this is a spare time project with no deadlines, it could be a great learning experience, go for it.

If you do decide get excentric with the design, I strongly suggest that you don’t make design changes after the import AND you keep your import code. Business user have a habit of finding just one more data set (disk) that needs go with the other that they promised was the only one that would ever exist.

Leave a Reply

Your email address will not be published. Required fields are marked *