Storing data in PostgreSQL: One table or two?

Posted on

Question :

I’ve just started using PostgreSQL 9.2 and my data consists of product prices at various points in time, usually a different price every month.

Question: Because every product can have different prices over time, I am thinking of storing all the data in a single table, each row having full details of the product, its price at that time, and the time.

id  | name           | color  | manufacturer | date      | price
123   Socks 123        black    nikki          12-12-2012   123.70
124   Socks 123        black    nikki          12-01-2013   145.67
125   Graphic Tee xXx  red      departed       13-01-2013   250.80

My reason for storing everything in one table and have duplicate data in certain columns like name, color, manufacturer is to facilitate doing queries like fetching all products’ prices in a particular month (can return 10k rows) will avoid costly table joins if I have a table with the product details and a foreign key in another table with just the time-cost data.

Is this a good reason for doing what I’ve described? Will using hstore make it easier to do such queries? Or did I get it all wrong?

Thanks!!

Answer :

If you mix the history data in with the current like that in order to speed up queries over a time period, you do so at the expense of slowing down queries for current data. You can add an extra column to explicitly mark the relevant rows as the current prices (and have id+flag as the PK) but that adds extra work to your business logic to both keep it maintained and respect it in all reports.

Of course if you move the price completely out of the main table you have a similar problem in finding the latest price becoming more expensive unless you have a “latest” flag or do-normalise slightly and keep a copy of the current price in the main table as well as the price history table. Personally I would do the latter, and use a trigger on that product table to automatically update the price history table when a new product is added or the price updated (I’m assuming the products tables does no see massive write activity most of the time so the performance impact of using a trigger here should be minimal) in order to remove that auditing task from your other logic to avoid bugs caused by new code turning up that forgets to update the history/audit.

Caveat: I’m answering this from a general PoV as I’ve not used postgres much in anger, so do some benchmarks before taking anything I’ve said regarding performance as fact in that system!

Table joins are not costly, compared with the alternative, which is incorrect data due to storing the same data in multiple places.

Split this into two tables, and if you need to find the most recent price for a set of products you can find the latest price efficiently with something like:

select
  items.item_id ,
  items.name    ,
  (select   price
   from     prices
   where    prices.item_id = items.item_id and
            prices.date    <= date_to_lookup_prices_for
   order by prices.date desc
   limit    1) price
 from
   items

Leave a Reply

Your email address will not be published. Required fields are marked *