Question :
I’ve just started using PostgreSQL 9.2 and my data consists of product prices at various points in time, usually a different price every month.
Question: Because every product can have different prices over time, I am thinking of storing all the data in a single table, each row having full details of the product, its price at that time, and the time.
id | name | color | manufacturer | date | price
123 Socks 123 black nikki 12-12-2012 123.70
124 Socks 123 black nikki 12-01-2013 145.67
125 Graphic Tee xXx red departed 13-01-2013 250.80
My reason for storing everything in one table and have duplicate data in certain columns like name, color, manufacturer
is to facilitate doing queries like fetching all products’ prices in a particular month (can return 10k rows) will avoid costly table joins if I have a table with the product details and a foreign key in another table with just the time-cost data.
Is this a good reason for doing what I’ve described? Will using hstore
make it easier to do such queries? Or did I get it all wrong?
Thanks!!
Answer :
If you mix the history data in with the current like that in order to speed up queries over a time period, you do so at the expense of slowing down queries for current data. You can add an extra column to explicitly mark the relevant rows as the current prices (and have id+flag as the PK) but that adds extra work to your business logic to both keep it maintained and respect it in all reports.
Of course if you move the price completely out of the main table you have a similar problem in finding the latest price becoming more expensive unless you have a “latest” flag or do-normalise slightly and keep a copy of the current price in the main table as well as the price history table. Personally I would do the latter, and use a trigger on that product table to automatically update the price history table when a new product is added or the price updated (I’m assuming the products tables does no see massive write activity most of the time so the performance impact of using a trigger here should be minimal) in order to remove that auditing task from your other logic to avoid bugs caused by new code turning up that forgets to update the history/audit.
Caveat: I’m answering this from a general PoV as I’ve not used postgres much in anger, so do some benchmarks before taking anything I’ve said regarding performance as fact in that system!
Table joins are not costly, compared with the alternative, which is incorrect data due to storing the same data in multiple places.
Split this into two tables, and if you need to find the most recent price for a set of products you can find the latest price efficiently with something like:
select
items.item_id ,
items.name ,
(select price
from prices
where prices.item_id = items.item_id and
prices.date <= date_to_lookup_prices_for
order by prices.date desc
limit 1) price
from
items