Optimize schema and query for overlapping date ranges query

Posted on

Question :

We have a database where we track the price of products from different sellers. If I strip everythin else it looks like this:

DB diagram

I need to make a report where given a date range I have to return a row for every day and the prices for every seller. There will be days that some seller don’t have a price for a product and there will be days that a seller will have more than one price for a given product.

I’m doing it by creating a temp table with the date ranges then join them with the sellers and prices for this but once product_price have a couple thousands rows it get VERY slow (about 4” to return a 200 rows) and I don’t know how to optimize it to get it fast enought.

I can’t think a workaround for the temp table nor which index should I create.

schema and query here: DB-fiddle

Answer :

You have only primary keys for tables and indexes related to the FK restrictions but no indexes related to the queries you run.

SELECT qtm.date_i
     , s.name
     , pp.* 
  FROM query_time_window   AS qtm
  JOIN seller              AS s   -- some ON condition is strictly recommended
  LEFT JOIN product_price  AS pp  ON pp.seller_id = s.id -- most conditions are moved to WHERE
 WHERE ( pp.product_id = 1 OR pp.product_id IS NULL )
   AND ( pp.created_at <= qtm.date_f   -- I've changed '!a>b' to 'a<=b'
         AND ( pp.cancelled_at IS NULL 
               OR pp.cancelled_at >= qtm.date_i) -- I've changed '!a<b' to 'a>=b'
 ORDER BY qtm.date_i, s.id;     

Table query_time_window should have an index (date_i, date_f) or (date_f, date_i)
Table product_price should have an additional index (product_id, seller_id, created_at, cancelled_at) – an exact order of the columns depend on the real data. You can try different combinations and choose the fastest one.

Leave a Reply

Your email address will not be published. Required fields are marked *