How can I increase performance for select in select statement

Posted on

Question :

The item table is like this

id    category_id    name    registered

I have to show total number of items from the beginning. Suppose between 2013-05-01 to 20-05–05, I have to count all the items before 2013-05-01 and add that to count of 2013-05-01 ‘s items. The result have to be grouped by registered and category_id. I have to do this in one query. This is my query

SELECT registered, me.category_id, 
    COUNT( * ) + (select count(*) from items where me.category_id = category_id 
    and registered < '2013-05-01 00:00:00') AS Value FROM items me

WHERE (
    (
      category_id IS NOT NULL
      AND 
     (me.registered BETWEEN '2013-05-01 00:00:00' AND '2013-05-02 00:00:00') 
    ) 
) 
GROUP BY registered, category_id

It takes almost 28 secs to generate the result. How can i increase the performance.

Update
I am using pgSQL. The registered and category_id is indexed. There are 568814 rows. I will be also very grateful, if any one can tell me that my sql is correct or not.

Update
Sorry for late reply.

CREATE TABLE items
(
id uuid NOT NULL DEFAULT uuid_generate_v4(),
registered timestamp without time zone NOT NULL,
modified timestamp without time zone NOT NULL,
published timestamp without time zone,
owner_id uuid,
status_id integer NOT NULL,
category_id uuid NOT NULL,
location_id uuid,
title text,
description text,
quantity integer NOT NULL DEFAULT 1,
unit text,
price_low bigint,
price bigint,
currency text NOT NULL DEFAULT 'BDT'::text,
ip text,
attributes_json text,
from_mobile boolean NOT NULL DEFAULT false,
CONSTRAINT items_pkey PRIMARY KEY (id),
CONSTRAINT items_fk_category_id FOREIGN KEY (category_id)
        REFERENCES categories (id) MATCH SIMPLE
        ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT items_fk_location_id FOREIGN KEY (location_id)
        REFERENCES locations (id) MATCH SIMPLE
        ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT items_fk_owner_id FOREIGN KEY (owner_id)
        REFERENCES users (id) MATCH SIMPLE
        ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT items_fk_status_id FOREIGN KEY (status_id)
        REFERENCES item_statuses (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE
)

Answer :

  1. What kind of column is [registered] -> datetime or varchar

  2. Why are you grouping by the full value of the [registered] column ?
    Ex. If you have the following values for this column:

        2013-05-01 00:00:00
        2013-05-01 00:00:01
        2013-05-01 00:00:02
        2013-05-01 00:00:03
        2013-05-01 00:00:04
    

    … and you group by them, you will have 5 rows in the output.

    Grouping at the minute or hour level will get you less results -> faster query execution time
    … if you have 568.814 rows, how many rows does your query return ? I suppose the result set is very close to 568.814 rows 🙂

  3. In the current form, for each row generated by the primary query, the subquery has to be run. Try to avoid that …

Here my version of your query. Based on the same input, the output will be the same:

SELECT
  t1.registered,
  t1.category_id,
  (COUNT(*) + COALESCE(q2.c2,0)) AS Value
FROM
  items t1
  LEFT JOIN
  (
    SELECT
      t2.category_id,
      COUNT(*) AS c2
    FROM
      items t2
    WHERE 
      t2.registered < '2013-05-01 00:00:00'
    GROUP BY
      t2.category_id
  ) q2
  ON  t1.category_id = q2.category_id
WHERE
      t1.registered BETWEEN '2013-05-01 00:00:00' 
                        AND '2013-05-02 00:00:00'
GROUP BY
  t1.registered,
  t1.category_id,
  q2.c2 ;

Output

REGISTERED              CATEGORY_ID     VALUE
---------------------------------------------
May, 01 2013 00:00:00+0000  1           5
May, 01 2013 01:00:00+0000  1           5
May, 01 2013 02:00:00+0000  2           6
May, 01 2013 07:00:00+0000  3           3
May, 01 2013 08:00:00+0000  5           3
May, 01 2013 09:00:00+0000  1           5

Query Plan

HashAggregate (cost=35.27..35.32 rows=4 width=20)
-> Hash Right Join (cost=33.63..35.23 rows=4 width=20)
Hash Cond: (t2.category_id = t1.category_id)
-> HashAggregate (cost=20.82..21.49 rows=67 width=4)
-> Bitmap Heap Scan on items t2 (cost=6.27..19.52 rows=260 width=4)
Recheck Cond: (registered < '2013-05-01 00:00:00'::timestamp without time zone)
-> Bitmap Index Scan on idx_reg (cost=0.00..6.20 rows=260 width=0)
Index Cond: (registered < '2013-05-01 00:00:00'::timestamp without time zone)
-> Hash (cost=12.76..12.76 rows=4 width=12)
-> Bitmap Heap Scan on items t1 (cost=4.29..12.76 rows=4 width=12)
Recheck Cond: ((registered >= '2013-05-01 00:00:00'::timestamp without time zone) AND (registered <= '2013-05-02 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on idx_reg (cost=0.00..4.29 rows=4 width=0)
Index Cond: ((registered >= '2013-05-01 00:00:00'::timestamp without time zone) AND (registered <= '2013-05-02 00:00:00'::timestamp without time zone))

For anyone interested, here is the sql fiddle page with sample data:
SQL-Fiddle-2

SQL Fiddle

with items as (
    select registered, category_id
    from items
    where
        category_id is not null
        and
        registered < '2013-05-02'
), items_restricted as (
    select registered, category_id
    from items
    where registered >= '2013-05-01'
), category_id as (
    select distinct category_id
    from items_restricted
)
select distinct ir.registered, ir.category_id, "value"
from
    items_restricted ir
    inner join
    (
        select
            category_id, count(*) as value
        from items
        where exists (
            select category_id
            from category_id
            where category_id = items.category_id
        )
        group by category_id
    ) s using(category_id)
order by registered, category_id

Leave a Reply

Your email address will not be published. Required fields are marked *