Question :
We have a query that runs on a view and its execution time is about 6 min.
SELECT COUNT(*)
FROM someschema.item_total_violations
WHERE channel_id = 122507833
AND item_date >= TIMESTAMP'2015-02-01 00:00:00';
Its not a materialized view so there is no index on the item_date
.
The base query of the view is below:
SELECT i.id
,i.title
,i.body
,i.description
,i.item_date
,i.created_date
,i.channel_id
,i.community_id
,i.person_id
,i.message_type
,ch.client_id
,upper(p.NAME) AS person_name_upper
,upper(ch.NAME) AS channel_name_upper
,p.NAME AS person_name
,ch.NAME AS channel_name
,ch.connector_type AS channel_type
,cm.NAME AS community_name
,cm.network_id
,nvl(ct2.cnt, 0) + nvl(ct4.cnt, 0) AS total_violations_count
,nvl(ct1.cnt, 0) + nvl(ct3.cnt, 0) AS quarantined_violations_count
,nvl(ct5.cnt, 0) AS quarantined_attributes_count
,nvl(lh_info.cnt_lh, 0) AS legal_holds_count
,nvl(lh_info.cnt_lh_w_no_date, 0) AS legal_holds_count_w_no_date
,nvl(lh_info.max_hold_until, to_date('1970-01-01', 'yyyy-mm-dd')) AS legal_hold_max_hold_until
FROM item i
,channel ch
,person p
,community cm
,(
SELECT /*+ NO_MERGE */ pv.item_id
,count(*) AS cnt
FROM policy_violation pv
WHERE pv.item_id IS NOT NULL
AND pv.quarantine_status = 'QUARANTINED'
GROUP BY pv.item_id
) ct1
,(
SELECT /*+ NO_MERGE */ pv.item_id
,count(*) AS cnt
FROM policy_violation pv
WHERE pv.item_id IS NOT NULL
GROUP BY pv.item_id
) ct2
,(
SELECT /*+ NO_MERGE */ aa.item_id AS item_id
,COUNT(*) AS cnt
FROM item_attachment aa
,policy_violation pv
WHERE aa.id = pv.item_attachment_id
AND pv.quarantine_status = 'QUARANTINED'
GROUP BY aa.item_id
) ct3
,(
SELECT /*+ NO_MERGE */ aa.item_id AS item_id
,COUNT(*) AS cnt
FROM item_attachment aa
,policy_violation pv
WHERE aa.id = pv.item_attachment_id
AND pv.item_id IS NULL
GROUP BY aa.item_id
) ct4
,(
SELECT /*+ NO_MERGE */ pv.item_id
,count(DISTINCT (pv.item_attribute_id)) AS cnt
FROM policy_violation pv
WHERE pv.item_id IS NOT NULL
AND pv.item_attribute_id IS NOT NULL
AND pv.quarantine_status = 'QUARANTINED'
GROUP BY pv.item_id
) ct5
,(
SELECT lh_rel.item_id AS item_id
,count(DISTINCT (lh.id)) AS cnt_lh
,max(lh.HOLD_UNTIL) AS max_hold_until
,sum(CASE
WHEN lh.hold_until IS NULL
THEN 1
ELSE 0
END) AS cnt_lh_w_no_date
FROM LEGAL_HOLD_RELATION lh_rel
,LEGAL_HOLD lh
WHERE lh.id = lh_rel.legal_hold_id
AND lh.STATUS = 'A'
GROUP BY lh_rel.item_id
) lh_info
WHERE ch.STATUS = 'A'
AND i.channel_id = ch.id
AND i.person_id = p.id
AND i.community_id = cm.id(+)
AND i.id = ct1.item_id(+)
AND i.id = ct2.item_id(+)
AND i.id = ct3.item_id(+)
AND i.id = ct4.item_id(+)
AND i.id = ct5.item_id(+)
AND i.id = lh_info.item_id(+)
The item
table is the biggest and we have tried partitioning the table in chaannel_id
but that didn’t help our query. I understand that there is no index on item_date
in original query and it has effect on performance. Apart from trying out materialized view, is there something that I can improve on the base query of the view?
Answer :
Why do you use this view at all for this query?
The view collects violations and other stuff, but your query does not care about those at all, just the number of items. The view lists all items regardless of these because of the outer joins, so you basically perform a lot of unnecessary extra work to collect violations and other stuff (the NO_MERGE hint makes this even worse), then you throw it away and count the items based on a totally different criteria.
The query should be something like this:
select
count(*)
from
item i
join channel ch on (i.channel_id = ch.id)
join person p on (i.person_id = p.id)
where
ch.status = 'A'
and i.channel_id = 122507833
and i.item_date >= timestamp'2015-02-01 00:00:00';
Additionally, you don’t need materialized views for indexing a column. Just create the index on the column(s) of the base table(s).
On top of the must-have indexes on primary and foreign keys, another possible candidate is item_date column depending on your data distribution.
You can improve efficiency of your query by (for example) replacing:
,(
SELECT /*+ NO_MERGE */ pv.item_id
,count(*) AS cnt
FROM policy_violation pv
WHERE pv.item_id IS NOT NULL
AND pv.quarantine_status = 'QUARANTINED'
GROUP BY pv.item_id
) ct1
,(
SELECT /*+ NO_MERGE */ pv.item_id
,count(*) AS cnt
FROM policy_violation pv
WHERE pv.item_id IS NOT NULL
GROUP BY pv.item_id
) ct2
,
with:
,(
SELECT /*+ NO_MERGE */ pv.item_id
,count(*) AS cnt
,count(case when pv.quarantine_status = 'QUARANTINED' then 1 end) as cnt_quarantined
FROM policy_violation pv
WHERE pv.item_id IS NOT NULL
GROUP BY pv.item_id
) ct2
,
This rewrite can be applied for other sub-selects in your query as well. That should give us:
[...]
,nvl(ct2.cnt_quarantined, 0) + nvl(ct4.cnt_quarantined, 0) AS quarantined_violations_count
[...]
,(
SELECT /*+ NO_MERGE */ pv.item_id
,count(*) AS cnt
,count(case when pv.quarantine_status = 'QUARANTINED' then 1 end) as cnt_quarantined
FROM policy_violation pv
WHERE pv.item_id IS NOT NULL
GROUP BY pv.item_id
) ct2
,(
SELECT /*+ NO_MERGE */ aa.item_id AS item_id
,COUNT(*) AS cnt
,count(case when pv.quarantine_status = 'QUARANTINED' then 1 end) as cnt_quarantined
FROM item_attachment aa
,policy_violation pv
WHERE aa.id = pv.item_attachment_id
AND pv.item_id IS NULL
GROUP BY aa.item_id
) ct4
You might be able to further merge these two rewrite with the use of a left join, but it is difficult to say without knowing more details of you model. Anyhow, this rewrite should reduce the amount of work that your query has to do.
As an advise I would suggest that you for such large queries use ansi joins since it makes it easier to read and understand the query