Help to understand explain plan in Oracle

Posted on

Question :

I am running a query in some big tables, and although it runs fine even tough is a lot of data, I’d like to understand what part of it weighs on the execution. Unfortunately I am not too good with explain plans so I call for help.

Here is some data about these tables:

  • history_state_table 7.424.65 rows (of which only 13.412 are left after t1.alarm_type = 'AT1' )
  • costumer_price_history 448.284.169 rows
  • cycle_table 215 rows

This would be the query (don’t mind the logic, is just for the reference):

SELECT t1.id_alarm, t2.load_id, t2.reference_date
  FROM history_state_table t1,
       (SELECT   op_code, contract_num,
                 COUNT (DISTINCT id_ponto) AS num_pontos,
                 COUNT
                    (DISTINCT CASE
                        WHEN vlr > 0
                           THEN id_ponto
                        ELSE NULL
                     END
                    ) AS bigger_than_zero,
                 MAX (load_id) AS load_id,
                 MAX (reference_date) AS reference_date
            FROM costumer_price_history
           WHERE load_id IN
                            (42232, 42234, 42236, 42238, 42240, 42242, 42244) /* arbitrary IDs depending on execution*/
             AND sistema = 'F1'          /* Hardcoded filters */
             AND rec_type = 'F3'         /* Hardcoded filters */
             AND description = 'F3'      /* Hardcoded filters */
             AND extract_type IN
                    ('T1', 'T2', 'T3')
        GROUP BY op_code, contract_num) t2
 WHERE t1.op_code = t2.op_code
   AND t1.contract_num = t2.contract_num
   AND t1.alarm_type = 'AT1'
   AND t1.alarm_status = 'DONE'
   AND (   (    t1.prod_type = 'COMBO'
            AND t2.bigger_than_zero = t2.num_pontos - 1
           )
        OR (    t1.prod_type != 'COMBO'
            AND t2.bigger_than_zero = t2.num_pontos
           )
       )
       /* arbitrary filter depending on execution*/
   AND t1.data_tratado BETWEEN (SELECT data_inicio
                                  FROM cycle_table
                                 WHERE id_ciclo = 160) AND (SELECT data_fim
                                                              FROM cycle_table
                                                             WHERE id_ciclo =
                                                                           160)

And finally the explain plan:

Plan
SELECT STATEMENT  ALL_ROWSCost: 5,485                           
    13 NESTED LOOPS                         
        7 NESTED LOOPS  Cost: 5,483  Bytes: 115  Cardinality: 1                     
            5 VIEW  Cost: 12  Bytes: 59  Cardinality: 1                 
                4 SORT GROUP BY  Cost: 12  Bytes: 85  Cardinality: 1            
                    3 INLIST ITERATOR       
                        2 TABLE ACCESS BY INDEX ROWID TABLE RAIDPIDAT.COSTUMER_PRICE_HISTORY Cost: 11  Bytes: 85  Cardinality: 1    
                            1 INDEX RANGE SCAN INDEX RAIDPIDAT.IDX_COSTUMER_PRICE_HISTORY_2 Cost: 10  Cardinality: 3  
            6 INDEX RANGE SCAN INDEX RAIDPIDAT.IDX_HISTORY_STATE_TABLE_1TPALM Cost: 662  Cardinality: 102,068               
        12 TABLE ACCESS BY INDEX ROWID TABLE RAIDPIDAT.HISTORY_STATE_TABLE Cost: 5,471  Bytes: 56  Cardinality: 1                   
            9 TABLE ACCESS BY INDEX ROWID TABLE RAIDPIDAT.CYCLE_TABLE Cost: 1  Bytes: 12  Cardinality: 1                
                8 INDEX UNIQUE SCAN INDEX (UNIQUE) RAIDPIDAT.PK_CYCLE_TABLE Cost: 0  Cardinality: 1             
            11 TABLE ACCESS BY INDEX ROWID TABLE RAIDPIDAT.CYCLE_TABLE Cost: 1  Bytes: 12  Cardinality: 1               
                10 INDEX UNIQUE SCAN INDEX (UNIQUE) RAIDPIDAT.PK_CYCLE_TABLE Cost: 0  Cardinality: 1    

Mind that I am not asking “how to rewrite it more efficiently”, but rather how do I find witht the explain plan what the most costly operation there. Meanwhile I am reading about it, but I’d appreciate some help.

Answer :

Explain plan does not tell you what is actually the most costly “operation”. The “Cost” column is a guess – it is a value estimated by optimizer. So is “Cardinality” column and “Bytes” column. http://docs.oracle.com/cd/B28359_01/server.111/b28274/ex_plan.htm#i18300

In your example, your optimizer tells you: I decide to use this plan because I guess that looping would cost about 5,483. And I hope this would be the most costly part of the execution, but I can’t guarantee this.

The same applies recursively to all the depths of the tree.

If you go in-depth to the lowest levels (that is by intuition most-looped, most-executed levels) you see that the operation that especially sticks out, both in terms of expected cost and expected number of elements, is the

6 INDEX RANGE SCAN INDEX RAIDPIDAT.IDX_HISTORY_STATE_TABLE_1TPALM Cost: 662  Cardinality: 102,068 

So, optimizer guessed that optimal execution of this query is to loop a lot around a poor workhorse RAIDPIDAT.IDX_HISTORY_STATE_TABLE_1TPALM. I really cannot see which part of your query directly relates to it, but I suspect t1.data_tratado condition. And, again, I cannot see if it is really the most costly part.

I’ll try to translate the syntax of loops in the explain plan to procedural pseudo-code:

/* begin step 13 (by "step 13" I mean a line that reads "   13 NESTED LOOPS") */
  /* begin step 7 */
    do step 5
    myresult = rows from step 5
    for each row from myresult {
       do step 6
       for each row from step 6 {
           join to a row from myresult the matching row from step 6
       }
    }
  /* end step 7 */
  for each row from myresult {
     do step 12
     for each row from step 12 {
         join to a row from myresult the matching row from step 12
     }
  }
/* end step 13 */
return myresult

Seems complicated, but really aim of each “nested loop” is to create a join (a single table made of two tables) in the most naive way, a loop-inside-a-loop.

The explain plan is just a prediction of the join methods that will be used when executing a query. This can make it hard to infer which step is the most time consuming, as a different plan may be followed when the statement is executed.

To get actual stats about how long each step takes you’ll need to run an sql trace of the statement and review the trace file – manually or using a tool such as tkprof. This will show you how many rows each step processed and how long it took.

That said, looking at the Cardinality listed at the end of each line will give an indication of how many rows are to be processed. Steps processing more rows are likely to take longer to execute as there’s more work to do.

So in your example line 6 INDEX RANGE SCAN INDEX RAIDPIDAT.IDX_HISTORY_STATE_TABLE_1TPALM Cost: 662 Cardinality: 102,068 which is expected to process 102,068 rows is likely to be the most expensive as the other steps are predicting one row. This is only true if these cardinality estimates are accurate however; you’ll need to verify that these cardinalities match the actual rows returned. The easiest way to do this is via an sql trace as stated above.

Leave a Reply

Your email address will not be published. Required fields are marked *