Using Oracle hierarchical queries for transposing field value into rows

Posted on

Question :

One column in my table stores list of values separated by comma (I know it’s a poor design, but refactoring is not an option at the current moment). I seem to come up with a working solution but I’m afraid it’s not very efficient :

select distinct test_id , regexp_substr(str_data,'[^,]+', 1, level)  
from 
(
select 1 as test_id, '1,2,3' as str_data from dual 
union all 
select 2 as test_id, '4,5,6' as str_data from dual
)test_data 
connect 
by 
regexp_substr(str_data, '[^,]+', 1, level) is not null

I don’t like that I have to use distinct . Is there a better approach ?

Thank you.

Answer :

Your CONNECT BY joins every row to every other (including itself) N times. This is a multi-auto-cross join. I can see how it could behave poorly performance-wise when the number of records increases.

I suggest a query like this instead:

SQL> SELECT test_id, regexp_substr(str_data, '[^,]+', 1, lvl)
  2    FROM (SELECT 1 AS test_id, '1,2,3' AS str_data
  3            FROM dual
  4          UNION ALL
  5          SELECT 2 AS test_id, '4,5,6' AS str_data FROM dual) test_data
  6    JOIN (SELECT LEVEL lvl FROM dual CONNECT BY LEVEL <= 1000)
  7      ON lvl <= length(str_data) - length(REPLACE(str_data, ',')) + 1;

TEST_ID REGEXP_SUBSTR(STR_DATA,'[^,]+',1,LVL)
------- -------------------------------------
      1                                     1
      2                                     4
      1                                     2
      2                                     5
      1                                     3
      2                                     6

Replace 1000 by the maximum number of elements (or calculate it with an inner COUNT subquery). Each row will only be queried n times (where n is the number of commas).

Also if performance is critical, use INSTR and SUBSTR instead of REGEX.

Leave a Reply

Your email address will not be published. Required fields are marked *