Question :
One column in my table stores list of values separated by comma (I know it’s a poor design, but refactoring is not an option at the current moment). I seem to come up with a working solution but I’m afraid it’s not very efficient :
select distinct test_id , regexp_substr(str_data,'[^,]+', 1, level)
from
(
select 1 as test_id, '1,2,3' as str_data from dual
union all
select 2 as test_id, '4,5,6' as str_data from dual
)test_data
connect
by
regexp_substr(str_data, '[^,]+', 1, level) is not null
I don’t like that I have to use distinct
. Is there a better approach ?
Thank you.
Answer :
Your CONNECT BY
joins every row to every other (including itself) N times. This is a multi-auto-cross join. I can see how it could behave poorly performance-wise when the number of records increases.
I suggest a query like this instead:
SQL> SELECT test_id, regexp_substr(str_data, '[^,]+', 1, lvl)
2 FROM (SELECT 1 AS test_id, '1,2,3' AS str_data
3 FROM dual
4 UNION ALL
5 SELECT 2 AS test_id, '4,5,6' AS str_data FROM dual) test_data
6 JOIN (SELECT LEVEL lvl FROM dual CONNECT BY LEVEL <= 1000)
7 ON lvl <= length(str_data) - length(REPLACE(str_data, ',')) + 1;
TEST_ID REGEXP_SUBSTR(STR_DATA,'[^,]+',1,LVL)
------- -------------------------------------
1 1
2 4
1 2
2 5
1 3
2 6
Replace 1000 by the maximum number of elements (or calculate it with an inner COUNT
subquery). Each row will only be queried n times (where n is the number of commas).
Also if performance is critical, use INSTR
and SUBSTR
instead of REGEX
.