Question :
I have a comma separated list of values (as one single string), which contains duplicates, and I want to remove them. I tried with REGEXP_REPLACE
but for certain values it gives me a wrong result:
/* CORRECT */
select '308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001' as str,
regexp_replace('308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001',
'([^,]+)(,*1)+',
'1') as str_uq
from dual
union all
/* FAULTY */
select '106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701' as str,
regexp_replace('106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701',
'([^,]+)(,*1)+',
'1') as str_uq
from dual
Running the above example, will provide the expected result for the first query, but for the second one, it just concatenates all the values without removing duplicates. I guess the fact that the values are starting with 1
has something to do with it.
Can you please help me out with the regex pattern? I cannot find the mistake…
Thank you!
Answer :
In your 2nd example the values are unique, but you’re searching for any repeated match and there’s multiple matches:
‘106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701′
It’s not concatenating, it’s removing some characters 🙂
Usually adding a word boundary b
should work, but not in Oracle.
I found this is working '(.+?,)1+'
, but you need to add a final comma to the string (fiddle)
/* CORRECT */
select '308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001' as str,
regexp_replace('308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001' || ',',
--'(b[^,]+)(,*1)+', --should work, doesn't in Oracle
'(.+?,)1+',
'1') as str_uq
from dual
union all
/* FAULTY */
select '106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701' as str,
regexp_replace('106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701' || ',',
--'(b[^,]+)(,*1)+', --should work, doesn't in Oracle
'(.+?,)1+',
'1') as str_uq
from dual
Indeed, as dnoeth said, it seems that I cannot achieve the expected behavior using only REGEXP, so I built a function that will do this. I’ll post it here, maybe it is useful to somebody else too.
create or replace function str_remove_duplicates (str in varchar2, separator in varchar2)
return varchar2
is
regex_pattern varchar2(5);
result varchar2(255);
begin
regex_pattern := '[^' || nvl(separator, ',') || ']+'; -- '[^,]+'
select listagg(acc, separator) within group (order by acc) into result
from (
select distinct acc from (
select regexp_substr(str, regex_pattern, 1, level) as acc from dual
connect by regexp_substr(str, regex_pattern, 1, level) is not null
)
);
return result;
end str_remove_duplicates;
All the best!
Daniel