Remove a character from a records enclosed between double quotes

Posted on

Question :

I have a table in which every row have data like:

0150566115,"HEALTH 401K","IC,ON","ICON HEALTH 401K",,,1,08/21/2014

What I want is to remove every comma (,) that is enclosed between double quotes ” “. And then split the rest of the string with comma (,)

I don’t want to do it checking every single character setting flags for start and end of double quotes.

Can i implement some sort of regex?

Is there a simple way?

What I have tried so far is just to split the string based on the comma (,) but it is also splitting values inside the quotes.

This is serving the purpose: how can I implement this in complete table (currently it is working if I have only one instance of double quotes block)?

Declare @Query nvarchar(max) 

Set @Query = 'Item1,Item2,"Item,Demo,3",New'

Declare @start int, @len int
SELECT @start = PATINDEX('%"%"%', @Query) + 1

print @start

select @len = CHARINDEX('"', SUBSTRING(@Query, @start, LEN(@Query))) - 1

select 
        SUBSTRING(@Query, 1, @start - 2) +
        REPLACE((SUBSTRING(@Query, @start, @len)), ',', '') +
        SUBSTRING(@Query, @start + @len + 1, LEN(@Query))

This is the function I am using to split

ALTER FUNCTION [dbo].[fnSplit](
    @sInputList VARCHAR(8000) -- List of delimited items
  , @sDelimiter VARCHAR(8000) = ',' -- delimiter that separates items
) RETURNS @List TABLE (id int, item VARCHAR(8000))

BEGIN
DECLARE @sItem VARCHAR(8000)
Declare @ID as int
Set @ID=0
WHILE CHARINDEX(@sDelimiter,@sInputList,0) <> 0
 BEGIN
 SELECT
  @sItem=RTRIM(LTRIM(SUBSTRING(@sInputList,1,CHARINDEX(@sDelimiter,@sInputList,0)-1))),
  @sInputList=RTRIM(LTRIM(SUBSTRING(@sInputList,CHARINDEX(@sDelimiter,@sInputList,0)+LEN(@sDelimiter),LEN(@sInputList))))
 Set @ID=@ID+1
 IF LEN(@sItem) > 0
  INSERT INTO @List SELECT @ID,@sItem
 END

IF LEN(@sInputList) > 0
 INSERT INTO @List SELECT @ID,@sInputList -- Put the last item in
RETURN
END

The problem is that this is an application in MS Access, which narrows down my possibilities. All I can do is passing the name of the file which is going to get imported by SQL Server.

Answer :

Well, this is really awful, but okay, if you are going to refuse to even consider better alternatives… first, create a set-based split function that will track the order of the string (your looping function is really not optimal):

CREATE FUNCTION [dbo].[SplitStrings_Ordered]
(
    @List       VARCHAR(8000),
    @Delimiter  VARCHAR(255)
)
RETURNS TABLE
AS
    RETURN (SELECT [Index] = ROW_NUMBER() OVER (ORDER BY Number), Item 
    FROM (SELECT Number, Item = SUBSTRING(@List, Number, 
      CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)
     FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
      FROM sys.all_columns) AS n(Number)
      WHERE Number <= CONVERT(INT, LEN(@List))
      AND SUBSTRING(@Delimiter + @List, Number, LEN(@Delimiter)) = @Delimiter
    ) AS y);

(If you have a numbers table in this database, use that instead of sys.all_columns, and add WITH SCHEMABINDING to the function definition.)

Now, let’s look at a few examples of strings with commas embedded inside double quotes, and removing those before splitting and re-concatenating:

DECLARE @x TABLE(n VARCHAR(8000));

INSERT @x VALUES
('0150566115,"HEALTH 401K","IC,ON","ICON HEALTH 401K",,,1,08/21/2014'),
('0150566115,HEALTH 401K,"IC,ON","ICON HEALTH 401K",,,1,"08/21/2014"'),
('"01505,66115,","HEALTH 401K","IC,ON","ICON HEALTH 401K",,,1,08/21/2014');

;WITH x AS
(
  SELECT x.n, s.[Index], s = REPLACE(s.Item, ',', 
    CASE s.[Index]%2 WHEN 0 THEN '' ELSE ',' END)
  FROM @x AS x 
  CROSS APPLY dbo.SplitStrings_Ordered(x.n, '"') AS s
)
SELECT x.n, fixed = (SELECT x2.s 
  FROM x AS x2 
  WHERE x2.n = x.n
  ORDER BY [Index]
  FOR XML PATH, TYPE).value(N'.[1]',N'varchar(max)')
FROM x
GROUP BY x.n;

Results in the fixed column for all three strings:

0150566115,HEALTH 401K,ICON,ICON HEALTH 401K,,,1,08/21/2014
0150566115,HEALTH 401K,ICON,ICON HEALTH 401K,,,1,08/21/2014
0150566115,HEALTH 401K,ICON,ICON HEALTH 401K,,,1,08/21/2014

Now, you can feed those results back into the split function, using the comma this time, depending on your ultimate goal. The question seemed to resolve only around being able to ignore the double quotes and any commas contained only inside double-quote pairs.

For more on splitting and concatenating strings:

For more on numbers tables and generating sets without loops:

RegEx is the only serious way to do this. Here’s a simple example using the RegEx functions that come installed for free if you have Master Data Services installed. Otherwise google around for SQL Server CLR RegEx, the expression used should be transferrable:

-- Find and replace commas in quotes
USE MDS
GO

DECLARE @t TABLE( yourText VARCHAR(MAX), newText VARCHAR(MAX) );

INSERT @t ( yourText ) 
VALUES
    ( '0150566115,"HEALTH 401K","IC,ON","ICON HEALTH 401K",,,1,08/21/2014' ),
    ( '0150566115,HEALTH 401K,"IC,ON","ICON HEALTH 401K",,,1,"08/21/2014"' ),
    ( '"01505,66115,","HEALTH 401K","IC,ON","ICON HEALTH 401K",,,1,08/21/2014' )

-- Remove commas within quotes
UPDATE @t
SET newText = mdq.RegexReplace( yourText, '("[^",]+),([^"]*")', '${1}${2}', 0 )

UPDATE @t
SET newText = REPLACE( mdq.RegexReplace( newText, '("[^",]+),([^"]*")', '${1}${2}', 0 ), '"', '' )

-- Show the results
SELECT *, DATALENGTH( yourText ) dl_before, DATALENGTH( newText ) dl_after
FROM @t
GO

enter image description here

Leave a Reply

Your email address will not be published. Required fields are marked *