Question :
I have a table in which every row have data like:
0150566115,"HEALTH 401K","IC,ON","ICON HEALTH 401K",,,1,08/21/2014
What I want is to remove every comma (,
) that is enclosed between double quotes ” “. And then split the rest of the string with comma (,
)
I don’t want to do it checking every single character setting flags for start and end of double quotes.
Can i implement some sort of regex?
Is there a simple way?
What I have tried so far is just to split the string based on the comma (,
) but it is also splitting values inside the quotes.
This is serving the purpose: how can I implement this in complete table (currently it is working if I have only one instance of double quotes block)?
Declare @Query nvarchar(max)
Set @Query = 'Item1,Item2,"Item,Demo,3",New'
Declare @start int, @len int
SELECT @start = PATINDEX('%"%"%', @Query) + 1
print @start
select @len = CHARINDEX('"', SUBSTRING(@Query, @start, LEN(@Query))) - 1
select
SUBSTRING(@Query, 1, @start - 2) +
REPLACE((SUBSTRING(@Query, @start, @len)), ',', '') +
SUBSTRING(@Query, @start + @len + 1, LEN(@Query))
This is the function I am using to split
ALTER FUNCTION [dbo].[fnSplit](
@sInputList VARCHAR(8000) -- List of delimited items
, @sDelimiter VARCHAR(8000) = ',' -- delimiter that separates items
) RETURNS @List TABLE (id int, item VARCHAR(8000))
BEGIN
DECLARE @sItem VARCHAR(8000)
Declare @ID as int
Set @ID=0
WHILE CHARINDEX(@sDelimiter,@sInputList,0) <> 0
BEGIN
SELECT
@sItem=RTRIM(LTRIM(SUBSTRING(@sInputList,1,CHARINDEX(@sDelimiter,@sInputList,0)-1))),
@sInputList=RTRIM(LTRIM(SUBSTRING(@sInputList,CHARINDEX(@sDelimiter,@sInputList,0)+LEN(@sDelimiter),LEN(@sInputList))))
Set @ID=@ID+1
IF LEN(@sItem) > 0
INSERT INTO @List SELECT @ID,@sItem
END
IF LEN(@sInputList) > 0
INSERT INTO @List SELECT @ID,@sInputList -- Put the last item in
RETURN
END
The problem is that this is an application in MS Access, which narrows down my possibilities. All I can do is passing the name of the file which is going to get imported by SQL Server.
Answer :
Well, this is really awful, but okay, if you are going to refuse to even consider better alternatives… first, create a set-based split function that will track the order of the string (your looping function is really not optimal):
CREATE FUNCTION [dbo].[SplitStrings_Ordered]
(
@List VARCHAR(8000),
@Delimiter VARCHAR(255)
)
RETURNS TABLE
AS
RETURN (SELECT [Index] = ROW_NUMBER() OVER (ORDER BY Number), Item
FROM (SELECT Number, Item = SUBSTRING(@List, Number,
CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)
FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
FROM sys.all_columns) AS n(Number)
WHERE Number <= CONVERT(INT, LEN(@List))
AND SUBSTRING(@Delimiter + @List, Number, LEN(@Delimiter)) = @Delimiter
) AS y);
(If you have a numbers table in this database, use that instead of sys.all_columns
, and add WITH SCHEMABINDING
to the function definition.)
Now, let’s look at a few examples of strings with commas embedded inside double quotes, and removing those before splitting and re-concatenating:
DECLARE @x TABLE(n VARCHAR(8000));
INSERT @x VALUES
('0150566115,"HEALTH 401K","IC,ON","ICON HEALTH 401K",,,1,08/21/2014'),
('0150566115,HEALTH 401K,"IC,ON","ICON HEALTH 401K",,,1,"08/21/2014"'),
('"01505,66115,","HEALTH 401K","IC,ON","ICON HEALTH 401K",,,1,08/21/2014');
;WITH x AS
(
SELECT x.n, s.[Index], s = REPLACE(s.Item, ',',
CASE s.[Index]%2 WHEN 0 THEN '' ELSE ',' END)
FROM @x AS x
CROSS APPLY dbo.SplitStrings_Ordered(x.n, '"') AS s
)
SELECT x.n, fixed = (SELECT x2.s
FROM x AS x2
WHERE x2.n = x.n
ORDER BY [Index]
FOR XML PATH, TYPE).value(N'.[1]',N'varchar(max)')
FROM x
GROUP BY x.n;
Results in the fixed
column for all three strings:
0150566115,HEALTH 401K,ICON,ICON HEALTH 401K,,,1,08/21/2014
0150566115,HEALTH 401K,ICON,ICON HEALTH 401K,,,1,08/21/2014
0150566115,HEALTH 401K,ICON,ICON HEALTH 401K,,,1,08/21/2014
Now, you can feed those results back into the split function, using the comma this time, depending on your ultimate goal. The question seemed to resolve only around being able to ignore the double quotes and any commas contained only inside double-quote pairs.
For more on splitting and concatenating strings:
- http://sqlperformance.com/2012/07/t-sql-queries/split-strings
- http://sqlperformance.com/2012/08/t-sql-queries/splitting-strings-follow-up
- http://sqlperformance.com/2012/08/t-sql-queries/splitting-strings-now-with-less-t-sql
- http://sqlperformance.com/2014/08/t-sql-queries/sql-server-grouped-concatenation
- http://sqlperformance.com/2014/08/t-sql-queries/sql-server-grouped-concatenation-2
- https://www.sentryone.com/blog/aaronbertrand/grouped-concatenation-use-cases
For more on numbers tables and generating sets without loops:
RegEx is the only serious way to do this. Here’s a simple example using the RegEx functions that come installed for free if you have Master Data Services installed. Otherwise google around for SQL Server CLR RegEx, the expression used should be transferrable:
-- Find and replace commas in quotes
USE MDS
GO
DECLARE @t TABLE( yourText VARCHAR(MAX), newText VARCHAR(MAX) );
INSERT @t ( yourText )
VALUES
( '0150566115,"HEALTH 401K","IC,ON","ICON HEALTH 401K",,,1,08/21/2014' ),
( '0150566115,HEALTH 401K,"IC,ON","ICON HEALTH 401K",,,1,"08/21/2014"' ),
( '"01505,66115,","HEALTH 401K","IC,ON","ICON HEALTH 401K",,,1,08/21/2014' )
-- Remove commas within quotes
UPDATE @t
SET newText = mdq.RegexReplace( yourText, '("[^",]+),([^"]*")', '${1}${2}', 0 )
UPDATE @t
SET newText = REPLACE( mdq.RegexReplace( newText, '("[^",]+),([^"]*")', '${1}${2}', 0 ), '"', '' )
-- Show the results
SELECT *, DATALENGTH( yourText ) dl_before, DATALENGTH( newText ) dl_after
FROM @t
GO