STRING_SPLIT with a multiple-character separator?

Posted on

Question :

SQL Server 2016 introduced STRING_SPLIT which is very fast and an excellent replacement for any homemade implementation people would roll before 2016.

Unfortunately, STRING_SPLIT only supports a single-character separator, which isn’t always enough. Does anyone know of a good implementation that allows for using multiple characters in the separator?

Answer :

Well, you can always use REPLACE to add a single-character delimiter to the argument before passing it in. You just need to choose a character that is unlikely/impossible to appear in the actual data. In this example, let’s say your original data uses three pipes as a delimiter; I chose a Unicode character at random to substitute:

DECLARE 
  @olddelim nvarchar(32) = N'|||', 
  @newdelim nchar(1)     = NCHAR(9999); -- pencil (✏)

DECLARE @x nvarchar(max) = N'foo|||bar|||blat|||splunge';

SELECT * FROM STRING_SPLIT(REPLACE(@x, @olddelim, @newdelim), @newdelim);

I blogged about this in more detail here:


Addressing a comment:

bad solution. what if original string is like ‘abc||pqr|||rst||123’ (dynamic and can contain anything). desired o/p is ‘abc||pqr’ and ‘rst||123’ but your solution will give ‘abc’ ‘pqr’ ‘rst’ ‘123’

Okay, let’s take your input and see if my solution gets the wrong output.

DECLARE 
  @olddelim nvarchar(32) = N'|||', 
  @newdelim nchar(1)     = NCHAR(9999); -- pencil (✏)

DECLARE @x nvarchar(max) = N'abc||pqr|||rst||123';

SELECT * FROM STRING_SPLIT(REPLACE(@x, @olddelim, @newdelim), @newdelim);

Result is:

abc||pqr
rst||123

And not what you must have assumed (but didn’t test) this:

abc
pqr
rst
123

If your data is in a table, you could create a view so that you don’t have to factor that expression into all of your queries.


If that doesn’t work, because you might have a pencil () in your data, and you can’t find a single character in the 1,111,998 available Unicode characters that won’t be in your data set, you’ll have to skip STRING_SPLIT(), since it is hard-coded to accept a single character delimiter (separator
Is a single character expression
).

Alternatives have been answered here dozens of times before, many before STRING_SPLIT() existed. Those methods still work.

I go over many alternatives, and also discuss the limitations in STRING_SPLIT(), in this series (I also discuss why you might consider not doing this in T-SQL using any method at all):

Leave a Reply

Your email address will not be published. Required fields are marked *