Revisiting STRING_SPLIT. This won’t be the last time, I’m sure. I am so thankful that STRING_SPLIT is native to SQL Server as of SQL Server 2016. I still, however, feel that it has room for improvement. There are so many various ways to split strings, as Aaron Bertrand has excellently detailed in his blog post: Split strings the right way – or the next best way.
Enough about the different ways, let’s discuss the problem with multi-character delimiters.
That “separator” argument…
Microsoft’s documentation currently states the following regarding
Is a single character expression of any character type (for example, nvarchar(1), varchar(1), nchar(1), or char(1)) that is used as separator for concatenated substrings.
What happens if you need to work with strange separators, like “<|>” for example? I’m not sure this exact separator would ever be used, but it could!
Time to do some replacing.
Let’s go ahead and create a table to test this.
CREATE TABLE dbo.SplitThese ( ID int NOT NULL IDENTITY, String varchar (32) NOT NULL ); INSERT INTO dbo.SplitThese (String) VALUES ('Red<|>Green<|>Blue'), ('Purple<|>Yellow');
So those two rows are “perfect” to do replacing. By that, I mean none of the characters that are part of the delimiters exist within the string. With this, technically we could just use the “<” as the
separator argument and replace the “|>” on the result.
What if we throw in something like this?
INSERT INTO dbo.SplitThese (String) VALUES ('Ora<nge<|>Mage|n>ta');
The extracted text from that is going to look crazy, but with the REPLACE method it will still split properly.
I’ve found quite a few posts demonstrating this solution using NCHAR(9999) I feel that it’s almost becoming the standard for this. NCHAR(9999) is a pencil ✏. The reason this is used is because it’s not likely to ever show up as a legitimate character within the data we’re attempting to split.
We can execute the following to split these rows correctly:
SELECT * FROM dbo.SplitThese CROSS APPLY STRING_SPLIT(REPLACE(String, '<|>', NCHAR(9999)), NCHAR(9999));
Here’s the results:
So that’s it?
I like strange workarounds like this, but I know they can be very confusing to someone reviewing the code. I’m not sure how many people automatically associate NCHAR(9999) with a pencil and are able to understand your intentions.
Hopefully this helps you in the future, if STRING_SPLIT doesn’t gain support for multi-character delimiters anytime soon.
Please consider voting for this feature (and many more) to be native to
Thanks for reading!