Finding char “x00” in a varchar field

Posted on

Question :

Im trying to find rows that contain certain unprintable characters and encounter some inconsistent behaviour of various string functions. The field is of type VARCHAR. Here are the queries:

select ascii(substring(AddressTail, 17, 1)) from Address where ID = 1934484 returns 0

select charindex(char(0), substring(AddressTail, 17, 1)) from Address where ID = 1934484 returns 1 which is expected

but

select charindex(char(0), AddressTail) from Address where ID = 1934484 returns 0 which is quite strange.

So, how to find the rows that contain byte zero? SQL Server version is Microsoft SQL Server 2016 (SP2-CU9) (KB4515435) - 13.0.5479.0 (X64)

Answer :

You can use a binary COLLATE clause – below example shows examples of using it with CHARINDEX and LIKE

SELECT  CHARINDEX(char(0) COLLATE Latin1_General_100_BIN2, txt)
FROM test
WHERE txt COLLATE Latin1_General_100_BIN2 LIKE '%' + char(0)  + '%'

NB: The documentation for CHARINDEX does state

0x0000 (char(0)) is an undefined character in Windows collations and
cannot be included in CHARINDEX.

but nonetheless it does work in practice. Perhaps a SQL binary collation (such as SQL_Latin1_General_CP850_BIN) would be a safer choice due to that however.

how to find the rows that contain byte zero?

SQL server somehow badly understands x00…

Maybe this trick will help you slightly?

create table test (id int, txt varchar(8));

insert into test values 
(1, '123' + char(0) + '456'),                -- in the middle
(2, '123' + char(0)),                        -- at the end
(3, char(0) + '456'),                        -- at the beginning
(4, '12' + char(0) + '34' + char(0) + '56'), -- in the middle
(5, '123456');                               -- absent
select id, pos
from test
cross apply ( select number + 1 pos 
              from master..spt_values 
              where type='P'
                and number < len(txt) ) numbers
where substring(txt,pos,1) = char(0)
order by id;

Output

id | pos
-: | --:
 1 |   4
 2 |   4
 3 |   1
 4 |   3
 4 |   6

Instead of a collate on the searched for ascii null, @MartinSmith suggested, you can as well SQL collate the string potentially holding it:

CHARINDEX(CHAR(0), s COLLATE SQL_Latin1_General_CP1_CI_AS) AS ascii_null_pos

See it in action: SQL Fiddle.

Please comment, if and as this requires adjustment / further detail.

Leave a Reply

Your email address will not be published.