search a string to create dummy variables

Posted on

Question :

I have a string variable I have created that is a concatenation of 12 columns, each of the 12 columns contains 4 or 5 digit character. I want to create a series of dummy variables, based on the contents of this string (named diag_string) . My code is as follows:

case when diag_string like '%A0[0-9]%' THEN 1 ELSE 0 END AS  [A00_A09] 
case when diag_string like '%A[15-19]%' THEN 1 ELSE 0 END AS  [A15_A19] 
case when diag_string like '%A[20-49]%' THEN 1 ELSE 0 END AS  [A20_A49] 
case when diag_string like '%A[50-64]%' THEN 1 ELSE 0 END AS  [A50_A64] 
case when diag_string like '%A[65-79]%' THEN 1 ELSE 0 END AS  [A65_A79] 
case when diag_string like '%A[80-89]%' THEN 1 ELSE 0 END AS  [A80_A89] 
case when diag_string like '%A[90-99]%' THEN 1 ELSE 0 END AS  [A90_A99] 

So the first dummy variable, [A00_A09], should be set to 1 where the string contains any values that are A01 through to A09.

The above is only a subset of the code, there are other dummy variables beginning with B all the way through to Z, but are split up in different ways, for example:

case when diag_string like '%Z0[0-13]%' THEN 1 ELSE 0 END AS  [Z00_Z13] 
case when diag_string like '%Z[20-29]%' THEN 1 ELSE 0 END AS  [Z20_Z29] 
case when diag_string like '%Z[30-39]%' THEN 1 ELSE 0 END AS  [Z30_Z39] 
case when diag_string like '%Z[40-54]%' THEN 1 ELSE 0 END AS  [Z40_Z54] 
case when diag_string like '%Z[55-65]%' THEN 1 ELSE 0 END AS  [Z55_Z65] 
case when diag_string like '%Z[70-76]%' THEN 1 ELSE 0 END AS  [Z70_Z76] 
case when diag_string like '%Z[80-99]%' THEN 1 ELSE 0 END AS  [Z80_Z99

There is an error somewhere, in that, for example, a record is being flagged as having a code A50 to A64, but inspection of the record shows that this is not the case.

I am not sure why this code is not working. Is anyone able to advise what the issue might be?

Answer :

I think I see your issue. [] represent a single character. The way you have it written %A[50-64]%', the character immediately following the A needs to be 5, or anything from 0-6 or a 4. Basically, any number 0 to 6.

If A will always be a leading charator? You can try the below code. Otherwise, put the ‘%’ back before the ‘A’. I like to avoid leading ‘%’ to make the code more efficient.

case when diag_string like 'A0[0-9]%' THEN 1 ELSE 0 END AS  [A00_A09] 
case when diag_string like 'A1[5-9]%' THEN 1 ELSE 0 END AS  [A15_A19] 
case when diag_string like 'A[2-4][0-9]%' THEN 1 ELSE 0 END AS  [A20_A49] 
case when diag_string like 'A5[0-9]%' OR diag_string like 'A6[0-4]%' THEN 1 ELSE 0 END AS  [A50_A64] 
case when diag_string like 'A6[5-9]%' OR diag_string like 'A7[0-9]%' THEN 1 ELSE 0 END AS  [A65_A79] 
case when diag_string like 'A8[0-9]%' THEN 1 ELSE 0 END AS  [A80_A89] 
case when diag_string like 'A9[0-9]%' THEN 1 ELSE 0 END AS  [A90_A99]

Further documentation on how LIKE interprits string can be found here.

Leave a Reply

Your email address will not be published. Required fields are marked *