Question :
I have a string variable I have created that is a concatenation of 12 columns, each of the 12 columns contains 4 or 5 digit character. I want to create a series of dummy variables, based on the contents of this string (named diag_string) . My code is as follows:
case when diag_string like '%A0[0-9]%' THEN 1 ELSE 0 END AS [A00_A09]
case when diag_string like '%A[15-19]%' THEN 1 ELSE 0 END AS [A15_A19]
case when diag_string like '%A[20-49]%' THEN 1 ELSE 0 END AS [A20_A49]
case when diag_string like '%A[50-64]%' THEN 1 ELSE 0 END AS [A50_A64]
case when diag_string like '%A[65-79]%' THEN 1 ELSE 0 END AS [A65_A79]
case when diag_string like '%A[80-89]%' THEN 1 ELSE 0 END AS [A80_A89]
case when diag_string like '%A[90-99]%' THEN 1 ELSE 0 END AS [A90_A99]
So the first dummy variable, [A00_A09], should be set to 1 where the string contains any values that are A01 through to A09.
The above is only a subset of the code, there are other dummy variables beginning with B all the way through to Z, but are split up in different ways, for example:
case when diag_string like '%Z0[0-13]%' THEN 1 ELSE 0 END AS [Z00_Z13]
case when diag_string like '%Z[20-29]%' THEN 1 ELSE 0 END AS [Z20_Z29]
case when diag_string like '%Z[30-39]%' THEN 1 ELSE 0 END AS [Z30_Z39]
case when diag_string like '%Z[40-54]%' THEN 1 ELSE 0 END AS [Z40_Z54]
case when diag_string like '%Z[55-65]%' THEN 1 ELSE 0 END AS [Z55_Z65]
case when diag_string like '%Z[70-76]%' THEN 1 ELSE 0 END AS [Z70_Z76]
case when diag_string like '%Z[80-99]%' THEN 1 ELSE 0 END AS [Z80_Z99
There is an error somewhere, in that, for example, a record is being flagged as having a code A50 to A64, but inspection of the record shows that this is not the case.
I am not sure why this code is not working. Is anyone able to advise what the issue might be?
Answer :
I think I see your issue. [] represent a single character. The way you have it written %A[50-64]%'
, the character immediately following the A
needs to be 5, or anything from 0-6 or a 4. Basically, any number 0 to 6.
If A will always be a leading charator? You can try the below code. Otherwise, put the ‘%’ back before the ‘A’. I like to avoid leading ‘%’ to make the code more efficient.
case when diag_string like 'A0[0-9]%' THEN 1 ELSE 0 END AS [A00_A09]
case when diag_string like 'A1[5-9]%' THEN 1 ELSE 0 END AS [A15_A19]
case when diag_string like 'A[2-4][0-9]%' THEN 1 ELSE 0 END AS [A20_A49]
case when diag_string like 'A5[0-9]%' OR diag_string like 'A6[0-4]%' THEN 1 ELSE 0 END AS [A50_A64]
case when diag_string like 'A6[5-9]%' OR diag_string like 'A7[0-9]%' THEN 1 ELSE 0 END AS [A65_A79]
case when diag_string like 'A8[0-9]%' THEN 1 ELSE 0 END AS [A80_A89]
case when diag_string like 'A9[0-9]%' THEN 1 ELSE 0 END AS [A90_A99]
Further documentation on how LIKE
interprits string can be found here.