SQL Server collation mismatch reducing the amount of data visible to an application?

Posted on

Question :

One of our servers died recently, it was rebuilt with a wrong collation and so it looks as follows:

  • SQL Server – SQL_Latin1_General_CP1_CI_AS
  • master db – SQL_Latin1_General_CP1_CI_AS
  • model db – SQL_Latin1_General_CP1_CI_AI
  • msdb db – SQL_Latin1_General_CP1_CI_AI
  • temp db – SQL_Latin1_General_CP1_CI_AS
  • every user db – SQL_Latin1_General_CP1_CI_AI

The databases are accessible, my question is: will the data visible to users via an app be limited because of the mismatching collation? Some users have said that they think they have lost a lot of data even though the database was restored from a very recent backup.

It is SQL Server 2008 R2 with SP3 on Windows Server 2012 R2 (compatible only due to service packs)

Thank you

Answer :

will the data visible to users via an app be limited because of the mismatching collation?

Assuming you are speaking of scenarios that do not error (because an error does not cause people to wonder if something is missing), but instead just return data that might not be ALL of the expected data, then it depends on how the queries are being done. If ALL of the following conditions are true, then yes, some data can be filtered out now when it wasn’t before:

  1. a query uses a temporary table (local or global)
  2. the Collation for string columns in the CREATE TABLE #TempTable statement did not explicitly specify the Collation via the COLLATE keyword
  3. the query is filtering at least one such string column in the temporary table on a string literal or local variable
  4. there is an accent mismatch between the data in that column and the string used for filtering (in the literal or variable)

Meaning, previously someone could have executed:

SELECT * FROM #TempTable WHERE Name = 'a';

and gotten back rows containing the following: A a À Á Â Ã Ä Å à á â ã ä å. That same query, with the new tempdb Collation, will now return only rows containing: A a.

Also, outside of the unexpected filtering that is happening in your user-facing code, there are other potential problems waiting for you. If master is a different Collation, then names of system-level entities (e.g. Databases, Logins, etc) will behave outside of expectations and differently than the rest of the system and could cause odd behavior similar to what your users are reporting. Meaning, if you have accents in Database and/or Login and/or variable/parameter/cursor names and have code that works because it is expecting accent-insensitive comparisons, then you / your application might experience what your users are experiencing (though it could take a while to notice, and even longer to debug). AND, it is assumed that all four system DBs are the same Collation, and some stored procedures and views in msdb JOIN to master on string columns and do break in this scenario. Hence this does really need to be fixed.

SO, since it is just the system DBs that are “incorrect” (it is best to have master and msdb match as there are some stored procedures that JOIN between them and error when their Collations don’t match), you should change only those back to the expected Collation of SQL_Latin1_General_CP1_CI_AI. You can do this easily enough by using SETUP.EXE /ACTION=Rebuilddatabase. It allows you to change just the system DBs and does not touch user Databases.

Also see the MSDN documentation for: Rebuild System Databases and Set or Change the Server Collation.

At the very least the command-line should look like (but all as one line, or you need to use ^ at the end of each line in order to continue the command on the following line):

Setup /QUIET /ACTION=REBUILDDATABASE /INSTANCENAME={InstanceName}
      /SQLSYSADMINACCOUNTS={accountsToAddToSysadminRole} 
      /SQLCOLLATION=SQL_Latin1_General_CP1_CI_AI [/otherOptions]

For this situation you do not need to use the other recommendation of the undocumented -q switch for sqlservr.exe. That option rebuilds ALL databases, and there is no need to make modifications to databases that are already correct. The less you touch (especially with an undocumented feature) the better off you are.

Some users have said that they think they have lost a lot of data

That’s not true.
You’ve restored your databases from backups and the databases have the original collation, i.e. SQL_Latin1_General_CP1_CI_AI, so nothing data is lost.

The problem you can have is related to tempdb that now has a different collation, so any code that creates a temp table with string data and then tries to join it to some table from your database on a string field or tries any comparison between string fields will fail with error

Cannot resolve the collation conflict between
“SQL_Latin1_General_CP1_CI_AS” and “SQL_Latin1_General_CP1_CI_AI” in
the equal to operation.

So it’s preferable to have the same collation.

The thing I cannot understand is how you’ve got the situation when msdb and tempdb databases have a collation that differs from master’s collation.
I can imagine that you’ve restored your msdb and model on the new server, but tempdb??? It is recreated at every restart and should inherit model’s collation, or you just never restarted yor server since model database was restored?

Change the collation of the instance back to what it was i.e SQL_Latin1_General_CP1_CI_AI. This will fix the issue. You can go through the below article on how to change the instance collation, this will need a downtime so plan it properly and go ahead.

https://www.mssqltips.com/sqlservertip/3519/changing-sql-server-collation-after-installation/

Leave a Reply

Your email address will not be published. Required fields are marked *