I’m trying to find what query (or queries) need to be run in order to determine if there is any sort of a problem with an availability group. I have been using:
SELECT @@SERVERNAME AS 'primary_replica', cs.database_name, rs.synchronization_health_desc FROM sys.dm_hadr_database_replica_states rs join sys.dm_hadr_database_replica_cluster_states cs on rs.replica_id = cs.replica_id and rs.group_database_id = cs.group_database_id WHERE rs.is_local = 1
It turns out that the synchronization health (
select * from sys.dm_hadr_availability_group_states) can be NOT_HEALTHY, but the query above does not show any problem with the replica. I have tried looking through all of the various DMs that contain information, but it is a maze or words and I can’t find my way to the SQL cheese that must be there somewhere.
What is the one query that will indicate that there is a problem that needs to be addressed? I don’t need it to be specific or to designate the cause, I just need to know that there is a problem. I know that during failover some of the states will be NOT_HEALTHY, and we account for that by requiring x minutes of unhealthiness before sending an alert.
Following is a SQL query that can be run against all nodes in the cluster to check all of the AG-related health states. The output could be formatted differently depending on your needs, but this one returns either a list of the objects that are not healthy, or a row that includes the server name and that all objects are healthy.
DECLARE @HealthStates AS TABLE (objectname VARCHAR(50), [type] VARCHAR(20), health_desc tinyint) /* Collect local database replica states */ INSERT INTO @HealthStates SELECT cs.[database_name], 'database_replica', rs.synchronization_health FROM sys.dm_hadr_database_replica_states rs join sys.dm_hadr_database_replica_cluster_states cs ON rs.replica_id = cs.replica_id and rs.group_database_id = cs.group_database_id WHERE rs.is_local = 1 /* Collect local AG states */ INSERT INTO @HealthStates SELECT ag.[name], 'availability_group', gs.synchronization_health FROM sys.dm_hadr_availability_group_states gs join sys.availability_groups_cluster ag ON gs.group_id = ag.group_id WHERE gs.primary_replica = @@SERVERNAME /* Collect replica states */ INSERT INTO @HealthStates SELECT cs.replica_server_name, 'server_replica', rs.synchronization_health FROM sys.dm_hadr_availability_replica_states rs join sys.availability_replicas cs ON rs.replica_id = cs.replica_id and rs.group_id = cs.group_id /* States to grid */ --SELECT objectname, [type], health_desc FROM @HealthStates /* Test for not healthy */ IF EXISTS (SELECT health_desc from @HealthStates where health_desc <> 2) BEGIN SELECT objectname, type, 'NOT_HEALTHY' as health_desc from @HealthStates where health_desc <> 2 END ELSE BEGIN SELECT @@SERVERNAME AS objectname, 'All Objects' AS type, 'HEALTHY' as health_desc END