Question :
I’m trying to find what query (or queries) need to be run in order to determine if there is any sort of a problem with an availability group. I have been using:
SELECT @@SERVERNAME AS 'primary_replica', cs.database_name, rs.synchronization_health_desc
FROM sys.dm_hadr_database_replica_states rs
join sys.dm_hadr_database_replica_cluster_states cs on rs.replica_id =
cs.replica_id and rs.group_database_id = cs.group_database_id
WHERE rs.is_local = 1
It turns out that the synchronization health (select * from sys.dm_hadr_availability_group_states
) can be NOT_HEALTHY, but the query above does not show any problem with the replica. I have tried looking through all of the various DMs that contain information, but it is a maze or words and I can’t find my way to the SQL cheese that must be there somewhere.
What is the one query that will indicate that there is a problem that needs to be addressed? I don’t need it to be specific or to designate the cause, I just need to know that there is a problem. I know that during failover some of the states will be NOT_HEALTHY, and we account for that by requiring x minutes of unhealthiness before sending an alert.
Answer :
Following is a SQL query that can be run against all nodes in the cluster to check all of the AG-related health states. The output could be formatted differently depending on your needs, but this one returns either a list of the objects that are not healthy, or a row that includes the server name and that all objects are healthy.
DECLARE @HealthStates AS TABLE (objectname VARCHAR(50), [type] VARCHAR(20), health_desc tinyint)
/* Collect local database replica states */
INSERT INTO @HealthStates
SELECT cs.[database_name], 'database_replica', rs.synchronization_health
FROM sys.dm_hadr_database_replica_states rs
join sys.dm_hadr_database_replica_cluster_states cs ON rs.replica_id = cs.replica_id and rs.group_database_id = cs.group_database_id
WHERE rs.is_local = 1
/* Collect local AG states */
INSERT INTO @HealthStates
SELECT ag.[name], 'availability_group', gs.synchronization_health
FROM sys.dm_hadr_availability_group_states gs
join sys.availability_groups_cluster ag ON gs.group_id = ag.group_id
WHERE gs.primary_replica = @@SERVERNAME
/* Collect replica states */
INSERT INTO @HealthStates
SELECT cs.replica_server_name, 'server_replica', rs.synchronization_health
FROM sys.dm_hadr_availability_replica_states rs
join sys.availability_replicas cs ON rs.replica_id = cs.replica_id and rs.group_id = cs.group_id
/* States to grid */
--SELECT objectname, [type], health_desc FROM @HealthStates
/* Test for not healthy */
IF EXISTS (SELECT health_desc from @HealthStates where health_desc <> 2)
BEGIN
SELECT objectname, type, 'NOT_HEALTHY' as health_desc from @HealthStates where health_desc <> 2
END
ELSE
BEGIN
SELECT @@SERVERNAME AS objectname, 'All Objects' AS type, 'HEALTHY' as health_desc
END