We have a SQL Server setup with Availability Groups. In the cluster we have four nodes, two in one datacenter and two in another datacenter.
There are four AGs on these cluster nodes, each primarily on one of the four nodes. One other node is in the same datacenter for synchronous HA and the other two for asynchronous in the other datacenter for DR.
We have one listener for each AG.
One AG seems to automatically failover at certain times due to various Network issues; however, the AG seems to failover fine. The HA secondary becomes primary, and all databases come online. However, applications seem unable to connect and we see various connection error type messages such as:
Cannot connect to SQL Server instance ‘HH-SQL-D11’ :
A transport-level error has occurred when receiving results from the
server. (provider: TCP Provider, error: 0 –
The semaphore timeout period has expired.) : The semaphore timeout
period has expired  (requires acknowledgement)
named pipes provider error 40 the network path was not found
Yet when we failover manually the connection is fine and works as expected. Other AGs on the cluster have automatically failed over and have worked fine. I have checked various things like the SQL Instance is up, TCP/IP and Named pipes enabled, remote connections enabled. I have asked the networks team to check things like firewalls and they say they are all ok.
I have tried to read the cluster logs but I am not sure what to look for in here and I am unsure of how to fix this issue. It’s a problem as it means that HA is not operating. And if it does failover automatically, applications connecting to it will not work.
If you cannot reach the listener from SSMS, then it seems to be an issue with networking or name resolution (DNS).
You can try do to a NSLOOKUP of your listener and make sure it point at the primary node. If not, it means you have an issue with DNS update. Your windows admin should be able to help you.
If it points to the right server, but you cannot ping it or connect to it, then I would provide that information to the network team, asking them to assist with this specific situation.