Saturday, September 5, 2020

Always On AG - Connection Timeout on Multi Subnet Cluster

 If your Customer is facing intermittent timeout issues while connecting to the listener on an Always on Availability group, below set of steps could be useful if all the below conditions apply in your case.

Conditions:


  1. AOAG is setup using multi subnet failover cluster. You can easily recognize if this is the case by looking at the IP assigned to the listener. One of the IP would show offline in that case. Refer below screenshot.


  1. Customer's application is legacy and the drivers they are using do not support/understand multi subnet AOAG failover. Below link gives you a list of Microsoft supported drivers with their minimum version, which can be used for connection multi subnet AOAG listeners.
  1. Connections to multi subnet AOAG listener requires a connection string option MultiSubnetFailover added to Client application connection string. If the client isn't using this option, this document applies to you.

Cause:


The issue is caused as both the IP address assigned to your listener are registered on DNS server under your listener name and your DNS server either randomly or in round robin fashion sends the request to both the IP addresses. The IP which is up connects immediately and the other one times out for Customer.


Resolution:


To resolve this error, login to any of the nodes part of AOAG, open an administrative PowerShell window and run below set of commands. Replace <YourListenerName> with the currently configured listener name.
You can change the value of HostRecordTTL as per Application/Customer's requirement. 300 is the recommended value when you are setting this option for multi subnet failover listeners.
Default value of this parameter normally is 1200 (20 mins). This will setup your listener to only register the active IP address with DNS server so that the redirection only happens to the active IP address. 

Import-Module FailoverClusters
Get-ClusterResource <YourListenerName> | Set-ClusterParameter RegisterAllProvidersIP 0
Get-ClusterResource <YourListenerName> | Set-ClusterParameter HostRecordTTL 300

If you get an error mentioning – Parameter 'RegisterAllProvidersIP' does not exist on the cluster object '<YourListenerName', use below command to execute the same.


Get-ClusterResource <YourListenerName> | Set-ClusterParameter -Create RegisterAllProvidersIP 0
Get-ClusterResource <YourListenerName> | Set-ClusterParameter -Create HostRecordTTL 300

Now do the same thing for your cluster and Network name object. Run this command as is without any changes.

Get-ClusterResource "cluster name" | Set-ClusterParameter RegisterAllProvidersIP 0
Get-ClusterResource "cluster name" | Set-ClusterParameter HostRecordTTL 300

Again, repeat the process for your network name. To get your network name run "Get-ClusterResource" on PowerShell window and look for "NetworkName" in ResourceType column. Note down the value in "Name" column and replace in below command.

Get-ClusterResource <NetworkName> | Set-ClusterParameter RegisterAllProvidersIP 0
Get-ClusterResource <NetworkName> | Set-ClusterParameter HostRecordTTL 300

Once you are done with above steps, stop and start your listener name either from Cluster administrator or using below commands.
Stop-ClusterResource <YourListenerName>
Start-ClusterResource <YourListenerName>


This should resolve the connection timeout your customer is facing. Hope you find this helpful.

No comments:

Post a Comment