Is there a more detailed document illustrating why a SQL availability group needs a Load Balancer in Azure IaaS but not On-Prem?
Is there a way around it in higher versions of SQL or Windows Server?
⚠Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
@Ayanmullick Thanks for the question! We are currently investigating and will update you shortly.
@Ayanmullick I assume you are referring to the following statement:
An availability group requires a load balancer when the SQL Server instances are on Azure virtual machines.
I believe this statement is misleading. There is nothing stopping you from creating a SQL machine in an AV set without a load balancer. However, when creating the load balancer you can opt to balancer between VMs in an AV set. In addition, it is best practice to combine an AV set with a load balancer to manage high availability.
@MikeRayMSFT can you confirm and determine if this needs to be made more clear?
Yes. So what challenges could one face if one deploys a 2-node SQL AlwaysOn Cluster in an availability Set and connects directly to the SQL listener IP without any Load Balancer in between?
@Ayanmullick if you don't use a load balancer and one of your SQL machines go down traffic would not be automatically directed to the running machine. This could impact your production as any services using the machine that went down would not work.
One of the main purposes of a load balancer is to distrubute traffic to avaialble machines and redirect it when there is a problem. So although you could consider your enviornment "Always On" it would not meet the critera for high availability.
But why don't I face the same issue without a Load Balancer on-Prem? We've deployed AlwaysON on-Prem without Load Balancers. Works fine.
@Ayanmullick I believe that is just the difference between running in the cloud and running on prem. The environment is different and require different settings to work. Not 100% sure I can provide a complete answer as it is just how it was designed to work. @MikeRayMSFT might be able to elaborate further than I can.
In regular WSFC (Windows server failover cluster) on-prem setup, when AG listener is created, it will create a DNS record for AG listener with the IP(s) provided. This IP address has to map now to MAC address of the current Primary node in ARP tables of switches/routers in the network. The cluster does this by using Gratuitous ARP (GARP) where it will broadcast to the network the latest IP-to-MAC mapping whenever a new Primary is elected after failover. Here, the IP is listener’s and MAC is of current Primary. This GARP should force an update on ARP table entries for the switches/routers and to a user connection to the listener IP address seamlessly goes to the current Primary.
GARP (even ARP) is not supported on any public clouds (Azure, GCP and AWS, I believe as well) due to security reasons. In short, any kind of broadcast is not supported on cloud setup.
So, in public cloud’s network infrastructure, load balancers provide traffic routing. In short, the load balancers are setup with a frontend IP, corresponding to the listener, and a probe port is assigned where LB will periodically poll for status. The VM which responds successfully to probe on this port will be forwarded incoming traffic. At one time only one SQL VM (Primary) will respond for this TCP probe. There is also configuration made at WSFC level, where corresponding probe port is setup at cluster IP resource level, thereby ensuring that Primary node does respond to TCP probe requests on this port.
Does that help?
Yes. Thanks. So we'd continue to need an additional Load Balancer for Azure IaaS even in later versions of SQL or Windows Server , right?
Also, Azure private DNS just went 'in Preview'. Could that make a difference in the process of associating the new IP after failover?
@Ayanmullick - first let me apologize for not replying to this conversation sooner.
You will need an load balancer for IaaS on Azure VMs (or any other cloud service that hosts VMs) in the future.
I don't think the Private DNS will change that because GARP is still not available on the network.
@mikerayMSFT Exactly what I was looking for the past two weeks as I was struggling to setup an AO AG on google cloud platform. Could you please post the above info "In regular WSFC (Windows server failover cluster) on-prem setup, when AG listener is created, it will create a DNS record for AG listener with the IP(s) provided. ...." as post on technet. There is not a single post explaining what is the difference between running AOAG onPrem vs cloud and how the listener ip is broadcasted. Wish you allowed the search crawling on the comments as well :)