Azure – Design for virtual machines with high availability

In this article we will see how we can create highly available VMs in Azure. We will understand the concept of fault domains and update domains and how we can effectively leverage availability sets and vm scale sets to ensure high availability for our applications running on Azure VMs.

Background

High availability is one of the most important aspect of any application. In case we have our application deployed on a VM in the cloud, the high availability becomes more important because essentially we are not relying on the VMs that are being managed by our cloud vendor. If we are designing an application where high availability is the key and we are going to deploy the application on Azure VMs, we need to know the various constructs and configurations provided by azure to get high availability.

When it comes to hardware reliability few thing that is certain is that systems are bound to fail, wires are going to break, connections are going to fail. So what will happen to our applications if our application is hosted on cuch VMs where the failure has happened. Azure acknowledges these facts and provide us ample solutions to be able to deign our VM infrastructure in such a way that even with these failures, our application is available.

Design for Highly available VMs

Before getting into how we can ensure high availability in Azure, lets try to understand what all type of failures could happen:

  • Planned maintenance: This is when the Azure VMs are getting updated to provide us with new set of features.
  • Unplanned maintenance: This is when Azure find that the VMs or networks are about to go bad (based on health checks) and perform maintenance activities.
  • Unexpected downtime: Some failure which is due to unforeseen circumstances.

Now with these types of potential failures in our on premise servers, we used to keep redundant servers on separate networks. In Azure the VM selection is a black box for us so we need to somehow be still able to keep redundant servers and separate networks.

Availability Sets

To achieve this redundancy, Azure has the concept of Availability Sets. When we put two or more VMs in an availability set, these VMs will be distributed across multiple hardware that are isolated in terms of electrical and network supply. How will Azure ensure this redundancy, it does that using the concept of fault domains and update domains. Lets try to understand them in details now.

Fault Domain: A fault domain is a set of machines (or essentially a rack) in a data center that is susceptible to a fault. Any electrical or network fault will impact all the VMs in a fault domain.

Update Domain: An update domain is a set of machines that will receive OS or Azure feature updates in one chunk(and will perhaps be rebooted together). It has nothing to do with racks its just a logical grouping that defines an update domain.

When we put our VMs in an availability set, Azure ensure that our VMs are put in separate fault domains and update domains. This will ensure that in case of a rack fault or any update on any of the VMs will not effect the other VMs in the availability set and this our application will still be available.

By default when we choose an availability set, our VMs will be put in 2 fault domains and 5 update domains.

VM Scale Sets

Choosing an availability set for our VMs ensure that our VMs are highly available but since we are choosing multiple VMs for the sake of high availability, we will have to manage these VMs independently too. Is it not possible for us to somehow manage the identical VMs as a group and have high availability too.

This is exactly where the Azure VM Scale sets come in picture. Azure VM scale sets let us create and manage a group of load balanced VMs together. Scale sets ensure the high availability for our VMs and at the same time let us manage all the VMs as a group. Also, the scale set can be configured to automatically scale based on load which will further help is in achieving our goal of high availability for VMs.

Here are few of the benefits of using VM Scale sets

  • Easy to manage multiple load balanced VMs
  • High availability
  • Automatic Scaling
  • Automatic Load balancing

So whenever we find ourselves in need of the above mentioned requirements, we should choose VM scale sets rather than creating multiple VMs manually ourselves.

Point of Interest

In this small article we got ourselves acquainted with the concepts of availability sets and VMs scale sets in Azure. we learnt about fault domains and update domains and we can use availability sets to effectively manage our VMs to ensure high availability. I hope this has been some what informative.

References

History

  • 31 July 2018: First version