A company’s IT systems are critical to everyday business operations. There is an expectation that these systems are always working and there will never be any downtime.

Downtime is impossible to rule out entirely, however, the IT team are responsible for minimising the risk of it happening. One way to manage these risks is with high availability (known as HA), which ensures that a line of business application or services has the maximum potential uptime.

To do this, you need to eliminate single points of failure so that if one element, such as a server, goes down the service is still available. Disruption to the availability of applications and services can impact on business operations and ultimately will result in additional expense or losses.

It’s worth remembering that high Availability doesn't remove the threat of downtime entirely, even with 99.999% service availability, but it does mean that you will have taken all of the necessary steps to ensure continuity. In this blog, we examine High Availability in the cloud, why you need it and how it should be deployed.

Why High Availability is Important For Your Business

High availability protects companies from lost revenue when access to their data resources and critical business applications is disrupted. What kind of outage is your business trying to protect itself against? Planned outages such as backup windows, maintenance and unplanned outages should be considered when planning high availability in the cloud.

Planned outages are the primary reason to have high availability in the cloud. These outages are needed to take systems or data offline to facilitate maintenance tasks, such as the deployment of new hardware or software upgrades. As your business grows so does the importance of uptime and, as a result, your maintenance windows have to shrink. How many hours can your systems be offline before they impact your business? The impact of planned outages can be minimised using a high availability solution.

Upgrades and fixes can be applied to the backup server while the primary server is running. The workload can then be switched to the backup server and fixes can be applied to the original primary server. After the upgrade has finished, production can be switched back to the original server.

Unplanned outages bring risks that can be mitigated by deploying a high availability solution. While cloud platforms are unlikely to fail, other factors can cause disruptive unplanned outages. Human error cannot be engineered out of any system, unfortunately. Procedures are not always followed and poor communication cause misunderstanding.

The resulting outages to operating systems, middleware or databases can be embarrassing and costly. With high availability infrastructure, your business can continue to operate on a failover server while the problem is diagnosed and resolved.

How High Availability Works

There are three areas to consider in exploring High Availability: redundancy, monitoring and failover.

Firstly, redundancy means you have multiple components that can perform the same task, removing single points of failure. Multiple components (for example, multiple servers) handle the same task, which creates a need for replication to ensure they all have the same data at the same time.

For example, if you were running a website and the database server went offline, PHP would be unable to perform queries properly, and the site will not display. With a high availability solution, the problem is eliminated as the databases are distributed across multiple servers, so when one server becomes unavailable, the data can still be read from the other servers. The redundant high availability infrastructure needs to monitor the availability of the replicated components to know when to failover from the primary to the secondary element.

High availability clusters explained

High Availability in the cloud is achieved by creating clusters. A high availability cluster is a group of servers that act as a single server to provide continuous uptime. These servers will have access to the same shared storage for data, so if a server is unavailable, the other servers pick up the load. A high availability cluster can be anything from two to dozens of servers. As well as providing failover, high availability clusters also allow load balancing of workloads so that anyone server within the cluster will not get overloaded and you can provide more consistent performance.

Determining your high availability requirements

To assess your high availability requirements, you need to define what length of downtime is acceptable to your business. The desire for most businesses is for the length of downtime, whether planned or unplanned, to be as close to zero as possible.

Not all applications need to be available at all times, which may mean a longer outage window is acceptable, but for others, downtime lasting more than a few seconds can be disastrous.

You need to carry out a Business Impact Analysis that considers the ramifications of downtime and calculates the cost of this downtime to your company.

Once you understand the impact of downtime, you can then determine what level of high availability you need within your infrastructure, by defining two things. Firstly your Recovery Time Objective (RTO), which is how long it will take to have your systems up and running and secondly, your Recovery Point Objective (RPO), which is the point at which data should be recovered from.

For example, a fairly static system could be restored from a day old back up but if you were handling a high volume of transactions, anything more than a slight blip in uptime may be unacceptable.

While we’ve loosely outlined how to go about determining your requirements, there are lots of considerations for building robust, highly available solutions. Our team can help you define this, or even just talk through your options. Organisations often begin this process feeling they need the shortest RTOs and the most recent data backups, however, once you get into detail, it often becomes clear that this simply isn’t needed.

Posted in Cloud Development, Managed Cloud on Apr 22, 2020