Deploy Fault Tolerant, Load Balanced Web Apps on Alibaba Cloud

Original Source:

This article was originally published on Alibaba Cloud. Thank you for supporting the partners who make SitePoint possible.

High Availability (HA), Fault Tolerance (FT), and Horizontal Scale Friendly (HSF) are equally important to functionality for web applications to run and succeed today. Existing or new web applications should be designed and provisioned with such underlying architecture. Fortunately, you can easily and promptly deploy the aforementioned architecture in the Cloud era today (compared to the on-premises bare-metal machine era)!

However, this flexibility comes with a caveat – how do you choose the right cloud provider? We are spoiled for choice and it can be really challenging (and hectic!) when evaluating and choosing the right one.

This post is intended to discuss and provide a walkthrough on deploying web applications on Alibaba Cloud from the ground up, including HA, FT, and HSF. Throughout this post, I will briefly introduce several services and tools provided in Alibaba Cloud. Yes, briefly! If you wish to learn more about particular services or tools, please visit the Documentation Center. In addition, this post will highlight the concerns and considerations when deploying such services.

WordPress is used as the demo web application that would be deployed on Alibaba Cloud in this post. The same deploying principle shall apply to many other web applications. This post is not intended to discuss on WordPress configuration at all. It shall not (and not able to) serves as reference for WordPress configuration. There are tons and tons of good resources out there regarding best practices on WordPress administrative.

1. High-level Architecture

Like many other web applications, the demo web application consists of an application layer (WordPress) and a database layer (MySQL).

Goal: Ultimately, we want an always-on web application (WordPress)!

In order to achieve such a “simple” goal, the demo web application must be deployed with the following minimum requirements:

A single main site.
A minimum of two physically separate WordPress instances on each site for redundancy and load balancing purposes.
Auto-spawning the other WordPress instance when the existing instance stops or experiences a failure.
The database instance (MySQL) must also be running in redundancy mode. It should automatically failover to the active standby instance when necessary.
Centralized dataspace. Shared resources must be accessible and available to all running WordPress instances. For example, a document uploaded by a user via WordPress should be synced across all running WordPress instances.

Fortunately, Alibaba Cloud provides a list of services and tools for us to fulfil these requirements. In this post specifically, we’ll utilize Cloud DNS (DNS), Auto Scaling Group (ASG), Server Load Balancer (SLB), Elastic Compute Service (ECS), Relational Database System (RDS), Object Storage Service (OSS), and Object Storage File System (OSSFS) tools to achieve our goal. The high-level architecture diagram for the deployed WordPress would be as following:

2. Deployment Procedures

We’ll briefly introduce the components shown in Figure 1.0 before diving into each individual configuration. As stated earlier, you would have to refer to other sources such as Alibaba Cloud online documentation for detailed explanation. The following table summarizes the description and usage of such components according to our deployment context:

Table 1: Cloud Components in Demo Deployments

Site / Region
Geographical area of the data center
1. Site for deployments

Physically isolated data center within a region
2. Used for redundancy purpose for Database

Cloud DNS
Domain name resolution and management service
3. Purchase new Domain Name4. Route traffic to WordPress instance

VPC (Virtual Private Cloud)
Virtual isolated network built for private usage
5. To group and separate resources6. To setup security control7. Assign network IP range

Virtual routing table
8. To configure network route for provisioned resources

Segment virtual networks into subnets
9. To separate resources into group within specify Zone via subnet

Server Load Balancer
Distribute traffic to instances according to configured profile
10. To load balance (round robin) request among provisioned WordPress instances

Auto Scaling Group
Automatically adjust computing resources based on scaling configuration
11. Serves as watchdog to maintain the defined healthy running WordPress instances

Elastics Computing Service (WordPress instance)
Compute and process unit provided by Alibaba Cloud
12. To install and run WordPress. This is the application layer of demo deployment

Relational Database Service (MySQL)
On-demand managed database service
13. The DB for WordPress application

Object Storage Service
High availability and fault tolerance object storage
14. Centralized storage for files/objects uploaded by user via WordPress application

The workflow below describes the general steps involved in deploying a web application on Alibaba Cloud.

2.1. Identify Service Region

It’s important to decide on the region where an application should be deployed. The general considerations shall include the following:

Cost: The mother of all considerations. Yes, the cost may vary according region.
Service availability in the region? It’s not uncommon that some regions provide additional services that aren’t available in another region — you have to test to find out!
Main target users’ geographical location. It’s definitely better for user experience if the application is physically closer to the customer, resulting in shorter latency.
Rules & Regulations. Is it legally OK for the application to be hosted in the selected region?
Number of Availability Zonez. Occasionally, we need to improve application availability by deploying redundant applications in a different zone. Since I’m based in Southeast Asia, I will be looking at the Singapore and Kuala Lumpur data centers. At the time of writing, “Asia Pacific SE 3 (Kuala Lumpur)” has only a single zone while “Asia Pacific SE 1 (Singapore)” has dual zones.

After consideration, we’ve decided “Asia Pacific SE 1 (Singapore)” will be the main region for our demo deployment.`

2.2. Plan for Network Configuration

We have to consider the number of nodes that might potentially be running in the deployment. Each running node is subject to one private IP, and we don’t want to end up running out of private IPs for nodes in the future!

There are three type of CIDR blocks allowed by Alibaba Cloud for a VPC:,, According to Alibaba Cloud documentation, the first & last three IPs of CIDR block would be reserved by system usage, and hence the maximum number of private IPs for each CIDR block are: = 16777212 (16777216 – 4) 1048572 (1048576 – 4) = 65532 (65536 – 4)

You may also wonder, why don’t we just use the biggest CIDR block allowed to avoid potentially running out of private IP in future? The following might help you to reconsider that thought:

Bigger CIDR block may increase the complexity when dealing with IP-related configuration, such as subnet creation, route configuration, security group configuration, and etc.
If the above is not a valid show-stopper for you, then consider this: “VPC peering (interconnect)” with other VPCs doesn’t allow overlapping CIDR block. In other words, it’s not possible to peer with other VPC once you using as CIDR block!

After consideration, we’ll use “” for our demo deployment as there will only be a few running nodes within the VPC.

II. Subnet

In Alibaba Cloud, VSwitch could be used to further segment the VPC CIDR block into a subnet with a smaller CIDR block. The general consideration for segmenting subnets includes the following:

Logical grouping of instances according to functionality. E.g. grouping the application in one group and RDS in another group for easier maintainability. For example, disabling a group of instances by deleting VSwitch attached to the group.
Simplify security group profile configuration. Security rules based on the subnet CIDR block level rather than the individual instance’s IP are cleaner.
Enable Auto-scaling and Server Load Balancer monitoring and actions on a specific subnet.
Redundancy on resources. It’s possible to seamlessly failover to a different subnet that’s based in a different zone when the existing subnet’s zone encounters failure.

After consideration, we’re grouping WordPress in one subnet ( and the RDS instance in another subnet (

2.3. Configure Firewall (Security Group)

Network access at the instance level could be limited via Security Group in Alibaba Cloud. The Security Group Rule configuration could be very granular, up to the per-protocol, per-port, per-client IP level. Hence, to avoid unauthorized access to the instance, we need to consider the following:

Always comply with least privilege practice. Restrict access to the required client only.
Intranet or/and internet connectivity. You can use Security Group to create a “private subnet” (no internet usage) by only allowing access for inbound intranet. In addition, a NAT gateway could be used to allow the instance in the private network to access outbound internet services.

Since we are running WordPress on Linux instances, we would at least allow an inbound rule for Port 80 (HTTP) and 22 (SSH) in Security Group. Besides that, all outbound traffic would be allowed since there’s no specific requirement on that.

2.4. Configure the Application Layer

This could be the trickiest and most uncertain decision we have to make when deploying web applications. As stated earlier, this post will not discuss an application’s capacity requirements and hence, choosing a proper instance type is out of scope of this post. Anyhow, the following considerations may assist in deciding on an instance type generally:

Always start with the Pay-As-You-Go model if you have no idea on the instance type performance nor the actual capacity requirement. This pricing model allows you to experiment with different instance types freely without a lock-in period.
You have to understand the nature of the to-be deployed application’s constraint. Is the application CPU-bound or IO-bound? You have to answer that in order to determine a proper instance type with the best cost efficiency.
Deploy with one step down instance whenever possible. If an application’s capacity requirement could be satisfied with a ‘X’ instance of a instance family type Y, it might be better if we deploy the application with two one step down instances (e.g. X/2) from the same family type for the same amount of workload. This will increase the availability of the application. For example, we can still process 50% of the workload if any the X/2 instance goes down compared with 100% downtime if the X instance is down. Of course, this approach is subject to the design and usage of the application.
Decide on other usage parameters e.g. network type, network bandwidth, operating system image, and etc. accordingly.

Since this is a demo deployment without any real production usage, we’ll go for the lowest (cheapest) ECS instance configuration. For example: General Type n1: 1-core, 1GB, Ubuntu 16.04 OS, Ultra Cloud Disk 40GB, and 1Mbps network bandwidth.

2.5. Configure the Database Layer

Generally, we have to decide between using self-managed DB instances (self-install DB at ECS instance) like what we usually do for on-premises solutions, or using fully managed RDS DB services like ApsaraDB. Again, it’s out of this post’s scope in comparing or benchmarking the two variants of database services. These guidelines may assist in choosing database variants generally:

Do you have available resources for managing and operating database instances? The management and operational tasks may include backing up data files, OS/DB patching, access control on the host machine, etc. If the answer is no, then maybe a fully managed RDS DB is preferable.
Do you need a dedicated database instance? If your database is small and the workload is minimal and able to co-exist with the application (e.g. in the development environment), perhaps the self-managed variant is preferable due to cost efficiency.
Do you need access to the underlying host for the database instance? For example, if you need to perform specific OS/DB configuration for performance-tuning purposes, then the self-managed variant shall be employed.
Does the fully-managed database service provide the DB type that you required? If no, then the answer is straightforward, go for a self-managed DB variant.
If you are concerned about possible cloud vendor lock-in, then you might want to avoid the fully-managed variant as some RDS implementations could be cloud vendor specific.

Since there is neither manpower to maintain the demo database nor any specify DB configuration, we’ll deploy the demo DB with ApsaraDB RDS – MySQL. In addition, this variant allows us to make a redundancy (active standby) database easily (with just a click!).

2.6. Identify Centralized Storage

Eventually, there could be multiple concurrent WordPress applications running on physically separate ECS instances. Each instance might generate and store certain files/image/media resulting from users’ operations. Obviously, objects that are generated by any instance would have to be synchronized across all other running application instances. One of the approaches to achieve this mentioned synchronization is through centralized storage. Objects generated shall be synchronzied to centralized storage and followed by synchronization between centralized objects and other running instances. Additionally, the centralized storage must always be available and any failure of any instance shouldn’t impact the availability and durability of centralized storage.

Alibaba Cloud provides a couple of fully managed services which could serve as centralized storage:

Object Storage Service for objects: It’s ideal as centralized object storage due to the guaranteed high availability (99.9%), scalability, and fully-managed nature. Specifically to this demo deployment, each running WordPress instance shall sync with a dedicated common Object Storage Service’s bucket. By employing such a syncing mechanism, all the running WordPress instances would have an identical set of created objects.
ApsaraDB Redis for application state: Sharing state (e.g. shared value, parameter) among running instances is possible via fully-managed ApsaraDB Redis.

A dedicated bucket in Object Storage Service would be created and used to store objects created as a result of user operations. All running WordPress instances shall sync with the relevant bucket for the list of created objects.

2.7. Plan for HA, FT, and HSF

To achieve HA, FT, and HSF in Alibaba Cloud, a web application shall be fundamentally designed as stateless and horizontally scalable. Any dependent application’s state or data shall be decoupled from the web application and migrated to centralized storage as discussed in the earlier section.

Services listed below could be employed for deploying a HA, FT, and HSF web application:

Cloud DNS: It’s possible to configure ‘A’ record types for instances hosted in different regions. It’s really useful during failover scenarios whereby an ‘A’ record of a standby instance could be enabled with one click, resulting in network traffic diversion to the standby instance.
Auto Scaling: It can be used to auto-spawn instances in a desired Zone when running instances go down or become unhealthy.
Server Load Balancer: This service would provide a health check on configured instances and report their status to the Auto Scaling service for further action. Besides that, this service would also load balance workload among running instances.
ApsaraDB RDS: RDS MySQL provides the multi-zone availability feature with just a click. It will really ease the effort required to provide HA and FT for the database.

The demo deployment will utilize DNS to route traffic to WordPress instances, Auto Scaling to ensure a minimum of two running instances in each region, and Server Load Balancer to provide a health check as well as to load balance workload. Last but not least, the Multi-Zone availability feature on RDS MySQL is enabled to provide HA and FT for the database.

2.8. Testing and Run

To test the HA and FT behavior, we may stop a running ECS manually and observe the action triggers by the auto-scaling service. If the auto-scaling has been configured properly, a new instance would be spawned automatically. Besides that, we may also manually turn off the RDS DB instance to observe the Multi-Zone redundancy failover happening. The best thing is that these actions are automatically handled by the respective services without any manual intervention. Shown below is our deployed WordPress:

Continue reading %Deploy Fault Tolerant, Load Balanced Web Apps on Alibaba Cloud%

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *