Before you get started, you might like to check out this short video on High Availability.
Takeover is the Engine Yard failover process for recovering from failure of an application master instance.
Takeover requires that you have at least one application slave in your environment.
The environment's application takeover preference determines the method in which a takeover occurs. For example, it's possible to disable automated takeovers; in this case you need to manually initiate app takeovers instead.
If your application master has failed and takeover is happening, you:
Receive automated email notification from Engine Yard.
See a takeover message for the environment on your dashboard.
Takeover occurs when Engine Yard detects that your application master is unable to reliably respond to requests. For example, this can happen because of an Amazon EC2 issue or because the instance froze. If the instance does not recover within a short time, Engine Yard does the following:
If your environment does NOT have elastic IP (EIP) addresses on your app slaves, Engine Yard assigns the old master’s IP address to the new master. If you have EIP addresses on your app slaves, see EIP Addressing.
Replaces the app slave instance that was promoted. (The new application slave uses the same version of the stack as the other instances in that environment.)
Deletes the old app master (unless you have configured to detach and create a utility instance instead; see app master takeover preference for more information).
If you have EIP addresses on your app slaves and an app slave is promoted to an app master, the new app master will retain the IP address of the promoted app slave. Below is an example of this EIP assignment.
Important: If your domain points to the EIP address, this addressing assignment will crash your application because the EIP does not point to any instances.
App master IP address: 126.96.36.199
App slave IP address: 188.8.131.52
App master IP address: 184.108.40.206
The Engine Yard dashboard shows 220.127.116.11 as available and unused.
Action required to apply cron jobs and custom Chef recipes
The new application master is the same as the old one with two important exceptions:
Cron jobs are not set up.
Custom Chef recipes are not applied.
To apply cron jobs and custom Chef recipes to the new application master
In your dashboard, click the environment name.
Click Apply. This applies/re-applies configuration, including cron jobs and custom Chef recipes as appropriate, to all instances in the environment.
Do the following to prepare your environment in case of application master failover.
To prepare your environment
Do keep a spare application server. Make sure that your cluster has one spare application server. For example, if you need three application servers to serve the everyday traffic, put four application servers in your cluster. Without a spare, your site might fail or slow down under load during the takeover.
Don’t keep important data on the application server alone. For example, if your application stores user-generated content, consider an online storage web service (such as Amazon S3).
In some situations you might want to manually initiate a takeover. For example, if you simulate a takeover, you can test your recovery processes.
To manually trigger a takeover
From the application slave, you need to send a notification to stonith several times, which indicates a failed connection to the application master. The instance you run this on will attempt to become the new application master.
i-abcd1234 ~ # for i in $(seq 1 6); do stonith notify; done