asked this on March 11, 2012 07:18 PM
No, failover to a database slave is not currently an option on Engine Yard Cloud. If you do run into issues with your master database, we suggest you recreate your master database instance from a snapshot. This is quicker than failing over and does not pose the risk of any data loss should replication be out-of-sync.
Engine Yard takes fresh snapshots of your /db volume when you terminate an instance. If you have to rebuild your database instance, you can terminate it and recreate it from the most recent snapshots.
Why does EY recommend adding a database slave to production environments? See:
If you can't fail over to the slave, what's the benefit?
The primary benefit of database slaves is to be able to offload database reads and dump style backups (eybackup) to this host. Neither of these occurs automatically on adding a slave but the slave database does automatically get added to your database.yml and there is a cookbook in the ey-cloud-recipes repository for managing the backups.Expanding on this important issue a bit, while database failover is not currently a customer facing option it is available as manual option to EY Data Engineers. This is mainly because MySQL replication is both asynchronous and statement based which carries its own risks. In most cases the terminate/rebuild from snapshot option is the safest, and fastest method to bring your database back online. The primary area that is an exception to this is for pre-planned instance size or disk upgrades in environments with either very large disk volumes or a large number of server instances.
Tyler, can you expand on your comments a bit in regards to the difficulty/risk of cutting over auto-magically?
Specific to MySQL
The asynchronous part of the risk is that the replica may not be caught up with the master at any given time. To add to this, the value of `Seconds_Behind_Master` is actually only the difference between the timestamp of the statement currently being processed by the SQL_thread and the timestamp of the statement being read from the master by the IO_thread. The actual way to see if a replica is caught up is to compare the master_log_position and master_log_file (from the master) with the exec_master_log_pos and relay_master_log_file (from the slave); but when comparing these there isn't a "true" way to see how long the slave might take to catch up if there is a difference.
The statement based risk is related to the fact that a replica runs the exact same statement that the master instance runs when changing data. In addition to certain functions like UUID() it is possible to write non-deterministic SQL which could result in the replica changing data differently than the master does. For example `update widgets set valve='open' where valve='closed' and name like 'op%' limit 32`. The master and replica can each follow their own access path for this where clause and if there are more than 32 widgets that match the replica could update a completely different set of widgets. On builds 5.1 and higher we do enable mixed mode replication by default to help mitigate these issues somewhat but the risk is still present. The MySQL manual also goes into these issues in greater depth.
I'd like to have more informations about postgresql, specifically. The following pages seem to indicate we can failover from a master db to a slave db:
Failing over from a master DB to a slave DB is possible, for both PostgreSQL and MySQL. Our Support teams can assist you with this process or provide tools for your team. You can reach out to them through a Support ticket and they'll answer any additional questions about best practices.
Support Ticket System by Zendesk