asked this on March 11, 2012, 7:18 PM
See updated answer below.
Why does EY recommend adding a database slave to production environments? See:
If you can't fail over to the slave, what's the benefit?
The primary benefit of database slaves is to be able to offload database reads and dump style backups (eybackup) to this host. Neither of these occurs automatically on adding a slave but the slave database does automatically get added to your database.yml and there is a cookbook in the ey-cloud-recipes repository for managing the backups.Expanding on this important issue a bit, while database failover is not currently a customer facing option it is available as manual option to EY Data Engineers. This is mainly because MySQL replication is both asynchronous and statement based which carries its own risks. In most cases the terminate/rebuild from snapshot option is the safest, and fastest method to bring your database back online. The primary area that is an exception to this is for pre-planned instance size or disk upgrades in environments with either very large disk volumes or a large number of server instances.
Tyler, can you expand on your comments a bit in regards to the difficulty/risk of cutting over auto-magically?
Specific to MySQL
The asynchronous part of the risk is that the replica may not be caught up with the master at any given time. To add to this, the value of `Seconds_Behind_Master` is actually only the difference between the timestamp of the statement currently being processed by the SQL_thread and the timestamp of the statement being read from the master by the IO_thread. The actual way to see if a replica is caught up is to compare the master_log_position and master_log_file (from the master) with the exec_master_log_pos and relay_master_log_file (from the slave); but when comparing these there isn't a "true" way to see how long the slave might take to catch up if there is a difference.
The statement based risk is related to the fact that a replica runs the exact same statement that the master instance runs when changing data. In addition to certain functions like UUID() it is possible to write non-deterministic SQL which could result in the replica changing data differently than the master does. For example `update widgets set valve='open' where valve='closed' and name like 'op%' limit 32`. The master and replica can each follow their own access path for this where clause and if there are more than 32 widgets that match the replica could update a completely different set of widgets. On builds 5.1 and higher we do enable mixed mode replication by default to help mitigate these issues somewhat but the risk is still present. The MySQL manual also goes into these issues in greater depth.
I'd like to have more informations about postgresql, specifically. The following pages seem to indicate we can failover from a master db to a slave db:
Failing over from a master DB to a slave DB is possible, for both PostgreSQL and MySQL. Our Support teams can assist you with this process or provide tools for your team. You can reach out to them through a Support ticket and they'll answer any additional questions about best practices.
Failing over from a master to a slave DB is now available as a GA feature. Additional details on this feature and its use can be found here: https://support.cloud.engineyard.com/entries/90444706-Promote-Datab.... If you have any questions or need additional assistance please open a ticket with our support team.
Support Software by Zendesk