“Nothing is failproof”
To be clear, we don’t host any of our clients’ sites. We recommend hosting providers we trust and respect, and our clients often go along with our recommendation. So when a site goes down we often aren’t technically responsible or liable for much of anything. That said, if I can provide some form of service that provides a solution for when a server goes down for an extended period of time, I want to do just that. It's never a matter of if a site goes down. It's always a matter of when.
I looked into a few options for automating this process. As with any technical endeavor, there are many paths to the same result. As such, I’d like to provide you, Dear Reader, with the key ingredients to a failover solution. How you automate the ingredients will be up to you. Toward the end here I’ll give some suggestions that should get you starting in the right direction.
Before diving in, here are some assumptions I am making in this article:
- Your website or app is setup to be portable so the paths to your files being different across environments won’t break anything.
- You are somewhat comfortable with the command line. (I’d consider myself a novice and I get around this stuff just fine.)
- You will try figuring this out and solving problems on your own—before posting comments here asking why something doesn’t work.
- You don’t mind clicking a few affiliate links along the way.
Okay, let’s dive in.
“It's never a matter of if a site goes down. It's always a matter of when.”
At the risk of being overly obvious, I have to list the first key ingredient as the Production Server and Failover Servers. After all, without the Production Server we have nothing to backup, and without the Failover Server we have nowhere to backup to. Your Production Environment is already decided, as you have a live site you want to setup a Failover for. When it comes to the Failover Server, we currently use Digital Ocean. Digital Ocean is an awesome tool to have for the quick provisioning of test or temporary environments. I’ve only used them in this context so far.
Second is ssh access to both environments. Most of this setup (for us) is done over ssh connections via the command line. If you aren’t comfortable with the command line, you might want to brush up on that before moving forward. This article won’t teach you anything about the command line.
The third ingredient is the commands or scripts we’ll be running. At their core they are very simple. The possible variations are many though. So understanding the commands will be helpful to you. When in doubt, learn more by Googling the command or reading the man pages.
With most websites that we build and support, there are two main things to backup. The first is the files and the second is databases.
Our first step will be to create a simple export of the necessary databases. In most of our work we only have between 1 and 3 databases to backup. We use the following command to do that:
mysqldump -u[username] -p[password] --add-drop-table --add-drop-database --databases dbname [db2name ...] | gzip > [/path/to/project/fileName].gz
Let’s go one piece of this at a time just to make sure it makes sense.
This is simply the database backup program that ships with mysql. More information available in the MySQL documentation on
mysqldump. The flags we’re using are all documented here.
-u flag is where you put your MySQL username that has access to the databases you need. You would be replacing
[username] with—well—the username.
This flag is exactly what you would expect, assuming you read what the
-u flag is above.
These flags do exactly what they say. They add queries to drop all tables and databases they encounter. We do this because we want a carbon copy of the database, rather than lingering old data that may have been removed in Production.
As I write this I realize that using
--add-drop-table is probably redundant considering we use
--add-drop-database. I’ll let you decide though.
--databases dbname [db2name ...]
The final piece of the
mysqldump command is to specify the databases we need. If you need one database you just put the single name. Any additional database names should be separated by a space.
| gzip > [/path/to/project/fileName].gz
mysqldump command cranks out a text result of SQL commands most commonly saved as a .sql file. We pipe the result through to gzip compression setting the path of our new file to live at the root of our project files. Name this file whatever you like. We’ll use it later. I recommend you save it one level above your web root directory so it is not accessible through the browser or to unwanted users.
That’s the extent of our work on the Production server. Run this command and you should see your new compressed database export sitting next to the rest of your site’s files.
Our next step is to get the files backed up to the Failover Server. I mentioned earlier that we use Digital Ocean for this at Focus Lab. I won’t go into setting up a box at Digital Ocean, so get to googling if you need help there. My buddies at ClearFire wrote a little something about this if you want to start there.
Once you have your Failover Server ready and accessible, log in over ssh and change directories over to your project’s root. Our next command is going to knock on the digital door of our Production Server and copy everything over.
rsync -rltpvh --ignore-existing [user]@[host]:[remotePath] [localPath]
Similar to our last section let’s break this down piece by piece.
This program is for file syncing and transferring. It’s a beautiful thing. This is the heavy lifting of the process once it’s automated. Google around about rsync if you aren’t familiar with it.
This is the collection of flags I chose for this syncing script. It’s almost identical to using -a (“archive”) but there are a few differences. In the interest of this being a learning experience, I’ll let you dig around to see what these flags actually do.
This flag makes our process a bit more efficient by ignoring files that already exist on the Failover Server. Simple concept and will save on bandwidth and transfer duration over time.
This is pretty self explanatory. You need to replace
[user] with the username you’re using to access the Production Server. The
[remotePath] is the absolute path to the directory you’re copying while
[localPath] is respective to the Failover Server.
Once you run this comment you’ll be prompted for the password of the
[user] for the
[host]. Enter the password and your syncing begins. Watch and enjoy.
When the transfer completes you’ll have a carbon copy of your Production Server files on your Failover Server. This includes the database backup we created previously. Our final step is to import that data. We need to unzip the compressed file and dump the data into MySQL.
zcat [textfile] | mysql -u[dbuser] -p[password]
Again, let’s look at both pieces separately.
Unzipping the compressed. This produces the SQL commands we need to run. So we’ll pipe this into MySql.
| mysql -u[dbuser] -p[password]
Similar to our export, we’re just defining connection details to MySQL. These are probably different credentials from your Production Server.
Step by Step
Considering the main ingredients and the commands above, this is what the step-by-step process would look like:
- MySQL dump/backup on the Production Server
- Run rsync from the Failover Server to sync files from the Production Server
- Unizip and import the Production database(s)
So far, everything I’ve shared is assuming you’re manually running these commands. That’s not horrible, but certainly not ideal. If you have a site go down for an extended period of time, you want an automated recovery process when possible.
Automating this process is more depth than I intended for this article. It varies by person, by team, and by environment. That said, I’d like to give some suggestions to help you move toward automating this process.
The first step in automating this process is to know when a site goes down. The simplest way to monitor this is to setup a service such as Pingdom to notify you when a site isn’t accessible in certain ways. Most of these service providers give you the ability to send a request to a specified URL endpoint which could trigger any part of the process you design.
Alternatively you can use a DNS based service that checks for your site to return specific content or status codes. Upon enough downtime, the service would automatically change the DNS to point to your Failover Server instead of your Production Server.
We’ve used DNS Made Easy for this in the past.
Production Server Cron Job
The database backup should probably be automated to run at an interval that adequately copies data for your particular project. Most of the sites we operate are sufficiently backed up nightly.
If you don’t know how to setup a cron job, ask the mostly-trusty Google. I’m sure you’ll find some articles and/or StackOverflow threads that will get you started. If you can’t figure it out, try reaching out to your hosting provider’s support team for some assistance.
You may recall that we had to manually type the password of the Production Server ssh user when using rsync. You can automate this step by setting up SSH keys so the two environments can comfortably talk to one another without saying the secret password each time.
Continuous Integration Servers
An alternative to running multiple servers would be to run your own Continuous Integration (CI) Server that runs checks and automates the creation of a Failover Server as needed. I’m not well versed in the CI options here, but we have used Jenkins CI at Focus Lab. Going the CI route would give you the ability to put your checks and subsequent actions in a single place.
For example, rather than relying on a nightly rsync process for all files, you could:
- Use rsync to transfer database backup somewhere nightly
- Use something like Pingdom to monitor uptime
- Automatically provision a new server at Digital Ocean when a site goes down for too long
- Deploy a git repository to the new server
- Dump the latest database into it
- Manually switch the DNS once that’s all complete
This process is a different article in itself though. One I'm not well equipped to write.
Nothing is Failproof
Automation can be dangerous if you don’t implement the proper “checks” along the way. What if your database backup script never returns the result you expected? What if rsync is failing every time it’s run? If you automate this process you would be wise to go a bit farther in the scripts to confirm the results are what you expected.
The last thing you want is a beautiful automated setup that you think works great, only to find out during a site outage that it was never running properly.
I hope this helps you get started in setting up your own failover solution. There are so many possibilities out there. I know where we landed. But how about you? What's your plan?