Skip to content


Website Backup to Windows Machine with Cygwin, rsync, and SSH

Overview

  1. Dump all databases to be backed up to file.
  2. Transfer database files and web files to local machine.
  3. Profit!

Databases

Dreamhost will let you login to a shell account, which will let you dump your databases to files using mysqldump. Most webhosts will let you do this, but each is a bit different. I created a directory db in my home directory  on my webserver to hold the DB dumps. In that directory I created a shell script with a line for each database to dump that looks something like this:

mysqldump --opt -uusername -ppassword -h db.scandalon.com dbname | bzip2 > /home/username/db/techblog-`date +%Y-%m-%d`.sql.bz2

A few things:

  • The password is visible as plain text. This is not very secure, but only root users can get at it, and they would be able to get at my databases anyway. I found that I couldn’t create new MySQL users and change their privileges in my databases on Dreamhost for some reason, otherwise I would have created a read-only user with a simple (or no) password and used that to do the dump.
  • The SQL dump is piped into bzip2 to greatly reduce its size. It’s like Gzip, but much better for text.
  • The filename has a weird looking string at the end. That’s just a date stamp placeholder. Looks like this when outputted: techblog-2009-10-01.sql.bz2
  • If you’re having trouble executing your script, make sure it is executable (chmod 700).

Next you want to schedule a cron job to run the DB dump script every day. Dreamhost has an easy cron interface under the “Goodies” menu. It’s an improvement to the asterisks and crontab and crap.

Transfer

In order to transfer the DB dumps and web directories, I used rsync and OpenSSH. These are two ubiquitous Linux programs that for some reason haven’t been effectively ported to Windows. Most people use Cygwin to run them, which is a pseudo-Linux environment that runs on top of Windows. SSH is known to be secure, and rsync is the standard for automated backups for a number of reasons–most important to me reliability. If your connection drops in the middle of a 2GB file, rsync will resume where it left off next time it is run.

Cygwin

Download the Cygwin installer from here. When you run it, it will ask you where to install it. For some reason, Windows seems to get confused sometimes if you install it in Program Files, so just install it to C:\Cygwin. It will list a thousand “packages” (programs) that you may or may not wish to install. Click the “View” button to list them all in alphabetical order. Find these packages and click on the “Skip” next to them (thereby selecting them for installation): openssh, rsync. It may auto-select some other packages, don’t sweat it; they’re necessary.

When launched, Cygwin starts you at a command prompt in “/home/Tremelune”. I have no idea where that directory really exists on your Windows drive. I can tell you that the root directory (/) translates to C:\Cygwin. To access your C: drive or X: drive, Cygwin has a mount point at /cygdrive/c or /cygdrive/x or whatever the drive letter is. Network shares are available using forward slashes in place of backslashes, ie: \\Harrier\Backup mounts to //Harrier/Backup in Cygwin.

rsync

To get rsync configured (as well as SSH keys), I used this guide and this Dreamhost guide. Here’s a summation…In Cygwin, run a command similar to this:

rsync -avzP username@scandalon.com:/home/username/db /cygdrive/c/backups

That’ll sync the db directory on the remote server with the db directory locally (in the backups dir–If it’s not there, it will be created. If it is there it will be appended to, not overwritten). Here’s what the switches do (there are many you can see by running rsync –help, but these are probably the ones you want):

  • a – Only transfers changes in file, not the entire file (if possible).
  • v – Verbose. Prints out what it’s doing.
  • z – Compresses files on-the-fly for transfer. Just speeds things up transparently.
  • P – Keeps partial pieces of a file (that fail?), and shows progress of download.

SSH Keys

Since we intend to automate this backup, we don’t want to have to type our SSH password in every time this runs. In order to accomplish this and maintain security, we’re going to generate public and private SSH keys. The idea is that the remote server has the public key and the local machine has the private key, and instead of asking for a password, SSH just matches them and calls it friendly.

To create the keys, run something like this in Cygwin (Don’t enter a password when it asks you for one):

ssh-keygen -t dsa -b 1024 -f /cygdrive/c/dreamhost-key

This will create two key files, dreamhost-key and dreamhost-key.pub in your C: directory. You can name them whatever you want…The two files simply contain scrambled text (the key).

The remote site running the rsync daemon (ie, Dreamhost) looks for SSH keys in a special text file in a hidden directory in your home directory. Probably something like this:

/home/username/.ssh/authorized_keys

You want to add the text in dreamhost-key.pub to authorized_keys. If you don’t have an authorized_keys file in your ~/.ssh directory, you can simply rename your public key file to authorized_keys and be done. If you have no .ssh directory in your ~ directory, just create one. ~ means home, ie /home/username.

Now we must modify the above rsync command slightly to utilize the private key we just created. Looks like this now:

rsync -avzP -e "ssh -i /cygdrive/c/dreamhost-key" tremelune@ensaster.com:/home/username/db /cygdrive/c/backups

You can put the key anwhere, just make sure you’re pointing to it correctly in the command above.

Automation

At this point, we’re doing everything we need to do to back this stuff up, but we’re doing it manually. The easiest way to automate the transfer is to add the rysnc command(s) to a bash shell script, run the bash shell script in Cygwin from a Windows batch file, and then create a Windows task to run the batch file. Can you believe no one’s written a program to do all this automagically yet?

  1. Create a shell script (backup-sites.sh) that will rsync to your db and web directories to download all the files you want to back up. ie, just add the rsync command from above to the .sh file. It can connect to any number of remote sites/directories. Just add a new line for each.
    Note: Cygwin wants this to be a Unix file with Unix carriage returns. If you see Cygwin freaking out about a missing “\#015″ directory, then your text file is PC, not Unix. You can fix it by doing a Save As… with TextPad.
  2. Create a batch file (backup-sites.bat) to launch Cygwin and run your shell script. It should look like this (I modified the standard Cygwin batch file):
    @echo off
    C:
    chdir C:\Cygwin\bin
    bash --login -i /cygdrive/c/backup-sites.sh
  3. Create a Windows task: Start > Control Panel > Scheduled Tasks > Add Task. Should be pretty straightforward. Just select the batch file (backup-sites.bat, perhaps) to run.

Snapshots

If a site of mine gets bungled and I don’t notice it for a few days, then my backup process will backup the bungled site. I don’t want that. I’d like to be able to get a pre-bungled copy and use that to restore my site.

To achieve this I do a daily backup of my web directory (as described above), but then I bundle it up into a compressed archive (with tar/bzip2) with a timestamp. I leave the synced directory as-is so that rsync doesn’t need to download every file again, and now I have a preserved copy of the site’s files for that given day. I can keep as many around for as long as I like.

The commands are in the backup-sites.sh file on my local machine that is running Cygwin. After the rsync commands, I have something that looks like this:

cd /cygdrive/c
mkdir snapshot-$timestamp
tar -cjvf snapshot-$timestamp/web.tar.bz2 backups

This will create a timestamped directory, and create an archive with all of the days synced files in it. It will also leave the rsynced backups directory alone, so that it will be there for syncing the next time. Tar and bzip2 are installed with Cygwin automagically. The switches are for:

  • c – Create file. You need it any time you intend to create an archive.
  • j – Compress the archive with bzip2.
  • v – Verbose. Tell me what you’re doing.
  • f – File. I dunno why its not obvious, but you need it for creating archives.

Posted in Techmology.