S3 Backup Utility!

TL;DR

Get the S3 Backup Utility using:

pip install s3backup

Example configurations etc. on GitHub

Full Description

As part of our backups at Audiu I wanted a simple but flexible cross platform backup script to prepare backups and ship them to S3. I require email notifications to let me know if it worked or not, and I don’t want it to upload duplicate backups if nothing in the backup set has changed which would waste storage space (and money!) on S3.

I couldn’t find a small scripted utility to fulfil these requirements so I decided to implement one myself. As a bonus, it gave me a chance to improve my Python skills a bit as well as providing something back to the open source community. It took a couple of evenings to create the s3backup utility to meet my needs.

In essence, any number of backup “plans” can be defined. A plan is defined as:

  1. Name
  2. (Optional) Command to be run first
  3. Source files
  4. Output file prefix

Name – This is self-explanatory. However, it is worth mentioning that the file hashes of previous backups (to detect changes) are keyed against the name

Command – This can be used to invoke any simple command directly, but for more advanced use cases it can simply execute a script which could perform any number of complex commands to prepare the files prior the backup. e.g. Inform a program to dump its data via command line utilities, then copy that data into a folder to be collected by the backup script

Source files – This can be an array of strings, or just a string, and it supports ant-style patterns (/some/directory/**/*)

Output file prefix – What the output file archive will be prefixed with (when its uploaded it will be appended with the timestamp when the backup archive was created)

Cross Platform

Obviously one of the challenges with cross platorm bits is managing file path formats. In the case of Linux and Mac OSX this is much easier when entering into JSON as it uses forward slashes. When using the Windows style backslashes, JSON requires them to be escaped. To ease the pain, forward slashes can also be used for Windows paths in the src entry, but unfortunately not within the command

File Hashing

After the output file has been generated for a backup set the MD5 digest for the archive is calculated. This is compared against a previous hash (if present), and the backup archive is only uploaded to S3 if there is a difference. The file is a simple flat file store of Plan Name=Hash

Main Backup=a3b3419cdabea423
Websites Backup=dbca397d9a3e09dc

Example Configurations

Linux Configuration

{
  "AWS_KEY": "this is a key",
  "AWS_SECRET": "this is a secret",
  "AWS_BUCKET": "this is a bucket",
  "AWS_REGION": "this is a region",
  "EMAIL_FROM": "[email protected]",
  "EMAIL_TO": "[email protected]",
  "HASH_CHECK_FILE": "plan_hashes.txt",
  "Plans": [
    {
      "Name": "Main Backup",
      "Command": "/home/bob/backups/backup_prep_script.sh",
      "Src": "/home/bob/backups/data/**/*",
      "OutputPrefix": "main_backup"
    },
    {
      "Name": "Websites Backup",
      "Src": [
        "/var/www/html/somesite.com/**/*", 
        "/var/www/html/anothersite.com/**/*"
      ],
      "OutputPrefix": "websites"
    }
  ]
}

 

Windows Configuration

{
  "AWS_KEY": "this is a key",
  "AWS_SECRET": "this is a secret",
  "AWS_BUCKET": "this is a bucket",
  "AWS_REGION": "this is a region",
  "EMAIL_FROM": "[email protected]",
  "EMAIL_TO": "[email protected]",
  "HASH_CHECK_FILE": "plan_hashes.txt",
  "Plans": [
    {
      "Name": "Main Backup",
      "Command": "C:\\users\\bob\\backups\\backup_prep_script.cmd",
      "Src": "C:/users/bob/backups/data/**/*",
      "OutputPrefix": "main_backup"
    },
    {
      "Name": "Websites Backup",
      "Src": [
        "C:/inetput/somesite.com/**/*", 
        "c:/inetpub/anothersite.com/**/*"
      ],
      "OutputPrefix": "websites"
    }
  ]
}

 

Example Scripts

MySQL Database Exporting
MySQL needs to have databases dumped via the mysqldump command

#/bin/bash
/usr/bin/mysqldump -u root -p password --all-databases --single-transaction | gzip > /home/bob/backups/db_backup.sql.gz

src would be set to /home/bob/backups/db_backup.sql.gz

TeamCity Data Exporting
TeamCity needs to have a small script executed, maintainDB which is documented here. It can be configured to backup the dataset to a specific location, which the src can be directed to.

:: The -D option can also be used for the database if required!
C:\TeamCity\bin\maintainDB.cmd backup -C -L -P -F C:\Backups\TeamCity\TeamCityBackup

src would be set to c:/Backups/TeamCity/*

Octopus Deploy Backup Collection
Octopus deploy outputs backups every 4 hours into a folder. Each file is timestamped. Unfortunately, if we were to point the src location to this folder, our backup will continually backup all Octopus Deploy backups ever created. Instead its better to use a small Command script to take the latest file from the folder and copy it to another location. The src can then be pointed to that location.

:: Variables
SET BackupPath=C:\Octopus\Backups
SET StoreToFile=C:\Backups\OctopusDeploy\OctopusDeployBackup.octobak

FOR /F "delims=|" %%I IN ('DIR "%BackupPath%\*.octobak" /B /O:D') DO SET NewestFile=%%I
copy "%BackupPath%\%NewestFile%" "%StoreToFile%"

src would be set to c:/Backups/OctopusDeploy/*

Final word

Anyway, contributions and ideas are always appreciated. Find me on here, or open a PR against the repository on GitHub!

You may also like...

Leave a Reply

Your email address will not be published.