Saturday, May 25, 2013

Monitor S3 file ages with Nagios


I have started using Amazon S3 storage for a for a couple different things like static image hosting and storing backups. My backup scripts tar and gzip files and then upload the tarball to S3. Since I don't have a central backup system to alert me of failed backups or to delete old backups I needed to handle those tasks manually. S3 has built in lifecycle settings which I do utilize but as with everything AWS it doesn't always work perfectly. As for alerting on failed backups I decided to handle that by watching the age of the files stored in S3 bucket. I ended up writing a Nagios plugin that can monitor both the minimum and maximum age of files stored in S3. In addition to monitoring the age of backup files I think this could also be useful in monitoring the age of files if you use an S3 bucket as a temporary storage area for batch processing. In this case old files would indicate a missed file or possibly a damaged file that couldn't be processed.

I wrote this my favorite new language Python and used the boto library to access S3. The check looks through every file stored in a bucket and checks the file's last_modified property against the supplied min and/or max. The check can be used for either min age, max age or both. You will need to create a .boto file in the home directory of the user executing the Nagios check with credentials that have at least read access to the S3 bucket.

The check_s3_file_age.py file is available on my github nagios-checks repository here: https://github.com/matt448/nagios-checks.

To use this with NRPE add an entry something like this:

command[check_s3_file_age]=/usr/lib/nagios/plugins/check_s3_file_age.py --bucketname myimportantdata --minfileage 24 --maxfileage 720

Here is output from --help:

./check_s3_file_age.py --help

usage: check_s3_file_age.py [-h] --bucketname BUCKETNAME
                            [--minfileage MINFILEAGE]
                            [--maxfileage MAXFILEAGE] [--listfiles] [--debug]

This script is a Nagios check that monitors the age of files that have been
backed up to an S3 bucket.

optional arguments:
  -h, --help            show this help message and exit
  --bucketname BUCKETNAME
                        Name of S3 bucket
  --minfileage MINFILEAGE
                        Minimum age for files in an S3 bucket in hours.
                        Default is 0 hours (disabled).
  --maxfileage MAXFILEAGE
                        Maximum age for files in an S3 bucket in hours.
                        Default is 0 hours (disabled).
  --listfiles           Enables listing of all files in bucket to stdout. Use
                        with caution!
  --debug               Enables debug output.


I am a better sys admin than I am a programmer so please let me know if you find bugs or see ways to improve the code. The best way to do this is to submit an issue on github.

Here is sample output in Nagios