This check is written in python and uses the boto library. It includes perfdata output so you can graph the number of messages in the queue. The the AWS API for SQS does wildcard matching of queue names so you can monitor a bunch of queues with one check if they have some sort of common prefix to the name. The way I use this is I have several individual checks using the complete explicit name of the queue and then a catchall using a wildcard set to a higher number that will catch any queues that have been added. Make sure you have a .boto file for the user that will be running this nagios check. It only requires read permissions.
Some queues may be more time sensitive than others. That is the case for my setup. For queues that are time sensitive I set the warning and critical counts to low values. Less time sensitive queues are set to higher count values. This screenshot is an example of that:
Config
Here is the command definition I use for Naigos:
# 'check_sqs_depth' command definition
define command{
command_name check_sqs_depth
command_line /usr/lib/nagios/plugins/check_sqs_depth.py --name '$ARG1$' --region '$ARG2$' --warn '$ARG3$' --crit '$ARG4$'
}
and here is the service definition I'm using
define service{
use generic-service
host_name sqs.us-east-1
service_description example_name SQS Queue
contact_groups admins,admins-page,sqs-alerts
check_command check_sqs_depth!example_name!us-east-1!150!300!
}
Code
The code is available on my github nagios-checks repository here: https://github.com/matt448/nagios-checks and I have posted it as a gist below. My git repository will have the most up-to-date version
No comments:
Post a Comment
Please note all comments are moderated by me before they appear on the site. It may take a day or so for me to get to them. Thanks for your feedback.