Tuesday, December 31, 2013

Simple way to integrate Nagios with Slack messaging

At work we recently switched messaging applications from Skype to a new platform called Slack. Slack just launched in August 2013. I have read it is similar to Campfire but I've never used that platform so I can't really comment on that but it is much more useful than a basic chat client like Skype. With Slack you can share files, easily search message history for text or files and integrate with 3rd party applications. Plus it is private for just your team or company. Slack has quite a few preconfigured integrations plus the ability to create your own custom integrations. First we setup the Github integration which allows all of our commit messages to dump into a channel. Next we setup the Trello integration to dump card changes from our main board into another channel. Then I went to setup the Nagios integration and ran into problems. They have a prebuilt integration for Nagios but I could not get it to work. It would post alert messages into the channel but the messages contained no information:

I mucked with their provided perl script quite a bit but I simply could not get it to work. It just kept posting empty messages. Being impatient and a do-it-yourselfer I set about trying to find another way to accomplish this. I looked through the list of integrations and noticed that they had a custom one called Incoming WebHooks which is an easy way to get messages from external sources posted into Slack. The simplest way to utilize Incoming WebHooks is to use curl to post the message to Slack's API. I wrote a little bash script that provides a detailed Nagios alert, a link back to the Nagios web page and conditional emoji's! Each warning level (OK, WARNING, CRITICAL and UNKNOWN) has it's own emoji icon. Here are some example messages in my Slack client:

Here is my bash script that posts to Slack. I placed it in /usr/local/bin

Here are the Nagios config lines that are added to commands.cfg

And finally lines I added to contacts.cfg

I'm not sure why Slack's prebuilt Nagios integration didn't work for me but I really like what I came up with. No Perl modules to install and the only outside dependency is curl. It's also pretty easy to modify the info in the alert message by adding or removing NAGIOS_ env variables in the curl statement.

Monday, November 4, 2013

Upgrading existing Solr installation to new version of Jetty

At work we have been running into a problem with Apache Solr crashing. Depending on how much it was used we would get several weeks of usage out of it before it crashed. Now it is only running for five days at a time. So this fire has started burning hot enough to be at the top of my to-do list.
When it crashes it throws errors saying "Too many open files". Running lsof showed it wasn't actually open files but thousands of orphaned sockets left open. The sockets looked like this in the lsof output:

java 2428 root 2173u sock 0,7 0t0 123291433 can't identify protocol

There won't be anything listed in netstat. These sockets don't have open connections to anything. The Solr log file will start showing errors similar to this:

SEVERE: java.io.FileNotFoundException: /usr/local/apache-solr-3.5.0/example/solr/data/index/_dgf.frq (Too many open files)

SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!

SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@./solr/data/index/write.lock

Initially we dealt with this problem by monitoring the number of open files for the java process and running a reindex when it got close to the limit. Not a great solution but at the time there weren't enough hours in the day to put a bunch of effort into figuring this out. In my case the limit when Solr blew up was 4000 open sockets. Once Solr had that many sockets open it would just throw 500 errors.

Usually the answer to a situation like this is upgrade Solr to a newer version. Unfortunately I couldn't do that in this case because we have a ruby gem that is dependent on Solr version 3.5. My research pointed to Jetty as the source problem and not Solr. Once I found this post I knew for sure Jetty was causing the orphaned sockets. Solr 3.5.0 is packaged with Jetty 6.1.26 which has a bug that causes the orphaned sockets under certain conditions. Because Jetty 6 is fairly old the developers are not going to fix it. At this point I set about upgrading Jetty to version 7.

The first thing I had to figure out was what stuff was Solr and what stuff was Jetty. Turns out most of the package is Jetty. Solr is contained in apache-solr-3.5.0/example/solr and apache-solr-3.5.0/example/webapps/solr.war. So I decided to try and stuff Solr 3.5.0 into Jetty 7.6.13. Later I may try moving to the latest version of Jetty 9 but I'm just trying to solve this orphaned socket problem right now and was worried the older version of Solr might have problems with a newer Jetty.

Upgrading Jetty

Here are the steps I took to upgrade Solr 3.5.0 to Jetty 7

Download latest Jetty 7 (jetty-distribution-7.6.13.v20130916.tar.gz at the time this was written) from here http://download.eclipse.org/jetty/7.6.13.v20130916/dist/

Untar jetty-distribution-7.6.13.v20130916.tar.gz
tar xfvz jetty-distribution-7.6.13.v20130916.tar.gz

Create destination directory for all the new files
mkdir /usr/local/apache-solr-3.5.0-jetty-7.6.13
mkdir /usr/local/apache-solr-3.5.0-jetty-7.6.13/example

copy the contents of jetty-distribution-7.6.13.v20130916 to new directory
cp -a jetty-distribution-7.6.13.v20130916/* /usr/local/apache-solr-3.5.0-jetty-7.6.13/example

Copy solr files from old solr installation to new Jetty directory
cp -a /usr/local/apache-solr-3.5.0/example/solr  /usr/local/apache-solr-3.5.0-jetty-7.6.13/example
cp -a /usr/local/apache-solr-3.5.0/example/webapps/solr.war /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/webapps/

Edit the jetty.xml config file to change the listening port
vi usr/local/apache-solr-3.5.0-jetty-7.6.13/example/etc/jetty.xml
Change this line
 <Set name="port"><Property name="jetty.port" default="8080"/></Set>
To this
 <Set name="port"><Property name="jetty.port" default="8983"/></Set>

At this point solr will run but there are some example war files and config files that aren't needed for Solr and should be cleaned up. 

- Edit /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/start.ini
   vi /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/start.ini
   Comment out the line
   so it reads 

- Clean up example war files
  cd /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/webapps
  mkdir BAK
  mv test.war spdy.war BAK

- Clean up example config files
  cd /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/etc
  mkdir BAK
  mv jetty-spdy.xml jetty-spdy-proxy.xml jetty-testrealm.xml BAK
  cd /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/contexts
  mkdir BAK
  mv test.xml BAK

I use a symbolic link for the installation directory so the start script doesn't have to be modified. Before restarting I have to switch that sym link.
  service solr stop
  cd /usr/local
  rm solr
  ln -s apache-solr-3.5.0-jetty-7.6.13 solr
  service solr start

Then you can test hitting the service locally.
  curl localhost:8983/solr/

it should return html that says something like this:
  <title>Welcome to Solr</title>

  <h1>Welcome to Solr!</h1>

You will probably need to run a reindex if transactions have been taking place while solr was down for the upgrade.

Resources used to compile this post

Thursday, October 24, 2013

Arduino - Using digital potentiometers (AD8403)

I have seen several blog posts covering the use of digital potentiometers with an Arduino but I haven't seen any that demonstrate digital pots that contain a shutdown circuit. I'm working on a project that needs this particular feature.

The shutdown circuit is of interest to me because I plan on interfacing with something that controls movement. In that application any value on the digital pot will cause a motor to move. Lower resistive values move forward and higher resistive values move back. I need a way to completely disable the potentiometer. My first prototype used two relays, two transistors and two resistors with different values. The Arduino would turn off and on the relays to cause movement. It worked because I could simply turn off both relays but it was quite bulky and the mechanical relays were noisy. A friend recommend a digital potentiometer as a replacement. A single digital pot chip would replace six components. I began researching them and found that not all of them include a shutdown circuit. I settled on the Analog Devices AD8403 chip. It is a quad channel, 256 position digitally controlled variable resistor. Communication is done through an SPI interface.

So lets start with a demonstration of what happens when there is no shutdown circuit on a digital potentiometer and the Arduino is reset. The circuit and code cycles through each digital pot varying the brightness of an LED by running up and down all 255 positions of the pot. While this is running I press the reset button on the Arduino.

In this situation the digital pot chip still has power and the value of one the pots is stuck while the Arduino resets (demonstrated by the LED staying lit). In my real application this would cause movement to continue until the board finished restarting which is highly undesirable. This effect could be minimized somewhat by including code to reset values of the four pots on startup but there would still be a second or two of uncontrolled movement while the board restarts.

Next is a demonstration of the same circuit but with the shutdown circuit enabled. The shutdown pin is connected to a pull down resistor. This causes the chip to go into shutdown mode if the Arduino resets or loses power. Again I press the reset button on the Arduino as it is running the code.

This is getting close to what I want but there is still the problem of the pot being set to some unknown value when the AD8B403 is taken out of shutdown mode. To handle this I added code to the setup function that sets the pots to a known value (zero in this case) before taking the chip out of shutdown mode.


Here is a photo and Fritzing diagram of the circuit.

The connections are:
  * All A pins of AD8403 connected to +5V
  * All B pins of AD8403 connected to ground
  * An LED and a 220-ohm resisor in series connected from each W pin to ground
  * RS - to +5v
  * SHDN - to digital pin 7 and a 10k ohm pull down resistor
  * CS - to digital pin 10  (SS pin)
  * SDI - to digital pin 11 (MOSI pin)
  * CLK - to digital pin 13 (SCK pin)

The AD8403 is the 10k ohm version.


The most update to date version of the code is available here: https://github.com/matt448/arduino/blob/master/SPI_Digital_Pot_AD8403/SPI_Digital_Pot_AD8403.ino

The code to control this isn't very complex. This is my first time using an SPI device and I found it to be very straight forward. The Arduino IDE has an example sketch for controlling SPI digital pots under File > Examples > SPI > DigitalPotControl. I started with the example and modified it to include shutdown control. That example code also ran a bit slow because of the serial console messages so I removed all the serial communication code.

Comment on shutdown pin

So after getting through all this testing I noticed a limitation of the shutdown pin, it shuts down all four pots at the same time. In my real application I plan on controlling movement on four separate motors and moving them to a certain positions. Once the motor reaches it's position I want it to stop. But the only way to stop the motor is activate the shutdown which deactivates all four pots. This is a problem because the other three motors may not have reached their position when I need to shutdown. I did some more looking and found that Microchip makes a digital pot that has individually controllable software shutdown for each pot. I'm going to order a couple MCP4251 chips and I'll write up another post when I test them out.

UPDATE 3/17/2014: Post about the MCP4251 is available here
UPDATE 3/27/2014: Added list of connections and resistor values. Also added links to github repo.

Thursday, October 10, 2013

Arduino - Sending data over a CAN bus

I have been tinkering with CAN buses due to my interest in cars. It's fascinating to me that packets are flying around a modern vehicle controlling nearly everything. Gauges, lights, locks, engine sensors, etc. To have a better understanding of the basics of a CAN bus I wanted to build the simplest possible setup to send and receive CAN messages. I chose two Arduino Uno's with a Seeed Studio CAN-BUS shield attached to each Uno. The Seeed shield is very straight forward and inexpensive. The Sparkfun CAN-BUS shield has an SD card slot, LCD connector and GPS connector. All of which are cool but drive up the price and complexity. The Seeed shield only does CAN bus and includes screw terminals which are handy for testing.

Arduino Uno R3Seeed CAN-BUS Shield

What I wanted to do with this experiment was transmit the value of an analog pin hooked up to a linear potentiometer. The data would be sent from one Arduino to another over a CAN bus and then display that value on an LCD connected to the second Arduino. Here is a picture of my setup. (Ignore the Mega2560 above the LCD. It's not used here.)

And here is a Fritzing diagram minus the CAN-BUS shields.

CAN bus termination

A CAN bus requires 120 Ohm termination resistors at each end of the bus. The Seeed Studio shields have built in termination resistors. When you connect two Seeed CAN bus shields togther like I did in this example you will have a properly terminated CAN bus. If you plan on connecting into an existing CAN bus that already has termination you can disable the built in termination resistors. To disable termination you can cut trace P1 or you can desolder resistor R1.

Close up view of the Seeed CAN bus shield
 termination resistors.
**Note: I have recently discovered the Seeed Studio CAN-BUS shield v1.0 uses a 60 ohm termination resistor for R3. While that worked for this small demo I later ran into issues when trying to use this shield with other nodes on a CAN bus. This 60 ohm resistor caused me many hours of frustration. If you are going to use this shield with on a bus with multiple nodes I would recommend desoldering R3 and using the correct 120 ohm resistance at the ends of your bus. 

Connecting into an existing CAN bus

If you are planning on connecting into an existing CAN bus (like in a car) you need to remove/disable the termination resistor on the shield as explained above. The CAN bus in a vehicle already has termination resistors. Adding a new node with a termination resistor will cause errors and disrupt communication on the bus.

Another important step is to connect a common ground between your Arduino board and the vehicle. If you are connecting at the OBD2 port pin 5 provides a signal ground. If you can't find a signal ground wire a chassis ground will suffice.

CAN bus messages

So I should probably explain a bit about CAN bus messages. Each message is made up of an id and some data. The id's in hex start at 0x000 and go to 0x7FF or 0 to 2047 in decimal. In most systems lower id values are considered more important. The bus handles collisions by letting the lower id win the collision. The data can be between 1 and 8 bytes for each message. Each byte can have a value from 0 to 255 or in hex 0x00 to 0xFF. When you send a CAN bus message you transmit the id, how many bytes you are sending (this is called DLC) and the actual data. The receiver will only read the number of bytes you said should be in the message. So if you send a DLC of 4 but the message contains 8 bytes the receiver will only read the first 4 bytes. Eight bytes per message is a bit limiting but the tradeoff is the high reliability of the bus. So sometimes you have to be creative with stuffing data into those bytes. If the value you are sending is less than 255 you can just use a single byte. Larger numbers will require using multiple bytes. Ascii codes can be sent but only eight characters per message. Whatever method you use to stuff the data in will also have to be used to un-stuff the data on the receiver. In my simple example here I did some math to limit the range of values to 0-255. An analog pin produces values between 0-1024. I simply divided the result by four to give me data I could send in a single byte.
CAN buses can operate at several different speeds up to 1 Mbit/s. Typical rates are 100 kbit/s, 125 kbit/s and 500 kbit/s. Slower rates allow for longer length buses. All devices on a bus must transmit at the same speed. The CAN bus wikipedia page is a good place to start if you want to learn more about the CAN protocol.


I started with the example code provided by Seeed and modified it to add in the LCD output on the 'receiver' device and added reading of the potentiometer on A0 for the value that is transmitted. They have basic examples for send and receive. You can find some good info on their wiki page. Their libraries are available here. On my Mac I created the directory ~/Documents/Arduino/libraries/CAN_BUS_Shield for the library files. I unzipped the file and copied over the .h and .cpp files into that new directory. The zip file also contains the send and receive examples.

Note that normally devices on a CAN bus are both receivers and transmitters of data. This is a simplified example where each device is only doing one task.

Sender code

Receiver code


[Updated 2014-05-25: Noted value of A0 potentiometer in the Fritzing diagram]
[Updated 2014-07-21: Added section about termination resistors]
[Updated 2014-09-25: Added note about incorrect value of resistor R3 on Seeed's shield]
[Updated 2015-03-10: Added additional notes about termination resistors]
[Updated 2017-03-27: Added new section 'Connecting into an existing CAN bus']
[Updated 2018-06-15: Fixed broken links for Seeed-Studio wiki and libraries]

Monday, October 7, 2013

Nagios monitoring for Amazon SQS queue depth

I have found that a bunch of messages stacking up in my SQS queue's can be the first sign of something breaking. Several things can cause messages to stack up in the queue. I have seen malformed messages, slow servers and dead processes all cause this at different times. So to monitor the queue depth I wrote this Nagios check / plug-in. The check simply queries the SQS api and finds out the count of messages in each queue. Then it compares the count to the warning and critical levels.

This check is written in python and uses the boto library. It includes perfdata output so you can graph the number of messages in the queue. The the AWS API for SQS does wildcard matching of queue names so you can monitor a bunch of queues with one check if they have some sort of common prefix to the name. The way I use this is I have several individual checks using the complete explicit name of the queue and then a catchall using a wildcard set to a higher number that will catch any queues that have been added. Make sure you have a .boto file for the user that will be running this nagios check. It only requires read permissions.

Some queues may be more time sensitive than others. That is the case for my setup. For queues that are time sensitive I set the warning and critical counts to low values. Less time sensitive queues are set to higher count values. This screenshot is an example of that:


Here is the command definition I use for Naigos:

# 'check_sqs_depth' command definition
define command{
        command_name    check_sqs_depth
        command_line    /usr/lib/nagios/plugins/check_sqs_depth.py --name '$ARG1$' --region '$ARG2$' --warn '$ARG3$' --crit '$ARG4$'

and here is the service definition I'm using

define service{
        use                                 generic-service   
        host_name                      sqs.us-east-1
        service_description          example_name SQS Queue
        contact_groups                admins,admins-page,sqs-alerts
        check_command             check_sqs_depth!example_name!us-east-1!150!300!


The code is available on my github nagios-checks repository here: https://github.com/matt448/nagios-checks and I have posted it as a gist below. My git repository will have the most up-to-date version

Monday, August 5, 2013

Using static IP's on Verizon 4G

First a little background info on IP addresses and cellular data service. 3G data connections use publicly accessible valid internet IP addresses. While this is nice if you want remote access to a device it does needlessly use up increasingly valuable IPv4 addresses. When carriers rolled out their next generation 4G service they switched to using private 10.x.x.x networks and NATed the traffic out to the internet somewhere within their network. It is possible to get publicly accessible static IP's from Verizon but they don't make the process very easy.

Requesting static IP's

I am using Verizon 4G service with Cradlepoint routers as a backup internet connection at my remote offices. I wanted to use static IP addresses so I could get access to these offices if the primary internet connection went down. We have a Verizon business sales rep and he was the person that handled our static IP request. Verizon charges a one time $500 dollar fee to add static IP's to your account. First step was to authorize the one time $500 charge. My accounting department handled that and then my Verizon sales rep sent the request somewhere deep into the bowels of the Verizon bureaucracy. A month later we were approved. Next someone from Verizon called to ask what 'sub-group' or 'level' we wanted these static IP's to be attached to. It took a while on the phone to figure out what exactly they were asking. Turns out we have our Verizon devices setup in two different groups. One group is devices with phone and data service and the other group is data only (things like iPads, hotspots or Cradlepoints). So we applied the static IP's to our data only group of devices. From the conversation I had on the phone with this Verizon person my understanding is that we would need to pay $500 bucks for each group of devices.

Assigning static IP's to 4G devices

It would be really nice if there was some sort of website to assign static IP's to devices but sadly there is not. The process for attaching a static IP to a certain device is to e-mail your Verizon business sales rep the device IMEI and/or the phone number assigned to the device. The sales rep then handles assigning the static IP and will e-mail back the static IP address once one has been assigned to the device.

Configuring 4G devices to use static IP's

This is where information got really nebulous. I asked my Verizon sales rep if I needed to do any configuration to my Cradlepoint router for the static IP. He said "nope, it should just work". Well that is definitely not the case. To be able to use static IP's you must change a setting for something called the APN. The APN is used to identify what network the device should attach to. The ability to change the APN of a device varies depending on the carrier. My AT&T iPhone does not present any options to change that setting but this Apple knowledge base document shows the option does exist. On Cradlepoint routers this option is easily accessible because it is a somewhat common thing to modify on those devices. The APN menu location on Cradlepoints depends on the device but it is usually under either modem settings or the Connection Manager.
Now what should the APN be set to for Verizon devices? Well this took a bit of searching. I found a few blog posts that said it should be set to "mw01.vzwstatic". I tried this and the modem kept dropping it's connection with an error saying carrier rejected. So after more searching I found this list of Verizon APN's:

1. ne01.vzwstatic (NorthEast)
2. nw01.vzwstatic (NorthWest)
3. so01.vzwstatic (South)
4. mw01.vzwstatic (MidWest)
5. we01.vzwstatic (West)

The correct APN depends on where you are in the country. I did not find any more specific information than this and since Texas spans a few different regions I wasn't exactly sure which one I should use. I took a guess at so01.vzwstatic and it turned out to be the correct one. After setting this option the Cradlepoint 4G modem cap restarted and it grabbed the correct static IP from Verizon. Success!

If you switch back to dynamic IP's you should use the APN "vzwinternet" or use the default setting for the device. I found once my device was assigned static IP service and I restarted the modem I could not use vzwinternet. Seems like the APN has to match whatever Verizon has assigned on their backend or they will reject the device.

Saturday, May 25, 2013

Monitor S3 file ages with Nagios

I have started using Amazon S3 storage for a for a couple different things like static image hosting and storing backups. My backup scripts tar and gzip files and then upload the tarball to S3. Since I don't have a central backup system to alert me of failed backups or to delete old backups I needed to handle those tasks manually. S3 has built in lifecycle settings which I do utilize but as with everything AWS it doesn't always work perfectly. As for alerting on failed backups I decided to handle that by watching the age of the files stored in S3 bucket. I ended up writing a Nagios plugin that can monitor both the minimum and maximum age of files stored in S3. In addition to monitoring the age of backup files I think this could also be useful in monitoring the age of files if you use an S3 bucket as a temporary storage area for batch processing. In this case old files would indicate a missed file or possibly a damaged file that couldn't be processed.

I wrote this my favorite new language Python and used the boto library to access S3. The check looks through every file stored in a bucket and checks the file's last_modified property against the supplied min and/or max. The check can be used for either min age, max age or both. You will need to create a .boto file in the home directory of the user executing the Nagios check with credentials that have at least read access to the S3 bucket.

The check_s3_file_age.py file is available on my github nagios-checks repository here: https://github.com/matt448/nagios-checks.

To use this with NRPE add an entry something like this:

command[check_s3_file_age]=/usr/lib/nagios/plugins/check_s3_file_age.py --bucketname myimportantdata --minfileage 24 --maxfileage 720

Here is output from --help:

./check_s3_file_age.py --help

usage: check_s3_file_age.py [-h] --bucketname BUCKETNAME
                            [--minfileage MINFILEAGE]
                            [--maxfileage MAXFILEAGE] [--listfiles] [--debug]

This script is a Nagios check that monitors the age of files that have been
backed up to an S3 bucket.

optional arguments:
  -h, --help            show this help message and exit
  --bucketname BUCKETNAME
                        Name of S3 bucket
  --minfileage MINFILEAGE
                        Minimum age for files in an S3 bucket in hours.
                        Default is 0 hours (disabled).
  --maxfileage MAXFILEAGE
                        Maximum age for files in an S3 bucket in hours.
                        Default is 0 hours (disabled).
  --listfiles           Enables listing of all files in bucket to stdout. Use
                        with caution!
  --debug               Enables debug output.

I am a better sys admin than I am a programmer so please let me know if you find bugs or see ways to improve the code. The best way to do this is to submit an issue on github.

Here is sample output in Nagios

Saturday, March 30, 2013

Compiling libhid for Raspbian Linux on a Raspberry Pi

My son and I are working on a project using a Raspberry Pi and I needed to be able to talk to a USB HID device. This requires a software library called libhid but unfortunately it is not available as a package on Raspbian linux. I downloaded the source and attempted to compile it but ran into an error:

lshid.c:32:87: error: parameter ‘len’ set but not used [-Werror=unused-but-set-parameter]
cc1: all warnings being treated as errors
make[2]: *** [lshid.o] Error 1
make[2]: Leaving directory `/root/libhid-0.2.16/test'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/root/libhid-0.2.16'
make: *** [all] Error 2

After some googling I found a couple others on the Raspberry Pi forums that ran into the same problem. One of the commenters came up with a simple fix that requires a quick edit of the source code. In ~/libhid-0.2.16/test you need to edit the file lshid.c

Here is the code before making the edit:

39 /* only here to prevent the unused warning */
40 /* TODO remove */
41 len = *((unsigned long*)custom);
43 /* Obtain the device's full path */

Here is the code after the edit.
You need to comment out line 41 and then add len = len; and custom = custom;

39 /* only here to prevent the unused warning */
40 /* TODO remove */
41 //len = *((unsigned long*)custom);
42 len = len;
43 custom = custom;
45 /* Obtain the device's full path */

After editing the file simply run configure, make and make install like normal. The library will be put into /usr/local. Make sure you run sudo ldconfig before trying to compile any software that uses libhid. Thanks Raspberry Pi forums!

Wednesday, March 20, 2013

Bandwidth limits for guest wifi on an ASA 5505

At work we have free wifi for our customers as a nicety and so they can download our smartphone app if needed. Initially I set it up with no bandwidth limits with the idea of keeping an eye on it and locking it down if there was abuse. Over the past few weeks my MRTG graphs showed several spikes where the free wifi hit 10mbps. That is a big chunk of our internet connection so I decided it was time to limit the bandwidth. Since I'm not a Cisco expert it took some Googling to find the best way to do this. I found a couple resources that helped me put together what I needed. The free wifi network is on a separate VLAN with it's own IP subnet.

Here is interface definition for the VLAN

interface Vlan92
  nameif freewifi
  security-level 50
  ip address

Here is the syntax I used to limit the freewifi VLAN to 2mbps. The limit is applied to the subnet used by the freewifi VLAN.

access-list ip-qos extended permit ip any
access-list ip-qos extended permit ip any

class-map qos
  description qos policy
  match access-list ip-qos

policy-map qos
  class qos
    police output 2000000 2000000
    police input 2000000 2000000

service-policy qos interface freewifi


My initial thought for testing the bandwidth limits was to connect to the freewifi VLAN and simply use one of the internet speed testing web sites. The speed test web sites worked fine for download speeds but the upload tests kept reporting that they were getting the full bandwidth of the connection. It seemed like the upload limit wasn't being enforced. I tried all of the popular speed testing sites and got the same result. Downloads were limited to 2mbps and uploads were running at the full speed of the connection. Hmmm...

I reviewed my settings on the ASA and everything seemed like it was correct. I decided to do a different type of test to see if I would get a different result. I created a 10MB file and then tested uploading and downloading it to and from a server out on the internet using scp. This test gave me the results I was expecting. Both upload and download of this test file took about 35 seconds which is inline for a 2mbps connection. I then tested transferring the same file on the inside VLAN which has no bandwidth limits and the scp transfer time was 4 seconds. I'm not sure what was going on with the speed test sites but the upload speeds were not reporting accurately for me.

Monitoring status

You can watch the bandwidth limits in action using the 'show service-policy police' command. If the limit is exceeded the output will show the number of packets and bytes that have exceeded the bandwidth limit.

This is the command output before sending any traffic:

asa5505# show service-policy police

Interface freewifi:
  Service-policy: qos
    Class-map: qos
      Output police Interface freewifi:
        cir 2000000 bps, bc 2000000 bytes
        conformed 1306 packets, 907993 bytes; actions:  transmit
        exceeded 0 packets, 0 bytes; actions:  drop
        conformed 0 bps, exceed 0 bps
      Input police Interface freewifi:
        cir 2000000 bps, bc 2000000 bytes
        conformed 1072 packets, 192021 bytes; actions:  transmit
        exceeded 0 packets, 0 bytes; actions:  drop
        conformed 0 bps, exceed 0 bps

This is the output after transmitting several test files:

asa5505# show service-policy police

Interface freewifi:
  Service-policy: qos
    Class-map: qos
      Output police Interface freewifi:
        cir 2000000 bps, bc 2000000 bytes
        conformed 149813 packets, 127878453 bytes; actions:  transmit
        exceeded 10273 packets, 14716462 bytes; actions:  drop
        conformed 3384 bps, exceed 360 bps
      Input police Interface freewifi:
        cir 2000000 bps, bc 2000000 bytes
        conformed 157493 packets, 123699017 bytes; actions:  transmit
        exceeded 15083 packets, 21214456 bytes; actions:  drop
        conformed 4928 bps, exceed 760 bps


Monday, March 18, 2013

Template Nagios check for a JSON web service

I wrote two different custom Nagios checks for work last week and realized I could make a useful template out of them. After writing the first check I was able to reuse most of the code for the second check. The only changes I had to make had to do with the data returned. So I decided to make this into a generic template that I can reuse in the future. The check first verifies that the web service is responding correctly and then checks various data returned in JSON format.  While writing this template I found this really cool service (www.jsontest.com) that let me code against a service available to anyone who wants to try out this Nagios check before customizing it. This is the first time I have used Python's argparse function and I have to say it is fantastic. It makes adding command line arguments very easy and the result is professional looking.

My github repo can be found here: https://github.com/matt448/nagios-checks

Here is the code in a gist:

Wednesday, March 13, 2013

Nagios file paths on Ubuntu and simple backup script

This is more of a note to myself than anything else but might be helpful to others. Here are the config and data directories for Nagios when installed using packages on Ubuntu 12.04.

Config files

Plugin executables

Graphing (pnp4naigos)


Here is a very simple backup script for Nagios on Ubuntu 12.04

Monday, January 28, 2013

Relaying Postfix through AuthSMTP on an alternate port

AuthSMTP is an authenticated SMTP relay service that you can use with web applications or any situation you need to send outbound e-mail. Because it is an authenticated service it is a little trickier to configure Postfix to relay through their service. I found this post really helpful in configuring the sasl options but one thing I couldn't find a clear answer on was how to use a port other than 25 for the relay host. AuthSMTP offers alternative ports (23, 26, 2525) for SMTP because some ISP's block port 25. To use an alternative port just put a colon after the host name and add the port number. Like this (in main.cf):

relayhost = mail.authsmtp.com:2525

The entry in your sasl-passwords file must match the relayhost name like this:

mail.authsmtp.com:2525 username:secretpassword

Just a quick tip about something that wasn't obvious to me and hopefully this helps out someone else.