Monday, August 1, 2011

Using twitter to assist in troubleshooting


This might already be common practice for some people but I have recently added twitter to my Sys Admin toolbox. I have never been a twitter user in the past and have no real use for it on a day to day basis but it is usefully for quick information updates. If there is one thing people love to do on twitter it is instantly complain about stuff.

Today I was trying to track down a strange internet issue. At my job we are a SaaS provider so I am very sensitive to any internet hiccups. We were seeing packet loss out to various locations on internet. I was seeing this from two different locations on two different internet providers. My monitoring system was alerting that some of our European hosted web sites were not responding. During my testing I was able to reach www.google.com but wasn't able to reach www.cnn.com. Testing of other sites was giving me about a 50% success rate. By the time I picked some sites to start doing traceroutes the issues cleared up.

I decided to search twitter to see if other people had the same problems. My first search was for the phrase 'internet issues'. Of course there was a lot of garbage but a few results were of interest:

"Update: A major internet backbone carrier had issues that caused problems for internet users across the country."

"Internet Issues - We are currently experiencing some issues trying to get to some web sites including... http://tumblr.com/xpn3u0wug7"

"Looks like internet in #austin is having issues. My cell phone and work internet are not pulling up certain sites."


Our colocation provider gave us some news that they suspected Level 3 was having issues. So I searched twitter again for 'Level 3'. Bingo:

"Looks like Level3 was the cause (trending topic #level3) - backbone outage broke a lot of links"

"#level3 seems to be having issues at this hop: ae-63-63.ebr3.atlanta2.level3.net"

"Seeing network outage #level3 #losangeles traceroutes fail from San Diego after ae-72-72.ebr2.LosAngeles1.Level3.net @level3 any status?"


Telecom and data providers aren't always forthcoming when they have problems. Especially when they screw up internally. They will quickly admit when some construction crew digs up a fiber cable but if one of their engineers fat fingers a config they will clam up. Twitter has helped me track down the source of problems when they aren't directly related to us which allows me to provide info to our customers to let them know that a particular ISP is having issues and not us.




http://twitter.com/#!/search?q=%23level3