Twitter Goes Down #PathViewCloudFindsProblem
by June 22, 2012

Filed under: Case Studies

I don’t think I was alone June 21, 2012 when I went to check-in on Twitter during lunch and for some reason could not connect. Hmmm, how is that possible? The Twittersphere can’t actually be down…or can it? My office in Boston was also experiencing some network problems so I just assumed the Twitter issue was related to the Comcast problem.  But, when I checked our PathView Cloud service, which monitors www.twitter.com, I realized I was wrong and the carrier was not to blame.

Twitter is one of the many news, social media, and other high traffic websites we monitor 24/7 with PathView Cloud for network performance. My engineering team and I have a placed a couple PathView microAppliances on each coast so our cloud-delivered PathView Cloud service can monitor our network traffic and the performance between various locations.   This is designed to catch performance issues and let us know when and why large websites go down.

Twitter Outage Analysis Overview

Twitter Outage Analysis Overview

So, What Caused Twitter To Go Down?

Looking at PathView Cloud’s visual reporting charts (from 0930 AM to 1700 EST) it appears that the problem for the outage was definitely network related. It was easy to see that the causes were excessive packet loss, RTT, and latency. We identified there were (6) complete disconnects for 1hr and 11min, (5) latency events for 24min, and (6) packet loss events for 25min. Additionally, before each disconnect, packet loss jumped to ~60%.

Looking at PathView Cloud’s visual reporting charts (from 0930 AM to 1700 EST) it appears that the problem for the outage was definitely network related. It was easy to see that the causes were excessive packet loss, RTT, and latency.  We identified there were (6) complete disconnects for 1hr and 11min, (5) latency events for 24min, and (6) packet loss events for 25min.  Additionally, before each disconnect, packet loss jumped to ~60%.

Twitter Downtime Analysis with PathView

Twitter Downtime Analysis with PathView

Now To The Application Traffic…

Twitter Downtime Analysis Application Traffic

Twitter Downtime Analysis Application Traffic

When it came to HTTP requests to the service, nothing out of the ordinary jumped out at me.  My DNS was not experiencing any problem.  Additionally, when we were able to successfully connect to the web service; first byte, last byte, and total response of HTTP were not problems.  One observation stood out though: it appears there was a time when the network was up, running and responding, however HTTP requests could not be processed.  This leads be to believe the network was operational, though Apache (or whatever web service they use) was not running at that time

Based on the performance reporting from PathView Cloud, the Twitter network was responsible for this outage.  This certainly shows us that all websites – even the most popular and highly visited ones – are very susceptible to network problems and performance interruptions!

Article Written By: Greg Zammuto