At 2PM EST yesterday (Tuesday, October 16th), I noticed something was a bit off with our virtual network performance management appliance hosted at GoGrid in California’s Bay Area. I say “a bit off,” as in the appliance appeared completely offline. I tried logging into their management portal to no avail. By looking at network paths monitored to the virtual appliance, we were able to assess that the GoGrid access router was up. However, the datacenter network connecting our host to the router was down. Hmmm. Although we were no longer able to monitor to and from the data center because of the broken connectivity, we were able to spot the blackout right as it was happening.
Normally when connectivity is lost, it is preceded with adverse performance indicators such as jitter and packet loss. This incident was different because it was a complete network connectivity outage, as if someone had cut the cable or powered off a non-redundant element. A total GoGrid blackout. Nothing we saw in our monitoring before 2:00PM EST indicated that a blackout was on the way.
Connectivity was regained at 2:45PM EST, and we were once again able to begin monitoring to and from the data center.
Was your organization impacted by this outage? If so, what happened on your end?