CDNs, POPs and SSDs: How the Fastly outage happened
by Paul Davenport Paul Davenport on

Reddit, Spotify, Hulu, Stripe, the New York Times, and countless other online properties went dark with a 503 error for almost an hour this week, showing how delicate the Internet’s physical infrastructure can really be.

The error brought down everything from streamers to fintech to news outlets, with Fastly, the Content Delivery Network (CDN) provider, calling the issue a “global CDN disruption” that wasn’t relegated to issues at a single data center.

Fastly’s senior vice president of engineering Nick Rockwell said in a blog that the hour-long outage happened because a customer pushed a configuration change that triggered an undiscovered software bug. While Rockwell was mum on the specifics, the bug is being traced back to a May 12 software update that apparently could trigger CDN failure with a “specific customer configuration under specific circumstances.”

It turns out that on June 8, those specific bug-triggering configuration changes were pushed, resulting in 85 percent of Fastly’s network to return 503 error messages out to users.

“Even though there were specific conditions that triggered this outage, we should have anticipated it. We provide mission-critical services, and we treat any action that can cause service issues with the utmost sensitivity and priority,” Rockwell said.

So how do CDNs work in the first place, and why are their failures so far-reaching?

CDNs are a critical component of the larger Internet. CDN companies operate solid state drive (SSD) servers around the globe that connect to improve performance and availability of web services by caching some data as close to end users as possible. Each data center hosting these SSDs acts as a regional point-of-presence (POP) that, combined with the other POPs, form an “edge” network (in this case, operated by Fastly) that works to speed up the delivery of sites to regional users.

For instance, the media content you consume (ie. your New York Times front page) may be cached at a CDN POP server near you so that it doesn’t have to be retrieved from a far-flung SSD every time you load a web page.

So while a page could take hundreds of milliseconds to load when it’s being retrieved from a server on the other side of the world, a CDN can usually start sending the content of a page in less than 25 milliseconds when it’s already been cached. (This, in part, is how apps have continued to grow more complex without impacting the responsiveness for the end user.)

Another way to understand CDNs is in relation to edge computing: in many enterprise contexts, CDNs are the WAN edge.

To help avoid congestion at key points in the network, teams can employ subnets (or VLANs) to help segment traffic at key locations, which can more intelligently (and predictably) route traffic to reduce the load on the larger network. In a similar fashion, enterprises can deploy CDNs that serve external requests directly without impacting the performance of the larger WAN.

So the big takeaway is that despite all of the built-in resiliency that goes into making the Internet just work, it’s still largely supported by physical infrastructure that needs to be monitored and managed, While many enterprises strive to go “internet-first” in an attempt to offload the amount of physical hardware their IT teams manage directly, these teams still need visibility into the environments that help route and deliver traffic across the enterprise footprint to ensure end-user experience stays consistent.

When issues like this arise, understanding the scope of the outage from an enterprise perspective allows IT teams to identify the impact to their users and business. Gaining this visibility requires a comprehensive monitoring tool that can take a granular look at the network while at the same time putting minimal impact on network capacity itself.


Digital Experience Monitoring from AppNeta
To learn more about how AppNeta tackles just this, download our whitepaper

Download Whitepaper

Filed Under: Industry Insights

Tags: hybrid office , hybrid cloud , enterprise IT , enterprise WAN , internet performance , network management , network performance monitoring , network performance , data center , content delivery network , POP , SSD , CDN , cloud , internet , network , Fastly , outage , error