IN PRACTICE | This article is part of a series of posts sharing examples of how AppNeta users have leveraged the service to solve performance problems.
Transient issues can be a huge problem on large network infrastructures. Not only are they hard to catch, but it is even harder to isolate root cause. A large US telecommunications customer continuously monitors the VoIP performance of their 200+ call centers distributed around the US to look for these type of issues. When six call centers experienced intermittent outages using their Avaya voice system, the Director of Voice Technology used the AppNeta Performance Manager to quickly pinpoint the cause. In this post we’ll take a look at his workflow to isolate these ghost issues.
This is exactly why we bought this product, it took us 45 minutes to fix this issue that would have probably taken 2 hours without it. We would have had to open a ticket with Avaya and wait for their response, but we didn’t have to do that because of AppNeta.
-Director, Voice Technology
It’s important to remember that it is far more difficult to isolate these types of issues if you don’t have some sort of continuous monitoring in place. The Director we spoke with has been using AppNeta for a while, and a big reason he was able to investigate this issue so quickly is the forethought he put into setting up his AppNeta deployment to monitor the exact routes his VoIP traffic takes through the production environment.
In order to avoid a single point of failure, the voice system for groups of call centers has network connectivity provided by a pair of load balanced data switches. This was set up so that if any one particular switch goes down, it would only impact half the call centers in the group.
Two AppNeta Monitoring Points (both with dual LAN interfaces) are racked in the datacenter and connect to the voice system’s A and B switches. Each of the call centers also has a monitoring point which enables continuous monitoring of the inbound and outbound routes that production traffic takes through each data switch, between the call center and the VoIP server. Depending on which port a particular path is targeting within the AppNeta Performance Manager, the voice team can quickly identify which of the load balanced switches is monitored.
How AppNeta Helped Troubleshoot
AppNeta can be configured to proactively alert if a path’s network performance degrades or the path loses connectivity to the target.
Since his paths are grouped within the AppNeta Performance Manager by the relevant VoIP switch, it was easy to quickly see that a group of six call centers had lost connectivity to the voice server:
Clear naming conventions quickly indicated that the paths all went through the same data switch:
Reviewing the route analysis for each of the disconnected paths confirms they all traverse the same data switch and provides the hostname and IP address:
Reduced Time to Resolution
This customer’s voice and data networks are supported by different internal groups that do not share visibility into infrastructure.
The voice network team opened a ticket with the data network team for further investigation into the impacted data switch. When the data network team looked at the switch, they didn’t see any errors or port issues. Prior to AppNeta monitoring data, the data network team would typically instruct the voice team to open a ticket with Avaya, believing the issue must be with the VoIP infrastructure. This would involve a lengthy back and forth process between the voice team, Avaya and the network team.
AppNeta was able to significantly improve time to resolution as it empowered the voice team with route history from the perspective of the AppNeta monitoring point behind the data switch. Armed with this data, the data team was convinced to restart the data switch, which resolved the problem!
The voice team still doesn’t know what the underlying cause of the data switch failure was, but an issue that would usually take two or more hours of back and forth between the teams was fixed in under 45 minutes. Having visibility into the data network “negates a lot of the finger pointing because we have more of the actual data to work with.”
Not every troubleshooting story is going to end with infrastructure-changing revelations, but at big enterprises sometimes knowing what isn’t at fault can be as important as knowing what is.