Ensuring the Performance of Today’s Distributed Systems
Since performance is our niche, we’re always on the lookout for any articles, reports or blogs on the topic. A recent one that got our attention is “The Expanding Role of Application Performance Management,” written by Sebastian Kruk and published in CRM magazine. The article came with the sub-head, “Understanding and monitoring the end-user experience” – our bailiwick.
The article mainly focuses on the monitoring of ERP systems and why that’s important. The most interesting aspect of that is when an ERP system expands, serves more users around the world and becomes more complex in order to accommodate that workload, monitoring becomes increasingly important for all the applications and components that are part of that system. Although the ERP system is typically a packaged application, it has become a more complicated, backend, distributed system, and not just a single app that’s on a server somewhere.
Distributed Systems Everywhere
You could say this is true of all systems, not just ERP systems. A consumer application such as Amazon or Google, or an e-commerce or media store, isn’t considered a single application from the perspective of the development and engineering teams; it tends to be multiple systems that are all interrelated and all have specialized functions that come together to produce an end-user experience.
These systems become complicated as a result of their need to scale up. When you have a large number of users (e.g. 10,000 concurrent connections) that the system handles at the same time, you want to break out specific pieces of functionality and have different engineering teams support them. Breaking up the system into services can be a huge win for maintainability, because individual components can change, as long as they don’t break their interface contracts. As a whole, though, the system gets enormously more complex, especially as different use cases emerge. The app admin, average user and power user are each going to have different usage patterns and create different loads on the system. Fortunately, these loads tend to show up in different services, so it makes sense to architect the system to let each team focus on the use case that’s most important to them.
A System-Level View
An important consideration that the article didn’t touch on is that the performance characteristics of those distributed systems are also different, and they tend to have more long-tail effects. If you are relying on six different components and they all talk to each other and all are mostly stable, then your end-user experience will be stable. If one of those systems is slow, then your end-user experience will be a little slow, which isn’t surprising. However, if all six of those components are slow — which happens when you have 10,000 users — then the end-user experience will be incredibly slow. The trick to resolving that is to notice when you have a particularly poor end-user experience and then back track to determine which underlying component, which underlying system caused that. It’s truly madness to try to make sure all of the pieces are 100% all of the time, because if something doesn’t impact an end-user, you don’t have time to go fix it.
Start with the End User
All things considered, you need to start and end with the end-user experience. There is so much complexity in these systems that by looking at each individual component or individual backend apps, you’re going to lose sight of what’s important. One of the trends that’s making this true and making it easy to go to these distributed systems is that technologies used to be in very distinct silos in which they talked to each other. You bought SAP and SAP talked to SAP, or you wrote a piece of software and you used open source libraries in order to create it and write different services.
Today, systems no longer stand or function on their own. With people using an ERP system that’s backed by a user analytics open source Hadoop database, which interacts with some custom code to do BI reporting on top of that, it’s increasingly important to do the full-stack monitoring. Monitoring of all components (network, web, application, etc.) ties everything together and lets you focus on improving the end-user experience with the entire system.