The TraceView team has been busy spending time with our users, especially our users in operations who are responsible for making sure their apps can deal with fluctuations in traffic. We are happy to announce that on top of recent UI updates, we are rolling out a chart of the total number of requests to your app. You’ll see it appear beneath the latency charts on the main app server page once you’ve updated your instrumentation.
This chart marks a newly exposed total request count that will help you to see how your app is impacted by increases in traffic. This lets you easily correlate any latency in your app with traffic spikes and also identify any resources that deteriorate under load.
- New Users: Relax, there will be no work to be done! You’ll automatically be set up with the new instrumentation.
- Existing Users: Upgrade your instrumentation from the Settings (gear icon) > Instrumentation menu item in TraceView and then restart the services that make up the application.
Updated: Users on all supported languages may now utilize updated instrumentation to see total request numbers.
Any time period that includes the new request count will begin to show data in a few places, detailed below.
For apps that have request counts, the total requests will be shown in place of the call volume on the Overview page.
Below the Layer Breakdown chart for each app, users will now see the total requests (once the setup steps have been followed) above the call volume for each app. Please note that filtering data to specific layers will hide this chart as the total requests are counted at the app level.
Similar to the Layer Breakdown, unfiltered results will display both the total request count and the number of traces collected below the heatmap.
I’d like to discuss a point that is bound to arise with the juxtaposition of the total requests and number of traces collected. To keep overhead to a minimum, TraceView samples your data. To explain, it’s best to look more deeply at the way our instrumentation agents work. Our language agents report data by randomly sampling requests coming into the system.
Most monitoring tools report data differently: viewing every request and once a minute sending a group of statistics and roughly 10 interesting traces. To get those “interesting traces”, it requires making decisions in real time about what to trace on your system, which leads to increased overhead and much less reviewable data.
TraceView aims for a statistically representative subsection of total requests to trace, which typically means closer to 1,000 traces in that same one-minute time period. Additionally, TraceView can stream this data to the reporting dashboards much faster, because we don’t have to wait on minute-by-minute sampling decisions.
For high throughput apps, or those that have a high request count, TraceView’s model scales to provide thousands of requests while remaining at less than 1% overhead.
We hope the new graph helps you even more quickly identify any resources that struggle under high traffic. If you’ve any feedback, we’d love to hear it — you can email email@example.com, tweet @AppNeta or leave a comment below!