Filed under: Performance Monitoring
When you are being tasked to find and fix those pesky slow pages in your web app, you might have hard time finding where to start. Identifying top performance bottlenecks is always difficult, since there are many factors that lead to slow server response time : slow application logic, poorly indexed database queries, a myriad of frameworks/libraries, or even network conditions. If you do not have access to tools that give you insight into core layers of your server stack, you might have to rely on your experience to come up with the list. An alternative “poor-man” approach that I had, is to examine any complicated logic that you are already familiar with – business logic in application stack, and lengthy DB queries. Obviously this approach is subpar and dependent to the range of skill and expertise that one possess. Anyway, let’s set that aside and focus on what you should do once you have applied a few optimization tricks to your code.
Once you’ve taken a pass over it, now what? The page seems to be running faster, but how fast? How should I go and quantify my improvement? Browsers today provide various timing information to load your webapp. By subtracting network latency from the time it takes to load enough HTML to begin rendering, should net you the server response time. However, a single measurement is not enough to determine the actual impact, especially when you are trying out more than one optimization changes. It would be ideal to measure the response time – over a reasonable large number of requests – and also in a matter of few minute, so you could iterate through changes and find the perfect combination to maximize the gain.
$ ab http://my-test-vm:8080/webapp/slow-end-point -c 5 -t 60
Firing 5 concurrent requests non-stop for 60 seconds
Apache Bench is a simple command-line tool that allows you to fire requests to an URL, and see how fast the web-application could process them [Documentation]. Originally it is designed to measure the performance of Apache HTTP server, but the tool is equally handy when you look at the server response time for a particular URL. Here is the result coming from one of the slowest page that I have worked on in the past:
Concurrency Level: 5 Time taken for tests: 300.386 seconds Complete requests: 383 Failed requests: 0 Write errors: 0 Keep-Alive requests: 0 Total transferred: 4809673395 bytes HTML transferred: 4809544035 bytes Requests per second: 1.28 [#/sec] (mean) Time per request: 3921.491 [ms] (mean) Time per request: 784.298 [ms] (mean, across all concurrent requests) Transfer rate: 15636.36 [Kbytes/sec] received Connection Times (ms) min mean [+/-sd] median max Connect: 0 4 1.9 4 8 Processing:2080 3901 835.9 3871 6720 Waiting: 444 1817 714.7 1697 4216 Total: 2086 3905 835.1 3872 6723 Percentage of the requests served within a certain time (ms) 50% 3871 66% 4248 75% 4460 80% 4587 90% 4960 95% 5301 98% 5886 99% 6204 100% 6723 (longest request)
The first section shows summary of the test run. One thing to watch out under this section is the number of failed requests. If any amount is reported, you might be running an older version of Apache Bench. The second section contains timing information of “Connect”, “Processing” and “Waiting”. “Connect” is the network latency plus overhead to establish a connection with remote host, “Processing” is the time between sending first byte of request and receiving first byte of response, and finally, “Waiting” is the time between sending last byte of request and receiving first byte of response. The last line “Total” is the time till the first byte of response, and can be derived by summing “Connect” and “Processing”.
What we really care here is “Processing”, which is the amount of time it takes for server to process a single request. Mean (aka average) is the key metric, allowing you to keep track of your progress in improving performance, however, the other metrics are equally important as they validate against your tests. Standard deviation measures how far apart the data points. A smaller value implies that the server is fairly consistent to handle bursts of requests sent from AB. If you are running a test that last more than ten minutes, you might encounter a few outliers that greatly exceed the average, and they would have drastic effect on both mean and standard deviation. To avoid rare outliers that might skew calculation of mean, verify that the maximum is not too far off from the average. The other two remaining metric : median and minimum, is best to look at together. Simply divide median by minimum – yields a value that give you a rough idea how variant the typical requests are. A low median/minimum (with an ideal value of 1) confirms that the data points are clustered around median – a good sign indicating server is operating normally.
If you are seeing any abnormal values from these supporting metric, you should tone down the concurrency level. One other common problem is running AB in the same machine that hosts the web-server. I recommend running AB on a separate host, so that the benchmark tool does not impact performance of the server.
Here is the final measurement from the same page, after I have made few improvements to both application layer and database queries. The average time per request dropped from 4 seconds to 1 second. Standard deviation is significant lower, implies that the server is processing all the requests in timely manner, as evident by the less parity between 50 percentile and 90 percentile.
Concurrency Level: 5 Time taken for tests: 300.225 seconds Complete requests: 1092 Failed requests: 0 Write errors: 0 Keep-Alive requests: 0 Total transferred: 13702081248 bytes HTML transferred: 13701851088 bytes Requests per second: 3.64 [#/sec] (mean) Time per request: 1374.657 [ms] (mean) Time per request: 274.931 [ms] (mean, across all concurrent requests) Transfer rate: 44569.68 [Kbytes/sec] received Connection Times (ms) min mean [+/-sd] median max Connect: 0 2 1.9 1 12 Processing:759 1369 184.1 1358 2401 Waiting: 2 5 2.8 4 30 Total: 760 1371 184.4 1361 2401 Percentage of the requests served within a certain time (ms) 50% 1361 66% 1425 75% 1479 80% 1510 90% 1602 95% 1681 98% 1767 99% 1871 100% 2401 (longest request)
- Make sure to test out target URL first with tools like curl, in order to avoid running performing tests against a 404 or redirect.
- If the target URL is behind authentication wall, then try signing into your application first through browser in order to obtain the session-key, and then use -C command to append session-key into request header.