X
    Categories Performance Monitoring

Full Stack Load Testing in Boston

AppNeta no longer blogs on DevOps topics like this one.

Feel free to enjoy it, and check out what we can do for monitoring end user experience of the apps you use to drive your business at www.appneta.com.

We had a great turn-out last night at the Boston Web Performance Meetup, where we talked about full stack load testing. Performance testing is funny, because unlike a lot of correctness testing, it frequently relies on having a full replica of production, with production level data, servers, and more.

The inspiration for this talk was really tcpreplay, a packet replay tool used for testing networks and packet monitoring tools like FlowView. Unlike web apps, which rely on scripted interaction, tcpreplay uses packet capture to play back real traffic at a variable rate. It even includes an option for re-writing IPs on the fly, which can stress data collection by increasing the number of different dimesions necessary.

Check out the slides, here:

There’s a couple key features of this approach, that I think are critical in load testing web apps:

  • Control the load directly: While many tools can do this incidentally (per number of connections), it’s important to think about what the primary scaling metric you care about is. Is it number of users? Logged in users? Searches per minute? This depends on your application, so you can’t skimp on thinking about it.
  • Use real traffic: Scripting is easy the first time, but it’s fragile over time, which can lead to blind spots. Sure, your ecommerce site performs like a champ when there are 10,000 concurrent users viewing products, comparing recommendations, and checking out … but it falls over when there are 15 people searching at once. Whoops.

    If you have RUM data available, you can use that to base your http://appneta.com/products/appview on. That gives you the advantage of both a stable set of data, but also something you can trust is representative.

  • Push on the sticky points: Knowing the secondary metrics that may cause you to fail are important. We found recently that we’re comfortable with the scalability of our trace processing pipeline when you add more data, we had less experience with what happens when individual events take longer. In this specific case, S3 was slow and/or unavailable, which caused issues in the pipeline. Knowing that this is a failure mode can inform future load tests.

Thanks to everybody who came out last night, and look forward to seeing you next month!

TR Jordan: A veteran of MIT’s Lincoln Labs, TR is a reformed physicist and full-stack hacker – for some limited definition of full stack. TR still harbors a not-so-secret love for Matlab-esque graphs and half-baked statistics, as well as elegant and highly-performant code.