TraceView by Example: LAMP Edition
by January 23, 2014

Filed under: Application Performance Management, Performance Management Tech

AppNeta no longer blogs on DevOps topics like this one.

Feel free to enjoy it, and check out what we can do for monitoring end user experience of the apps you use to drive your business at www.appneta.com.

You have a web app, and you care about its performance. How does TraceView look at what it’s doing and help you optimize, troubleshoot, and save time and money? In this blog post, we’ll show you how to review the structure of a simple LAMP stack, what insights you can get right out of the box with TraceView, and what kinds of problems you can easily diagnose and resolve.

Our Guest Today: LAMP

You may be familiar with this acronym: Linux, Apache, MySQL, and PHP. Of course, TraceView works with many different stacks, but LAMP is a very common web application engine. Here’s what the stack looks like, assuming that your database is running on a separate host, either physically or virtually:

basic_lamp

Hosts + Layers = Apps

Before we get our hands dirty, let’s take a moment to break down the LAMP stack into various parts.

  • Hosts: Web applications are software running on machines, be they physical or virtual. In this example, the Hosts are alice and bob. Hosts form the underpinning of performance with the resources they provide for the app to consume.
  • Layers: Each component that is involved in servicing requests for your app is a Layer of the stack. In this example, Apache, PHP, and MySQL are each considered Layers. Typically, you might think of each service that runs in a distinct process as a different Layer.
  • Apps: Finally, your entire production LAMP stack is a logical grouping of software components and hosts that work together to serve requests. Let’s call this an App. In the simple LAMP case, everything is considered part of the same App. If you have a separate staging environment, that’s a separate App.

How Does TraceView Work?

How does TraceView gather data about Hosts, Layers, and Apps? Each is handled slightly differently. The following diagram illustrates automated instrumentation points in green:

basic_lamp_inst_2

  • Hosts: You want to know whether your app is starved for resources or overprovisioned. For this reason, the Tracelyzer agent, installed on each host, gathers metrics about utilization of CPU, memory, I/O, etc. The Tracelyzer is a lightweight daemon, provided as an APT or YUM package, that you can install on every host in your deployment that you want to monitor.
  • Layers: Each Layer is presented slightly differently by TraceView depending on what features are interesting: for example, when considering the MySQL Layer we are probably interested in queries, tables, and databases, and when looking at Apache we will want to see response codes and HTTP methods. TraceView follows requests through your stack, starting at the top layer, which is usually some kind of webserver. To get visibility into the request starting at the web server, you install instrumentation in the form of a plugin module (eg. Apache or Nginx module). In the LAMP stack, this instrumentation watches for when requests arrive into Apache from clients, when Apache passes along the request to PHP, when PHP replies with content, and when Apache finally passes that back to the end-user.

What about getting more detail about PHP and MySQL?  You can go even deeper, if you install instrumentation for the next layer of the stack down: PHP. That instrumentation, provided as a PHP extension, listens for incoming requests (e.g. when Apache hands them off), records interesting events such as calls to MySQL, errors, and more, and then notes when PHP finished processing the request and returned control to Apache. Because PHP is watching what queries it executes, there’s no need to run a modified MySQL server.

  • Apps: An app is a logical grouping of Layers running on Hosts. It’s defined by you, the user, to help you keep track of everything you care about. By default, all new layers are added to an app called “default.” User-created apps are defined by their entry point; The point at which requests enter the stack. In our example, that is Apache running on alice. After you have defined the entry points for an app, TraceView will follow requests passing through the entry point and automatically populate the remaining hosts and layers in your stack. Because the discovery happens automatically, TraceView understands when multiple Apps are sharing Host or Layer resources (eg. a database). A TraceView account can support an unlimited number of apps, which can be useful to segment your performance data across different environments.

Solving Performance Problems

All of this information comes together in TraceView to help you solve web application performance problems. Here’s a few examples of finding problems in a LAMP stack.

  • Request Queuing: One common tuning problem is getting the correct number of worker processes/threads in the application layer. When this number is too low, it can causes requests to queue in the webserver, a bottleneck adding latency to every request. Because TraceView starts monitoring requests at the web server level, it’s easy to catch this problem–if too much time is being spent in Apache before the request is handed off to PHP, this is the likely culprit.

    In the picture above, we can see that a significant amount of time was spent in the webserver before it was able to pass the request to a worker thread at the app layer.

    In the picture above, we can see that a significant amount of time was spent in the webserver before it was able to pass the request to a worker thread at the app layer.

  • Bottlenecks: What’s the worst leg of the critical path for a given request? We watch every query made in each trace, which lets TraceView show you the low-hanging fruit in terms of slowest queries, frequently-executed queries, and queries that keep users waiting the longest overall. And the same for remote service calls, methods in your app, and more!top_queries
  • Query Loops: It’s generally faster to retrieve a set of rows from a table all at once rather than making a separate query roundtrip for each. However, an unfortunately common design pattern is to fetch a list of items, then execute some other query (eg. to get an attribute of the item) once for each. This can be hard to catch if you’re just looking at slow query logs, because generally each individual query is relatively quick. But they start to add up. Because TraceView is watching each called to request makes, it’s easy to isolate and optimize query loops.query loop
  • Underprovisioning: Easy juxtaposition of latency graphs and host-level metrics allows you to quickly determine if bad latency is related to lack of machine resources. That’s why we pull in host data along with tracing data:

    Look at how the latency rise coincides with increased disk latency--perhaps you shouldn't be backing up from your web nodes?

    Look at how the latency rise coincides with increased disk latency–perhaps you shouldn’t be backing up from your web nodes?

With all this, what are you waiting for? Sign up for your free account today, and start monitoring the full stack!