X
    Categories Performance Monitoring

Tracing Black Boxes I: JMX Insight Into JVM Performance

AppNeta no longer blogs on DevOps topics like this one.

Feel free to enjoy it, and check out what we can do for monitoring end user experience of the apps you use to drive your business at www.appneta.com.

As an APM sales engineer at AppNeta, I’m constantly exposed to new technologies, and over the past few months I’ve gained a lot of insight into some fairly obscure parts of the Java web ecosystem. Where else would I have the opportunity to learn about CocoonFelix, or Railo?

I’m primarily a Python developer, though – even some well-known features of Java are new to me! That’s why I’ve found myself so surprised by the amount of interest in our JMX support and the level of excitement over our recent release of it. What the heck is JMX, and why should I care that TraceView now supports it?

The first question was easy to answer: JMX stands for Java Management Extensions, and while Wikipedia’s page about JMX is a bit sparse, this IBM developerWorks guide provides a much better overview of the concept. The one-sentence takeaway about JMX is that it’s a set of tools for objects to expose monitoring or manipulation methods to external applications by registering themselves with a management server.

What about the second question, though?

We collect detailed data about Tomcat, Spring, Webflow, and JBoss on requests to our PathView Cloud site.

Java applications are typically run as persistent processes that respond to requests. Within TraceView, we already collect a huge amount of data about how each layer handles the requests that pass through it. The request-level data found in a trace is best interpreted alongside contextual information like request volume or server load. But it turns out that if we try to monitor a Java application without JMX statistics, we’ll have a hard time providing anything beyond limited context. Check out this graph:

Using TraceView to monitor application memory usage on an AWS EC2 server running a Tomcat application.

Here we’re looking at a Java server’s memory usage holding steady at just under 200 MB. That’s because Java was the most resource-intensive application running on this server, and it didn’t request any additional memory from the operating system or free any requested memory for other applications. But while this graph is accurate, it’s missing out on any activity that takes place beneath the surface of the JVM. Check out this diagram taken from this excellent primer on JVM memory regions and tunable parameters:

Memory regions within the Java HotSpot JVM.

What we were seeing in the previous graph was OS memory allocation to the JVM, with no information on any of the layers beneath that. In this particular case, the memory allocated to the JVM remained constant, so we didn’t see any change in the graph over time. As it turns out, this is pretty common: it’s standard practice for Java applications to set their initial and maximum heap sizes to the same value so that all OS memory allocations occur on startup. The upshot of this is that until I start or stop a Tomcat application, the OS memory usage on my server will never change!

The JVM has control over further allocations within its memory region. One of its main levers is garbage collection. This process controls both the deallocation of expired objects and the movement of surviving objects between the various intra-JVM memory regions. The JVM’s choice of garbage collection strategies can be very relevant to performance of the applications built on top of it. Some strategies temporarily interrupt program execution (the ‘stop-the-world’ approach), while others avoid this problem but are less efficient as a result (the incremental and concurrent approaches). Different GC strategies can also have dramatically different scaling properties in heavily parallelized environments.

There are a number of JVM configuration options that provide control over memory allocation and garbage collection strategies, but turning dials in the wrong direction can be far worse than not making use of them at all. If we want to make intelligent decisions, we’ll need evidence to base them on, and that means we’ll need a tool capable of collecting data about the JVM’s operations – and it just so happens that this is one of the most common uses of JMX statistics.

Using `jconsole` to monitor changes in heap memory use over time.

We could monitor JMX statistics using

1
jconsole

, a GUI tool bundled with the JDK that’s capable of connecting to local and remote JVMs. But aside from my doubts about the healthiness of long-term exposure to Swing GUIs, the eagle-eyed among you might have noticed that from my screenshot that I’m actually running

1
jconsole

remotely. Astoundingly enough, using the notoriously slow method of X11 forwarding over SSHwas the least painful of the variety of approaches (StackOverFlowDaniel KunnathGabe Nell) that I tried for gathering JMX statistics from a remote EC2 server.

Once I remembered that I actually value my time, I switched over to TraceView to inspect my server’s JMX statistics. As a matter of pride I won’t say how long it took to get

1
jconsole

running, but let me just note that even reading the articles I linked would take more time than it did for me to deploy the newest version of the TraceView package (

1
sudo apt-get update

) and restart Tomcat (

1
sudo service tomcat6 restart

).

Using TraceView to monitor several JVM memory regions on the same Tomcat server over the same time period.

These graphs depict the size of the JVM’s memory regions. Objects are initially given memory allocated from the Eden Space, and when it passes a usage threshold a minor garbage collection is triggered. During the minor garbage collection, the JVM either deallocates objects or moves them into the Survivor Space depending on whether they are still in use. This is why the

1
Eden_Space.Usage.use

graph has long periods of steady growth followed by sudden, sharp declines. Changes in the

1
HeapMemoryUsage.used

graph mirror those in the

1
Eden_Space.Usage.used

graph because that’s the only portion of heap memory being modified by the minor garbage collection. Finally, since minor garbage collections primarily affect heap memory, the other (non-heap) graphs remained flat over this period.

Convenience and aesthetics aren’t the only reasons to use TraceView to monitor your JVMs. Make some changes to your Tomcat configuration? If you were using

1
jconsole

for your monitoring, you’d be disconnected when you restarted Tomcat – and all of your graphs would reset! And unlike

1
jconsole

, we support adding annotations to track what changes you’ve made so that you can more efficiently monitor their performance impact. Best of all, any interesting data you find can be communicated to other members of your organization just by sharing a link.

In the end, it’s the flexibility offered by JMX statistics that has really convinced me of their value. I’ve focused on describing their use in JVM monitoring because this is one of the most common use cases. But while TraceView collects JVM-level JMX statistics by default, adding more collection points is just a matter of modifying

1
monitor.jmx.scopes

in the JSON configuration file.

Whether you wrote your MBeans in-house or inherited them from a development team you’ve never met, you can use TraceView to monitor them – and if you can monitor them, that means you can graph them, alert on them, and otherwise use them to manage your application’s performance. For a Java developer, there’s no reason that JMX statistics shouldn’t be as accessible as system load or disk latency, and TraceView makes that a reality.

James Meickle: James started as a hobbyist web developer, even though his academic background is in social psychology and political science. His favorite language is Python, his favorite editor is Sublime, and his favorite game is Dwarf Fortress.