Analyzing system performance with YourKit Java Profiler
Among the problems that are the hardest to diagnose in a web application installation are performance issues. They may arise from either software issues, a back-end system that is not behaving as expected, a network issue or any other internal or external factor (don't forget to check available disk space :)).
Fortunately there are a lot of different tools available, ranging from built-in JVM tools to Java Profilers. Java Profilers are usually a "last resort" type of solution, as they are the tools that have the most effect on performance, but when all other tools have failed to properly diagnose a problem, usually the profiler should raise the issues properly.
We at Jahia have been using for a long time the YourKit Java Profiler. It was initially designed to be used in pre-production systems and we have used it successfully even when benchmarking our software under high loads. Of course the profiler has an impact on performance (and also on memory usage), but if well configured, these may be controlled and taken into account.
Quick disclaimer
In this guide we will talk about some of the important options used to diagnose performance issues on pre-production or production systems, but of course we do not give any guarantees to the stability nor the performance of the production system when running with the profiler activated, and will not be help liable if you break anything. Using these tools should be a last-resort type of solution, once you have tried alternative diagnosis scenarios and once you have practiced the methods and tools presented here at least a few times. So once again, we will not be held responsible if you break anything, but you might learn some interesting things in the process so it might be worth it.
Also this guide is in no way an exhaustive user manual to the YourKit profiler tool but simply serves as a starting point on diagnosing server issues with this tool. We highly recommend you use the YourKit online documentation to go more in-depth into how the product works.
Performance Impact of the profiler
In order to test the performance of the profiler, I did a quick JMeter test hitting on the Jahia home page of a default Jahia 6.6 installation with 400 threads on a laptop. This is the performance I got without YourKit profiling activated:
So we get 56.2 pages per second, which will serve as our reference performance benchmark. Now let's see what we get with the profiler activated (CPU Profiling with sampling and NO J2EE profile, Stack telemetry activated, nothing else):
As you can see there is an impact as we get 54.6 pages per second, but on the overall throughput the difference is about 2 pages/sec over 20'000 samples. You'll notice that the average page serving time has also increased and this is expected, but despite this we are still seeing acceptable performance. So it is indeed perfectly possible to use this profiling configuration during load tests or even in pre-production or production scenarios where we can accept the performance impact. If we calculate this as a percentage, using the throughput values, this amounts to a 3% slowdown due to profiling, which seems perfectly reasonable (as well as predictable).
Of course this quick benchmark is not very complex and there are cases where the profiler may have a heavier impact, but usually it should average in the same ballpark (and our own experience confirms this).
How to configure the Tomcat server
In the tomcat/bin directory:
1. Copy your startup.sh (Linux, Unix, Mac OS X) or startup.bat (Windows) file to a file we will call startup_with_yjp.sh/bat.
2. Modify the end of the file to look like this (this is the UNIX version, for Windows you will have to adapt it accordingly):
JAVA_OPTS="-agentpath:/Applications/YourKit_Java_Profiler_9.5.6.app/bin/mac/libyjpagent.jnilib=disablestacktelemetry,disableexceptiontelemetry,builtinprobes=none,delay=10000,sessionname=Tomcat $JAVA_OPTS"
export JAVA_OPTS
exec "$PRGDIR"/"$EXECUTABLE" start "$@"
Note the agent path that must be absolute path and point to the relevant Agent library for your OS. Here is the list of agents for YourKit 9.5.6 taken from this page: http://www.yourkit.com/docs/10/help/agent.jsp
Windows, 32-bit Java |
-agentpath:<profiler directory>\bin\win32\yjpagent.dll |
Windows, 64-bit Java |
-agentpath:<profiler directory>\bin\win64\yjpagent.dll |
Mac OS X |
-agentpath:<profiler directory>/bin/mac/libyjpagent.jnilib |
Linux x86, 32-bit Java |
-agentpath:<profiler directory>/bin/linux-x86-32/libyjpagent.so |
Linux x86, 64-bit Java |
-agentpath:<profiler directory>/bin/linux-x86-64/libyjpagent.so |
Solaris SPARC, 32-bit Java |
-agentpath:<profiler directory>/bin/solaris-sparc-32/libyjpagent.so |
Solaris SPARC, 64-bit Java |
-agentpath:<profiler directory>/bin/solaris-sparc-64/libyjpagent.so |
Solaris x86, 32-bit Java |
-agentpath:<profiler directory>/bin/solaris-x86-32/libyjpagent.so |
Solaris x86, 64-bit Java |
-agentpath:<profiler directory>/bin/solaris-x86-64/libyjpagent.so |
HP-UX IA64, 32-bit Java |
-agentpath:<profiler directory>/bin/hpux-ia64-32/libyjpagent.so |
HP-UX IA64, 64-bit Java |
-agentpath:<profiler directory>/bin/hpux-ia64-64/libyjpagent.so |
AIX ppc, 32-bit Java |
-agentpath:<profiler directory>/bin/aix-ppc-32/libyjpagent.so |
AIX ppc64, 64-bit Java |
-agentpath:<profiler directory>/bin/aix-ppc-64/libyjpagent.so |
Also you can note that there are some options after the agent path. These are the recommended defaults and it is not recommended to change them unless you know exactly what you are doing. You are free to change the session name though, which will also influence the default name of snapshots, so this might be very useful. You might want to use the machine host name instead of the default “Tomcat” name.
You should then be able to start the Tomcat server by using the new script :
startup_with_yjp.sh/.bat
If you make an error in this setup, Tomcat will probably not start, so be sure to check the log files in tomcat/logs if you have any problems.
If it startups properly, you should see a line like this in the logs:
[YourKit Java Profiler 9.5.6] Loaded. Log file: /Users/john/.yjp/log/1521.log
You might want to have a quick look at the contents of this log file, it contains notably information such as the port on which the YourKit agent will listen to incoming client connection.
This concludes installation of the YourKit profiler agent on the server. YourKit does provide a wizard to help install the agent on local servers, but usually for pre-production or production serves that are usually distant text-only servers it is not possible, and also not recommended, to install the YourKit client GUI on the server.
How to setup a profiling session for minimal performance impact
As the YourKit agent actually awaits instructions from the client, it will not slow down the Tomcat server until the client is connected and profiling options are activated. Of course on a production server where every bit of memory and CPU counts, it is best not to keep it installed all the time, but for on and off sessions you might want to keep it in place. Also, as we use a different script to start the server with or without the YourKit profiler agent, this makes it very easy to choose at startup time which configuration to use.
You can therefore start on your local machine the YourKit Java Profiler GUI. If you are running a Tomcat locally with the YourKit profiler agent, it will show up in the Monitor Local Applications are illustrated in the following screenshot:
If instead you want to connect to a remote server, you will need to use the following section of the UI :
Despite it asking for a port number, it is not actually required, and it will use the default (port 10001). Once connected, you should see the CPU Time and Threads telemetry appear :
Let’s activate some profiling options, making sure we only activate those that have a minimal performance impact. First, let’s activate Stack Telemetry:
Then, let’s activate the CPU Profiling, but here is it very important to uncheck the “Profile J2EE” checkbox, as it has an very important performance impact on high load systems.
Warnings :
- DO NOT ACTIVATE "Profile J2EE" in CPU Profiling
- DO NOT ACTIVATE Monitor Profiling
- DO NOT ACTIVATE Exception Telemetry
These are all the options we recommend you activate at this time. Of course it might be interesting to activate more when diagnosing specific problems, but this is not recommended on systems that have high loads. It is best to try to reproduce the scenarios in controlled copies of the environment. The purpose of this document is to show the options that may be used with a minimal overhead.
Another important warning is that when profiling is activated, the agent will start collecting data and therefore use more memory on the server. While YourKit does use memory very efficiently, you should avoid keeping profiling activated for long periods of time, or at least monitor the memory usage carefully to make sure the server is not running out of memory because of profiling options (actually even when no option is activated the telemetry is still being recorded, but should not consume too much memory. But if you want zero impact, you should restart the server without the agent).
To check that your profiling session is properly configured, your toolbar should look like this :
Once the profiling options are activated, you can see some data appear in near real-time, in the “Call Tree”, “Method List”, and other tabs. These are quick estimates and are not as precise as the calculations performed when retrieving a snapshot.
Despite these limitations, it is useful to look at the “Call Tree” or “Method list” while the profiling is active to get an immediate idea of where bottlenecks might be.
Threads view
As we have activated stack telemetry, the threads view is therefore available. This view is very useful for identifying blocked or waiting threads. If you click on a location in the graph it will print out the full stack trace for that moment. It is also possible to activate the “Show stack of selected thread only” check box to limit the stack trace to only the selected thread, making it easy to see where in the code the thread is blocked, waiting or active.
Quick tip:
Avoid running the profiler on the same machine that the Tomcat server is running on.
Snapshots
Snapshot are basically the equivalent of a recording. They start when a CPU profiling is started, and they end when a snapshot is captured. To capture a snapshot, you can click the following button:
Note that you have the option to leave profiling active after capturing the snapshot, which might be useful in some cases.
Snapshots, once captured, are retrieved from the distant server and may be directly opened in the GUI, or sent to us for analysis. Make sure, if you intend to send the snapshot, to compress the files as they might compress very well.
If instead you want to discard previously recorded data, you can use the following button:
Interpreting results (analyzing CPU snapshots)
Analyzing snapshots is closer to an art form than science. It can take precise analysis, persistence and sometimes sheer luck to pinpoint the source of a bottleneck.
We won’t got into much detail on how to interpret results but will just give some useful pointers for things to look at (or to ignore).
Wall time
First, something you will need to understand when looking at CPU profiling results, is that the profiler actually also counts the time some methods have spent waiting on I/O. This is called Wall time, and usually should be ignored in terms of areas to optimize. Of course if an external system such as a database or the hard disk is systematically involved, it might be interesting to note that the I/O performance must be improved to help the overall performance. But for most analysis you need to be aware of Wall time and “ignore” or at least factor it out. Here is an example of a Wall time report:
As you can see in the above example, Wall time for waiting on data to be available in the socket accounts accounts for 97% of the profiled time ! This is perfectly normal as a server such as Tomcat spends a lot of time waiting for clients to connect or to send data, so you can safely ignore this time.
You need to be careful that by default the “Call tree” view aggregates time spent in methods that may include methods with Wall time, so it is important to open them up to check if you can ignore this time or not. Of course if you do so you will have to remember that you have “ignored” 97% of the reported time, and therefore will need to take that into account when looking at the percentage reports. Anything that, in the above example, uses 1% of the CPU time suddently becomes very important !
Using the different CPU views
The CPU tab has interesting sub reports:
- Call tree (all threads together)
- Call tree (by thread)
- Hot spots
- Method list
These are all useful, despite what you might think initially. It is very tempting to use only the first call tree view, but if a same method is called from different locations in the code it might be hard to get the accumulated time spent in this method. For this the “Method list” will be much more useful, as it does aggregate all the calls in the whole code of a single method. The “Hot spots” is a specific case of the method list where the profiler has built a reduced list of the methods that amount to the most time. Unfortunately a lot of the Wall methods appear here so this view is not always useful. But if one of your methods appears in the “Hot spots” view this might indicate a serious problem which should be addressed.
Using the thread view
The thread view, as presented previously, can be very useful to detect threading issues. It is a good idea to browse the chart to look at places where threads might be blocked and to look at the associated stack traces to understand what might be wrong at that point in time.
Garbage collection
The garbage collection graphs are also very useful to check for any large pauses that might have slowed down the system. You should always review the graphs to check if that was the case. If you suspect a memory consumption issue, it might be interesting to generate memory dumps (either with YourKit or with the jmap JVM tool) and analyze them with YourKit.
Memory statistics (only available on memory dumps)
The memory statistics view is quite complex to interpret, so it might require a bigger learning curve. Again we won’t pretend to present an exhaustive description of its capabilities but rather give a few pointers on how best to use it.
It is generally useful to use the “merged paths” and “object explorer” views to understand how the objects are structured, what they contain and how they are related to each other.
As in the CPU reports, there are overlaps in the reported data as Java objects “contain each other”. So usually you will see the largest objects contain the second largest and so forth. So you need to be careful in the analysis as these are not entirely different objects but actually objects containing other objects. Keeping this in mind will make it easier to understand and analyze the report.
Indexing large memory dumps on the server
If you have large memory dumps on your server that you intend to analyze, it is highly recommended that you create an index file for them directly on the server that has generated them. The reason for this is that servers usually have a lot more available RAM than the desktop or laptops used to analyze the memory dump. For example opening up an 8GB memory dump will be impossible to do on a desktop or laptop because it will require a lot of swapping (as usually they will have 4GB of RAM), whereas a server might have 24GB or more available and it therefore makes more sense to do it there. Once the index file is created, YourKit requires a lot less memory to open the dump and this makes it then possible to transfer the memory dump and index files to the other computer for analysis (again remember to compress them, they compress very well).
To generate an index on the server, follow these steps:
- Download and install YourKit on the server
- Create the snapshot index file using the following command:
java -jar <Profiler Installation Directory>/lib/yjp.jar -create-index <snapshot_file_path> - Transfer all the files to the desktop
- Open the snapshot with YourKit, it will automatically use the index file.
Summary
As you have seen, using a profiler can be very useful to diagnose the internal workings of a Java system. We have presented all this in the context of the YourKit Java profiler because that is the tool that we use at Jahia but of course it is possible to use these procedures with other tools (remember to test the performance impact of the tools though !).
It is also a good idea to use a profiler when running validation load tests, so that you can verify that the code is running nominally.