Five tricks to analyze Java thread dumps
In this post, we will present 5 tricks to help you read and understand Java Virtual machine thread dumps. When properly analyzed, thread dumps can be an incredible tool to understand what is happening inside a Java Virtual machine at a given point in time, or if misread, could lead you to false conclusions that might make you waste a lot of time. The point of this post is to present some information that is not yet available in resources available online (see the “Further reading” section at the bottom of this post) to help you navigate the information contained in these thread dumps to hopefully help you improve performance, detect bottlenecks or even simple application bugs. If you are new to thread dump analysis it might however be better that you started with one of the articles linked below.
First of all, it is important to remember that a thread dump is just that: an instantaneous image of what is going on in the list of threads currently executing inside the Java virtual machine.
Let’s look at an extract of a thread dump:
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.162-b12) "Atmosphere-Scheduler-3162" nid=137911 state=TIMED_WAITING [] java.lang.Thread.State: TIMED_WAITING - waiting on <0x2db6a729> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) - locked <0x2db6a729> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:1129) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "http-nio-8080-exec-10" nid=1044 state=RUNNABLE [] java.lang.Thread.State: RUNNABLE at sun.management.ThreadImpl.getThreadInfo1(Native Method) at sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:176) at org.jahia.tools.jvm.ThreadMonitor.dumpThreadInfo(ThreadMonitor.java:352) at org.jahia.tools.jvm.ThreadMonitor.getFullThreadInfo(ThreadMonitor.java:500) at org.jahia.tools.jvm.ThreadMonitor.generateThreadInfo(ThreadMonitor.java:472) at org.jahia.bin.errors.ErrorFileDumper.outputSystemInfo(ErrorFileDumper.java:594) at org.apache.jsp.threadDump_jsp._jspService(org.apache.jsp.threadDump_jsp:139) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:109) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:384) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:486) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:380) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org.jahia.bundles.extender.jahiamodules.JspServletWrapper$3.call(JspServletWrapper.java:143) at org.jahia.bundles.extender.jahiamodules.JspServletWrapper$3.call(JspServletWrapper.java:140) at org.ops4j.pax.swissbox.core.ContextClassLoaderUtils.doWithClassLoader(ContextClassLoaderUtils.java:60) at org.jahia.bundles.extender.jahiamodules.JspServletWrapper.service(JspServletWrapper.java:139) at org.apache.felix.http.base.internal.handler.ServletHandler.doHandle(ServletHandler.java:163) at org.apache.felix.http.base.internal.handler.ServletHandler.handle(ServletHandler.java:150) at org.apache.felix.http.base.internal.dispatch.ServletPipeline.handle(ServletPipeline.java:69) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:49) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:33) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:78) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:47) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:33) at org.apache.felix.http.base.internal.dispatch.FilterPipeline.dispatch(FilterPipeline.java:48) at org.apache.felix.http.base.internal.dispatch.Dispatcher.dispatch(Dispatcher.java:39) at org.apache.felix.http.base.internal.DispatcherServlet.service(DispatcherServlet.java:67) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org.apache.felix.http.proxy.ProxyServlet.service(ProxyServlet.java:60) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:176) at org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:145) at org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:92) at org.jahia.services.seo.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:125) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:343) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.jahia.bin.filters.CompositeFilter$VirtualFilterChain.doFilter(CompositeFilter.java:133) at org.jahia.bin.filters.CompositeFilter.doFilter(CompositeFilter.java:114) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:343) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.jahia.bin.filters.jcr.JcrSessionFilter.doFilter(JcrSessionFilter.java:133) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:343) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.jahia.bin.filters.spnego.SpnegoHttpFilter.doFilter(SpnegoHttpFilter.java:105) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:343) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61) at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108) at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137) at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:81) at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66) at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108) at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137) at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:81) at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66) at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:359) at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:275) at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90) at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83) at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:344) at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:272) at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:81) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:343) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.jahia.bin.filters.MaintenanceFilter.doFilter(MaintenanceFilter.java:95) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:343) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.jahia.bin.filters.HttpHeadRequestFilter.doFilter(HttpHeadRequestFilter.java:86) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:88) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:106) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:617) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1091) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:668) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1521) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1478) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748) "http-nio-8080-exec-9" nid=1043 state=WAITING [] java.lang.Thread.State: WAITING - waiting on <0x4d32e193> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) - locked <0x4d32e193> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:103) at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:31) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748)
We can see that the Java Virtual Machine thread dump lists all the threads one by one along with the stack trace dump of what each thread is currently executing.
Each thread is dumped in the following format:
“THREAD_NAME” nid=THREAD_ID state=THREAD_STATE [THREAD_LOCK_INFO] java.lang.Thread.State: TIMED_WAITING
Where:
- THREAD_NAME is a name for the thread. Actually this can already be a good source of information if the developers gave it a meaningful name. In our example, the “http-nio” prefix indicates that this thread is used to process HTTP requests using Java NIO non-blocking input/output system.
- THREAD_ID is a thread unique identifier in the JVM. This is useful notably when studying monitor locking between threads, because the thread dump will indicate which thread currently holds the lock and will refer to it by its nid value.
- THREAD_STATE is the actual state the thread is in (directly from the official Javadoc) :
- NEW : A thread that has not yet started is in this state.
- RUNNABLE : A thread executing in the Java virtual machine is in this state.
- BLOCKED : A thread that is blocked waiting for a monitor lock is in this state.
- WAITING : A thread that is waiting indefinitely for another thread to perform a particular action is in this state.
- TIMED_WAITING : A thread that is waiting for another thread to perform an action for up to a specified waiting time is in this state.
- TERMINATED : A thread that has exited is in this state.
- THREAD_LOCK_INFO : This value may differ on different JVM implementation, but will usually indicate whether the thread is holding a monitor lock or if it is waiting to lock a specific monitor lock object.
As we can also see in our example, more information about locks held and waiting information is listed before the actual stack trace starts. This information is very useful to check for deadlocks or any kind of long-held locks that may be abnormal.
Now that we’ve gone through these basics about thread dumps, let's dive right into the tricks.
Trick #1 : Know what to ignore in a Thread Dump
As there is a lot of information in thread dumps, it might quickly be overwhelming, especially in server systems that manage hundreds or sometimes even thousands of simultaneous threads. As you can imagine in such a case the thread dump output is very long, and can be tricky to analyze.
Our first trick is knowing what to ignore safely. In our example, if we look at the first thread again, it looks something like this:
"Atmosphere-Scheduler-3162" nid=137911 state=TIMED_WAITING [] java.lang.Thread.State: TIMED_WAITING - waiting on <0x2db6a729> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) - locked <0x2db6a729> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:1129) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
If you look at the stack trace you will notice the usage of the LockSupport.parkNanos method that is documented here. We are here actually in the case of a parked thread. A parked thread is disabled for scheduling until it gets a “permit” from a another thread, that will allow it to start executing again, or it will restart when the timeout value has been reached.
If we look further down the stack trace we see that we are actually inside a ThreadPoolExecutor, that is basically waiting for a new task. So in effect this thread is just waiting for a task to execute, this means that we can basically ignore it from our analysis since it is not going to affect our performance in any way.
This is a good first example of threads that can be ignored in our analysis. Another example are threads that are waiting for a request, such as a pool of HTTP threads that are just waiting for an incoming request. Although it looks like they are holding locks and waiting on monitors they are actually just parked threads and we can safely ignore them.
Trick #2 : Look at your code first
In the first trick we have looked at stack traces that include code that is either included in the JDK or coming from a library. But in the first steps of analysis it is usually better to analyse the stack traces by looking at the parts that contain your own application code. These sections will make a lot more sense to you and you will be able to quickly identify which part of the application logic is currently executing. This context will make it easier for you to check if the behavior is normal or not.
This trick is actually implemented by some tools such as the YourKit Profiler (which has a CPU profiling tool that actually uses thread dumps to perform CPU analysis) that will by default “hide” stack traces coming from the JDK and known libraries to help you focus on your application code.
However, if everything looks ok in your code, you should then start looking at the libraries, server and JDK code to understand what could be going on. Of course in this case it helps a lot if you are working with open source code, as you will be able to directly look at the source code to understand how the code’s logic actually works. In the case of closed source (or even worse obfuscated code), you will probably not be able to do any kind of analysis and will have to focus mostly on your application code hoping you can figure out if anything is properly behaving.
Trick #3 : The longer the thread stack, the more suspicious it is
This trick might seem strange, but it is based on my personal experience, and has proven to be a reasonably good indicator of potential problems. Basically, the longer the stack trace, the more probable it is that something is misbehaving.
However I should point out quickly that in some complex applications long stack traces might be normal, and so this trick is not 100% reliable and requires a good understanding of the application being analyzed to make sure that you are not falsely flagging a thread as a problem.
Despite this warning, long stack traces mean that your application has a lot of code it needs to execute to perform its task, so usually that could imply that the performance of the application could be improved somehow.
In the worst cases, stack traces might become infinite if there is an infinite loop, but that problem will usually be caught early at the development phase so it is usually not that relevant for production analysis, although it might still possible in some obscure edge cases.
Trick #4 : Look at multiple thread dumps that are done in short intervals
I haven’t seen this recommendation very often, mostly because a lot of people already struggle analyzing a single thread dump, but it can actually be very helpful to generate multiple thread dumps close to one another (for example at a 2-5 second interval) to have a better understanding of the threads evolutions over time.
As stated in the introduction, thread dumps are an instantaneous snapshot of the threads being executed in the JVM, and in a lot of cases it is not easy to understand if a thread is behaving normally or not because a single snapshot could either indicate normal or abnormal behavior.
With multiple thread dumps close to one another, we can start checking if threads have indeed continued executing or if they are “stuck” on a specific piece of code, for example waiting for a network response or waiting on a monitor.
You do need to be careful with this type of analysis because false negatives are also possible, because if two stack traces are identical between two thread dumps it could mean either:
The thread is stuck at this particular part of the code
The thread has continued executing normally but has again been executing the same part of the code
It is important to keep in mind these two possibilities while performing the analysis or false conclusions might be reached which might cause confusion and in the worst cases a lot of time wasted on fixing a problem that isn’t there to begin with.
Trick #5 : Tools might help
Because of the length and complexity of thread dumps, it can be very useful to use tools to help you analyze them. One of the most recent newcomers to these types of tools is also quite powerful and available as an online service: http://fastthread.io The flip side of this easy availability is the fact that you will have to send thread dumps to a third party company and this might be a huge confidentiality or data privacy issue. So make sure that any thread dumps you test with this service are safe to send out (They also have a commercial on-premise version).
TDA Thread Dump Analyzer is a Java Swing application that also is capable of analyzing thread dumps, including parsing log files to find multiple thread dumps. As an application you can run locally on your computer you will therefore not have to upload any thread dumps to a third party and remain in full control of your data.
You will probably also find older tools that are no longer maintained, such as IBM Thread and Monitor Dump Analyzer but I wouldn’t recommend them anymore as they will probably not work with recent JVM thread dumps. However if you need to analyze old IBM JVM thread dumps this tool might still be useful.
Another interesting tool, Spotify’s Java Thread Dump Analyzer, although still quite young, works directly in a browser because it is implemented in Javascript.
It is worth mentioning that some IDEs also have tools to analyze thread dumps, such as Eclipse or JetBrain’s Intellij IDEA.
This brings us to the big problem with JVM thread dumps is that the output format of these has been changing over Java releases, and it means that if tools are not regularly updated they will not be able to parse more recent JVM format.
In general, you will probably find a few tools when researching, but you will probably have to test multiple ones to not only find one that fits your needs but the one that is also properly maintained and still works with recent JVM thread dump formats.
In a nutshell
These few tricks should help you analyze your thread dumps in a more productive way, but they are far from exhaustive. There are also a lot of other nice articles available on the internet on the subject of thread dump analysis. You will find a list of them at the bottom of this article.
I would also love to hear from you; what are your best tricks to analyze thread dumps ? Did I miss any important tools that you could recommend ? Please don’t hesitate to help out the community by sharing them in the comments below.
Further reading
Other good articles on the subject of thread dump analysis :
https://dzone.com/articles/how-analyze-java-thread-dumps
https://sites.google.com/site/threaddumps/java-thread-dumps