Openj9: Improve jitdump functionality

Created on 3 Apr 2020  路  17Comments  路  Source: eclipse/openj9

Background

The jitdump is a dump agent [1] which collects JIT trace logs which can help investigation of OpenJ9 issues. This dump agent is enabled by default for general purpose faults and aborts [2].

A jitdump can typically help under two scenarios:

  1. A crash during a JIT compilation
  2. A crash in a JIT compiled method

For both of these scenarios we typically require a JIT trace log of the method in question for further investigation. Sometimes this is an iterative process, especially for case 2. as we may no know which area of the JIT compiler was responsible for generating the faulty logic in the JIT compiled method assembly. The iterative process may require us to learn more about the problem from every log, and suggest additional tracing options until we can pinpoint the problem.

For case 1. we often need to have additional tracing enabled of the area in the JIT that we crashed, in addition to having the JIT IL trees at hand.

Due to the dynamic nature of the JVM runtime environment, and the fact that the JIT compiler is guided by profiling information, a JIT compilation of a method in one JVM invocation may behave differently than a JIT compilation of the same method in a subsequent invocation of the JVM, even when the same environment and application is being run. This is a problem for servicing such issues if the first incident data collection did not capture enough information to be able to effectively service the issue and provide a resolution.

The typical result of the failure to obtain useful logging on first incident is that developers/service engineers must work with the stakeholder to reproduce the issue with additional tracing. This can take time and resources for both parties. A properly generated jitdump has a very high chance of reproducing the exact same compilation as the original, but with tracing enabled due to the fact that it runs in the same JVM process which produced the original faulty compilation. Therefore it is highly desirable to generate a useful jitdump on first incident to speed up the investigation effort of issues in the JIT.

[1] https://www.eclipse.org/openj9/docs/xdump/#dump-agents
[2] https://www.eclipse.org/openj9/docs/xdump/#default-dump-agents

Problems

There are several limitations when jitdump trace files are created:

  1. The jitdump file is empty
  2. The jitdump only contains a partial trace due to a recursive crash not related to original problem
  3. The jitdump file fails to trace the right area of the JIT for finer grained information
  4. The jitdump does not trace the full backtrace of interesting methods
  5. The jitdump trace does not reproduce the original trace file
  6. The options used for the jitdump generation were different than the options used for original compilation
  7. The jitdump compilation fails to complete due to JVM shutdown

Goal

The goal of this effort is to figure out a way to resolve the problems outlined in the previous section, and to always generate a useful jitdump so that developers/service engineers can make use of the trace information obtained during first incident data collection. The success metric of this effort will be quantified by the reduction in the amount of time it takes for developers/service engineers to obtain a JIT trace log which contains valuable information to make progress on fixing a defect. Another goal of this effort is to improve documentation and code quality of the jitdump process in the JIT compiler.

Issues / PRs

  • [x] #8803 - Assert "should be unreachable" in TR_CHTable::commitVirtualGuard()
  • [x] #9133 - Consolidate jitdump functionality into JitDump.cpp
  • [x] #9136 - Relax condition to require no exclusive VM access when generating jitdump
  • [ ] #9137 - Ensure the same options are used for jitdump compilations
  • [x] #9201 - Disable upgrades/downgrades on jitdump compilations
  • [ ] #9227 - Avoid recursive crashes when dumping current IL during jitdump generation
  • [x] eclipse/omr#5135 - Write out newline character after the message in verbose output
  • [x] #9386 - Enable paranoid opt. check for jitdumps resulting from crashes in the optimizer
  • [x] #9387 - Improve jitdump diagnostic messages and function names
  • [x] #9391 - Update verbose log write API calls to use new newline convention
  • [x] #9428 - Improve tracing options for jitdump scenarios
  • [ ] #9479 - Add support for jitdump agent sub-options
  • [x] #9522 - Avoid compilation interruptions when generating jitdump
  • [ ] #10962 - Ensure AOT compilation in jitdump
  • [x] #11765 - TR_ASSERT_FATAL uses abort() to terminate the program but SIGABRT cannot be caught via j9sig_protect
  • [x] #11770 - j9jni_deleteLocalRef assumes we have VM access but we may not have it if the compilation crashes
  • [x] #11772 - Only the diagnostic threads should be executing JitDump compilations
  • [ ] eclipse/omr#5774 - Add TR_EnableSIGSEGVOnTrap option
  • [ ] #11860 - Make diagnostic thread compilations uninterruptible
jit

Most helpful comment

I found one problem in a crash. The original crash is in AOT compilation, but the replay is for JIT compilation which finishes without error.

Opened https://github.com/eclipse/openj9/pull/10852

All 17 comments

Adding new PR (eclipse/omr#5135) to the list which will address some newline issues seen when jitdumps are generated.

I've reopened #9227 as we'll need to avoid printing snippets after a crash since we cannot reliably print them before binary encoding. This is further explained in eclipse/omr#5111 which will be addressed at some point in the future. For now, we still want to avoid recursive crashes so we get a proper jitdump out so I'll be addressing that issue in the next few days.

Adding new issue (#9386) on a proposal to enable paranoid opt. check for jitdump recompilations.

Adding new PR (#9387) to address inconsistency in generation of jitdump vs. javacore and other dump triggers. That is, the messages reported and how they are reported are now consistent with javacore, Snap dump, heapdump, etc. and there is no redundant prefixes in the messages.

In addition we use the same function naming convention as javacore and snap dumps to remain consistent with other parts of the JVM.

Adding new issue (#9428) to improve programmatically setting of tracing options for jitdump compiles.

Adding new issue (#9479) to support specifying sub-options using the -Xdump framework to jitdump so as to enable custom tracing to arbitrary failures.

Adding new issue (#9522) to avoid compilation interruptions, such as the JVM wanting to shut down, when generating jitdumps. This is often seen in JUnit type tests where for example a crash in the JIT will happen, or an exception is thrown in a test which reaches main. In such scenarios JUnit will report this error and it may terminate the rest of the tests at that point. The JVM will then want to shut down but jitdumps are still being generated. This results in truncated jitdumps which are not useful for diagnosing the problem.

Just a quick update on where things stand. I currently have several PRs up which I'm waiting to get merged before forging on. I think the most important issue to work on following this bulk of PRs getting merged is #9136.

Another update on #9136. I've gotten to the bottom of the major issue for one of the deadlocks. Still need to investigate the other much less common, and more artificial deadlock described in the latest comment in #9136. I'd like to fix them both to close off that item which is a major milestone in this work.

Back to trying to finish this off in the next month or so. Trying to knock off the easier items first, so I'm resuming #9428.

Another update from me. I do still have this on my radar but have been distracted by some machine migration that must be performed by end of September. I hope to get back to working on this in the next few weeks. I will post an update once I get back to doing something meaningful in this area.

The changes delivered here are already starting to show their benefit, for example a 0/420 defect was able to produce a useful jitdump on first failure data capture over in #10630 which will aid in debugging the assert there.

I found one problem in a crash. The original crash is in AOT compilation, but the replay is for JIT compilation which finishes without error.

It would be good if the trace log and jitdump tell us if it is an AOT compile. Another problem is that, if the crash is in ilgen, no trees will be printed out before replay. I guess if replay happens in the right context, that is not a problem, but it will still be good if we can print some information.

I found one problem in a crash. The original crash is in AOT compilation, but the replay is for JIT compilation which finishes without error.

Opened https://github.com/eclipse/openj9/pull/10852

Getting back to this work in the last few days as I'm trying to polish this off given we are so close to completing everything. I started back looking at #9522 and that problem is mostly fixed, but during my stress testing around that area I discovered several issues which I've documented in #11765, #11770, and #11772. I have a firm understanding of the various problems now and I have solutions for each of them which I will try to deliver in the next few days. We are much closer to having robust JitDump generation.

I've dug myself out of the hole and have emerged with a ton of goodies. I've opened up #11825 which addresses what I believe to be all issues revolving around generation of JitDumps from crashed compilations. It will also help in the case of application thread crashes as well. This is the area I am going to stress test next and ensure _every_ JIT compiled body on the stack of an application crash gets a JitDump recompilation. This will be the final step in this saga, afterwhich I expect every single JIT defect to have a useful JitDump accompanying it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pshipton picture pshipton  路  62Comments

pshipton picture pshipton  路  72Comments

AlenBadel picture AlenBadel  路  106Comments

M-Davies picture M-Davies  路  76Comments

pshipton picture pshipton  路  59Comments