openj9_sigabrt_issue.zip [provided by @imkabir]
There are two files in the testcase:
1) test_sigabrt.c (native app): Registers an application signal handler for SIGABRT. Then, it executes the Java code and sleeps.
sigaction(SIGABRT, &shndlr, NULL);
jenv = createJVM(&jvm);
if(jenv == NULL)
return 1;
runjclass(jenv);
2) javasample.java: Contains the Java code.
When SIGABRT is sent to the native app, OpenJ9 generates the JVM core dump using JVM's SIGABRT handler, and the application signal handler is not invoked. On the other hand, Hotspot does not generate the JVM core dump using JVM's SIGABRT handler, and the application signal handler is correctly invoked. OpenJ9 needs to match Hotspot's (reference implementation's) behavior.
Refer to readme.txt in the attached openj9_sigabrt_issue.zip on how to run the test.
JVM has launched
Hello, World
1.8
1.8.0_242-b08
ADD 10+20= 30
In sig handler
received SIGABRT
Killed
JVM has launched
Hello, World
1.8
1.8.0_242-b08
ADD 10+20= 30
JVMDUMP039I Processing dump event "abort", detail "" at 2020/03/03 14:44:36 - please wait.
JVMDUMP032I JVM requested System dump using '/team/babsing/projects/kabir_signal_issue/testcase/core.20200303.144436.11620.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h" specifies that the core dump is to be piped to an external program. Attempting to rename either core or core.11654.
JVMDUMP010I System dump written to /team/babsing/projects/kabir_signal_issue/testcase/core.20200303.144436.11620.0001.dmp
JVMDUMP032I JVM requested Java dump using '/team/babsing/projects/kabir_signal_issue/testcase/javacore.20200303.144436.11620.0002.txt' in response to an event
JVMDUMP010I Java dump written to /team/babsing/projects/kabir_signal_issue/testcase/javacore.20200303.144436.11620.0002.txt
JVMDUMP032I JVM requested Snap dump using '/team/babsing/projects/kabir_signal_issue/testcase/Snap.20200303.144436.11620.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /team/babsing/projects/kabir_signal_issue/testcase/Snap.20200303.144436.11620.0003.trc
JVMDUMP007I JVM Requesting JIT dump using '/team/babsing/projects/kabir_signal_issue/testcase/jitdump.20200303.144436.11620.0004.dmp'
JVMDUMP010I JIT dump written to /team/babsing/projects/kabir_signal_issue/testcase/jitdump.20200303.144436.11620.0004.dmp
JVMDUMP013I Processed dump event "abort", detail "".
Use -Xrs:sync (JVM option). Then, J9 won't register a signal handler for SIGABRT, and the application signal handler can be correctly invoked in the above testcase. It should also be noted that J9 won't register a signal handler for other signals such as SIGBUS, SIGILL, SIGFPE, SIGTRAP and SIGABEND when -Xrs:sync is specified. -Xrs should be used if the native application wants to completely disable JVM signal handling.
[1] -Xrs and -Xrs:sync documentation: https://www.eclipse.org/openj9/docs/xrs/.
[2] OpenJ9 signal handling: https://eclipse.github.io/openj9-docs/openj9_signals/
fyi @imkabir, The above issue covers the incorrect SIGABRT behavior reported by you. I will ping you once the issue is resolved.
@babsingh Does this behaviour change on the RI in later versions? Do JDK 11 & 14 have similar results?
@babsingh Does this behaviour change on the RI in later versions? Do JDK 11 & 14 have similar results?
OpenJ9-JDK11+ (tested JDK11 and JDK13) behave similar to OpenJ9-JDK8.
What about the RI's JDK11 & JDK13?
What does RI stand for?
RI = Reference Implementation which is OpenJDK Hotspot.
Summary:
installAbortHandler and abortHandler. pushDumpFacade invokes installAbortHandler. Commit history for rasdump.c till v0.11.0 release. installAbortHandler, abortHandler and pushDumpFacade haven't been changed since OpenJ9 was open-sourced. So, the issue may have been introduced before open-sourcing in J9-JDK8.@babsingh any update on a fix for this? M2 for the 0.20 release is just over a week away so I expect we'll be moving this out to the 0.21 release.
@pshipton I didn't get to spend much time on this task due to other high priority tasks. So, the fix won't make M2 for the 0.20 release. We should move it to the 0.21 release.
@DanHeidinga What priority should be given to this task?
void sig_handler(int signo)
{
printf("In sig handler \n");
if (signo == SIGABRT)
printf("received SIGABRT\n");
raise(SIGKILL);
_exit(-1);
}
-Xrs:sync disables the JVM signal handler for SIGABRT. This allows OpenJ9 JDK8 to match the RI for SIGABRT. But, it also disables the JVM signal handler for other synchronous signals such as SIGSEGV, SIGFPE, etc. Here OpenJ9 does not match the RI for other synchronous signals.
Disabling the OpenJ9 signal handler for SIGABRT by default to match the RI does not seem like a good idea since a lot of OpenJ9 users rely upon it. I think OpenJ9 needs a command line option to either enable or disable the JVM signal handler for SIGABRT. This will allow OpenJ9 users to match the RI by disabling the JVM signal handler for SIGABRT ... when needed. @pshipton @DanHeidinga thoughts?
Why does OpenJ9 JDK8 have signal chaining disabled?
Why does OpenJ9 JDK8 have signal chaining disabled?
For JDK8, the core signal chaining functionality was moved to OMR and completely re-implemented (omrsigcompat). There is no one-to-one code-mapping for signal chaining between JDK7 and JDK8. I couldn't derive the reason for disabling signal chaining by looking at the code. Also, I couldn't find a change-set or documentation which explicitly specifies the reason for disabling signal chaining in JDK8+.
My best guess ... _We wanted to match the RI during re-implementation_.
@babsingh, Have you arrived at any solution for this issue ? I am not sure about the changes that went into JRE 8 that is causing these issues, but -Xrs:sync would completely disable signal handling and even the rest of the signals would be masked.
Could you please let us know the latest update on this as the end customer is waiting for a resolution on this.
[URGENT ^^^] @DanHeidinga @pshipton; fyi, @smudigon
Have you arrived at any solution for this issue ?
Yes, the solution is proposed in https://github.com/eclipse/openj9/issues/8735#issuecomment-607304957:
_The reference implementation (RI) doesn't install a JVM signal handler for SIGABRT. OpenJ9 needs a command line option to either enable or disable the JVM signal handler for SIGABRT. This will allow OpenJ9 users to match the RI by disabling the JVM signal handler for SIGABRT._
I am awaiting @DanHeidinga's OR @pshipton's decision on whether to take this approach.
-Xrs:sync would completely disable signal handling
-Xrs:sync doesn't completely disable signal handling. With -Xrs:sync, JVM signal handlers are not registered for synchronous signals (SIGSEGV, SIGBUS, SIGILL, SIGFPE, SIGTRAP, SIGABEND and SIGABRT). All asynchronous signals should still work.
Changes that went into JRE 8 that is causing these issues
In JRE 8, signal chaining is disabled in native apps until the jsig library is linked or preloaded. These changes are not at fault since they were introduced to match the RI.
Existing solution doesn't work
You can also resolve the issue by linking or preloading the jsig library: Linking a native code driver to the signal-chaining library. But, the customer was not able to link the jsig library since it doesn't own the native driver. Also, the customer was not able to preload the jsig library without crashing other application components. Now, the only solution is to match the RI by disabling the JVM signal handler for SIGABRT via a new OpenJ9 command line option.
@babsingh What does the RI do for SIGABRT if no user handler is installed? I'm wondering if their behaviour is to only install their handler if there isn't already a user one installed.
Command line options are the solution of last resort as they introduce additional code paths that need to be tested and kept working. New options also increase the complexity of an already complex piece of code.
What do we hook SIGABRT for? Just to create the RAS dumps or for other reasons (JIT?) as well?
Finding a solution that allows applications to "just work" is preferable if we can determine reasonable semantics for it.
What does the RI do for SIGABRT if no user handler is installed?
RI never registers a signal handler for SIGABRT. It relies upon the default OS handler which generates a core file.
SIGABRT: The HotSpot VM does not handle this signal.
What do we hook SIGABRT for?
Referring to the abortHandler, it is mostly used to generate RAS dumps. No JIT reasons noticed. On zOS, abortHandler triggers an abend if -Xsignal:posixSignalHandler=cooperativeShutdown is specified, in addition to the RAS dumps.
fyi @DanHeidinga
I'm wondering if their behaviour is to only install their handler if there isn't already a user one installed.
Is this a behaviour we can adopt when signal chaining isn't enabled? We only install our RAS handler if there isn't an existing user handler?
For a native app that uses the JVM, enabling signal chaining refers to replacing the OS signal and sigaction calls with the function calls of the same name from the jsig library. This is achieved by either linking the jsig library at compile time or preloading the jsig library.
Can we find out if a native app has enabled signal chaining i.e. if a native app has invoked the sigaction or signal functions via the jsig library? Currently, there is no means to derive this information.
So, it is not feasible to adopt the suggested behavior:
Install the RAS handler for SIGABRT in the OpenJ9 JVM only if no user handler is registered and signal chaining is disabled in the native app.
@babsingh Can we tell if the handler for SIGABRT is not the default handler? If we can tell that, we can avoid installing our handler.
Something vaguely like:
struct sigaction query_action = {0};
if ((sigaction (SIGABRT, NULL, &query_action) >= 0)
&& (query_action.sa_handler == SIG_DFL)
) {
/* install our handler */
} else {
/* user handler already installed - do nothing */
}
The biggest question would be does that work in the presence of sigchaining?
Yes, the new behavior can be implemented in OpenJ9. sigaction is equivalent to OMRSIG_SIGACTION, and it can be used to retrieve the current handler. SIGABRT should have the default OS handler installed if the current handler is equal to SIG_DFL.
JAH: "JVM Abort Handler"
UH: "User Handler"
| Scenario | Signal Chaining Enabled | UH Installed | Current OpenJ9 Behavior | Proposed OpenJ9 Behavior |
|:----------------:|:----------------:|:----------------:|:----------------:|:----------------:|
| 1 | No | No | JAH invoked | JAH invoked|
| 2 | Yes | No | JAH invoked | JAH invoked|
| 3 | No | Yes | JAH invoked | UH invoked|
| 4 | Yes | Yes | UH + JAH invoked | UH invoked|
In the above table, OpenJ9 behavior changes for scenarios 3 and 4. Some OpenJ9 customers may be dependent on scenario 4, where they expect both the UH and JAH to be invoked. With the proposed behavior, OpenJ9 will no longer invoke the JAH in scenario 4, and it will break functionality for some customers. The same may also apply for scenario 3.
JVM/Java spec doesn't have a set-defined behavior for SIGABRT. Over time, users have opted to accept the SIGABRT behavior of their preferred JVM.
But, the previous suggestion(s) will cause irreversible change in OpenJ9's functionality. This will lead to user issues from those who are dependent on OpenJ9's current SIGABRT behavior.
I feel that adding a command line option to disable OpenJ9's SIGABRT handler is the only solution. It won't impact existing users who rely upon OpenJ9's current SIGABRT behavior. At the same time, it will let others attain the RI's behavior if needed. We already support -Xrs / -Xrs:sync to disable JVM signal handlers. Adding an exclusive option to disable OpenJ9's SIGBART handler will be a minor, low-impact change.
@DanHeidinga Shall we add a cmdline option to disable OpenJ9's SIGABRT handler?
Discussed with Babneet - he's going to propose a design for how the new commandline option should interact with the existing -Xrs options. Once we agree on that - and the spelling of the new option - he'll put together a PR
-XX:DisableAbortHandler
Disable the JVM abort handler registered with SIGABRT.
-Xrs and -Xrs:sync-Xrs and -Xrs:sync both disable the JVM abort handler along with disabling other signal handlers ... -Xrs and -Xrs:sync documentation. -XX:DisableAbortHandler will only disable the JVM abort handler.-Xrs* and -XX:DisableAbortHandler since these options provide similar functionality.-XX:[+|-]AbortHandler since -XX:+AbortHandler will conflict with -Xrs. The JVM abort handler shouldn't be enabled in the presence of -Xrs. Otherwise, we will differ in -Xrs w.r.t. the RI.fyi - @DanHeidinga
We should always have enable and disable variations for options, to allow an earlier command line option to be overridden by a later one.
The proposal should be -XX:+DisableAbortHandler and -XX:-DisableAbortHandler. Although perhaps -XX:+AbortHandler and -XX:-AbortHandler would be less confusing.
We can just document that -Xrs overrides any -XX:+AbortHandler option, and it can only be used to reverse a previous -XX:-AbortHandler option.
I'd like it to error out if both -Xrs & -XX:+AbortHandler are specified. This will be a very rare corner case that I'd prefer to have the user correct rather than build in additional rules for
There is an existing option: -XX:[+|-]HandleSIGXFSZ. It is similar to the proposed new option, -XX:[+|-]AbortHandler. The existing option handles SIGXFSZ whereas the new option will handle SIGABRT.
-XX:[+|-]AbortHandler to -XX:[+|-]HandleSIGABRT?-XX:+HandleSIGXFSZ works with -Xrs. We chose to throw an error when both -Xrs & -XX:+AbortHandler are specified. I feel that the two options should have similar behavior in the presence of -Xrs. Shall we remove the error requirement for the new option? Adding the error requirement to the existing option will break functionality for existing users. Using the documentation to express the relation between -XX:[+|-]HandleSIG* and -Xrs will be a better option.
- To be consistent with the existing option's naming, shall we change the new option's name from
-XX:[+|-]AbortHandlerto-XX:[+|-]HandleSIGABRT?
+1 to having consistent naming for the options.
-XX:+HandleSIGXFSZworks with-Xrs. We chose to throw an error when both -Xrs & -XX:+AbortHandler are specified. I feel that the two options should have similar behavior in the presence of-Xrs. Shall we remove the error requirement for the new option? Adding the error requirement to the existing option will break functionality for existing users. Using the documentation to express the relation between-XX:[+|-]HandleSIG*and-Xrswill be a better option.
I would prefer that we error in the case that both options are specified. This is the safest default given the expectation of limited use of this new option and minimizes the complexity in the code to manage interactions.
We can always revisit this in the future if there is demand from users.
@babsingh can this be closed now?
@pshipton Yes, this can be closed.
@imkabir @smudigon The fix is -XX:-HandleSIGABRT. This option will disable J9's abort handler and prevent J9 to overwrite a native user signal handler for SIGABRT. Thus, matching the reference implementation. It should be available in the upcoming 0.21 release.
@babsingh Do we have an APAR for this?
@manqingl No APAR was created because I don't think an IBM J9 Jazz workitem exists for this issue. Please send me a direct-message if an APAR is needed.
@babsingh, Will this be made available as part of JDK / JRE 1.8 ? If so, could you please confirm the fixpack in which it would be included.
@smudigon the change will be in the OpenJ9 0.21.0 release, delivering in July. It sounds like you are asking about an IBM release, which we try to avoid discussing in this open source project. I expect if you ask this question inside IBM you can get an answer.
@pshipton , Thank you. Did not realize I was posting this question Open forum.