Hi OpenJ9 team.
Today I tried out the 0.9.0-rc2 on Java 8 (built from AdoptOpenJDK):
https://github.com/AdoptOpenJDK/openjdk8-openj9-nightly/releases/download/jdk8u181-b13-201808010217/OpenJDK8-OPENJ9_x64_Linux_201808010217.tar.gz
My problem now is that for my application (an Akka clustered application) which adds shutdownHooks by itself those are not run and as a consequence the graceful cluster shutdown is not working.
Really puzzling for me is that in simple reproducers (Java with main method registering one or multiple shutdownHooks) the hooks are working as expected and are called both on SIGINT and SIGTERM.
I also tried a simple Scala program in which a shutdown hook is registered and that also worked.
When starting the same Akka programs with OpenJDK8 Hotspot, the graceful shutdown works as expected, the shutdown hooks are correctly invoked.
I even tried debugging into the java.lang.Shutdown#runHooks when sending SIGTERM, but this was never called when running with 0.9.0-rc2. It was called when running with Hotspot.
Do you have any ideas or suggestions how I could investigate further?
Would be a shame if our application still can't handle signals like in Hotspot.
Will try to reproduce tomorrow with an Akka application.
@babsingh Can you take a look at this?
Still can't reproduce with a simple Akka starter.
Our code in which that doesn't work is here:
https://github.com/eclipse/ditto/
For example, start the following class (e.g. in IntelliJ, running this without any config is possible):
https://github.com/eclipse/ditto/blob/master/services/policies/starter/src/main/java/org/eclipse/ditto/services/policies/starter/PoliciesService.java
Killing the process won't run shutdown hooks and won't terminate the JVM.
I ran the below micro-test to verify if shutdown hooks are executed using OpenJ9 Java 8 0.9.0 rc2. OK... exiting is printed from the shutdown hook. The shutdown hook ran with both SIGINT and SIGTERM.
I will try to run eclipse/ditto (PoliciesService.java) to reproduce the issue.
java -version:
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
Eclipse OpenJ9 VM (build openj9-0.9.0-rc2, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20180801_267 (JIT enabled, AOT enabled)
OpenJ9 - 24e53631
OMR - fad6bf6e
JCL - b911d4c2 based on jdk8u181-b13)
Micro-test:
import sun.misc.Signal;
public class QuickTest extends Thread {
private static Object monitor = new Object();
public static void main(String args[]) {
Runtime.getRuntime().addShutdownHook(new QuickTest());
System.out.println("Press CTRL-C to exit");
synchronized (monitor) {
try {
monitor.wait();
} catch (Exception e) {
System.out.println("Main thread interrupted");
}
System.out.println("Terminating main thread correctly");
}
}
public void run() {
System.out.println("OK... exiting");
synchronized (monitor) {
monitor.notify();
}
}
}
Output:
Press CTRL-C to exit
OK... exiting
Terminating main thread correctly
It appears that for our services, sun.misc.Signal#dispatch is not even invoked when the process receives a SIGTERM
That's unfortunately the point where I cannot debug further why it is not invoked
I finally have something.
This is a minimal reproducer with only one dependency:
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-actor_2.12</artifactId>
<version>2.5.14</version>
</dependency>
code:
package test;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;
import akka.Done;
import akka.actor.ActorSystem;
import akka.actor.CoordinatedShutdown;
public class Test {
public static void main(String... args) throws ExecutionException, InterruptedException {
System.out.println("Starting Akka ..");
final ActorSystem actorSystem = ActorSystem.create("test");
System.out.println("Adding CoordinatedShutdown task ..");
CoordinatedShutdown.get(actorSystem).addTask(
CoordinatedShutdown.PhaseBeforeServiceUnbind(), "log_shutdown_initiation", () -> {
System.out.println("Initiated coordinated shutdown - gracefully shutting down..");
return CompletableFuture.completedFuture(Done.getInstance());
});
System.out.println("Adding Runtime shutdownHook ..");
Runtime.getRuntime().addShutdownHook(new Thread(() -> System.out.println("Runtime shutdownHook invoked")));
System.out.println("Main thread finished");
// when commenting in the following line, the shutdown hooks are called as expected:
// actorSystem.getWhenTerminated().toCompletableFuture().get();
}
}
In our code we don't block the main() Thread until termination of the Akka ActorSystem.
In that case, OpenJ9 does not invoke the shutdown hooks.
Only when we block the main() Thread, the shutdown hooks are invoked as expected.
I must admit that I don't know the "best practice" to handle the main thread here. I even did never think about this as in Hotspot it does work either way.
I would nevertheless say that is a quite unexpected behavior for many Java applications out there relying on the Hotspot behavior.
@thjaeckle I studied the issue in more detail. In the below micro-test, the Java main thread terminates while another Java thread is waiting to receive a SIGINT (Ctrl + C). In such a scenario, the JVM doesn't process shutdown signals (SIGINT/SIGTERM/SIGHUP). JVM processes shutdown signals only when the Java main thread is alive. There is a bug in OpenJ9; JVM should receive and process signals from other Java threads.
Currently, I am investigating to pin-point the source of the issue. I will soon submit a fix for this issue. Then, OpenJ9's behaviour should be similar to Hotspot.
import sun.misc.Signal;
public class QuickTestThread extends Thread {
private static Object monitor = new Object();
public static void main(String args[]) {
Runtime.getRuntime().addShutdownHook(new QuickTestThread());
QuickTestThread qt = new QuickTestThread();
qt.execute();
//synchronized(monitor) {
// try {
// monitor.wait();
// } catch (Exception e) {
// System.out.println("Main thread interrupted");
// }
// System.out.println("Termination main thread correctly");
//}
}
public void run() {
System.out.println("OK... exiting");
synchronized (monitor) {
monitor.notifyAll();
}
}
public void execute() {
MyThread myThread = new MyThread();
myThread.start();
}
class MyThread extends Thread {
public void run() {
synchronized (monitor) {
System.out.println("Press CTRL-C to exit");
try {
monitor.wait();
} catch (Exception e) {
System.out.println("MyThread interrupted");
}
System.out.println("Terminating MyThread correctly");
}
}
}
}
2018-08-17 15:40:48,467 INFO [] a.c.s.DDataShardCoordinator akka.tcp://[email protected]:2552/system/sharding/policyCoordinator/singleton/coordinator - Sharding Coordinator was moved to the active state State(Map(),Map(),Set(),Set(),false)
[Application is waiting]
19:41:18.267*0x1043800 j9vm.592 - signalDispatch - signal value: 15
[Application receives SIGTERM and gracefully terminates]
2018-08-17 15:41:18,273 INFO [] o.e.d.s.p.s.PoliciesService - Initiated coordinated shutdown - gracefully shutting down..
2018-08-17 15:41:18,279 INFO [] o.e.d.s.p.s.PoliciesService - Gracefully shutting down Prometheus HTTP endpoint..
2018-08-17 15:41:18,283 INFO [] o.e.d.s.p.s.PoliciesRootActor akka://ditto-cluster/user/policiesRoot - Gracefully shutting down status/health HTTP endpoint..
...
2018-08-17 15:41:19,400 INFO [] a.r.RemoteActorRefProvider$RemotingTerminator akka.tcp://[email protected]:2552/system/remoting-terminator - Remoting shut down.
2018-08-17 15:41:19,424 INFO [] o.e.d.s.p.s.PoliciesService - Exiting JVM with status code '0'
I had to revert the fix for this due to new test failures - see https://github.com/eclipse/openj9/pull/2658 for details
@babsingh Can you take another look at your patch in light of the threadMXBeanTestSuite2_2 hang? See the #2658 revert PR for details.
@DanHeidinga threadMXBeanTestSuite2_2 hang happens even without my changes (https://github.com/eclipse/openj9/pull/2650). the hang occurs intermittently. the first occurrence of the hang was 3-4 months ago. due to the intermittent nature of the hang, it has been ignored and it has stayed unnoticed. I have added more details about the hang in #2658. It's related to exclusive VM access. threadMXBeanTestSuite2_2 doesn't deal with signals so predefinedHandlerWrapper is never invoked in the context of threadMXBeanTestSuite2_2. thus, https://github.com/eclipse/openj9/pull/2650 can't cause the hang.
@charliegracie brought to my attention that shutdown signal handlers are not invoked for the following Spring application: https://github.com/spring-projects/spring-petclinic. The Spring application suffers from the same issue documented over here. I have verified that https://github.com/eclipse/openj9/pull/2650 fixes the issue for the Spring application; with the fix, INT/TERM/HUP signals are properly processed for the Spring application.
Thanks for digging into this @babsingh. If you open a new PR for the change, I'll review and merge it.
Most helpful comment
I finally have something.
This is a minimal reproducer with only one dependency:
code:
In our code we don't block the
main()Thread until termination of the Akka ActorSystem.In that case, OpenJ9 does not invoke the shutdown hooks.
Only when we block the
main()Thread, the shutdown hooks are invoked as expected.I must admit that I don't know the "best practice" to handle the main thread here. I even did never think about this as in Hotspot it does work either way.
I would nevertheless say that is a quite unexpected behavior for many Java applications out there relying on the Hotspot behavior.