Presto: Slow down of presto-cli and presto-jdbc in JDK 11 environment

Created on 24 Jul 2019  路  25Comments  路  Source: prestosql/presto

Problem

When we run the console program or program using presto-jdbc in JDK 11 environment, it is too slow to complete the execution. In JDK 11 environment, the program tries to construct the connection with HTTP 2 instead of HTTP 1.1.

While the coordinator does not support HTTP 2 yet, it can be a problem when we access to the coordinator through proxy or load balancer such as ALB supporting HTTP 2.

This query took several minutes to complete. api-presto.treasuredata.com is an endpoint to the public ALB supporting presto protocol.

$ ./presto-cli-317-SNAPSHOT-executable.jar \
    --server https://api-presto.treasuredata.com \
    --catalog td-presto \
    --user XXXXXXXXXXXXXXXXX \
    --execute "select 1234"

This seems to be caused by a non-daemon thread running in OkHttpClient with HTTP 2.0.
(ref: https://github.com/square/okhttp/issues/4029).

Workaround

As a workaround, there are two ways to disable HTTP 2 on the client-side.

  1. Use JDK 8 not to use HTTP 2
  2. Specify HTTP 1.1 explicitly in the OkHttpClient as follows.
OkHttpClient.Builder builder = new OkHttpClient.Builder();
builder.protocols(Arrays.asList(Protocol.HTTP_1_1));

Both ways work for me.

Discussion

Currently, it is not a huge problem in the usual cases because Presto does not support HTTP 2 yet. But it can prevent Presto from supporting HTTP 2 migration. The issue in OkHttp says that we should close the connection pool properly by okHttpClient.connectionPool().evictAll() but it's not a solution in the long-running program like presto-cli/presto-jdbc.

How can we support the HTTP 2 environment from presto-cli and presto-jdbc?

bug

All 25 comments

The daemon thread issue would only affect shutdown. Since this is happening during query execution, there has to be something else going on. Can you show a stack trace (via jstack) of the main CLI or JDBC thread at several points during one of these queries that takes several minutes?

I captured stack trace by jstack and compared JDK 8 and JDK 11. The difference between them was the existence of two Http2Connections waiting for something.

JDK 1.8.0_172

"OkHttp ConnectionPool" #12 daemon prio=5 os_prio=31 tid=0x00007f83e4a1f000 nid=0x5503 in Object.wait() [0x000070000bf7b000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000007ab5642d8> (a okhttp3.ConnectionPool)
        at java.lang.Object.wait(Object.java:460)
        at okhttp3.ConnectionPool$1.run(ConnectionPool.java:67)
        - locked <0x00000007ab5642d8> (a okhttp3.ConnectionPool)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

JDK 11.0.4

"OkHttp ConnectionPool" #15 daemon prio=5 os_prio=31 cpu=0.30ms elapsed=126.13s tid=0x00007fe2f615c800 nid=0x6103 in Object.wait()  [0x000070000b5cf000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait([email protected]/Native Method)
        - waiting on <0x00000007cf600690> (a okhttp3.ConnectionPool)
        at java.lang.Object.wait([email protected]/Object.java:462)
        at okhttp3.ConnectionPool$1.run(ConnectionPool.java:67)
        - waiting to re-lock in wait() <0x00000007cf600690> (a okhttp3.ConnectionPool)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
        at java.lang.Thread.run([email protected]/Thread.java:834)

"OkHttp api-presto.treasuredata.com" #20 prio=5 os_prio=31 cpu=2.50ms elapsed=4.92s tid=0x00007fe2f4945800 nid=0x9b07 runnable  [0x000070000b4cb000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcherImpl.read0([email protected]/Native Method)
        at sun.nio.ch.SocketDispatcher.read([email protected]/SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer([email protected]/IOUtil.java:276)
        at sun.nio.ch.IOUtil.read([email protected]/IOUtil.java:245)
        at sun.nio.ch.IOUtil.read([email protected]/IOUtil.java:223)
        at sun.nio.ch.SocketChannelImpl.read([email protected]/SocketChannelImpl.java:355)
        at sun.nio.ch.SocketAdaptor$SocketInputStream.read([email protected]/SocketAdaptor.java:203)
        - locked <0x00000007cfb109d0> (a java.lang.Object)
        at sun.nio.ch.ChannelInputStream.read([email protected]/ChannelInputStream.java:103)
        - locked <0x00000007cfb10b68> (a sun.nio.ch.SocketAdaptor$SocketInputStream)
        at sun.security.ssl.SSLSocketInputRecord.read([email protected]/SSLSocketInputRecord.java:448)
        at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket([email protected]/SSLSocketInputRecord.java:68)
        at sun.security.ssl.SSLSocketImpl.readApplicationRecord([email protected]/SSLSocketImpl.java:1104)
        at sun.security.ssl.SSLSocketImpl$AppInputStream.read([email protected]/SSLSocketImpl.java:823)
        - locked <0x00000007cfb10d98> (a sun.security.ssl.SSLSocketImpl$AppInputStream)
        at okio.Okio$2.read(Okio.java:139)
        at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
        at okio.RealBufferedSource.request(RealBufferedSource.java:67)
        at okio.RealBufferedSource.require(RealBufferedSource.java:60)
        at okhttp3.internal.http2.Http2Reader.nextFrame(Http2Reader.java:95)
        at okhttp3.internal.http2.Http2Connection$ReaderRunnable.execute(Http2Connection.java:566)
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
        at java.lang.Thread.run([email protected]/Thread.java:834)

"OkHttp Http2Connection" #21 daemon prio=5 os_prio=31 cpu=1.00ms elapsed=4.92s tid=0x00007fe2f4a83800 nid=0x9807 waiting on condition  [0x000070000b6d2000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x00000007cfd66358> (a java.util.concurrent.SynchronousQueue$TransferStack)
        at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:234)
        at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill([email protected]/SynchronousQueue.java:462)
        at java.util.concurrent.SynchronousQueue$TransferStack.transfer([email protected]/SynchronousQueue.java:361)
        at java.util.concurrent.SynchronousQueue.poll([email protected]/SynchronousQueue.java:937)
        at java.util.concurrent.ThreadPoolExecutor.getTask([email protected]/ThreadPoolExecutor.java:1053)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1114)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
        at java.lang.Thread.run([email protected]/Thread.java:834)

"OkHttp Http2Connection" #22 daemon prio=5 os_prio=31 cpu=0.20ms elapsed=4.92s tid=0x00007fe2f4a44800 nid=0x6607 waiting on condition  [0x000070000b7d5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x00000007cfd66358> (a java.util.concurrent.SynchronousQueue$TransferStack)
        at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:234)
        at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill([email protected]/SynchronousQueue.java:462)
        at java.util.concurrent.SynchronousQueue$TransferStack.transfer([email protected]/SynchronousQueue.java:361)
        at java.util.concurrent.SynchronousQueue.poll([email protected]/SynchronousQueue.java:937)
        at java.util.concurrent.ThreadPoolExecutor.getTask([email protected]/ThreadPoolExecutor.java:1053)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1114)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
        at java.lang.Thread.run([email protected]/Thread.java:834)

I used OkHttp logging interceptor to catch the timing of each request and found each GET request to the statement endpoint takes several minutes to complete the response. Is it caused by the waiting Http2Connection?

--> POST https://api-presto.treasuredata.com/v1/statement (11-byte body)
<-- 200 https://api-presto.treasuredata.com/v1/statement (122717ms, unknown-length body)
--> GET https://api-presto.treasuredata.com/v1/statement/20190725_XXXXXXXXXXXX/1
// Take up to several minutes
<-- 200 https://api-presto.treasuredata.com/v1/statement/20190725_XXXXXXXXXXXX/1 (60966ms, unknown-length body)
--> GET https://api-presto.treasuredata.com/v1/statement/20190725_XXXXXXXXXXXX/2
// Take up to several minutes
<-- 200 https://api-presto.treasuredata.com/v1/statement/20190725_XXXXXXXXXXXX/2 (61179ms, unknown-length body)

@electrum How about specifying HTTP 1.1 in presto-cli and presto-jdbc explicitly? I think it can be a problem when both client and server run on the latest JDK fully supporting HTTP 2. By specifying the procotol version explicitly, we can avoid this type of problem happening accidentally.

I鈥檇 like to figure out what鈥檚 happening, but that seems like a good intermediate solution, as I don鈥檛 have time now to investigate.

Do you want to send a PR?

Sure, I have the patch for it. Will send a PR. Thanks.

Encountered the same issue. Removing SocketChannelSocketFactory seems to fix this issue (or using presto-jdbc 0.208, before custom socket factory was introduced)

@Aulust That's interesting information. It was added to workaround JDK-8200719, which is now fixed on Java 11, so we could skip installing it on later versions.

@Lewuathe Could you see if removing SocketChannelSocketFactory fixes your issue as well?

@Aulust @electrum Thanks for the information.
Sure I'll try and report the result here.

Hmm, it's interesting. Removing SocketChannelSocketFactory works even in my environment in JDK 11.

SocketChannelSocketFactory seems to be introduced as a workaround for JDK-8131133. We may be able to remove it completely in the newer JDKs?

Yes, we can remove it for JDK 11+ (though I suspect we will keep the JDBC driver compatible with JDK 8 for a while, even after the rest of Presto moves on).

I suspect we will keep the JDBC driver compatible with JDK 8 for a while

We could have separate project for that, like: https://github.com/prestodb/presto-jdbc-java6

If we have a separated project for JDK8 or older, we can disable SocketChannelSocketFactory in the current master branch.

If not, can we switch the SocketChannelSocketFactory by recognizing the JDK version?

We should detect the Java version and disable the socket factory on pre JDK 11.

Okay, I'll prepare the patch to disable SocketChannelSocketFactory by recognizing underlying JDK version.

The issue in OkHttp seems to be fixed in 4.3. I'll try to use the latest version in Presto client.
https://github.com/square/okhttp/issues/4029#issuecomment-562874876

@tooptoop4 Sorry, I was a little confusing. The fix will be released in 4.3. The current latest version is 4.2.2 which does not fix the problem yet, I confirmed. We are waiting for the next release of okhttp, then will check the problem again.

Unfortunately, OkHttp 4.3.0 did not solve the issue in my environment. presto-cli is still slow with Java 11 runtime with HTTP 2.0.

Since we have found upgrading OkHttp is not the solution, disabling the workaround for the latest JDK would be the resolution.

Is there a way to workaround the issue from the JDBC configuration?

@paulolieuthier We do not have a configuration to disable socket channel or ensure to use HTTP 1.1.

OK, looking forward to the next release!

@Lewuathe, since we're seeing more users hit this issue, we'd like to get that workaround in quickly. Would you be able to send a PR in the next few days? Hopefully we can include it in the next release.

@martint Yes, this is the PR for the workaround. Confirmed the issue was fixed for presto-cli and presto-jdbc. Could you take a look at this when you get a chance?
https://github.com/prestosql/presto/pull/2633

@Lewuathe can this be closed?

Was this page helpful?
0 / 5 - 0 ratings