kubernetes-client could not work with java 8u252

Created on 9 May 2020  路  33Comments  路  Source: fabric8io/kubernetes-client

When upgrading the jdk from 8u242 to 8u252, we find that the kubernetes-client could not work. It always throws the following exception.

Even when i bump the kubernetes-client version to 4.9.1, it still could not solve the problem. I know it might be a issue of okhttp or other dependencies of kubernetes-client. However, i could not find a clear clue for that.

io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Service]  with name: [flink-native-k8s-session-1-rest]  in namespace: [default]  failed.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:225)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:168)
    at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestService(Fabric8FlinkKubeClient.java:201)
    at org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:104)
    at org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:185)
    at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
    at org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:185)
Caused by: java.net.SocketException: Broken pipe (Write failed)
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
    at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
    at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
    at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894)
    at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865)
    at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
    at org.apache.flink.kubernetes.shaded.okio.Okio$1.write(Okio.java:79)
    at org.apache.flink.kubernetes.shaded.okio.AsyncTimeout$1.write(AsyncTimeout.java:180)
    at org.apache.flink.kubernetes.shaded.okio.RealBufferedSink.flush(RealBufferedSink.java:224)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:546)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:536)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.startHttp2(RealConnection.java:299)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:288)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:169)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:134)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:114)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
    at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.execute(RealCall.java:93)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:469)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:395)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:376)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:845)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:214)
    ... 6 more
bug

Most helpful comment

I can confirm as well that the workaround seems to work for me both locally on Kube 1.18.2 and on OCP 4.2. Thanks for looking into this @manusa and finding the solution ... great help for us!

All 33 comments

Duplicate of #2145

@rohanKanojia I really appreciate for your quick response. I have noticed this issue of okhttp. However, i think it could not solve the problem. When i bump the fabric8 kubernetes-client to 4.9.1 and okhttp to 3.12.11, we still have the same exception with kubernetes version v1.17+.

When we downgrade the kubernetes version to v1.16, it works well without any changes to the okhttp version.

https://stackoverflow.com/questions/61565751/why-am-i-not-able-to-run-sparkpi-example-on-a-kubernetes-k8s-cluster
Currently, i have to use the same solution with above, downgrading the kubernetes version to v1.16.9. It works now.

OkHttp version was changed in 4.10.0, 4.9.1 still has the problem.

Are you sure you're excluding in your pom the OkHttp dependency from kubernetes-client's dependency and including your own version?

Yes, i use dependencyManagement to bump the okhttp version to 3.12.11. Actually we could find the stack trace is really different from that issue. We get broken pipe and that issue just fix failed to get ALPN selected protocol.

Not only Flink, i think the Spark also comes into the same issue.

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>com.squareup.okhttp3</groupId>
                <artifactId>okhttp</artifactId>
                <version>3.12.11</version>
            </dependency>
        </dependencies>
    </dependencyManagement>

We have the same problems as well in Strimzi - using okhttp 3.12.11 or 3.14.8 does not seem to solve the issue. Our exception is basically the same as @wangyang0918 pasted above.

We are not able to reproduce. Can you try to bump com.squareup.okhttp3:logging-interceptor version to 3.12.11 too?

@manusa What's the Kubernetes version you are using? It only happens in the K8s 1.7+ with jdk 8u252.

In the Strimzi case, it is quite a bit complicated:

  • It seems to happen only with some versions of Kubernetes / OpenShift, probably also dependent on their setup

    • Latest OCP 4.2 in AWS is confirmed to have this issue by several people using independent environments (I tried it my self). However it works fineon OCP 4.3 and 4.4.

    • I have the problem on my local Kubernetes 1.18.2 cluster setup using kubeadm. But Minikube with Kube 1.18 worked for me so I do not think it is as easy as just Kubernetes version.

  • In Strimzi, the first call we do is to call getVersion() and it fails right there. But I assume any other call would fail as well.
  • I tried to override both the okhttp as well as the logging-interceptor without success. But for the record I did that with Fabric8 4.6.4. We are still updating to 4.10.1. I will keep you posted how does it work there.
  • Similarly to the original okhttp issue, moving back to Java 1.8.0-242 solves all issues.

Sorry, didn't see your statement regarding k8s version. I can reproduce.

@manusa I can confirm that even with Fabric8 4.10.1 I can still see this problem.

Is there any way I can help with debuging this and finding the issue?

Cause

The issue is related with the auto-selected Protocol used for each platform.

When using JDK 8u252 the selected Protocol (in RealConnection) is http/2 which as it seems is not compatible with this Platform (exception is thrown in the invoked startHttp2(pingIntervalMillis) method).
image

When using JDK 8u233 (RealConnection) is http/1.1which works fine.
image

When using JDK 11 the selected Protocol is http/2 but startHttp2(pingIntervalMillis) actually works fine.
image

Description

When using > JDK 8u251, OkHttp "wrongly" detects the Platform as Jdk9Platform. This in turn enables Application-Layer Protocol Negotiation (ALPN) support which is not available out-of-the-box for JDK 8, thus enabling http/2protocol which OkHttp then selects (as it is the most efficient).

In order to prevent this we need to force http/1.0 protocol whenever JDK 8 is detected during runtime (which would be the default anyway). I will submit a PR with the fix.

Workaround for <= v4.10.1

Set Config#http2Disable (http2.disable system property // HTTP2_DISABLE environment variable) true.

@manusa Thanks a lot for the investigation and fix. Could we also backport this to branch 4.9? We are going to upgrade the fabric8 kubernetes-client version to 4.9.x as @rohanKanojia 's suggestion.

ah, We recently split the Kubernetes Model into smaller jars per apiVersion in #2137 . But it has introduced some regressions #2195, #2201 and #2205

If you're not affected by these bugs, you can upgrade. We have plans on fixing them and then cutting a stable release.

Have you tried the workaround (e.g. export HTTP2_DISABLE=true)?

There's no problem in backporting the fix but we're currently directing all efforts to cut a stable 4.10 release (so probably this will come first than a fixed 4.9.x).

@manusa I have verified it could work after export HTTP2_DISABLE=true. Great job.

I can confirm as well that the workaround seems to work for me both locally on Kube 1.18.2 and on OCP 4.2. Thanks for looking into this @manusa and finding the solution ... great help for us!

This was fixed in 3.12.11 and 4.6.0 a couple of weeks ago.

https://square.github.io/okhttp/changelog/
https://square.github.io/okhttp/changelog_3x/

Fix: Don鈥檛 crash on Java 8u252 which introduces an API previously found only on Java 9 and above. See Jetty鈥檚 overview of the API change and its consequences.

The other report in your project which is for Azul's Zulu VM - https://github.com/fabric8io/kubernetes-client/issues/2145

When we saw that in OkHttp there was also a fix required in Zulu

Our wrapper is fixed, we switched to a javaassist generated proxy, so we should be good to go there.

Looks great! JDK 9 platform activates and the fallback for UnsupportedOperationException for our incompatible SslSocket wrapper successfully falls back to HTTP/1.

https://github.com/square/okhttp/issues/5970#issuecomment-619087509

Thx for your feedback.

We are using the latest releases of OkHttp with your fixes :heart:.

The problem is that sun.security.ssl.SSLSocketImpl is never throwing the UnsupportedOperationException in SSLSocketImpl.java, so that is why HTTP/2 becomes available.

You can check the below screenshot form a debugging session in OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_252-b09)
image

Anyway, we managed to fix the issue by forcing http/1.1 from our side when JDK 8 is detected (_in a rudimentary way_), but I'm not sure that other users of OkHttp will face a similar issue.

Is there a reason you need to avoid HTTP/2? We have one existing issue with Keeppass client where HTTP/2 is a problem because they want the error from a smart HTTP/2 server that sends the HTTP/2 goaway before a request completes. But generally, I would have thought you would be happy that HTTP/2 is working out of the box in JDK 8.

Or to put it another way, why not look into why HTTP/2 is failing in JDK 8 252?

Is there a reason you need to avoid HTTP/2?...

Not at all.

Or to put it another way, why not look into why HTTP/2 is failing in JDK 8 252?

I completely agree on this.

I'm a little bit lost here, is HTTP/2 supposed to be fully supported in JDK8u252? (I assume it is given your previous comment(s))
Have you successfully made it work in any given implementation?
Is the problem related to our specific environment (k8s REST API, JDK, etc.)?

Yep - it should be working. This is with a test client (okurl).

10:35:17.443 OkHttp Platform: Jdk9Platform
10:35:17.444 Protocol: h2
10:35:17.446 JVM: 25.252-b09

$ ./okurl --debug https://www.twitter.com/robots.txt
10:35:16.857    url https://www.twitter.com/robots.txt
10:35:16.867    Request Request{method=GET, url=https://www.twitter.com/robots.txt, headers=[User-Agent:okurl/dev], tags={class com.baulsupp.okurl.credentials.Token=TokenSet(name=default)}}
10:35:16.947    Dns (www.twitter.com): www.twitter.com/104.244.42.1, www.twitter.com/104.244.42.193
10:35:17.152    Q10002 scheduled after   5 s : OkHttp www.twitter.com ping
10:35:17.158    >> CONNECTION 505249202a20485454502f322e300d0a0d0a534d0d0a0d0a
10:35:17.158    >> 0x00000000     6 SETTINGS
10:35:17.159    >> 0x00000000     4 WINDOW_UPDATE
10:35:17.159    Q10005 scheduled after   0 碌s: OkHttp www.twitter.com
10:35:17.160    Q10005 starting              : OkHttp www.twitter.com
10:35:17.160    Q10001 scheduled after   0 碌s: OkHttp ConnectionPool
10:35:17.160    Q10001 starting              : OkHttp ConnectionPool
10:35:17.161    Q10001 run again after 300 s : OkHttp ConnectionPool
10:35:17.161    Q10001 finished run in 763 碌s: OkHttp ConnectionPool
10:35:17.161    << 0x00000000     6 SETTINGS
10:35:17.162    Q10002 scheduled after   0 碌s: OkHttp www.twitter.com applyAndAckSettings
10:35:17.162    Q10002 starting              : OkHttp www.twitter.com applyAndAckSettings
10:35:17.163    --> GET https://www.twitter.com/robots.txt h2
10:35:17.163    User-Agent: okurl/dev
10:35:17.163    Accept-Encoding: br,gzip
10:35:17.163    Host: www.twitter.com
10:35:17.163    Q10004 scheduled after   0 碌s: OkHttp www.twitter.com onSettings
10:35:17.163    Connection: Keep-Alive
10:35:17.163    >> 0x00000000     0 SETTINGS      ACK
10:35:17.163    --> END GET
10:35:17.163    Q10004 starting              : OkHttp www.twitter.com onSettings
10:35:17.164    Q10002 finished run in   1 ms: OkHttp www.twitter.com applyAndAckSettings
10:35:17.164    Q10004 finished run in 369 碌s: OkHttp www.twitter.com onSettings
10:35:17.227    << 0x00000000     0 SETTINGS      ACK
10:35:17.297    Matching interceptor: null
10:35:17.302    >> 0x00000003    42 HEADERS       END_STREAM|END_HEADERS
10:35:17.432    << 0x00000003   354 HEADERS       END_HEADERS
10:35:17.434    << 0x00000003    52 DATA          END_STREAM
10:35:17.438    <-- 200 https://www.twitter.com/robots.txt (274ms)
10:35:17.438    content-encoding: gzip
10:35:17.439    content-length: 52
10:35:17.439    content-type: text/plain;charset=utf-8
10:35:17.439    date: Fri, 15 May 2020 09:35:17 GMT
10:35:17.439    server: tsa_f
10:35:17.439    set-cookie: personalization_id="v1_dRKt013uHMsG0nP5LTvx3Q=="; Max-Age=63072000; Expires=Sun, 15 May 2022 09:35:17 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None
10:35:17.439    set-cookie: guest_id=v1%3A158953531735934219; Max-Age=63072000; Expires=Sun, 15 May 2022 09:35:17 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None
10:35:17.439    strict-transport-security: max-age=631138519
10:35:17.440    x-connection-hash: 00892da89b17cfdd912eca68ff4da741
10:35:17.440    x-response-time: 113
10:35:17.440    <-- END HTTP
10:35:17.443    OkHttp Platform: Jdk9Platform
10:35:17.444    Protocol: h2
10:35:17.444    TLS Version: TLS_1_2
10:35:17.444    Cipher: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
10:35:17.445    Peer Principal: CN=twitter.com, OU=lon3, O="Twitter, Inc.", L=San Francisco, ST=California, C=US
10:35:17.446    Local Principal: none
10:35:17.446    JVM: 25.252-b09
10:35:17.459    Q10001 scheduled after   0 碌s: OkHttp ConnectionPool
User-agent: *
Disallow: /
10:35:17.459    Q10001 starting              : OkHttp ConnectionPool

10:35:17.459    Q10001 run again after 300 s : OkHttp ConnectionPool
10:35:17.459    Q10001 finished run in 376 碌s: OkHttp ConnectionPool
10:35:17.460    Q10001 canceled              : OkHttp ConnectionPool
10:35:17.461    >> 0x00000000     8 GOAWAY
10:35:17.462    Q10002 canceled              : OkHttp www.twitter.com ping
10:35:17.462    Q10005 finished run in 302 ms: OkHttp www.twitter.com

This is the error with Keeppass https://github.com/PhilippC/keepass2android/issues/747

It's with a Go based server, and it's clearly an OkHttp bug but without a simple fix. IIRC in that case we are causing the pending 401, to become a "stream was reset: NO_ERROR".

Maybe it's something similar. The fix there was for them to disable HTTP/2.

Thx for you input!

I'll try to find some time to see if at least we get the underlying cause for the failure in Kubernetes-Client.

So do we have a concrete plan for v4.9.2?

So do we have a concrete plan for v4.9.2?

We're finishing with the fixes for bugs reported in 4.10.1, we'll probably release 4.10.2 in the next few days. This patched version should take care of any bugs and should be a direct replacement for 4.9.1.

If after 4.10.2 release there are still some issues/bug, or if you find some strong reason (besides our previous recommendation) to not upgrade version, we can then release a back-ported patched 4.9.2.

is there a release planned with this https://github.com/fabric8io/kubernetes-client/pull/2227 fix? Upgrade to okhttp 3.12.11 didn't help me.

is there a release planned with this #2227 fix? Upgrade to okhttp 3.12.11 didn't help me.

4.10.2

So do we have a concrete plan for v4.9.2?

We're finishing with the fixes for bugs reported in 4.10.1, we'll probably release 4.10.2 in the next few days. This patched version should take care of any bugs and should be a direct replacement for 4.9.1.

If after 4.10.2 release there are still some issues/bug, or if you find some strong reason (besides our previous recommendation) to not upgrade version, we can then release a back-ported patched 4.9.2.

Actually I am not very sure whether 4.10.2 is stable enough for production.

@manusa @rohanKanojia I share the same concern with @zhengcanbin. Bumping the version in downstream project is not very easy. So we want to make sure that whether 4.9.x is more stable that the latest version 4.10.x.

I think other projects may have the same situation.

Downstream in Eclipse Che is interested in 4.9.x version.

Just started release process for v4.9.2

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jsimao71 picture jsimao71  路  15Comments

rainer-maierhofer picture rainer-maierhofer  路  20Comments

cdancy picture cdancy  路  12Comments

uce picture uce  路  15Comments

kolorful picture kolorful  路  24Comments