When upgrading the jdk from 8u242 to 8u252, we find that the kubernetes-client could not work. It always throws the following exception.
Even when i bump the kubernetes-client version to 4.9.1, it still could not solve the problem. I know it might be a issue of okhttp or other dependencies of kubernetes-client. However, i could not find a clear clue for that.
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Service] with name: [flink-native-k8s-session-1-rest] in namespace: [default] failed.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:225)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:168)
at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestService(Fabric8FlinkKubeClient.java:201)
at org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:104)
at org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:185)
at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:185)
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
at org.apache.flink.kubernetes.shaded.okio.Okio$1.write(Okio.java:79)
at org.apache.flink.kubernetes.shaded.okio.AsyncTimeout$1.write(AsyncTimeout.java:180)
at org.apache.flink.kubernetes.shaded.okio.RealBufferedSink.flush(RealBufferedSink.java:224)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:546)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:536)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.startHttp2(RealConnection.java:299)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:288)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:169)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:134)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:114)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.execute(RealCall.java:93)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:469)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:395)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:376)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:845)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:214)
... 6 more
Duplicate of #2145
@rohanKanojia I really appreciate for your quick response. I have noticed this issue of okhttp. However, i think it could not solve the problem. When i bump the fabric8 kubernetes-client to 4.9.1 and okhttp to 3.12.11, we still have the same exception with kubernetes version v1.17+.
When we downgrade the kubernetes version to v1.16, it works well without any changes to the okhttp version.
https://stackoverflow.com/questions/61565751/why-am-i-not-able-to-run-sparkpi-example-on-a-kubernetes-k8s-cluster
Currently, i have to use the same solution with above, downgrading the kubernetes version to v1.16.9. It works now.
OkHttp version was changed in 4.10.0, 4.9.1 still has the problem.
Are you sure you're excluding in your pom the OkHttp dependency from kubernetes-client's dependency and including your own version?
Yes, i use dependencyManagement to bump the okhttp version to 3.12.11. Actually we could find the stack trace is really different from that issue. We get broken pipe and that issue just fix failed to get ALPN selected protocol.
Not only Flink, i think the Spark also comes into the same issue.
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>3.12.11</version>
</dependency>
</dependencies>
</dependencyManagement>
We have the same problems as well in Strimzi - using okhttp 3.12.11 or 3.14.8 does not seem to solve the issue. Our exception is basically the same as @wangyang0918 pasted above.
We are not able to reproduce. Can you try to bump com.squareup.okhttp3:logging-interceptor version to 3.12.11 too?
@manusa What's the Kubernetes version you are using? It only happens in the K8s 1.7+ with jdk 8u252.
In the Strimzi case, it is quite a bit complicated:
getVersion() and it fails right there. But I assume any other call would fail as well.Sorry, didn't see your statement regarding k8s version. I can reproduce.
@manusa I can confirm that even with Fabric8 4.10.1 I can still see this problem.
Is there any way I can help with debuging this and finding the issue?
The issue is related with the auto-selected Protocol used for each platform.
When using JDK 8u252 the selected Protocol (in RealConnection) is http/2 which as it seems is not compatible with this Platform (exception is thrown in the invoked startHttp2(pingIntervalMillis) method).

When using JDK 8u233 (http/1.1which works fine.

When using JDK 11 the selected Protocol is http/2 but startHttp2(pingIntervalMillis) actually works fine.

When using > JDK 8u251, OkHttp "wrongly" detects the Platform as Jdk9Platform. This in turn enables Application-Layer Protocol Negotiation (ALPN) support which is not available out-of-the-box for JDK 8, thus enabling http/2protocol which OkHttp then selects (as it is the most efficient).
In order to prevent this we need to force http/1.0 protocol whenever JDK 8 is detected during runtime (which would be the default anyway). I will submit a PR with the fix.
Set Config#http2Disable (http2.disable system property // HTTP2_DISABLE environment variable) true.
@manusa Thanks a lot for the investigation and fix. Could we also backport this to branch 4.9? We are going to upgrade the fabric8 kubernetes-client version to 4.9.x as @rohanKanojia 's suggestion.
ah, We recently split the Kubernetes Model into smaller jars per apiVersion in #2137 . But it has introduced some regressions #2195, #2201 and #2205
If you're not affected by these bugs, you can upgrade. We have plans on fixing them and then cutting a stable release.
Have you tried the workaround (e.g. export HTTP2_DISABLE=true)?
There's no problem in backporting the fix but we're currently directing all efforts to cut a stable 4.10 release (so probably this will come first than a fixed 4.9.x).
@manusa I have verified it could work after export HTTP2_DISABLE=true. Great job.
I can confirm as well that the workaround seems to work for me both locally on Kube 1.18.2 and on OCP 4.2. Thanks for looking into this @manusa and finding the solution ... great help for us!
This was fixed in 3.12.11 and 4.6.0 a couple of weeks ago.
https://square.github.io/okhttp/changelog/
https://square.github.io/okhttp/changelog_3x/
Fix: Don鈥檛 crash on Java 8u252 which introduces an API previously found only on Java 9 and above. See Jetty鈥檚 overview of the API change and its consequences.
The other report in your project which is for Azul's Zulu VM - https://github.com/fabric8io/kubernetes-client/issues/2145
When we saw that in OkHttp there was also a fix required in Zulu
Our wrapper is fixed, we switched to a javaassist generated proxy, so we should be good to go there.
Looks great! JDK 9 platform activates and the fallback for UnsupportedOperationException for our incompatible SslSocket wrapper successfully falls back to HTTP/1.
https://github.com/square/okhttp/issues/5970#issuecomment-619087509
Thx for your feedback.
We are using the latest releases of OkHttp with your fixes :heart:.
The problem is that sun.security.ssl.SSLSocketImpl is never throwing the UnsupportedOperationException in SSLSocketImpl.java, so that is why HTTP/2 becomes available.
You can check the below screenshot form a debugging session in OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_252-b09)

Anyway, we managed to fix the issue by forcing http/1.1 from our side when JDK 8 is detected (_in a rudimentary way_), but I'm not sure that other users of OkHttp will face a similar issue.
Is there a reason you need to avoid HTTP/2? We have one existing issue with Keeppass client where HTTP/2 is a problem because they want the error from a smart HTTP/2 server that sends the HTTP/2 goaway before a request completes. But generally, I would have thought you would be happy that HTTP/2 is working out of the box in JDK 8.
Or to put it another way, why not look into why HTTP/2 is failing in JDK 8 252?
Is there a reason you need to avoid HTTP/2?...
Not at all.
Or to put it another way, why not look into why HTTP/2 is failing in JDK 8 252?
I completely agree on this.
I'm a little bit lost here, is HTTP/2 supposed to be fully supported in JDK8u252? (I assume it is given your previous comment(s))
Have you successfully made it work in any given implementation?
Is the problem related to our specific environment (k8s REST API, JDK, etc.)?
Yep - it should be working. This is with a test client (okurl).
10:35:17.443 OkHttp Platform: Jdk9Platform
10:35:17.444 Protocol: h2
10:35:17.446 JVM: 25.252-b09
$ ./okurl --debug https://www.twitter.com/robots.txt
10:35:16.857 url https://www.twitter.com/robots.txt
10:35:16.867 Request Request{method=GET, url=https://www.twitter.com/robots.txt, headers=[User-Agent:okurl/dev], tags={class com.baulsupp.okurl.credentials.Token=TokenSet(name=default)}}
10:35:16.947 Dns (www.twitter.com): www.twitter.com/104.244.42.1, www.twitter.com/104.244.42.193
10:35:17.152 Q10002 scheduled after 5 s : OkHttp www.twitter.com ping
10:35:17.158 >> CONNECTION 505249202a20485454502f322e300d0a0d0a534d0d0a0d0a
10:35:17.158 >> 0x00000000 6 SETTINGS
10:35:17.159 >> 0x00000000 4 WINDOW_UPDATE
10:35:17.159 Q10005 scheduled after 0 碌s: OkHttp www.twitter.com
10:35:17.160 Q10005 starting : OkHttp www.twitter.com
10:35:17.160 Q10001 scheduled after 0 碌s: OkHttp ConnectionPool
10:35:17.160 Q10001 starting : OkHttp ConnectionPool
10:35:17.161 Q10001 run again after 300 s : OkHttp ConnectionPool
10:35:17.161 Q10001 finished run in 763 碌s: OkHttp ConnectionPool
10:35:17.161 << 0x00000000 6 SETTINGS
10:35:17.162 Q10002 scheduled after 0 碌s: OkHttp www.twitter.com applyAndAckSettings
10:35:17.162 Q10002 starting : OkHttp www.twitter.com applyAndAckSettings
10:35:17.163 --> GET https://www.twitter.com/robots.txt h2
10:35:17.163 User-Agent: okurl/dev
10:35:17.163 Accept-Encoding: br,gzip
10:35:17.163 Host: www.twitter.com
10:35:17.163 Q10004 scheduled after 0 碌s: OkHttp www.twitter.com onSettings
10:35:17.163 Connection: Keep-Alive
10:35:17.163 >> 0x00000000 0 SETTINGS ACK
10:35:17.163 --> END GET
10:35:17.163 Q10004 starting : OkHttp www.twitter.com onSettings
10:35:17.164 Q10002 finished run in 1 ms: OkHttp www.twitter.com applyAndAckSettings
10:35:17.164 Q10004 finished run in 369 碌s: OkHttp www.twitter.com onSettings
10:35:17.227 << 0x00000000 0 SETTINGS ACK
10:35:17.297 Matching interceptor: null
10:35:17.302 >> 0x00000003 42 HEADERS END_STREAM|END_HEADERS
10:35:17.432 << 0x00000003 354 HEADERS END_HEADERS
10:35:17.434 << 0x00000003 52 DATA END_STREAM
10:35:17.438 <-- 200 https://www.twitter.com/robots.txt (274ms)
10:35:17.438 content-encoding: gzip
10:35:17.439 content-length: 52
10:35:17.439 content-type: text/plain;charset=utf-8
10:35:17.439 date: Fri, 15 May 2020 09:35:17 GMT
10:35:17.439 server: tsa_f
10:35:17.439 set-cookie: personalization_id="v1_dRKt013uHMsG0nP5LTvx3Q=="; Max-Age=63072000; Expires=Sun, 15 May 2022 09:35:17 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None
10:35:17.439 set-cookie: guest_id=v1%3A158953531735934219; Max-Age=63072000; Expires=Sun, 15 May 2022 09:35:17 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None
10:35:17.439 strict-transport-security: max-age=631138519
10:35:17.440 x-connection-hash: 00892da89b17cfdd912eca68ff4da741
10:35:17.440 x-response-time: 113
10:35:17.440 <-- END HTTP
10:35:17.443 OkHttp Platform: Jdk9Platform
10:35:17.444 Protocol: h2
10:35:17.444 TLS Version: TLS_1_2
10:35:17.444 Cipher: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
10:35:17.445 Peer Principal: CN=twitter.com, OU=lon3, O="Twitter, Inc.", L=San Francisco, ST=California, C=US
10:35:17.446 Local Principal: none
10:35:17.446 JVM: 25.252-b09
10:35:17.459 Q10001 scheduled after 0 碌s: OkHttp ConnectionPool
User-agent: *
Disallow: /
10:35:17.459 Q10001 starting : OkHttp ConnectionPool
10:35:17.459 Q10001 run again after 300 s : OkHttp ConnectionPool
10:35:17.459 Q10001 finished run in 376 碌s: OkHttp ConnectionPool
10:35:17.460 Q10001 canceled : OkHttp ConnectionPool
10:35:17.461 >> 0x00000000 8 GOAWAY
10:35:17.462 Q10002 canceled : OkHttp www.twitter.com ping
10:35:17.462 Q10005 finished run in 302 ms: OkHttp www.twitter.com
This is the error with Keeppass https://github.com/PhilippC/keepass2android/issues/747
It's with a Go based server, and it's clearly an OkHttp bug but without a simple fix. IIRC in that case we are causing the pending 401, to become a "stream was reset: NO_ERROR".
Maybe it's something similar. The fix there was for them to disable HTTP/2.
Thx for you input!
I'll try to find some time to see if at least we get the underlying cause for the failure in Kubernetes-Client.
So do we have a concrete plan for v4.9.2?
So do we have a concrete plan for v4.9.2?
We're finishing with the fixes for bugs reported in 4.10.1, we'll probably release 4.10.2 in the next few days. This patched version should take care of any bugs and should be a direct replacement for 4.9.1.
If after 4.10.2 release there are still some issues/bug, or if you find some strong reason (besides our previous recommendation) to not upgrade version, we can then release a back-ported patched 4.9.2.
is there a release planned with this https://github.com/fabric8io/kubernetes-client/pull/2227 fix? Upgrade to okhttp 3.12.11 didn't help me.
is there a release planned with this #2227 fix? Upgrade to okhttp 3.12.11 didn't help me.
4.10.2
So do we have a concrete plan for v4.9.2?
We're finishing with the fixes for bugs reported in 4.10.1, we'll probably release 4.10.2 in the next few days. This patched version should take care of any bugs and should be a direct replacement for 4.9.1.
If after 4.10.2 release there are still some issues/bug, or if you find some strong reason (besides our previous recommendation) to not upgrade version, we can then release a back-ported patched 4.9.2.
Actually I am not very sure whether 4.10.2 is stable enough for production.
@manusa @rohanKanojia I share the same concern with @zhengcanbin. Bumping the version in downstream project is not very easy. So we want to make sure that whether 4.9.x is more stable that the latest version 4.10.x.
I think other projects may have the same situation.
Downstream in Eclipse Che is interested in 4.9.x version.
Just started release process for v4.9.2
Patched release with cherry- picked backport done: https://github.com/fabric8io/kubernetes-client/releases/tag/v4.9.2
Maven Central: https://repo1.maven.org/maven2/io/fabric8/kubernetes-client/4.9.2/
Patched release with cherry- picked backport done: https://github.com/fabric8io/kubernetes-client/releases/tag/v4.9.2
Maven Central: https://repo1.maven.org/maven2/io/fabric8/kubernetes-client/4.9.2/
Thank you @manusa !
Most helpful comment
I can confirm as well that the workaround seems to work for me both locally on Kube 1.18.2 and on OCP 4.2. Thanks for looking into this @manusa and finding the solution ... great help for us!