Okhttp: Connection Leaks happening during WebSocket Operations

Created on 9 Sep 2018  路  10Comments  路  Source: square/okhttp

Hi All,

Where exactly we are supposed to do response.close() in case of WebSocket operations? I don't see any in the example i.e WebSocketEcho provided in okhttp repo.

We're trying to get rid of these OkHttp connection leaks generated during WebSocket operations in fabric8io/kubernetes-client. For example, There is one websocket operation happening in PodOperationsImpl"

        try {
            URL url = new URL(URLUtils.join(getResourceUrl().toString(), sb.toString()));
            Request.Builder r = new Request.Builder().url(url).header("Sec-WebSocket-Protocol", "v4.channel.k8s.io").get();
            OkHttpClient clone = client.newBuilder().readTimeout(0, TimeUnit.MILLISECONDS).build();
            final ExecWebSocketListener execWebSocketListener = new ExecWebSocketListener(getConfig(), in, out, err, errChannel, inPipe, outPipe, errPipe, errChannelPipe, execListener);
            clone.newWebSocket(r.build(), execWebSocketListener);
            execWebSocketListener.waitUntilReady();
            return execWebSocketListener;
        } catch (Throwable t) {
            throw KubernetesClientException.launderThrowable(forOperationType("exec"), t);
        }

Which gets handled on ExecWebSocketListener , We're closing socket in close(..) callback here:

    private void closeWebSocketOnce(int code, String reason) {
      if (closed.get()) {
        return;
      }

      try {
        WebSocket ws = webSocketRef.get();
        if (ws != null) {
          ws.close(code, reason);
        }
      } catch (Throwable t) {
        LOGGER.debug("Error closing WebSocket.", t);
      }
    }

But somehow we're still getting connection leaks, In logs I can see that stacktrace points to the WebSocket opening operation in PodOperationsImpl:


Sep 09, 2018 12:08:43 PM okhttp3.internal.platform.Platform log
WARNING: A connection to http://localhost:55231/ was leaked. Did you forget to close a response body?
java.lang.Throwable: response.body().close()
    at okhttp3.internal.platform.Platform.getStackTraceForCloseable(Platform.java:144)
    at okhttp3.RealCall.captureCallStackTrace(RealCall.java:89)
    at okhttp3.RealCall.enqueue(RealCall.java:98)
    at okhttp3.internal.ws.RealWebSocket.connect(RealWebSocket.java:183)
    at okhttp3.OkHttpClient.newWebSocket(OkHttpClient.java:436)
    at io.fabric8.kubernetes.client.dsl.internal.PodOperationsImpl.exec(PodOperationsImpl.java:267)
    at io.fabric8.kubernetes.client.dsl.internal.PodOperationsImpl.exec(PodOperationsImpl.java:61)
    at io.fabric8.kubernetes.client.mock.ExecTest.doExec(ExecTest.java:99)
    at io.fabric8.kubernetes.client.mock.ExecTest.doIteration(ExecTest.java:81)
    at io.fabric8.kubernetes.client.mock.ExecTest.testConnectionLeaks(ExecTest.java:73)
    at io.fabric8.kubernetes.client.mock.ExecTest.testMockServer(ExecTest.java:50)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
    at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
    at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)

There is a PR already in progress which tries to close response body and reduce leaks(not working though): https://github.com/fabric8io/kubernetes-client/pull/1198 . Could anyone please check what we're doing wrong here? Any kind of help would be highly appreciated :-).

  • Steps to reproduce:
  1. Clone kubernetes-client repository:
git clone -b reduceOkHttpLeaks https://github.com/rohanKanojia/kubernetes-client.git
  1. Navigate to kubernetes-tests directory and run:
 mvn -Dtest="io.fabric8.kubernetes.client.mock.ExecTest#testMockServer" test
  1. Wait ~10 minutes. For me after 5th iteration(after 5 minutes) logs appears.
  2. OkHttp Version: 3.9.1
bug

Most helpful comment

Sounds like a bug we can fix in OkHttp!

All 10 comments

The scenario:

The unit test in question deliberately tries to set up a websocket to a pod that doesn't exist:

https://github.com/fabric8io/kubernetes-client/blob/f7843cff15b239e4466167f63c599fc9ca4e4146/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/PodOperationsImpl.java#L267

The connection will obviously return a 404, which will result in a WebSocketListener being notified of failure:

https://github.com/fabric8io/kubernetes-client/blob/f7843cff15b239e4466167f63c599fc9ca4e4146/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/ExecWebSocketListener.java#L211-L237

(In the code as it exists the Response supplied to onFailure is never closed; there's a pull request to deal with that, but it's also an open question as to whether the Response should be closed here or if OkHttp will take care of it somewhere else.)

By this point, the ConnectInterceptor has allocated a new connection:

https://github.com/square/okhttp/blob/0b2486f7f32fbe21c8630f330d240546a15beb66/okhttp/src/main/java/okhttp3/internal/connection/ConnectInterceptor.java#L42

However, this allocation is never released. Specifically, even when all response bodies are closed, those operations can't ever "get at" the stream allocation to deallocate it. In other words, ResponseBody#close() ultimately invokes this:

https://github.com/square/okhttp/blob/0b2486f7f32fbe21c8630f330d240546a15beb66/okhttp/src/main/java/okhttp3/internal/http1/Http1Codec.java#L421-L430

鈥ut nothing actually deallocates the stream housed by the Http1Codec object referenced by the FixedLengthSource object. As a result, stream allocation references with null referents eventually get noticed by the cleanup loop:

https://github.com/square/okhttp/blob/5b37cda9e00626f43acf354df145fd452c3031f1/okhttp/src/main/java/okhttp3/ConnectionPool.java#L266-L271

@swankjesse @JakeWharton @jwendell : Please share your thougthts on this :-)

Sounds like a bug we can fix in OkHttp!

Thanks @swankjesse; do bear in mind I'm not deeply familiar with the OkHttp code so my analysis may be lacking.

So I understand correctly, you're saying it's the HttpCodec1 not properly releasing its streamAllocation once the interceptor chain is complete?

I guess so; my comment above is the best I could do! Thanks for looking into this.

No problem. It's a blocking issue for us, so I need to resolve. Getting familiar with the okhttp code base.

I may have found a fix, just running some tests before I submit for review.

Started a pull request for this issue -- hoping to spark a discussion about implementation of this fix.

Thanks a lot @swankjesse , any possible ETA when 3.12 is going to be released

I don鈥檛 provide time estimates for open source things. You should build your own snapshot if you need something urgently.

https://github.com/square/okhttp/milestone/28

Was this page helpful?
0 / 5 - 0 ratings