Kubernetes-client: OkHttpClient reports connection leak in OperationSupport.handleResponse

Created on 11 Oct 2018 · 15Comments · Source: fabric8io/kubernetes-client

The following happens with version 4.0.5 using KubernetesClient.

OkHttpClient reports the following connection leak on log level FINE.

WARNING: A connection to https://[...] was leaked. Did you forget to close a response body?
java.lang.Throwable: response.body().close()
  at okhttp3.internal.platform.Platform.getStackTraceForCloseable(Platform.java:144)
  at okhttp3.RealCall.captureCallStackTrace(RealCall.java:89)
  at okhttp3.RealCall.execute(RealCall.java:73)
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:379)
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344)
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:313)
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:296)
  at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:794)
  at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:210)
  at io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.edit(HasMetadataOperation.java:68)
  at io.fabric8.kubernetes.client.dsl.internal.DeploymentOperationsImpl.edit(DeploymentOperationsImpl.java:78)
  at io.fabric8.kubernetes.client.dsl.internal.DeploymentOperationsImpl.edit(DeploymentOperationsImpl.java:44)
  at io.fabric8.kubernetes.client.dsl.internal.DeploymentOperationsImpl$DeploymentReaper.reap(DeploymentOperationsImpl.java:163)
  at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:614)
  at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:63)
  at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:655)
  at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:643)
  at io.fabric8.kubernetes.client.handlers.DeploymentHandler.delete(DeploymentHandler.java:63)
  at io.fabric8.kubernetes.client.handlers.DeploymentHandler.delete(DeploymentHandler.java:32)
  at io.fabric8.kubernetes.client.dsl.internal.NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.delete(NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.java:158)
  at io.fabric8.kubernetes.client.dsl.internal.NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.delete(NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.java:57)

Looking at OperationSupport.java:379, the problem seems to be that we directly call client.newCall(request).execute() instead of wrapping it in a try-with-resources as the Javadocs of Call.execute() suggest:

<p>To avoid leaking resources callers should close the {@link Response} which in turn will close the underlying {@link ResponseBody}.

<pre>
   // ensure the response (and underlying response body) is closed
   try (Response response = client.newCall(request).execute()) {
     ...
   }
</pre>

The following code results in the above warning for me:

  @Test
  public void reproduceLeakWarning() {
    Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE);

    KubernetesClient client = new DefaultKubernetesClient();

    Deployment deployment = new DeploymentBuilder()
        .withMetadata(
            new ObjectMetaBuilder()
                .withName("greeting")
                .build()
        )
        .withSpec(
            new DeploymentSpecBuilder()
                .withReplicas(1)
                .withSelector(
                    new LabelSelectorBuilder()
                        .withMatchLabels(
                            ImmutableMap.of("app", "greeting")
                        )
                        .build()
                )
                .withTemplate(
                    new PodTemplateSpecBuilder()
                        .withMetadata(new ObjectMetaBuilder()
                            .withLabels(ImmutableMap.of("app", "greeting"))
                            .build())
                        .withSpec(new PodSpecBuilder()
                            .withContainers(
                                new ContainerBuilder()
                                    .withName("greeting")
                                    .withImage("arungupta/greeting")
                                    .withImagePullPolicy("IfNotPresent")
                                    .withPorts(
                                        new ContainerPortBuilder()
                                            .withContainerPort(8080)
                                            .withName("http")
                                            .build(),
                                        new ContainerPortBuilder()
                                            .withContainerPort(5005)
                                            .withName("debug")
                                            .build()
                                    )
                                    .build()
                            )
                            .build())
                        .build()
                )
                .build()
        )
        .build();

    client.resource(deployment).delete();
    client.resource(deployment).delete();
    client.resource(deployment).get();
    client.resource(deployment).get();
  }

I think that this ticket is unrelated to #1013 which reports a similar warning when using watches.

statustale

Source

uce

Most helpful comment

After testing I get the impression that this is a race condition in okhttp. When we make multiple calls to the client in succession, the cleanup seems to be mid-stream or something causing those exceptions to appear.

When I tested locally, creating a new client every time I was making a call versus using the same client resulted in no warnings in the logs about leaks. When I made multiple calls at once, like the following code, I saw far less warnings:

while (count < 10) { try (KubernetesClient kubeClient = new DefaultKubernetesClient(config)) { kubeClient.apps().daemonSets().list().getItems(); kubeClient.apps().deployments().list().getItems(); kubeClient.pods().list().getItems(); kubeClient.nodes().list().getItems(); } catch (Exception e) { e.printStackTrace(); } count++; }

@uce is making 4 calls in rapid succession - I'm willing to bet if he did a sleep between each call he would not see this as often:

client.resource(deployment).delete(); client.resource(deployment).delete(); client.resource(deployment).get(); client.resource(deployment).get();

The issue with creating a new client each time is obvious; seeing about a 100-200ms performance hit for every call.

traviswinter on 15 Nov 2018

👍2

All 15 comments

@uce : Hi, Thanks for reporting this. On which platform you're reproducing it. I was trying on Openshift but could not reproduce. As far as I know try with resources for ResponseBody works too. Does it work for you when you wrap response into try with resources. Notice that response is consumed in unmarshal method, so maybe it's getting consumed on a different thread?