The following happens with version 4.0.5 using KubernetesClient.
OkHttpClient reports the following connection leak on log level FINE.
WARNING: A connection to https://[...] was leaked. Did you forget to close a response body?
java.lang.Throwable: response.body().close()
at okhttp3.internal.platform.Platform.getStackTraceForCloseable(Platform.java:144)
at okhttp3.RealCall.captureCallStackTrace(RealCall.java:89)
at okhttp3.RealCall.execute(RealCall.java:73)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:379)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:313)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:296)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:794)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:210)
at io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.edit(HasMetadataOperation.java:68)
at io.fabric8.kubernetes.client.dsl.internal.DeploymentOperationsImpl.edit(DeploymentOperationsImpl.java:78)
at io.fabric8.kubernetes.client.dsl.internal.DeploymentOperationsImpl.edit(DeploymentOperationsImpl.java:44)
at io.fabric8.kubernetes.client.dsl.internal.DeploymentOperationsImpl$DeploymentReaper.reap(DeploymentOperationsImpl.java:163)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:614)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:63)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:655)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:643)
at io.fabric8.kubernetes.client.handlers.DeploymentHandler.delete(DeploymentHandler.java:63)
at io.fabric8.kubernetes.client.handlers.DeploymentHandler.delete(DeploymentHandler.java:32)
at io.fabric8.kubernetes.client.dsl.internal.NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.delete(NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.java:158)
at io.fabric8.kubernetes.client.dsl.internal.NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.delete(NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.java:57)
Looking at OperationSupport.java:379, the problem seems to be that we directly call client.newCall(request).execute() instead of wrapping it in a try-with-resources as the Javadocs of Call.execute() suggest:
<p>To avoid leaking resources callers should close the {@link Response} which in turn will close the underlying {@link ResponseBody}.
<pre>
// ensure the response (and underlying response body) is closed
try (Response response = client.newCall(request).execute()) {
...
}
</pre>
The following code results in the above warning for me:
@Test
public void reproduceLeakWarning() {
Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE);
KubernetesClient client = new DefaultKubernetesClient();
Deployment deployment = new DeploymentBuilder()
.withMetadata(
new ObjectMetaBuilder()
.withName("greeting")
.build()
)
.withSpec(
new DeploymentSpecBuilder()
.withReplicas(1)
.withSelector(
new LabelSelectorBuilder()
.withMatchLabels(
ImmutableMap.of("app", "greeting")
)
.build()
)
.withTemplate(
new PodTemplateSpecBuilder()
.withMetadata(new ObjectMetaBuilder()
.withLabels(ImmutableMap.of("app", "greeting"))
.build())
.withSpec(new PodSpecBuilder()
.withContainers(
new ContainerBuilder()
.withName("greeting")
.withImage("arungupta/greeting")
.withImagePullPolicy("IfNotPresent")
.withPorts(
new ContainerPortBuilder()
.withContainerPort(8080)
.withName("http")
.build(),
new ContainerPortBuilder()
.withContainerPort(5005)
.withName("debug")
.build()
)
.build()
)
.build())
.build()
)
.build()
)
.build();
client.resource(deployment).delete();
client.resource(deployment).delete();
client.resource(deployment).get();
client.resource(deployment).get();
}
I think that this ticket is unrelated to #1013 which reports a similar warning when using watches.
@uce : Hi, Thanks for reporting this. On which platform you're reproducing it. I was trying on Openshift but could not reproduce. As far as I know try with resources for ResponseBody works too. Does it work for you when you wrap response into try with resources. Notice that response is consumed in unmarshal method, so maybe it's getting consumed on a different thread?
@rohanKanojia I've updated the description to include the version information. I'm using 4.0.5 and KubernetesClient with Kubernetes v1.9.9.
I think that the problem in https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/base/OperationSupport.java#L379 is that this throws before we get into the try-catch block. That's why it may never get to unmarshal or the try-with-resources for the body.
Unfortunately, I haven't checked out the client code and can't modify the original source to move the try-catch.
@uce : Are you facing this problem in 4.0.5 only? Could you please try with previous versions i.e 4.0.3/4.0.0 ? Actually I sent a fix for leak in a PR few weeks ago : https://github.com/fabric8io/kubernetes-client/pull/1204/files#diff-bd0b3215ca04a57dd5925ca30802efd4R391 (that was supposed to fix leaks, and it was working also at that time).
This warning relies on the garbage collector to find instances that went out of scope before they were closed, and this process is neither immediate nor particularly reliable.
I think that the problem in https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/base/OperationSupport.java#L379 is that this throws before we get into the try-catch block. That's why it may never get to
unmarshalor the try-with-resources for the body.Unfortunately, I haven't checked out the client code and can't modify the original source to move the try-catch.
Please see https://github.com/square/okhttp/issues/4258 and https://github.com/fabric8io/kubernetes-client/pull/1198 (which is unmerged?).
@ljnelson : Actually it's a different leak, see handleResponse function on OperationSupport
Yes, but note that if line 379 fails, the ResponseBody will not have been opened so there is nothing to close (or leak). So I believe that the message the OP is reporting is still related to https://github.com/square/okhttp/issues/4258.
Hm, I see. I think we should sync with okhttp team again to get this prioritized somehow. Or maybe fix it in okhttp itself.
Agreed; as noted in the OkHttp issue, I'm no expert on their codebase, so my analysis should be taken with a grain of salt.
It looks like this was fixed in their repo (https://github.com/square/okhttp/pull/4374) but I'm still seeing issues with Kubernetes 1.7 and 1.8 clients when I ran using it. Kubernetes 1.9+ seem to have no issues however. Seeing the same log statements as @uce :
Nov 14, 2018 12:33:33 PM okhttp3.internal.platform.Platform log
WARNING: A connection to https://db-aks-187-0eb3ad36.hcp.eastus.azmk8s.io/ was leaked. Did you forget to close a response body?
java.lang.Throwable: response.body().close()
at okhttp3.internal.platform.Platform.getStackTraceForCloseable(Platform.java:148)
at okhttp3.RealCall.captureCallStackTrace(RealCall.java:115)
at okhttp3.RealCall.execute(RealCall.java:87)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:379)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:328)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:598)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:63)
at com.embotics.autoscaler.Fabric8ioTester.main(Fabric8ioTester.java:52)
Here is the code I was running:
public static void main(String[] args) {
Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE);
String contents = StringUtils.EMPTY;
try {
contents = FileUtils.readFileToString(new File("C:\\path\\config"), Charset.defaultCharset());
} catch (IOException e) {
//Should never get here
}
Config config = Config.fromKubeconfig(contents);
final KubernetesClient kubeClient = new DefaultKubernetesClient(config);
int count = 0;
while (count < 1000) {
try {
kubeClient.apps().deployments().list();
Thread.sleep(500);
} catch (Exception e) {
//Do nothing
}
count++;
}
kubeClient.close();
}
Yeah, the issue is fixed but We still have to upgrade to okhttp version using that release. And that hasn't been released yet. I still need to verify this using SNAPSHOT though..
Sorry, I probably wasn't clear - I tested their fix and still see the issue. If you could verify using Snapshot that would be great; like I said, seems fine for Kubernetes 1.9+ clusters, but 1.7 and 1.8 complain.
oh, you mean you tested with snapshot but still see this :-o . I'll find time to verify
After testing I get the impression that this is a race condition in okhttp. When we make multiple calls to the client in succession, the cleanup seems to be mid-stream or something causing those exceptions to appear.
When I tested locally, creating a new client every time I was making a call versus using the same client resulted in no warnings in the logs about leaks. When I made multiple calls at once, like the following code, I saw far less warnings:
while (count < 10) {
try (KubernetesClient kubeClient = new DefaultKubernetesClient(config)) {
kubeClient.apps().daemonSets().list().getItems();
kubeClient.apps().deployments().list().getItems();
kubeClient.pods().list().getItems();
kubeClient.nodes().list().getItems();
} catch (Exception e) {
e.printStackTrace();
}
count++;
}
@uce is making 4 calls in rapid succession - I'm willing to bet if he did a sleep between each call he would not see this as often:
client.resource(deployment).delete();
client.resource(deployment).delete();
client.resource(deployment).get();
client.resource(deployment).get();
The issue with creating a new client each time is obvious; seeing about a 100-200ms performance hit for every call.
This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!
Most helpful comment
After testing I get the impression that this is a race condition in okhttp. When we make multiple calls to the client in succession, the cleanup seems to be mid-stream or something causing those exceptions to appear.
When I tested locally, creating a new client every time I was making a call versus using the same client resulted in no warnings in the logs about leaks. When I made multiple calls at once, like the following code, I saw far less warnings:
while (count < 10) { try (KubernetesClient kubeClient = new DefaultKubernetesClient(config)) { kubeClient.apps().daemonSets().list().getItems(); kubeClient.apps().deployments().list().getItems(); kubeClient.pods().list().getItems(); kubeClient.nodes().list().getItems(); } catch (Exception e) { e.printStackTrace(); } count++; }@uce is making 4 calls in rapid succession - I'm willing to bet if he did a sleep between each call he would not see this as often:
client.resource(deployment).delete(); client.resource(deployment).delete(); client.resource(deployment).get(); client.resource(deployment).get();The issue with creating a new client each time is obvious; seeing about a 100-200ms performance hit for every call.