Describe the bug
I'm not sure if this is a bug of the quarkus extension or the client itself.
When my application starts, it registers a watcher for Kubernetes Jobs like this
kubernetesClient.batch().jobs().withLabel("jobType", "myJobs").watch(new Watcher<>() {
...
}
All runs fine (I receive job updates as expected) on Azure k8s service for a while. But then suddenly, the log is spammed with exceptions like this
2020-03-09 04:08:41,910 WARN [io.fab.kub.cli.dsl.int.WatchConnectionManager] (OkHttp https://172.16.100.1/...) Exec Failure: io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
at io.fabric8.kubernetes.client.utils.Serialization.unmarshal(Serialization.java:243)
at io.fabric8.kubernetes.client.utils.Serialization.unmarshal(Serialization.java:162)
at io.fabric8.kubernetes.client.utils.Serialization.unmarshal(Serialization.java:147)
at io.fabric8.kubernetes.client.dsl.internal.WatchHTTPManager.readWatchEvent(WatchHTTPManager.java:290)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:229)
at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:834)
at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:497)
at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:193)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: No resource type found for:v1#Status
at [Source: (BufferedInputStream); line: 1, column: 176] (through reference chain: io.fabric8.kubernetes.api.model.WatchEvent["object"])
at io.fabric8.kubernetes.internal.KubernetesDeserializer.fromObjectNode(KubernetesDeserializer.java:107)
at io.fabric8.kubernetes.internal.KubernetesDeserializer.deserialize(KubernetesDeserializer.java:83)
at io.fabric8.kubernetes.internal.KubernetesDeserializer.deserialize(KubernetesDeserializer.java:41)
at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:129)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:288)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:151)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4202)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3250)
at io.fabric8.kubernetes.client.utils.Serialization.unmarshal(Serialization.java:241)
... 16 more
This message is repeated continously and my service does not seem to receive job updates any longer. Unfortunately, the exception does not surface to my code and only seems to happens after a random time on Azure. I was not able to reproduce it locally with Minikube. Also, the log is spammed so quickly, that I wasn't able to see the beginning of the exceptions. I only ever see the exception above repeated indefinitely.
I am running a native build of my application. So I thought it might be caused by this. However, since the the error takes so long to appear, I still have to test if it is fixed with a JVM build.
Expected behavior
Should not get such an exception or it should be possible to react on it.
Actual behavior
The jobs watching seems to not work any more. Furthermore, I do not have a chance to notice the error in my application and react (e.g. restart).
To Reproduce
Cannot reproduce it reliabily. For us it happens with an application with native build watching jobs as described above and running on Azure k8s service.
Environment (please complete the following information):
I found this issue https://github.com/fabric8io/kubernetes-client/issues/1429 with a similar error message (it regards v1#List instead of `v1#Status'. Could it be that the native build does not include some required resource files?
@gastaldi: You fixed fabric8io/kubernetes-client#1429. Do you have an idea what could be wrong here?
@andreas-eberle it's hard to say without a reproducer project, but your theory of missing resource files during the native build makes sense. Try in JVM mode and see if it the problem remains
OK, thanks. Since the problem only occurs after a random time of sometimes days, it is hard to debug. But I will come back when I have some updates.
I seriously doubt the problem is in Quarkus, but like @gastaldi said, a reproducer would be great :)
cc @iocanel
Asking a stupid question: do we renew the connections from time to time?
On Tue, Mar 10, 2020 at 10:28 AM Georgios Andrianakis <
[email protected]> wrote:
I seriously doubt the problem is in Quarkus, but like @gastaldi
https://github.com/gastaldi said, a reproducer would be great :)—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/quarkusio/quarkus/issues/7683?email_source=notifications&email_token=AAJYOBKIQZLINKT7WRZNS5DRGYB3ZA5CNFSM4LECZGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOKUVNA#issuecomment-596986548,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAJYOBMHQVSEHGMZWG4SPWTRGYB3ZANCNFSM4LECZGNQ
.
We do not do anything explicitly, it's all up to the Kubernetes Client (which is why I am pinging the expert @iocanel
:wink: )
Yeah, asking that because, from our experience with Azure Pipelines, I wouldn't say Azure is well known for its network stability. So it might be worth checking we can recover from a network issue.
I'm having the same problem with an application built using a native image, quarkus version 1.6. 0.final, graalvm version is 20.1.0.
I tried the jvm version of the deployment and found that it no longer reported this exception
@iocanel any idea?
It seems that the kubernetes deserializer can't find a resource with apiVersion v1 and kind status.
Need to check if such resource exists and which module provides it.