Kubernetes-client: watch should handle etcd old version exception

Created on 14 May 2018  路  28Comments  路  Source: fabric8io/kubernetes-client

I am running spark on kubernetes. This is the full issue description https://issues.apache.org/jira/browse/SPARK-24266

I think the exception too old resource version: 21648111 (21653211) should be better handled in kubernetes-client instead of simply throw it to the caller because resource version is cached by kubernetes-client, not by the caller. https://github.com/fabric8io/kubernetes-client/blob/5b1a57b64c7dcc7ebeba3a7024e8615c91afaedb/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/WatchConnectionManager.java#L259-L266

componenkubernetes-client enhancement statustale

Most helpful comment

@manusa one big difference is that with a watcher we can watch one single pod. This is watch spark-submit does when watching the driver, with sharedinformer I am watching all the pods. Unless there is way to watch a single pod?
Anyway I guess this will use more resources then needed, unless I am mistaken and this is negligible?

All 28 comments

cc @foxish

We run into the same issue, any progress on this?

@yujiantao For a simple fix, you can try comment out these lines https://github.com/fabric8io/kubernetes-client/blob/v4.0.5/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/WatchConnectionManager.java#L141-L143
We've using it for a long time, everything is fine.

We also running in the same issue with Spark, would be great to see the fix eventually

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

It is still relevant for people using k8s-as-a-service on Azure. We applied the workaround mentioned by @chenchun and it works fine so far...

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

@yujiantao For a simple fix, you can try comment out these lines https://github.com/fabric8io/kubernetes-client/blob/v4.0.5/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/WatchConnectionManager.java#L141-L143
We've using it for a long time, everything is fine.

@chenchun Is this something we maybe could put into the client? That for some watches you don't care about version problems.

It is still relevant and requires a rebuild of Spark from the sources just for using hacked kubernetes-client :disappointed:

@SergSlipushenko : Actually we added it in past but we had to remove it, see #1800

@stijndehaes I took a look at #1800, is it better to add a bool flag of whether or not do re-watching automatically when receive a version change? So that we won't break the contract of sending HTTP_GONE if resource version is old and also makes people easier when they don't care about the problem.

@chenchun : Adding a boolean flag to reconnect sounds nice to me :+1: . This thing gets requested quite often. I think we need to tweak WatchConnectionManager

I also noticed there is a deprecated watch method method that allows you to set a resource version. Looking in the git history does not tell me much. But using that method allows you to set a null resource version, that way you don't get the HTTP GONE message I believe?

We implemented SharedInformers (#1384) a while back to mimic client-go's behavior and provide an extra level of abstraction for Watch operations (Kubernetes client-go: watch.Interface vs. cache.NewInformer vs. cache.NewSharedIndexInformer? and Writing Controllers/SharedInformers)

Our implementation of SharedInformers already takes care of HTTP_GONE scenario.

If you are looking for this reconnect behavior, I would encourage using SharedInformers instead of Watch, or else use watch with your own reconnect implementation. I think providing this behavior for watch too would be duplicating a feature that's already available in Informers.

@rohanKanojia maybe we can use this issue to provide some additional examples and documentation on different use-cases for SharedInformers. I think it's unclear that they should be the default approach to watch resources.

@manusa one big difference is that with a watcher we can watch one single pod. This is watch spark-submit does when watching the driver, with sharedinformer I am watching all the pods. Unless there is way to watch a single pod?
Anyway I guess this will use more resources then needed, unless I am mistaken and this is negligible?

I'm really unsure how we implemented the SharedInformer and what are the current features, but there should be an option to filter by labels (or even fields>i.e. metadata.name).
Any filtering option applicable for watch should be available to SharedInformes (since the latter is a superset of the former).
If this option isn't there we should work on providing that instead.

@manusa found it! You can do it like this I think:

val podInformer = informers.sharedIndexInformerFor(
      classOf[Pod],
      classOf[PodList],
      new OperationContext().withNamespace(NAMESPACE).withName(PODNAME),
      60000)

:tada: Good news @stijndehaes, thx for sharing!

@stijndehaes : Hi, I tried documenting SharedInformer support in a blog[0] this weekend. I tried out the code you listed in the comment but unfortunately, it didn't work out for me as expected. I kept getting this error:

/usr/java/jdk-14.0.1/bin/java -javaagent:/opt/ideaIC-2019.3.3/idea-IC-193.6494.35/lib/idea_rt.jar=39383:/opt/ideaIC-2019.3.3/idea-IC-193.6494.35/bin -Dfile.encoding=UTF-8 -classpath /home/rohaan/work/repos/kubernetes-client-demo/target/classes:/home/rohaan/.m2/repository/io/fabric8/kubernetes-client/4.10.1/kubernetes-client-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-core/4.10.1/kubernetes-model-core-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-common/4.10.1/kubernetes-model-common-4.10.1.jar:/home/rohaan/.m2/repository/io/sundr/builder-annotations/0.21.0/builder-annotations-0.21.0.jar:/home/rohaan/.m2/repository/io/sundr/sundr-core/0.21.0/sundr-core-0.21.0.jar:/home/rohaan/.m2/repository/io/sundr/sundr-codegen/0.21.0/sundr-codegen-0.21.0.jar:/home/rohaan/.m2/repository/io/sundr/resourcecify-annotations/0.21.0/resourcecify-annotations-0.21.0.jar:/home/rohaan/.m2/repository/io/sundr/transform-annotations/0.21.0/transform-annotations-0.21.0.jar:/home/rohaan/.m2/repository/org/jsonschema2pojo/jsonschema2pojo-core/0.4.23/jsonschema2pojo-core-0.4.23.jar:/home/rohaan/.m2/repository/com/google/code/javaparser/javaparser/1.0.11/javaparser-1.0.11.jar:/home/rohaan/.m2/repository/com/google/android/android/4.1.1.4/android-4.1.1.4.jar:/home/rohaan/.m2/repository/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar:/home/rohaan/.m2/repository/org/apache/httpcomponents/httpclient/4.0.1/httpclient-4.0.1.jar:/home/rohaan/.m2/repository/org/apache/httpcomponents/httpcore/4.0.1/httpcore-4.0.1.jar:/home/rohaan/.m2/repository/commons-codec/commons-codec/1.3/commons-codec-1.3.jar:/home/rohaan/.m2/repository/org/khronos/opengl-api/gl1.1-android-2.1_r1/opengl-api-gl1.1-android-2.1_r1.jar:/home/rohaan/.m2/repository/xerces/xmlParserAPIs/2.6.2/xmlParserAPIs-2.6.2.jar:/home/rohaan/.m2/repository/xpp3/xpp3/1.1.4c/xpp3-1.1.4c.jar:/home/rohaan/.m2/repository/com/sun/codemodel/codemodel/2.6/codemodel-2.6.jar:/home/rohaan/.m2/repository/com/google/code/gson/gson/2.5/gson-2.5.jar:/home/rohaan/.m2/repository/com/squareup/moshi/moshi/1.1.0/moshi-1.1.0.jar:/home/rohaan/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/home/rohaan/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.jar:/home/rohaan/.m2/repository/javax/validation/validation-api/1.0.0.GA/validation-api-1.0.0.GA.jar:/home/rohaan/.m2/repository/joda-time/joda-time/2.2/joda-time-2.2.jar:/home/rohaan/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.11/jackson-mapper-asl-1.9.11.jar:/home/rohaan/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.11/jackson-core-asl-1.9.11.jar:/home/rohaan/.m2/repository/org/apache/commons/commons-lang3/3.2.1/commons-lang3-3.2.1.jar:/home/rohaan/.m2/repository/com/google/code/findbugs/annotations/1.3.9/annotations-1.3.9.jar:/home/rohaan/.m2/repository/com/fasterxml/jackson/module/jackson-module-jaxb-annotations/2.10.3/jackson-module-jaxb-annotations-2.10.3.jar:/home/rohaan/.m2/repository/jakarta/xml/bind/jakarta.xml.bind-api/2.3.2/jakarta.xml.bind-api-2.3.2.jar:/home/rohaan/.m2/repository/jakarta/activation/jakarta.activation-api/1.2.1/jakarta.activation-api-1.2.1.jar:/home/rohaan/.m2/repository/javax/annotation/javax.annotation-api/1.3.2/javax.annotation-api-1.3.2.jar:/home/rohaan/.m2/repository/javax/xml/bind/jaxb-api/2.3.0/jaxb-api-2.3.0.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-rbac/4.10.1/kubernetes-model-rbac-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-admissionregistration/4.10.1/kubernetes-model-admissionregistration-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-apps/4.10.1/kubernetes-model-apps-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-autoscaling/4.10.1/kubernetes-model-autoscaling-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-apiextensions/4.10.1/kubernetes-model-apiextensions-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-batch/4.10.1/kubernetes-model-batch-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-certificates/4.10.1/kubernetes-model-certificates-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-coordination/4.10.1/kubernetes-model-coordination-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-discovery/4.10.1/kubernetes-model-discovery-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-events/4.10.1/kubernetes-model-events-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-extensions/4.10.1/kubernetes-model-extensions-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-networking/4.10.1/kubernetes-model-networking-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-metrics/4.10.1/kubernetes-model-metrics-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-policy/4.10.1/kubernetes-model-policy-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-scheduling/4.10.1/kubernetes-model-scheduling-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-settings/4.10.1/kubernetes-model-settings-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/kubernetes-model-storageclass/4.10.1/kubernetes-model-storageclass-4.10.1.jar:/home/rohaan/.m2/repository/io/fabric8/openshift-model/4.10.1/openshift-model-4.10.1.jar:/home/rohaan/.m2/repository/com/squareup/okhttp3/okhttp/3.12.11/okhttp-3.12.11.jar:/home/rohaan/.m2/repository/com/squareup/okio/okio/1.15.0/okio-1.15.0.jar:/home/rohaan/.m2/repository/com/squareup/okhttp3/logging-interceptor/3.12.11/logging-interceptor-3.12.11.jar:/home/rohaan/.m2/repository/com/fasterxml/jackson/dataformat/jackson-dataformat-yaml/2.10.3/jackson-dataformat-yaml-2.10.3.jar:/home/rohaan/.m2/repository/org/yaml/snakeyaml/1.24/snakeyaml-1.24.jar:/home/rohaan/.m2/repository/com/fasterxml/jackson/datatype/jackson-datatype-jsr310/2.10.3/jackson-datatype-jsr310-2.10.3.jar:/home/rohaan/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.10.3/jackson-annotations-2.10.3.jar:/home/rohaan/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.10.3/jackson-databind-2.10.3.jar:/home/rohaan/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.10.3/jackson-core-2.10.3.jar:/home/rohaan/.m2/repository/io/fabric8/zjsonpatch/0.3.0/zjsonpatch-0.3.0.jar:/home/rohaan/.m2/repository/com/github/mifmif/generex/1.0.2/generex-1.0.2.jar:/home/rohaan/.m2/repository/dk/brics/automaton/automaton/1.11-8/automaton-1.11-8.jar:/home/rohaan/.m2/repository/io/fabric8/openshift-client/4.10.1/openshift-client-4.10.1.jar:/home/rohaan/.m2/repository/org/json/json/20190722/json-20190722.jar:/home/rohaan/.m2/repository/org/slf4j/slf4j-simple/1.7.28/slf4j-simple-1.7.28.jar:/home/rohaan/.m2/repository/org/slf4j/slf4j-api/1.7.5/slf4j-api-1.7.5.jar io.fabric8.NamespacedInformerDemo
May 11, 2020 12:37:57 PM io.fabric8.NamespacedInformerDemo main
INFO: Informer factory initialized.
May 11, 2020 12:37:58 PM io.fabric8.NamespacedInformerDemo main
INFO: Starting all registered informers
[informer-controller-Pod] INFO io.fabric8.kubernetes.client.informers.cache.Controller - informer#Controller: ready to run resync and reflector runnable
[informer-controller-Pod] INFO io.fabric8.kubernetes.client.informers.cache.Reflector - Started ReflectorRunnable watch for class io.fabric8.kubernetes.api.model.Pod
[informer-controller-Pod] WARN io.fabric8.kubernetes.client.informers.cache.Controller - Reflector list-watching job exiting because the thread-pool is shutting down
java.util.concurrent.RejectedExecutionException: Error while starting ReflectorRunnable watch
    at io.fabric8.kubernetes.client.informers.cache.Reflector.listAndWatch(Reflector.java:85)
    at io.fabric8.kubernetes.client.informers.cache.Controller.run(Controller.java:112)
    at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.util.concurrent.RejectedExecutionException: Error while doing ReflectorRunnable list
    at io.fabric8.kubernetes.client.informers.cache.Reflector.getList(Reflector.java:73)
    at io.fabric8.kubernetes.client.informers.cache.Reflector.reListAndSync(Reflector.java:94)
    at io.fabric8.kubernetes.client.informers.cache.Reflector.listAndWatch(Reflector.java:80)
    ... 2 more
Caused by: java.lang.NullPointerException
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.getRootUrl(OperationSupport.java:129)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.getNamespacedUrl(OperationSupport.java:136)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.getNamespacedUrl(OperationSupport.java:147)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:656)
    at io.fabric8.kubernetes.client.informers.SharedInformerFactory$1.list(SharedInformerFactory.java:134)
    at io.fabric8.kubernetes.client.informers.SharedInformerFactory$1.list(SharedInformerFactory.java:127)
    at io.fabric8.kubernetes.client.informers.cache.Reflector.getList(Reflector.java:67)
    ... 4 more

On debugging, I realized that we were losing OperationContext details when we try to pass our own OperationContext. Let me create a PR to fix this.

[0] https://medium.com/@rohaan/introduction-to-fabric8-kubernetes-java-client-informer-api-b945082d69af

With my PR, you should be able to get Namespaced informers. With when I was trying to get informer for one specific resource. I noticed one strange issue, I was getting some unnecessary delete events :

INFO: Starting all registered informers
[informer-controller-Pod] INFO io.fabric8.kubernetes.client.informers.cache.Controller - informer#Controller: ready to run resync and reflector runnable
[informer-controller-Pod] INFO io.fabric8.kubernetes.client.informers.cache.Reflector - Started ReflectorRunnable watch for class io.fabric8.kubernetes.api.model.Pod
[OkHttp https://172.17.0.2:8443/...] INFO io.fabric8.kubernetes.client.informers.cache.ReflectorWatcher - Event received MODIFIED
May 11, 2020 1:41:38 PM io.fabric8.NamespacedInformerDemo$1 onAdd
INFO: Pod testpod got added
May 11, 2020 1:41:49 PM io.fabric8.NamespacedInformerDemo$1 onUpdate
INFO: Pod testpod got updated
May 11, 2020 1:41:50 PM io.fabric8.NamespacedInformerDemo$1 onDelete
INFO: Pod testpod got deleted
May 11, 2020 1:42:20 PM io.fabric8.NamespacedInformerDemo$1 onDelete
INFO: Pod testpod got deleted
May 11, 2020 1:42:49 PM io.fabric8.NamespacedInformerDemo$1 onUpdate
INFO: Pod testpod got updated
May 11, 2020 1:42:50 PM io.fabric8.NamespacedInformerDemo$1 onDelete
INFO: Pod testpod got deleted
May 11, 2020 1:43:20 PM io.fabric8.NamespacedInformerDemo$1 onDelete

Upon debugging I checked that when we query a single resource response is in the form of a single resource not in the form of a list. Hence, Deserialization fails during list step resulting in resource's ObjectMeta being added in the list but with zero items:
https://github.com/fabric8io/kubernetes-client/blob/e8255c3ef6a41cbe27477348dfbce981e4872821/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/informers/SharedInformerFactory.java#L127-L150

I checked client-go's implementation but I'm not sure if they support listing a specific resource. Maybe informers are not meant to list specific resources? Since they are implementing ListerWatcher interface. They are supposed to list and watch only, which is only applicable to lists.

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

@manusa @rohanKanojia has the Watcher been fixed for the issue? We are using watcher to watch a Kube Job and started noticing this issue. Going through this thread looks like SharedInformer is an alternate. Can a SharedInformer be used to watch a Kube Job?

@SunithaR : Are you talking about batch/v1 Job resource? If yes, here is a simple example of it's usage:

try (KubernetesClient client = new DefaultKubernetesClient()) {
    // Get Informer Factory
    SharedInformerFactory sharedInformerFactory = client.informers();

    // Create instance for Job Informer
    SharedIndexInformer<Job> jobSharedIndexInformer = sharedInformerFactory.sharedIndexInformerFor(Job.class, JobList.class,
            5 * 1000L);
    logger.info("Informer factory initialized.");

    // Add Event Handler for actions on all Job events received
    jobSharedIndexInformer.addEventHandler(
            new ResourceEventHandler<Job>() {
                @Override
                public void onAdd(Job job) {
                                     logger.info("Job " + job.getMetadata().getName() + " got added");
                                                                                                       }

                @Override
                public void onUpdate(Job oldJob, Job newJob) {
                    logger.info("Job " + oldJob.getMetadata().getName() + " got updated");
                }

                @Override
                public void onDelete(Job job, boolean deletedFinalStateUnknown) {
                    logger.info("Job " + job.getMetadata().getName() + " got deleted");
                }
            }
    );

    logger.info("Starting all registered informers");
    sharedInformerFactory.startAllRegisteredInformers();

    // Wait for 1 minute
    Thread.sleep(60 * 1000L);
    logger.info("Stopping informers now..");
    sharedInformerFactory.stopAllRegisteredInformers();
}

Thanks @rohanKanojia yes _batch/v1 Job_. Is there a way to watch a single job? Currently we have this
Watch watch = client.batch().jobs().inNamespace(NAMESPACE).withName(jobName) .watch(kubeJobWatcher))
when there is on close we then delete the job
Currently we are seeing onClose called with exception - _KubernetesClientException: too old resource version: 827312380 (827319450)_
However we want to continue to watch it. Is this possible with the - SharedInformer

I think you should be able to do it with something like this:

SharedIndexInformer<Job> jobSharedIndexInformer = sharedInformerFactory.sharedIndexInformerFor(Job.class, JobList.class,
                    new OperationContext().withNamespace(NAMESPACE).withFields(Collections.singletonMap("metadata.name", jobName)),
                    RESYNC_PERIOD);

Thanks @rohanKanojia will try it out. What is the RESYNC_PERIOD?

Informers maintain their own internal cache. A resync plays back all the events held in the informer's internal cache. So after every resync period informer re-queries API server(list+watch) and updates cache. You can have a look at this blog[0] to see how informer is used with resync period. Recently we also added support for avoiding resync when resync period is set to 0.

[0] https://rohaan.medium.com/introduction-to-fabric8-kubernetes-java-client-informer-api-b945082d69af
[1] https://github.com/fabric8io/kubernetes-client/issues/2651

Was this page helpful?
0 / 5 - 0 ratings