Is this a BUG REPORT or FEATURE REQUEST? (choose one):
/kind bug
Please provide the following details:
Environment:
minikube version: v0.25.0
OS:
ProductName: Mac OS X
ProductVersion: 10.13.3
BuildVersion: 17D47
VM driver:
"DriverName": "virtualbox",
ISO version
"Boot2DockerURL": "file:///Users/jrobb/.minikube/cache/iso/minikube-v0.23.6.iso",
What happened:
I am writing a Discovery Service for use with EnvoyProxy that uses Kubernetes as a backing store (and minikube for local dev). I call the watch URLs and monitor when new services or endpoints are added to K8S (and likewise for modifications and deletions), subsequently pushing those changes into Envoy.
When Discovery Service first starts, it opens a connection to minikube at /api/v1/namespaces/{namespace}/endpoints?watch=true and begins receiving endpoints (resourceVersion implicitly null, so that I can get a snapshot of the current world). The watch expires after five minutes (I'll dial that value up later on), and, having captured the resourceVersion from the last endpoint received, my code re-issues another five-minute watch to pick up where I left off.
Once the most recent endpoint change is >30 minutes old, my endpoint watcher starts getting an error response back. (I believe that 30 minutes is the amount of time that endpoint events persist before they expire, but this is not my area of expertise. The amount of time may be configurable.) Here's an example of the error:
{
apiVersion: v1
code: 410
details: null
kind: Status
message: too old resource version: 447099 (455412)
metadata: class V1ListMeta {
_continue: null
resourceVersion: null
selfLink: null
}
reason: Gone
status: Failure
}
As a workaround, I've written code to catch the error and restart the watch from the resourceVersion specified in the parens.
What you expected to happen:
The call should not return an error when attempting to use the most recent resourceVersion given by the response to a previous call.
How to reproduce it (as minimally and precisely as possible):
Output of minikube logs (if applicable):
Nothing relevant in the logs.
Anything else do we need to know:
I have not yet verified whether this is a problem on regular Kubernetes, but will report this there as well if it is.
Here are some logs from my app, beginning upon startup. The endpoints are already >30 minutes old. The initial watch runs for five minutes, and then I try to start another with the resource version given by the opening response. That one fails immediately with an error; I capture the new resource version from the error message and start another; it lasts five minutes, and then the problem repeats _even when using the newer resource version from the error message_.
10:47:45.481 EndpointWatcher is active; starting with resource version null.
10:47:45.606 EndpointWatcher ADDED: class V1Endpoints {
apiVersion: v1
kind: Endpoints
metadata: class V1ObjectMeta {
annotations: null
clusterName: null
creationTimestamp: 2018-03-01T17:09:34.000-05:00
deletionGracePeriodSeconds: null
deletionTimestamp: null
finalizers: null
generateName: null
generation: null
initializers: null
labels: {app=reference-code}
name: reference-code
namespace: sandbox
ownerReferences: null
resourceVersion: 457559
selfLink: /api/v1/namespaces/sandbox/endpoints/reference-code
uid: 39a10d6c-1d9d-11e8-a52e-080027b81fb5
}
subsets: [class V1EndpointSubset {
addresses: [class V1EndpointAddress {
hostname: null
ip: 172.17.0.4
nodeName: minikube
targetRef: class V1ObjectReference {
apiVersion: null
fieldPath: null
kind: Pod
name: reference-code-service-77797b9746-fbbkh
namespace: sandbox
resourceVersion: 444729
uid: 829d8dee-1d6c-11e8-a52e-080027b81fb5
}
}]
notReadyAddresses: null
ports: [class V1EndpointPort {
name: http
port: 8291
protocol: TCP
}, class V1EndpointPort {
name: app-https
port: 8290
protocol: TCP
}]
}]
} (resource version = 457559)
10:52:45.459 EndpointWatcher expired.
10:52:46.475 EndpointWatcher is active; starting with resource version 457559.
10:52:46.486 Received error message: `too old resource version: 457559 (494959)`. Jumping forward to new valid resource version as a workaround.
10:52:46.487 EndpointWatcher expired.
10:52:47.492 EndpointWatcher starting.
10:52:47.495 EndpointWatcher is active; starting with resource version 494959.
10:57:47.483 EndpointWatcher expired.
10:57:48.489 EndpointWatcher starting.
10:57:48.494 EndpointWatcher is active; starting with resource version 494959.
10:57:48.495 Received error message: `too old resource version: 494959 (495141)`. Jumping forward to new valid resource version as a workaround.
10:57:48.495 EndpointWatcher expired.
10:57:49.500 EndpointWatcher starting.
10:57:49.503 EndpointWatcher is active; starting with resource version 495141.
10:57:49.504 Received error message: `too old resource version: 495141 (495142)`. Jumping forward to new valid resource version as a workaround.
10:57:49.504 EndpointWatcher expired.
10:57:50.506 EndpointWatcher starting.
10:57:50.510 EndpointWatcher is active; starting with resource version 495142.
11:02:50.503 EndpointWatcher expired.
11:02:51.511 EndpointWatcher starting.
11:02:51.518 EndpointWatcher is active; starting with resource version 495142.
11:02:51.521 Received error message: `too old resource version: 495142 (495325)`. Jumping forward to new valid resource version as a workaround.
11:02:51.522 EndpointWatcher expired.
11:02:52.525 EndpointWatcher starting.
11:02:52.528 EndpointWatcher is active; starting with resource version 495325.
11:07:52.516 EndpointWatcher expired.
11:07:53.521 EndpointWatcher starting.
11:07:53.524 EndpointWatcher is active; starting with resource version 495325.
11:07:53.525 Received error message: `too old resource version: 495325 (495507)`. Jumping forward to new valid resource version as a workaround.
11:07:53.525 EndpointWatcher expired.
11:07:54.529 EndpointWatcher starting.
11:07:54.534 EndpointWatcher is active; starting with resource version 495507.
11:07:54.536 Received error message: `too old resource version: 495507 (495508)`. Jumping forward to new valid resource version as a workaround.
11:07:54.536 EndpointWatcher expired.
11:07:55.539 EndpointWatcher starting.
11:07:55.542 EndpointWatcher is active; starting with resource version 495508.
11:12:55.531 EndpointWatcher expired.
11:12:56.534 EndpointWatcher starting.
11:12:56.538 EndpointWatcher is active; starting with resource version 495508.
11:12:56.541 Received error message: `too old resource version: 495508 (495690)`. Jumping forward to new valid resource version as a workaround.
11:12:56.542 EndpointWatcher expired.
11:12:57.543 EndpointWatcher starting.
11:12:57.546 EndpointWatcher is active; starting with resource version 495690.
11:12:57.546 Received error message: `too old resource version: 495690 (495692)`. Jumping forward to new valid resource version as a workaround.
11:12:57.547 EndpointWatcher expired.
11:12:58.551 EndpointWatcher starting.
11:12:58.554 EndpointWatcher is active; starting with resource version 495692.
11:17:58.543 EndpointWatcher expired.
11:17:59.544 EndpointWatcher starting.
11:17:59.550 EndpointWatcher is active; starting with resource version 495692.
11:17:59.551 Received error message: `too old resource version: 495692 (495874)`. Jumping forward to new valid resource version as a workaround.
11:17:59.554 EndpointWatcher expired.
11:18:00.559 EndpointWatcher starting.
11:18:00.563 EndpointWatcher is active; starting with resource version 495874.
11:18:00.563 Received error message: `too old resource version: 495874 (495875)`. Jumping forward to new valid resource version as a workaround.
11:18:00.564 EndpointWatcher expired.
11:18:01.566 EndpointWatcher starting.
11:18:01.571 EndpointWatcher is active; starting with resource version 495875.
11:23:01.559 EndpointWatcher expired.
11:23:02.564 EndpointWatcher starting.
11:23:02.567 EndpointWatcher is active; starting with resource version 495875.
11:23:02.569 Received error message: `too old resource version: 495875 (496057)`. Jumping forward to new valid resource version as a workaround.
11:23:02.569 EndpointWatcher expired.
11:23:03.570 EndpointWatcher starting.
11:23:03.573 EndpointWatcher is active; starting with resource version 496057.
11:23:03.573 Received error message: `too old resource version: 496057 (496059)`. Jumping forward to new valid resource version as a workaround.
11:23:03.574 EndpointWatcher expired.
11:23:04.578 EndpointWatcher starting.
11:23:04.580 EndpointWatcher is active; starting with resource version 496059.
11:28:04.569 EndpointWatcher expired.
11:28:05.574 EndpointWatcher starting.
11:28:05.578 EndpointWatcher is active; starting with resource version 496059.
11:28:05.579 Received error message: `too old resource version: 496059 (496241)`. Jumping forward to new valid resource version as a workaround.
11:28:05.581 EndpointWatcher expired.
11:28:06.586 EndpointWatcher starting.
11:28:06.588 EndpointWatcher is active; starting with resource version 496241.
11:28:06.618 Received error message: `too old resource version: 496241 (496242)`. Jumping forward to new valid resource version as a workaround.
11:28:06.619 EndpointWatcher expired.
11:28:07.621 EndpointWatcher starting.
11:28:07.627 EndpointWatcher is active; starting with resource version 496242.
11:33:07.612 EndpointWatcher expired.
11:33:08.613 EndpointWatcher starting.
11:33:08.615 EndpointWatcher is active; starting with resource version 496242.
11:33:08.619 Received error message: `too old resource version: 496242 (496426)`. Jumping forward to new valid resource version as a workaround.
11:33:08.620 EndpointWatcher expired.
11:33:09.625 EndpointWatcher starting.
11:33:09.629 EndpointWatcher is active; starting with resource version 496426.
~Note that I have an equivalent watcher for services (/api/v1/namespaces/{namespace}/services) which does not have this problem. It may not be just endpoints, but it's definitely not universal.~
EDIT: I have now seen the same issue appear on my service endpoint watcher. The problem may in fact be universal.
Here's one more log snippet: I deleted and re-added the endpoint via kubectl. My app received it, and then happily did its thing for 30 minutes before starting to error again. This time, the first error provided a resourceVersion that was still too old, and then specified a new version (incremented by one), which worked.
11:43:23.991 EndpointWatcher ADDED: class V1Endpoints {
apiVersion: v1
kind: Endpoints
metadata: class V1ObjectMeta {
annotations: null
clusterName: null
creationTimestamp: 2018-03-02T11:43:24.000-05:00
deletionGracePeriodSeconds: null
deletionTimestamp: null
finalizers: null
generateName: null
generation: null
initializers: null
labels: {app=reference-code}
name: reference-code
namespace: sandbox
ownerReferences: null
resourceVersion: 498020
selfLink: /api/v1/namespaces/sandbox/endpoints/reference-code
uid: d2fa9851-1e38-11e8-a52e-080027b81fb5
}
subsets: [class V1EndpointSubset {
addresses: [class V1EndpointAddress {
hostname: null
ip: 172.17.0.4
nodeName: minikube
targetRef: class V1ObjectReference {
apiVersion: null
fieldPath: null
kind: Pod
name: reference-code-service-77797b9746-fbbkh
namespace: sandbox
resourceVersion: 444729
uid: 829d8dee-1d6c-11e8-a52e-080027b81fb5
}
}]
notReadyAddresses: null
ports: [class V1EndpointPort {
name: http
port: 8291
protocol: TCP
}, class V1EndpointPort {
name: app-https
port: 8290
protocol: TCP
}]
}]
} (resource version = 498020)
11:48:15.681 EndpointWatcher expired.
11:48:16.684 EndpointWatcher starting.
11:48:16.686 EndpointWatcher is active; starting with resource version 498020.
11:53:16.676 EndpointWatcher expired.
11:53:17.681 EndpointWatcher starting.
11:53:17.684 EndpointWatcher is active; starting with resource version 498020.
11:58:17.672 EndpointWatcher expired.
11:58:18.677 EndpointWatcher starting.
11:58:18.682 EndpointWatcher is active; starting with resource version 498020.
12:03:18.725 EndpointWatcher expired.
12:03:19.730 EndpointWatcher starting.
12:03:19.733 EndpointWatcher is active; starting with resource version 498020.
12:08:19.866 EndpointWatcher expired.
12:08:20.873 EndpointWatcher starting.
12:08:20.876 EndpointWatcher is active; starting with resource version 498020.
12:13:20.865 EndpointWatcher expired.
12:13:21.869 EndpointWatcher starting.
12:13:21.872 EndpointWatcher is active; starting with resource version 498020.
12:18:21.861 EndpointWatcher expired.
12:18:22.866 EndpointWatcher starting.
12:18:22.870 EndpointWatcher is active; starting with resource version 498020.
12:18:22.886 Received error message: `too old resource version: 498020 (498079)`. Jumping forward to new valid resource version as a workaround.
12:18:22.887 EndpointWatcher expired.
12:18:23.891 EndpointWatcher starting.
12:18:23.894 EndpointWatcher is active; starting with resource version 498079.
12:18:23.895 Received error message: `too old resource version: 498079 (498080)`. Jumping forward to new valid resource version as a workaround.
12:18:23.895 EndpointWatcher expired.
12:18:24.900 EndpointWatcher starting.
12:18:24.903 EndpointWatcher is active; starting with resource version 498080.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
We've decided to go a different route for our service mesh implementation, and I no longer care about this issue. However, to the best of my knowledge, it is still a problem. As such, I'm not going to close this proactively. If someone else thinks it makes more sense to close it now, feel free, but I'm not going to be the one to make that call.
I'm seeing same issue. I open watch with timeout of N minutes and receive current view of kube and I store resourceVersion to reuse after watch closes. Then after reopening watch I receive
2018-08-16 10:19:14.831 INFO 13005 --- [serverListener6] p.w.s.k.apiserver.ApiServerClient : Called process watch for type V1Deployment - watch started
2018-08-16 10:19:14.832 TRACE 13005 --- [serverListener6] p.w.s.k.apiserver.ApiServerClient : type: ERROR
2018-08-16 10:19:14.832 TRACE 13005 --- [serverListener6] p.w.s.k.apiserver.ApiServerClient : status: class V1Status {
apiVersion: v1
code: 410
details: null
kind: Status
message: too old resource version: 3523108 (10021631)
metadata: class V1ListMeta {
_continue: null
resourceVersion: null
selfLink: null
}
reason: Gone
status: Failure
}
2018-08-16 10:19:14.832 TRACE 13005 --- [serverListener6] p.w.s.k.apiserver.ApiServerClient : object: null
I can reopen watch without resourceVersion, but then I'm receiving all elements, and I don't see the way to fetch changes since previously watched state.
Issue observed using vault operator
ERROR: logging before flag.Parse: W0824 12:41:32.769006 1 reflector.go:334] github.com/coreos-inc/vault-operator/pkg/operator/controller.go:35: watch of *v1alpha1.VaultService ended with: The resourceVersion for the provided watch is too old.
I'm light on details but wanted to chime in to add validation to the issue.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen
Also seeing this issue, and would either like a fix, or a way to watch for events newer than the last collected record without throwing an error and without retrieving all events over again.
@drewhemm: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
Also seeing this issue, and would either like a fix, or a way to watch for events newer than the last collected record without throwing an error and without retrieving all events over again.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I was very new to Kubernetes and minikube when I reported this. I believe this is actually an issue in Kubernetes, and not in minikube, anyway.
I'm no longer pursuing the project that led me to this bug, so I have no stake in this issue. @drewhemm, I'd recommend you open your own issue -- and make sure it's in the right project, because I'm not sure this one belongs here.
Yeah, I realised after posting that this is the minikube repo and not k8s itself - my bad! It's definitely not a minikube issue as I have run into it with non-minikube k8s.
Most helpful comment
I'm seeing same issue. I open watch with timeout of N minutes and receive current view of kube and I store resourceVersion to reuse after watch closes. Then after reopening watch I receive
I can reopen watch without resourceVersion, but then I'm receiving all elements, and I don't see the way to fetch changes since previously watched state.