What happened:
Plank failed to list pods and proceeded to try to create a pod for every running ProwJob that it failed to list pods for. This resulted in many pod <pod-name> already exists errors.
What you expected to happen:
If Plank fails to list pods for a build cluster, ProwJobs from that build cluster should be ignored for that sync loop.
/help
@cjwagner:
This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
What happened:
Plank failed to list pods and proceeded to try to create a pod for every running ProwJob that it failed to list pods for. This resulted in manypod <pod-name> already existserrors.What you expected to happen:
If Plank fails to list pods for a build cluster, ProwJobs from that build cluster should be ignored for that sync loop./help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign
This is not super trivial to solve, as the whole listing is part of the cache. We could add a check to only create pods when getting them before that returns a 404? The cache not being synced returns a distinct error.
We could add a check to only create pods when getting them before that returns a 404? The cache not being synced returns a distinct error.
That sounds sensible to me.
No longer an issue with the prow-controller-manager.
/close
@matthyx: Closing this issue.
In response to this:
No longer an issue with the
prow-controller-manager.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
This is not super trivial to solve, as the whole listing is part of the cache. We could add a check to only create pods when getting them before that returns a 404? The cache not being synced returns a distinct error.