Test-infra: Plank errors listing pods cause plank to try to recreate pods for all running ProwJobs.

Created on 12 Oct 2020  路  6Comments  路  Source: kubernetes/test-infra

What happened:
Plank failed to list pods and proceeded to try to create a pod for every running ProwJob that it failed to list pods for. This resulted in many pod <pod-name> already exists errors.

What you expected to happen:
If Plank fails to list pods for a build cluster, ProwJobs from that build cluster should be ignored for that sync loop.

/help

areprow help wanted kinbug

Most helpful comment

This is not super trivial to solve, as the whole listing is part of the cache. We could add a check to only create pods when getting them before that returns a 404? The cache not being synced returns a distinct error.

All 6 comments

@cjwagner:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

What happened:
Plank failed to list pods and proceeded to try to create a pod for every running ProwJob that it failed to list pods for. This resulted in many pod <pod-name> already exists errors.

What you expected to happen:
If Plank fails to list pods for a build cluster, ProwJobs from that build cluster should be ignored for that sync loop.

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/assign

This is not super trivial to solve, as the whole listing is part of the cache. We could add a check to only create pods when getting them before that returns a 404? The cache not being synced returns a distinct error.

We could add a check to only create pods when getting them before that returns a 404? The cache not being synced returns a distinct error.

That sounds sensible to me.

No longer an issue with the prow-controller-manager.
/close

@matthyx: Closing this issue.

In response to this:

No longer an issue with the prow-controller-manager.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings