Test-infra: Support run_after_failure jobs

Created on 7 May 2018  Â·  14Comments  Â·  Source: kubernetes/test-infra

Brought up during kubecon

@krzyzacy @cjwagner

/area prow
/kind feature

areprow help wanted kinfeature lifecyclrotten

Most helpful comment

I don't think that is the consensus. Like Ben is saying, we really need something more general than tacking on another triggering mechanism.
run_after_success is already a problem because it has to be specifically handled in a number of places and introduces dependencies between jobs. Here are some examples:

I'm worried that adding a new triggering mechanism without generalizing and exposing a better interface to Prow components will result in more technical debt and bugs like these.
This could probably use a design discussion breakout session?

All 14 comments

/help

@kargakis:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

We should think through a clear boundary for the logic that we want to support here, these sorts of logic things cause the majority of correctness bugs in prow.

simply run_after, and some condition like succeed?

If we're going to do that, I think something a tad more general might be
worth thinking about. The coupling of our triggering to job definitions is
a bit awkward right now. We also have a couple of mutually exclusive
prowjob fields already..

On Thu, May 31, 2018 at 11:17 AM Sen Lu notifications@github.com wrote:

simply run_after, and some condition like succeed?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/test-infra/issues/7951#issuecomment-393625845,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA4Bq268WAG-imIo33JXZy0JfLmJSqHCks5t4DOjgaJpZM4T05Hs
.

yeah, also that :-)

While using prow for Prometheus, I really missed this feature.
Also discussed a bit on slack with @cjwagner & @fejta.
If there is a consensus, I would like to implement this.
It seems just need to add a similar feature like runAfterSuccess to kube.PodFailed & kube.PodPending

I don't think that is the consensus. Like Ben is saying, we really need something more general than tacking on another triggering mechanism.
run_after_success is already a problem because it has to be specifically handled in a number of places and introduces dependencies between jobs. Here are some examples:

I'm worried that adding a new triggering mechanism without generalizing and exposing a better interface to Prow components will result in more technical debt and bugs like these.
This could probably use a design discussion breakout session?

Managing run_after_success jobs has indeed been very problematic. Today I was thinking of splitting creation of run_after_success jobs into its own service that has some advantages over the current state of things. Namely, prow controllers (plank, jenkins operator) are going to be simplified:

  • we can remove the github client entirely once reporting is its own service
  • we trim rbac for agent controllers down since they don't need access to create prowjobs anymore
  • no need to extend agent controllers for handling run_after_whatever anymore and less code to maintain

The extra service can also handle creation of run_after_success jobs for tide I think which means that tide is also going to be slightly simplified?

I like this idea a lot.

I'd love to see a refactor someday (not necessarily worth the effort, but
it could be nice...) where we manage to totally decouple triggering from
job definitions so anyone can more easily integrate triggers for say, "run
if a release is tagged on github", "run a downgrade job if my cluster is
unresponsive", etc...

On Thu, Jul 12, 2018 at 9:45 AM Michalis Kargakis notifications@github.com
wrote:

Managing run_after_success jobs has indeed been very problematic. Today I
was thinking of splitting creation of run_after_success jobs into its own
service that has some advantages over the current state of things. Namely,
prow controllers (plank, jenkins operator) are going to be simplified:

  • we can remove the github client entirely once reporting is its own
    service
  • we trim rbac for agent controllers down since they don't need access
    to create prowjobs anymore
  • no need to extend agent controllers for handling run_after_whatever
    anymore and less code to maintain

The extra service can also handle creation of run_after_success jobs for
tide I think which means that tide is also going to be slightly simplified?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/test-infra/issues/7951#issuecomment-404576274,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA4Bq6FNMnB6ukV8wa0e36sZB5i4tXB1ks5uF30vgaJpZM4T05Hs
.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lavalamp picture lavalamp  Â·  3Comments

fejta picture fejta  Â·  4Comments

sjenning picture sjenning  Â·  4Comments

cjwagner picture cjwagner  Â·  3Comments

MrHohn picture MrHohn  Â·  4Comments