Pipeline: Design: Failure Strategy for TaskRuns in a PipelineRun

Created on 4 Dec 2019  路  40Comments  路  Source: tektoncd/pipeline

The goal is to come up with a design to handle failing task runs in a pipelinerun. Today, we simply fail the entire pipelinerun if a single taskrun fails.

Current Status

Summary in this comment: https://github.com/tektoncd/pipeline/issues/1684#issuecomment-611016087

Ideas

Here are a couple of ideas from @sbwsg and me:

  1. Introduce an errorStrategy field in PipelineTasks similar to the idea in #1573
  2. The errorStrategy could be under the runAfter field.
  3. To start off, we could have two error strategies : FailPipeline which is the default for today, and ContinuePipeline which will continue running the whole pipeline
  4. Later on, we could add branch based error strategies e.g. fail one one branch of the graph but continue running the remaining pipelines

Additional Info

@sbwsg has some strawperson YAMLs:
RunNextTasks for an integration test cleanup scenario
FailPipeline(default) for a unit test failing before a deploy task

Use Cases

  • Unit test fails but integration still run
  • Rollbacks for CD e.g. Canaries - rollback if canary analysis fails
  • Cleanup task if integration test fails
  • Always run a step/task at the end e.g. to Report results
  • Run on conditional failures #1023

Related Issues

The Epic #1376 has all the related issues

areapi areroadmap design

Most helpful comment

Status update:
We are currently punting on the runOn syntax. Instead, we are:

  1. Implementing a pipeline level finally field that always runs some tasks at the end of a pipeline. (doc

We are also considering adding the following (discussions ongoing in the API working group):

  1. A pipeline level onError/except that runs a Task if any task in a pipeline fails
  2. A finally/onError Step within a Task or within a PipelineTask
  3. Allowing a pipeline to call other pipelines aka nested/sub pipelines for complex branches

Some discussion here

All 40 comments

Thanks for getting this started!!!

    - name: integration
      taskRef:
        name: run-integration-tests
      runAfter: uts
      errorStrategy: RunNextTasks # allow cleanup to occur
    - name: cleanup
      taskRef:
        name: cleanup-integration-test-junk
      runAfter: integration

A concern about this is for these two use cases:

  • Always run a step/task at the end e.g. to Report results
  • The example above: cleanup after a integration test

It seems like it doesn't work if there is one more Task in the Pipeline, e.g. (totally contrived?) but something like:

    - name: integration
      runAfter: uts
      errorStrategy: RunNextTasks # allow cleanup to occur
    - name: integration2 # pretend there was another set of tests?
      runAfter: uts
      errorStrategy: RunNextTasks # allow cleanup to occur
    - name: cleanup
      taskRef:
        name: cleanup-integration-test-junk
      runAfter: integration

RunNextTasks for the above set of Tasks means that if integration fails, integration2 will also run even tho what we really want is to jump straight to cleanup.

A couple different ideas:

  • An explicit finally clause you can put on a Task which forces it to run at the end (naiive??)
  • An errorStrategy that lets you jump straight to a branch in the pipeline? e.g. like "resume from cleanup or something

(I do think an errorStrategy field makes a lot of sense! I think there are lots of potential kinds of strategies we might want to express - e.g. failing the entire Pipeline immediately vs. allowing any other Tasks in flight to finish)

RunNextTasks for the above set of Tasks means that if integration fails, integration2 will also run even tho what we really want is to jump straight to cleanup.

One workaround might be that integration2 is a conditional task that only runs if the previous step is successful? That being said, if this is a common patter, I think a separate errorStrategy for jumping to another task might be a simpler way to do this.

Defining errorStrategy has two sides to it (1) dictate the behavior of next tasks/steps in queue e.g. RunNextTasks (from the example above) (2) dictate its own behavior e.g. IgnorePriorTaskErrors (based on @sbwsg's step PR).

I am biased towards (2).

Defining these error strategies based on my understanding so far:

  • SkipOnPriorStepErrors: Within a task, halt the execution of a step if a prior step has failed. This strategy is only scoped to a Task.
  • IgonrePriorStepErrors: Within a task, continue execution of a step even if a prior step has failed. Just like SkipOnPriorStepErrors, this strategy is only scoped to a Task.
  • SkipOnPriorTaskErrors: Within a pipeline, halt the execution of a task if a prior task has failed. This gets little tricky with conditions but here the task is marked as failed even when the associated condition fails and hence the task is not even executed. This same strategy can be applied to a pipeline having two groups of tasks (A,B, and C) and (X, Y, and Z). For example, task A has a conditional execution and task B and C should be executed if task A succeeds. In this scenario, Task A would refer to a condition using conditionRef without any errorStrategy and Task B and C would have errorStrategy set to SkipOnPriorTaskErrors. Now, in case when Task A executes successfully, Task B is the next in queue and depending on Task B's execution result, Task C will be executed. In case when Task A fails, Task B will be skipped since its errorStrategy is marked to skip its execution if prior Task (A) failed, and Task C will be skipped too since Task B was never executed (pipeline marked Task B as failure?)
  • IgnorePriorTaskErrors: Within a pipeline, continue the execution of a task even if prior task has failed.

Also, here we have to be explicit and define what Next and Prior means to us (Pipeline and Task), Next all next tasks/steps or just the next, Prior all previous tasks/steps or just one previous ...

_I have collected my thoughts here based on talking to @dibyom on slack, step PR from @sbwsg, comments from issues themselves and working group recordings._

@pritidesai Thanks for writing this up....I think your examples for error strategies are for tasks defining their own behavior and not for the subsequent tasks. With that approach how do we model the scenario that @bobcatfish mentioned above i.e. we have 3 tasks running sequentially A -> B -> C in the happy case but when A fails, we want to jump to C

Also, another idea @sbwsg had was adding the errorStrategies in the runAfter or from fields:

spec:
  tasks:
    # ... other tasks ...
    - name: cleanup
      taskRef:
        name: cleanup-integration-test-junk
      runAfter:
        task: integration-tests
        errorStrategy: Continue # or Skip / Fail

how about modeling the scenario that @bobcatfish mentioned above with:

    - name: integration
      runAfter: uts
    - name: integration2 # pretend there was another set of tests?
      runAfter: uts
      errorStrategy: SkipOnPriorTaskErrors # do not execute if previous integration tests fail
    - name: cleanup
      taskRef:
        name: cleanup-integration-test-junk
      runAfter: integration
      errorStrategy: IgnorePriorTaskErrors

woo, I like errorStrategies in runAfter and from, let me give it a thought 馃...

Another alternative to consider: Go's defer and recover keywords model quite similar behaviour to what we're discussing here. I can imagine DeferredPipelineTask and RecoveryPipelineTask types that perform work regardless of prior outcome (Deferred) and in response to a task's failure (Recovery). Examples:

DeferredPipelineTask

# In this example, a "deferred" task is used to clean up environment after integration tests.
# Deferred tasks run regardless of outcome in prior tasks
spec:
  tasks:
    - name: integration-tests # can fail!
      taskRef:
        name: run-some-tests
    - name: cleanup-integration-environment
      deferred: true # will run regardless of failure in integration-tests. Will not run if integration-tests is never run (i.e. because a task prior to integration-tests failed)
      runAfter: integration-tests
      taskRef:
        name: delete-integration-namespaces

RecoveryPipelineTask

# In this example, a "recovery" task is used to handle errors during deployment to staging.
# Recovery tasks only execute if the task they runAfter fails
spec:
  tasks:
    - name: deploy-to-staging
      taskRef:
        name: deploy-to-k8s
    - name: rollback-staging
      recovery: true # will run only if deploy-to-staging fails
      runAfter: deploy-to-staging
      taskRef:
        name: rollback-deployment

Two further tweaks to this idea: First, a DeferredPipelineTask that doesn't declare a runAfter will always execute at the end of the pipeline. This is the "finally" clause equivalent. Second, a RecoveryPipelineTask with no runAfter will handle any error case in the pipeline. This is the equivalent of a giant catch { } block wrapped around your pipeline. We could even pass the error to the RecoveryPipelineTask as a PipelineResource or something to help it with reporting.

Also worth keeping in mind that while a DeferredPipelineTask or RecoverPipelineTask needs to be explicitly marked as such, I think they would also be allowed to be "roots" of their own trees. In other words another task could be runAfter a DeferredPipelineTask but does not need to include deferred: true. Similarly for recovery, a task could be runAfter a RecoveryPipelineTask but does not need to include recovery: true. In effect this allows entire branches of the execution DAG to be run only in the event of failure or for the purposes of cleanup etc.

So I think this would cover the following scenarios:

  1. Execute work after a specific task in the pipeline succeeds OR fails

    • DeferredPipelineTask with runAfter

    • Use cases: cleanup integration environment, upload unit test results

  2. Recover from failed tasks by jumping to a different branch

    • RecoveryPipelineTask with runAfter

    • Use case: roll back bad deployment

  3. Perform work at the end of a pipeline regardless of outcome

    • DeferredPipelineTask without runAfter

    • Use case: any naive finally scenario ("naive" here means it doesn't need specific knowledge of what ran or didn't run)

  4. Handle any error in the pipeline that occurs with a fallback task

    • RecoveryPipelineTask without runAfter

    • Use case: any naive catch { } scenario (example i can think of: send a message to slack that a pipeline has failed)

The deferred and recovery keys would need to be either-or in the yaml. I don't think you can support both recovery: true and deferred: true on the same task.

What I most like about this approach is that:

  1. it doesn't mess with runAfter, so avoids some possibly tricky schema changes in the yaml (particularly since from behaviour may also need to be modified to keep it in line with runAfter)
  2. it maintains the property that the "edge" in the graph is defined (with runAfter/from) in the same PipelineTask that the error handling or deferral behaviour is described
  3. it provides flexible catch-all handling to satisfy any jump / finally / catch requirements.
  4. It doesn't rely on tricky-to-remember constants like "IgnorePriorErrors".
  5. Finally (pun intended) what I like about this is that it drops the word "errorStrategy" completely. I think there are very legitimate use cases for these kinds of handlers that don't involve errors or failures or anything negative at all. It's just branching the DAG in response to specific outcomes of the graph nodes.

Another phrasing of the above approach that @dibyom and I discussed would be to use keywords for defer / recover / skip (the default):

spec:
  tasks:
    - name: deploy-to-staging
      taskRef:
        name: deploy-to-k8s
    - name: rollback-staging
      runAfter: deploy-to-staging
      strategy: Recover # or Defer or Skip
      taskRef:
        name: rollback-deployment

This ^ says that rollback-staging PipelineTask will only execute if deploy-to-staging fails (it "Recovers" from deploy-to-staging's failure).

Having thought about it for a couple days I'm still pretty sure we could describe all of the use cases we've talked about so far with just these three strategies.

Another alternative to consider: Go's defer and recover keywords model quite similar behaviour to what we're discussing here. I can imagine DeferredPipelineTask and RecoveryPipelineTask types that perform work regardless of prior outcome (Deferred) and in response to a task's failure (Recovery). Examples:

DeferredPipelineTask

# In this example, a "deferred" task is used to clean up environment after integration tests.
# Deferred tasks run regardless of outcome in prior tasks
spec:
  tasks:
    - name: integration-tests # can fail!
      taskRef:
        name: run-some-tests
    - name: cleanup-integration-environment
      deferred: true # will run regardless of failure in integration-tests. Will not run if integration-tests is never run (i.e. because a task prior to integration-tests failed)
      runAfter: integration-tests
      taskRef:
        name: delete-integration-namespaces

thanks @sbwsg, defer, recover, and skip sounds great but at the same time will need little bit of clarification which can be provided with docs and examples.

Also, DeferPipelineTask could be interpreted as always executed i.e. cleanup-integration-environment is always run irrespective of the outcome of integration-tests or any previous tasks if there are any. I am trying to justify will not run because integration-tests never run, in Go, understanding of defer statement is, it pushes a function call onto a list and that list of calls are executed after the surrounding function returns. How would this impact on tasks defined after integration-tests?

RecoveryPipelineTask

# In this example, a "recovery" task is used to handle errors during deployment to staging.
# Recovery tasks only execute if the task they runAfter fails
spec:
  tasks:
    - name: deploy-to-staging
      taskRef:
        name: deploy-to-k8s
    - name: rollback-staging
      recovery: true # will run only if deploy-to-staging fails
      runAfter: deploy-to-staging
      taskRef:
        name: rollback-deployment

Two further tweaks to this idea: First, a DeferredPipelineTask that doesn't declare a runAfter will _always_ execute at the end of the pipeline. This is the "finally" clause equivalent. Second, a RecoveryPipelineTask with no runAfter will handle _any_ error case in the pipeline. This is the equivalent of a giant catch { } block wrapped around your pipeline. We could even pass the error to the RecoveryPipelineTask as a PipelineResource or something to help it with reporting.

Also worth keeping in mind that while a DeferredPipelineTask or RecoverPipelineTask needs to be explicitly marked as such, I think they would also be allowed to be "roots" of their own trees. In other words another task could be runAfter a DeferredPipelineTask but does _not_ need to include deferred: true. Similarly for recovery, a task could be runAfter a RecoveryPipelineTask but does _not_ need to include recovery: true. In effect this allows entire branches of the execution DAG to be run only in the event of failure or for the purposes of cleanup etc.

So I think this would cover the following scenarios:

  1. Execute work after a specific task in the pipeline succeeds OR fails

    • DeferredPipelineTask with runAfter
    • Use cases: cleanup integration environment, upload unit test results
  2. Recover from failed tasks by jumping to a different branch

    • RecoveryPipelineTask with runAfter
    • Use case: roll back bad deployment
  3. Perform work at the end of a pipeline regardless of outcome

    • DeferredPipelineTask without runAfter
    • Use case: any naive finally scenario ("naive" here means it doesn't need specific knowledge of what ran or didn't run)
  4. Handle any error in the pipeline that occurs with a fallback task

    • RecoveryPipelineTask without runAfter
    • Use case: any naive catch { } scenario (example i can think of: send a message to slack that a pipeline has failed)

The deferred and recovery keys would need to be either-or in the yaml. I don't think you can support both recovery: true and deferred: true on the same task.

What I most like about this approach is that:

  1. it doesn't mess with runAfter, so avoids some possibly tricky schema changes in the yaml (particularly since from behaviour may also need to be modified to keep it in line with runAfter)
  2. it maintains the property that the "edge" in the graph is defined (with runAfter/from) in the same PipelineTask that the error handling or deferral behaviour is described
  3. it provides flexible catch-all handling to satisfy any jump / finally / catch requirements.
  4. It doesn't rely on tricky-to-remember constants like "IgnorePriorErrors".
  5. _Finally_ (pun intended) what I like about this is that it drops the word "errorStrategy" completely. I think there are very legitimate use cases for these kinds of handlers that don't involve errors or failures or anything negative at all. It's just branching the DAG in response to specific outcomes of the graph nodes.

Overall I like the idea of defining strategy with one of defer, recover, and skip.

defer, recover, and skip sounds great but at the same time will need little bit of clarification

I agree, the keywords don't make much sense in isolation. How about "AlwaysRun" (defer), "RunOnFail" (recover), and "RunOnSuccess" (Tekton's current behaviour)?

I am trying to justify _will not run because integration-tests never run_, in Go, understanding of defer statement is, it pushes a function call onto a list and that list of calls are executed after the surrounding function returns. How would this impact on tasks defined after integration-tests?

I think the analogy here with go's defer breaks down. I somewhat regret drawing the comparison. In my mind the strategy only describes a single relationship between a task and its "parents" (those it declares with "runAfter" or "from"). iow given the following tasks:

- name: Task A
- name: Task B
  runAfter:
    - Task A
  strategy: RunOnFail # Task B only executes if Task A errors out
- name: Task C
  runAfter:
    - Task B
  strategy: AlwaysRun

I expect the following behaviour:

  1. Task A runs
  2. Task B will only run if Task A fails.
  3. Task C will only run if Task B runs.

    • Because Task C declares "AlwaysRun" with "runAfter: Task B".

    • If Task B never ran (Task A succeeded and B is only RunOnFail) then Task C never runs.

So I think that's another reason why using the go keywords probably doesn't make sense after all - they don't map perfectly on to Tekton's meanings. But AlwaysRun / RunOnFail / RunOnSuccess are a bit clearer maybe, especially when we consider them paired with runAfter.

Hrm. AlwaysRun isn't that great for the Finally case - it doesn't make as much sense. Deferred may be better after all. Here's a comparison:

AlwaysRun

# This pipeline pings a URL when the pipeline finishes.
# This ping happens regardless of the pipeline's outcome.
apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
  name: test-pipeline
spec:
  tasks:
    - name: ping-url-on-complete
      taskRef:
        name: send-ping
      strategy: AlwaysRun # AlwaysRun without a runAfter. Executes at end of pipeline.
    - name: uts
      taskRef:
        name: run-unit-tests
    - name: integration
      taskRef:
        name: run-integration-tests
      runAfter: uts

Deferred

# This pipeline pings a URL when the pipeline finishes.
# This ping happens regardless of the pipeline's outcome.
apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
  name: test-pipeline
spec:
  tasks:
    - name: ping-url-on-complete
      taskRef:
        name: send-ping
      strategy: Deferred # Deferred without a runAfter. Executes at end of pipeline.
    - name: uts
      taskRef:
        name: run-unit-tests
    - name: integration
      taskRef:
        name: run-integration-tests
      runAfter: uts

Yes I agree, AlwaysRun is misleading for finally use case how about introducing one more strategy called defer or finally 馃 in addition to AlwaysRun RunOnFail and RunOnSuccess?

I kind of ran these strategies against the pipeline example you have and it looks something like this:

apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
  name: ignore-errors-pipeline
spec:
  tasks:
    - name: uts
      taskRef:
        name: run-unit-tests
    - name: integration
      taskRef:
        name: run-integration-tests
      strategy: AlwayRun # irrespective of unit test results, run integration tests
    - name: cleanup
      taskRef:
        name: cleanup-integration-test-junk
      runAfter: integration
      strategy: AlwaysRun # since cleanup is grouped with integration test, AlwaysRun strategy would fit here otherwise we would have to go for defer.
apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
  name: ignore-errors-pipeline
spec:
  tasks:
    - name: uts
      taskRef:
        name: run-unit-tests
    - name: deploy
      taskRef:
        name: deploy-staging
      runAfter: uts
      strategy: RunOnSuccess
    - name: integration
      taskRef:
        name: run-integration-tests
      runAfter: uts
      strategy: RunOnSuccess # Run if unit tests succeeds 
    - name: cleanup
      taskRef:
        name: cleanup-integration-test-junk
      runAfter: integration
      strategy: AlwaysRun

Adding one more use case to build Javascript application and/or Java application depending on the runtime of an application:

Using strategy here to make sure the task and pipelinerun doesnt report failure if condition fails and the chain of tasks doesnt get executed.

apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
  name: pipeline-for-various-runtimes
spec:
  tasks:
# build application if the source code is written in NodeJS
# Run tasks in order (1) install dependencies (2) build zip file and (3) build an image
    - name: install-npm-packages
      taskRef:
        name: task-install-npm-packages
      conditions:
        - conditionRef: is-nodejs-runtime
    - name: build-archive
      taskRef:
        name: task-build-archive
      runAfter: install-npm-packages
      strategy: RunOnSuccess  
    - name: build-nodejs-app-image
      taskRef:
        name: build-image
      runAfter: build-archive  
      strategy: RunOnSuccess
# build application if the source code is written in Java
# Run tasks in order (1) Create Jar with Maven (2) Build runtime with Maven (3) Embed function into runtime (4) Build an image
    - name: create-jar-with-maven
      taskRef:
        name: task-create-jar-with-maven
      conditions:
        - conditionRef: is-java-runtime
    - name: build-runtime-with-gradle
      taskRef:
        name: task-build-runtime-with-gradle
        runAfter: create-jar-with-maven
        strategy: RunOnSuccess
    - name: finalize-runtime-with-function
      taskRef:
        name: task-finalize-runtime-with-function
        runAfter: build-runtime-with-gradle
        strategy: RunOnSucess
    - name: build-java-app-image
      taskRef:
        name: build-image
      runAfter: finalize-runtime-with-function 
      strategy: RunOnSuccess  

Another slightly different case:

For things like updating GitHub status notifications it would be nice if we could do something like the following...admittedly this is a bit repetitive, but passing the "success" or "failure" of a task might work with the "recover" strategy mentioned earlier, which would mean that after each task, somehow it'd use the success/failure of the previous task to update the GitHub status appropriately.

Updating these kinds of statuses would be really useful if you want your pipeline to determine whether or not a commit can be merged (if you're not familiar with these, you can require specific contexts to be successful before a PR can be merged).

This also adds a runAfter pipeline-scoped taskRef, which could do the cleanup in a "Go defer" way, i.e. always after the pipeline has ended, irrespective of how what caused it to end.

The example below would trigger two parallel executions (lint and tests), which would report in their status to GitHub.

apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
  name: pullrequest-pipeline
spec:
  runAfter:
    taskRef: cleanup-post-pullrequest
  tasks:
    - name: start-github-ci-status
      taskRef:
        name: update-github-status
        params:
        - name: STATUS
          value: pending
        - name: CONTEXT
          value: ci-tests
        - name: COMMIT_SHA
          value: $(inputs.params.commit_sha)
    - name: run-tests
      taskRef:
        name: golang-test
      errorStrategy:
        taskRef: update-commit-status
        params:
        - name: STATUS
          value: failed
        - name: CONTEXT
          value: ci-tests
        - name: COMMIT_SHA
          value: $(inputs.params.commit_sha)
    - name: mark-github-ci-status-success
      runAfter:
        - run-tests
      taskRef:
        name: update-github-status
        params:
        - name: STATUS
          value: success
        - name: CONTEXT
          value: ci-tests
        - name: COMMIT_SHA
          value: $(inputs.params.commit_sha)
    # repeat pending for ci-lint context
    - name: run-lint
      taskRef:
        name: golangci-lint
      errorStrategy:
        taskRef: update-commit-status
        params:
        - name: STATUS
          value: failed
        - name: CONTEXT
          value: ci-lint
        - name: COMMIT_SHA
          value: $(inputs.params.commit_sha)
    # repeat success for ci-lint context

One specific use case that I don't think has been explicitly mentioned above is in a fan-in/out scenario.

For example, if my pipeline is

apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
  name: sharded-tests
spec:
  tasks:
    - name: pre-work
      taskRef:
        name: pre-work-step
    - name: run-tests-shard-1
      taskRef:
        name: golang-test
      params:
        - name: SHARD_SPEC
          value: 1
      runAfter: ["pre-work"]
    - name: run-tests-shard-2
      taskRef:
        name: golang-test
      params:
        - name: SHARD_SPEC
          value: 2
      runAfter: ["pre-work"]
    - name: upload-test-results
      taskRef:
        name: upload-test-results-step
      runAfter: ["run-tests-shard-1, run-tests-shard-2"]

Here, I would want to always run the upload-test-results task regardless of whether 0, 1, or both of the tasks preceding it failed.

To me, this reads a lot like conditional execution but more like, conditional failure. Perhaps, this could be served as an extension to the conditions that already exist. If you wanted to "always execute B after A" your condition could simply always return true to override the default behavior of "execute B after A if A is successful"

@pritidesai @bigkevmcd @pierretasci Thanks a lot for adding such detailed use cases! Very very helpful :pray:

  • @pritidesai For your use case -- the current behavior for conditionals is that if a task is skipped, it dependents (identified using the runAfter and from fields) are automatically skipped. The overall pipelinerun status will be determined from the status of the non-skipped tasks. And the default and only strategy today is the RunOnSuccess.
    Though I guess there could be strategies such as RunOnSuccessOrSkip or RunAlways which can be combined with conditionals for more complex pipelines.

  • @bigkevmcd Updating status is definitely a very important use case:

    • top level runAfter - this is the pipeline level finally use case. It seems like we'd have to add something like this. The alternative would be to have one task that has runAfters set so that it runs after all other tasks and a strategy set to RunAlways. This can be unwieldy since anytime you add a new Task to the pipeline, you'd have to manually make sure that the task is still that last thing that executes.
    • errorStrategy containing a taskRef - this is interesting! And in some ways more descriptive than adding a generic task with a runAfter and a errorStategy: RunOnFailure. Are there other benefits? One thing I like about keeping the taskRefs separate is that then we can have multiple tasks that can run/be chained together (e.g. you can have both a cleanup-test-env task as well as a update-github-task that runs when the test fails

    • On passing status to tasks -- we had a proposal in https://github.com/tektoncd/pipeline/issues/1020 though the current way of doing so is to pass in the pipelineRun name and then using kubectl within the task to fetch the status. (I think @afrittoli might also be doing something here re: Notifications design work)

  • @pierretasci Sounds like the RunAlways strategy is what you'd need for the upload-test-results-step in your example. I do like the idea of using conditionals as sort of the extension mechanism for more complicated strategies -- the basic strategies such as RunAlways, RunOnSuccess/Failure/Skip etc. are built-in while a user can use those plus a conditional to describe complex strategies (e.g. a strategy of RunAlways plus a conditional for if two of the three tasks failed or whatever)

One idea - instead of failure/error/executionStrategy, we could have a field like runOn (or simply on or when) that takes in a list of states that the parent taskruns have to be in for it to run (default is: success):

- name: task1
  conditions:
    conditionRef: "condition-that-sometime-fails"
  taskRef: { name: "my-task" }

- name: runIfTask1Fails 
  runAfter: task1
  runOn: ["failure"]

- name: runIfTask1Succeeds
  runAfter: task1
  runOn: ["success"]

- name: runIfTask1IsSkipped
  runAfter: task1
  runOn: ["skip"]

What I like about this is that the field name is more succinct and for the user instead of having to remember a bunch of magic strings (is it RunOnSuccessOrSkip or RunOnSkipAndSuccess ), they just need to remember the 3 taskrun states e.g. "success", "failure", "skip"

A few more examples here: https://gist.github.com/dibyom/92dfd6ea20f13f5c769a21389df53977

Very nice! I like this a lot. Would we want to include something like "any" or "*" for the "I don't care about the precise state" possibility?

oops, nevermind I see you included "always" in the examples you linked. I like it!

Would we want to include something like "any"

I think runOn: ["any"] sounds better than runOn: ["always"] :smile:

I really like the idea of runOn: ["success", "failure", "skip"] == runOn: ["always"]. Keeps with the descriptive nature of Kubernetes and doesn't require a loaded term (is always _really_ always).

One thing I want to add (though it is a bit tangential) is the idea of Pipeline Failure conditions. Right now, the pipeline bails early if anything fails. If I have multiple "branches" in my pipeline that are independent, I would expect non-dependent branches to run to completion separately from each other. An example:

apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
  name: branched-pipeline
spec:
  tasks:
    - name: pre-work
      taskRef:
        name: install-dependencies
    - name: lint
      taskRef:
        name: run-linter
      runAfter:
        - pre-work
    - name: compile
      taskRef:
        name: run-compiler
      runAfter:
        - pre-work
    - name: deploy
      taskRef:
        name: deploy
      runAfter:
        - compile

If the lint task here fails, it will fail the whole pipeline even if the compile succeeds. In this scenario, I would expect/want the deploy to still happen. I could imagine other scenarios where one could want to make a task a show-stopper as well. I believe this calls for failure strategies on a Pipeline

Comparing some of these with the programming paradigm (for simplicity 馃):

if condition-1-is-true {
        task1
        if task1 succeeds {
                task2  # runOn: success
        } else {
                task3  # runOn: failure
        }
}

if condition-2-is-true {
        task4
}

And:

if task1 succeeds {
        task2  # runOn: success
} else {
        task3  # runOn: failure
}
task4 # runOn: skip/any/always

One more:

task2
if task2 succeeds or fails {
        task3  # runOn: skip/any/always
        if task3 succeeds {
                task4 # runOn: success
        }
}

Hey @dibyom, @sbwsg, I have started drafting design document here:
https://docs.google.com/document/d/1PcBAVI_ZmMjNQbNl4qCppRrXOUenyGH_AiC_wn7b1fc/edit#

Priti and I have been discussing a failureStrategy and some other alternatives in addition to the runOn:
https://gist.github.com/dibyom/5420de48279816ca920d219a1706dc74

Lots of great discussion on the design doc. I'm gonna summarize where we are at now:

runOn

The idea seems popular but instead of a list we might make it into a map.

     - name: task3
       runAfter: ["task1", "task2"]
       runOn:
         - task: task1
           states: ["success", "failure"]
         - task: task2
           states: ["success"]
--- instead of
     - name: task3
       runAfter: ["task1", "task2"]
       runOn: ["success", "failure"]

What's nice about the map is that it is more powerful i.e. users can say run this task3 regardless of task1's state but only if task2 succeeds. At the same time, its adds some duplication (we need both runAfter and runOn) and some extra validation on our side (e.g. we should not accept more tasks in runOn that are not already present in runAfter). In the future, we can get rid of runAfter in favor of this runOn!

pipeline level failureStrategy

Instead of adding a pipeline level failureStrategy, we could change the default behavior of pipeline execution from today's fail on first failure to keep running independent branches of the pipeline until there are no more tests left to run. This would be a backwards incompatible change so we should decide on this sooner rather than later given the upcoming beta release!

cc @sbwsg @skaegi (might be related to https://github.com/tektoncd/pipeline/issues/1978#issuecomment-582941534)
Also cc @vdemeester @bobcatfish re: beta release implications

Interesting complex use case here: https://github.com/tektoncd/pipeline/issues/1922#issuecomment-583640512

I think we'd need failureStrategies at the very least. Also possibly some sort of simple conditionals to model should I run this branch or not based on output params/results

Yeah, that's a doozy. Child pipelines is starting to get into fork territory which is exciting and scary all at the same time. I do think failureStrategies + output params are the answer here. Nothing we are proposing here inhibits the ability to make that complex pipeline work.

I've started looking at moving our use-case out of this issue, an operator that tracks PipelineRuns (and probably TaskRuns too), and if they're annotated/tagged appropriately, sending a git commit status notification (using go-scm).

Quick update: @pritidesai has created this proposal for a finally syntax at the Pipeline level: https://docs.google.com/document/d/1lxpYQHppiWOxsn4arqbwAFDo4T0-LCqpNa6p-TJdHrw/edit

Will be discussing in our working group today (i havent yet added a link to the community repo about this new working group but it's taking the same slot and links as our previous beta working group https://github.com/tektoncd/community/blob/master/working-groups.md#beta-release)

Status update:
We are currently punting on the runOn syntax. Instead, we are:

  1. Implementing a pipeline level finally field that always runs some tasks at the end of a pipeline. (doc

We are also considering adding the following (discussions ongoing in the API working group):

  1. A pipeline level onError/except that runs a Task if any task in a pipeline fails
  2. A finally/onError Step within a Task or within a PipelineTask
  3. Allowing a pipeline to call other pipelines aka nested/sub pipelines for complex branches

Some discussion here

hey folks - this is one of the key requirements for the work we are leading from Kubeflow side to run on top of Tekton. Would be great to get current status, and see how we can accelerate this

cc @tomcli @afrittoli @skaegi

hey folks - this is one of the key requirements for the work we are leading from Kubeflow side to run on top of Tekton. Would be great to get current status, and see how we can accelerate this

@pritidesai is implementing a finally field for always running tasks at the end of a pipeline. Hope that helps with some of your use cases. Beyond that, we are considering:

  1. A "finally" step with a task
  2. An "onError" step/task that runs only if the pipeline/task fails.
  3. Allowing a pipeline to call other pipelines aka nested/sub pipelines for complex branches

Would this be sufficient for the Kubeflow use cases?

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

I feel like it's fair to consider this closed now that we have finally, tho there are more features to add, and to get the complete set of flexibility someone might want, i think we need to add in #2134 as well

Was this page helpful?
0 / 5 - 0 ratings