Pipeline: if a side container in a taskrun failes, entire taskrun should fail

Created on 5 Apr 2019 · 10Comments · Source: tektoncd/pipeline

Reusing the title from #682 since they are the same but this time we are using side container instead of init ones

Expected Behavior

When you have one step that fails it should be marked as fail and not continue to the next step

Actual Behavior

Continue to the next steps and show as Completed/Succeeded

Steps to Reproduce the Problem

Sample task

apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
  name: echo-hello-world
spec:
  steps:
    - name: echo
      image: ubuntu
      command:
        - echo
      args:
        - "hello world"
    - name: error
      image: ubuntu
      command:
        - exit
    - name: final
      image: ubuntu
      command:
        - echo
      args:
        - "hello moto"

First step should success, second shoudl error and third should work as seen from the _containerStatuses_ :

- containerID: cri-o://e8f02dae0c84276cc1fb15b282c8b5241042525c13576fca4353239484e92436
    image: docker.io/library/ubuntu:latest
    imageID: docker.io/library/ubuntu@sha256:f2557f94cac1cc4509d0483cb6e302da841ecd6f82eb2e91dc7ba6cfd0c580ab
    lastState: {}
    name: build-step-echo
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://e8f02dae0c84276cc1fb15b282c8b5241042525c13576fca4353239484e92436
        exitCode: 0
        finishedAt: 2019-04-04T22:06:00Z
        reason: Completed
        startedAt: 2019-04-04T22:06:00Z
  - containerID: cri-o://9358cbe2aeeecbe571c102d6489fb3a366c9d6198217d022d5b7a2647ad62998
    image: docker.io/library/ubuntu:latest
    imageID: docker.io/library/ubuntu@sha256:f2557f94cac1cc4509d0483cb6e302da841ecd6f82eb2e91dc7ba6cfd0c580ab
    lastState: {}
    name: build-step-error
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://9358cbe2aeeecbe571c102d6489fb3a366c9d6198217d022d5b7a2647ad62998
        exitCode: 1
        finishedAt: 2019-04-04T22:06:01Z
        reason: Error
        startedAt: 2019-04-04T22:06:01Z
  - containerID: cri-o://f400bcd955d115462011c9cafe786d5b3431f4b0fc37d440ce04e6edca4f9b52
    image: docker.io/library/ubuntu:latest
    imageID: docker.io/library/ubuntu@sha256:f2557f94cac1cc4509d0483cb6e302da841ecd6f82eb2e91dc7ba6cfd0c580ab
    lastState: {}
    name: build-step-final
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://f400bcd955d115462011c9cafe786d5b3431f4b0fc37d440ce04e6edca4f9b52
        exitCode: 0
        finishedAt: 2019-04-04T22:06:02Z
        reason: Completed
        startedAt: 2019-04-04T22:06:02Z

but third step should have not been started since the second one has failed and should show as completed/success :

% k get pod
NAME                                           READY   STATUS      RESTARTS   AGE
echo-hello-world-task-run-pod-cfd7fd           0/4     Completed   0          3m30s
% k get taskrun
NAME                        TYPE        STATUS   STARTTIME   COMPLETIONTIME
echo-hello-world-task-run   Succeeded   False    4m38s       4m24s

Additional Info

design kindocumentation kinquestion

Source

chmouel

All 10 comments

@chmouel So… it's a bit trickier. As we don't use initContainers, all containers in a Pod starts in parallel… So the flow is the following:

all the container are starting, initial step command being "transformed" using entrypoint …
if it's the first step, it starts right away, and write a file when finished (if failed, it's gonna prepend .err to that file)
if it's not the first step, it's gonna wait for a file (well actually either the "success file" or the "error file") to be written (by the previous step) before actually running the real step command.
- If the previous step was error, it's doing a no-op (aka exit 0)

This can be a bit confusing as it will exit 0 and thus makes us think it actually executed. It kinda has been discuss here quikcly :sweat_smile:

vdemeester on 5 Apr 2019

👍1

I see thanks for taking the time to describes how it works, I guess then it's a UI problem, to check if the containers have failed or not.

Is it a pattern that we are going to keep ? if it is then perhaps we can document it ? (I would not mind doing it)

chmouel on 5 Apr 2019

I see thanks for taking the time to describes how it works, I guess then it's a UI problem, to check if the containers have failed or not.

Is it a pattern that we are going to keep ? if it is then perhaps we can document it ? (I would not mind doing it)

Right… Good question :sweat_smile:. I'm not a huge fan of " it's a UI problem, to check if the containers have failed or not." as it means each UI implementation will have to have this knowledge (and if it changed or evolve, well. we break them). I think that:

We should document it (for sure !)
We may want to think of a better way to show that those steps are skip due to previous errors. In the PR we discussed maybe using a special exit code for that.

vdemeester on 5 Apr 2019

Perhaps as a first step, we can maybe fix that the status is not set as "Succeeded" when there is a failure ? For example :

status:
  conditions:
  - lastTransitionTime: 2019-04-04T22:06:04Z
    message: 'build step "build-step-error" exited with code 1 (image: "docker.io/library/ubuntu@sha256:f2557f94cac1cc4509d0483cb6e302da841ecd6f82eb2e91dc7ba6cfd0c580ab");
      for logs run: kubectl -n tekton-pipelines logs echo-hello-world-task-run-pod-cfd7fd
      -c build-step-error'
    status: "False"
    type: Succeeded

the type here get sets as failed after all the task has finished ?

that would make explicit that there is an issue for UI to then go digg into the issue ?

chmouel on 5 Apr 2019

@chmouel as discussed on Slack, this conditions are "standard", and it is kinda explicit for the consumer that this is a failure : condition of type Succeeded is false — hence it's a failure. Also this is what knative/serving is looking for and it would be a huge API changes just to rename it :sweat:

vdemeester on 5 Apr 2019

fair enough, so let's try to document this, then. (there is a nice github tag for this?)

chmouel on 5 Apr 2019

❤1

I just want to double check @chmouel @vdemeester do we want to keep this open so we update the docs? or did that already happen?

bobcatfish on 17 Apr 2019

@bobcatfish we may want to keep it opened until docs are clearer about it, indeed

vdemeester on 17 Apr 2019

👍1

I think we can "close" this :angel: @ImJasonH proposal on API spec should help with making it more clear but it's "standard" conditions :angel:

/close

vdemeester on 7 Nov 2019

@vdemeester: Closing this issue.

In response to this:

I think we can "close" this :angel: @ImJasonH proposal on API spec should help with making it more clear but it's "standard" conditions :angel:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.