Pipeline: Failed TaskRuns should still report results

Created on 22 Oct 2020 · 15Comments · Source: tektoncd/pipeline

Expected Behavior

Results produced by failed steps should still be include in the task results.

Actual Behavior

Results are present in the "message" field of the step, but not in the task results.

Steps to Reproduce the Problem

Apply the following to a cluster with Tekton installed:

apiVersion: tekton.dev/v1beta1
kind: ClusterTask
metadata:
  name: result-task
spec:
  results:
  - name: sampleResult
    description: Result
  steps:
  - name: write-result
    image: gcr.io/google-containers/busybox:1.27
    script: |
      #!/bin/sh
      /bin/echo 'Result expected' > $(results.sampleResult.path)
      exit 1

---
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: result-task-run
spec:
  taskRef:
    kind: ClusterTask
    name: result-task

Additional Info

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-gke.401", GitCommit:"eb94c181eea5290e9da1238db02cfef263542f5f", GitTreeState:"clean", BuildDate:"2020-09-09T00:57:35Z", GoVersion:"go1.13.9b4", Compiler:"gc", Platform:"linux/amd64"}

Tekton Pipeline version:
v0.17.1 (I first ran into this in V16.3)

kinfeature

Source

ekupershlak

Most helpful comment

Great discussion!!! Thanks @ekupershlak and @vincent-pli for going into detail about your use cases, this really helps.

I'm thinking 2 things:

Maybe we could nail down the problem statement a bit, e.g. in the form of a TEP? (https://github.com/tektoncd/community/tree/master/teps) The problem seems to be about understanding why a Task failed and we've generated a couple of alternatives in this issue already, e.g. leveraging Task results and also letting a Task emit a reason to provide more context. We've been trying to use an approach with TEPs where we nail down the problem statement before getting too deep into the solution
I actually want to double check that using logs isn't the best solution here? (I think a very clear problem statement could help explain why using logs doesn't accomplish what you need as well) @ekupershlak you said:

Having the user do it breaks encapsulation and doing it programatically requires access to the underlying Pods.

Is it possible that the user facing abstraction you are creating could access these logs? If the main problem is actually that it's difficult for you to fetch the logs, maybe we should tackle that directly.

My thinking is that even if we DO add a failure reason or allow Tasks to emit results even when they fail, you're still going to need to write ALL of your Tasks in such a way that they are aware of their own failure and can introspect. That's going to be difficult to do since you'd need to handle the failure of ANY part of your Task.

bobcatfish on 1 Dec 2020

👍2

All 15 comments

@cc @pritidesai @bobcatfish @jerop

vdemeester on 23 Oct 2020

Results produced by failed steps should still be include in the task results.

Could you go in to a bit more detail as to _why_ they should be included? What's your use-case? A failed Task may have failed because the data it generated was invalid.

sbwsg on 23 Oct 2020

I'm using Tekton for execution, though the user-facing abstraction has no notion of Tekton concepts. When execution fails, I need to provide some details about this failure.

I would like to store details about the failure in a result and then interrogate it when retreiving the PipelineRun.

The only alternative I can think of is to access the logs directly, but this is undesirable. Having the user do it breaks encapsulation and doing it programatically requires access to the underlying Pods.

ekupershlak on 26 Oct 2020

yup I agree and would be great to provide a way to communicate failure logs which are otherwise just limited to the pod its running in.

We have been having lot of discussion around task results in finally and task results without results, please feel free to leave a comment and voice your opinion:

https://github.com/tektoncd/community/pull/240
https://docs.google.com/document/d/1tV1LgPOINnmlDV-oSNdLB39IlLcQRGaYAxYZjVwVWcs/edit#heading=h.1huoxojutir3

pritidesai on 26 Oct 2020

I dunno. I feel like we've jumped the gun a bit. I definitely see the value in returning some information about the failure of a Task. I'm not convinced that a Task Result is the absolutely only way we could tackle this tho. Here's an alternative: we add a new field to a TaskRun's status (e.g. taskRun.status.failureReason) that gets populated from a /tekton/failure-reason file that a Step can write. Then a PipelineRun is updated with that TaskRun Status including the failure reason. The benefit to this would be that the mechanism is consistent across every TaskRun - each Task doesn't have to invent its own "failure result" that also has to be wired in to every Pipeline Result.

Task Results already have a defined behaviour when Tasks fail - they don't populate in TaskRun Status. If we modify that contract I kinda feel like we're making a backwards-incompatible change. So I'm going to re-classify this as a feature request rather than a bug since Task Results aren't behaving incorrectly here - their behaviour matches the expectations we set for them when they were designed. Happy to discuss further but I think this needs to be considered a bit carefully before moving ahead on it.

/kind feature
/remove-kind bug

sbwsg on 28 Oct 2020

Having a consistent way to populate a failure reason would be great.

ekupershlak on 28 Oct 2020

@ekupershlak we have stdout and stderr for steps being implemented, see TEP and PoC. Would this work for your use case?

See issue https://github.com/tektoncd/pipeline/issues/2925

pritidesai on 10 Nov 2020

@pritidesai if I understand correctly, that proposal simplifies capturing stdout/stderr of a step. This is useful, but doesn't provide a way to obtain this data.

I tried to use a result because it's accessible from a TaskRun without any additional management (i.e. creating buckets or other similar stores).

ekupershlak on 10 Nov 2020

@ekupershlak
Your case is exactly same as mine.
I need delivery something like error source, error message and some customized data to outer world when a Pipelinerun get failed. I think TaskRunResult is a good candidate, just watch the Pipelinerun and parse the TaskRunResult, then know the root cause and decide what to do next.

@pritidesai
I think the result consuming in finally can help on my case, then I can launch my custom task(since finally do not support task chain) in finally with the result as param, no need to retrieval the object Pipelinerun, but seems the pr still not merged.

So, any reason we cannot let a failed task report it's result? I have no idea about previous discussion about that, any clarification? @sbwsg

vincent-pli on 1 Dec 2020

👍1

We're essentially describing Results that are only optionally populated (i.e. they only populate when there's an error). Elsewhere we are proposing that a Task which doesn't emit a declared result should be an automatic failure: https://github.com/tektoncd/pipeline/issues/3497. We've proposed this because users should be able to depend on Task Results so that other Tasks in a Pipeline can use them reliably. A missing Result should be considered a bug in a successful Task.

So, let's look at an example Task with a failure result:

spec:
  results:
  - name: errorMessage
    description: If this task fails, it will return an error message in this result.

When this Task succeeds it _must_ return a value for errorMessage if we implement #3497. What's the correct value of errorMessage when the Task succeeds? Empty string? "No errors" message? An exit code "0"? And then, thinking about this further, could a UI or other tool make use of that failure information and display it in some way to help users? How does the UI/tool discover the correct Result to read the error information from?

So I guess I'm still not that convinced that we need to muddy the contract around Results to support this. I'd personally prefer to see Results only returned on success and a _different_ field returned on failure. See my earlier comment https://github.com/tektoncd/pipeline/issues/3439#issuecomment-718073395 for a suggestion of a different possible way to tackle the same problem.

sbwsg on 1 Dec 2020

👍1

Great discussion!!! Thanks @ekupershlak and @vincent-pli for going into detail about your use cases, this really helps.

I'm thinking 2 things:

Maybe we could nail down the problem statement a bit, e.g. in the form of a TEP? (https://github.com/tektoncd/community/tree/master/teps) The problem seems to be about understanding why a Task failed and we've generated a couple of alternatives in this issue already, e.g. leveraging Task results and also letting a Task emit a reason to provide more context. We've been trying to use an approach with TEPs where we nail down the problem statement before getting too deep into the solution
I actually want to double check that using logs isn't the best solution here? (I think a very clear problem statement could help explain why using logs doesn't accomplish what you need as well) @ekupershlak you said:

Having the user do it breaks encapsulation and doing it programatically requires access to the underlying Pods.

Is it possible that the user facing abstraction you are creating could access these logs? If the main problem is actually that it's difficult for you to fetch the logs, maybe we should tackle that directly.

My thinking is that even if we DO add a failure reason or allow Tasks to emit results even when they fail, you're still going to need to write ALL of your Tasks in such a way that they are aware of their own failure and can introspect. That's going to be difficult to do since you'd need to handle the failure of ANY part of your Task.

bobcatfish on 1 Dec 2020

👍2

Is it possible that the user facing abstraction you are creating could access these logs? If the main problem is actually that it's difficult for you to fetch the logs, maybe we should tackle that directly.

@bobcatfish This sums up the bulk of the issue: I was building an operator that, as part of reconciliation, is managing Tekton tasks. I could definitely add logic to read logs, but I was trying to avoid the work (and needing to add the corresponding permissions).

I do think that there is room for error details to be distinct from logs. Tools can have noisy outputs (e.g. prompts for surveys), and STDERR is generally more valuable. Additionally, error detection in a script (e.g. checking if an output file is empty) could use echo, but writing directly to an error log might be preferable to produce actionable output.

ekupershlak on 1 Dec 2020

👍1

@ekupershlak from the pipelineRun perspective, how are you capturing the task result (in your case failures)? in a pipeline result?

The reason I am asking is, we have a feature being designed to access the pipelineRun metadata as a json payload from any other tasks. In your use case, you could access this in a finally task. Does this work for you?

I agree with @sbwsg, results are not the right fit for exposing failures.

pritidesai on 1 Dec 2020

@pritidesai I configured the tasks to emit results, but didn't specify any pipeline results. To get the value, I'm parsing pipelineRun.Status.TaskRuns[].Status.Steps[].Terminated.Message.

Regarding a finally task, is the idea that it would access error details, and use them to produce a result? If so, I think that would work.

For what it's worth, I don't feel strongly about how error details are surfaced as long as there's a way.

ekupershlak on 1 Dec 2020

@bobcatfish thanks for your summary. and thanks @sbwsg for clarification

One quick question about #3479 , I guess I like current behavior better: a task declare a Result but not emit. The Pipeline will failed if other task in that Pipeline depends on the Result, if no dependency, nothing happen.

I mean task 's Result could be used by other tasks in the same pipeline, also could used by outer world application. i.e. a custom task controller, a monitor/audit application.

From task perspective, I guess Result is the most straight forward way to deliver customized information to outer world (correct me if I missing something), no matter it's success or failed.

Actually, I have workaround the issue by parse the pipelineRun.Status.TaskRuns[].Status.Steps[].Terminated.Message, the result will be always write to there : )

vincent-pli on 2 Dec 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

PipelineRun hangs in Pending state

castlemilk · 4Comments

Can I use tektoncd for building data processing pipelines?

sujithjoseph · 3Comments

Tutorial.md says to use "host.docker.local" instead of "host.docker.internal"

andydude · 3Comments

Beta documentation: write up alternatives to PipelineResources if they remain Alpha

sbwsg · 4Comments

If the WhenExpression of the task is evaluated to false immediately after the pipelinerun is started, the pipelinerun stays in "Running(Started)" state

VeereshAradhya · 3Comments