Argo: failed to save outputs: interface conversion: error is *exec.Error, not *exec.ExitError

Created on 2 Feb 2019  Â·  7Comments  Â·  Source: argoproj/argo

Is this a BUG REPORT or FEATURE REQUEST?:
BUG REPORT

What happened:
Seeing this error msg every so often and completely random. It does not happen specifically to a task. I get this for a task and the next run my task works fine.

The task runs fine and I can see the output but If the next task depends on this task it won't go to the next task.

What you expected to happen:
I did not used to see this and start to see that at some point of time when I installed kubeflowpipline and ran a task. However I remove/redeploy argo again but still see the error every so often.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Argo version:
$ argo version
argo: v2.2.1
  BuildDate: 2018-10-11T16:25:59Z
  GitCommit: 3b52b26190163d1f72f3aef1a39f9f291378dafb
  GitTreeState: clean
  GitTag: v2.2.1
  GoVersion: go1.10.3
  Compiler: gc
  Platform: darwin/amd64
  • Kubernetes version :
$ kubectl version -o yaml
clientVersion:
  buildDate: 2018-07-10T10:13:58Z
  compiler: gc
  gitCommit: 91e7b4fd31fcd3d5f436da26c980becec37ceefe
  gitTreeState: clean
  gitVersion: v1.11.0
  goVersion: go1.10.3
  major: "1"
  minor: "11"
  platform: darwin/amd64
serverVersion:
  buildDate: 2018-12-06T23:13:14Z
  compiler: gc
  gitCommit: 6bad6d9c768dc0864dab48a11653aa53b5a47043
  gitTreeState: clean
  gitVersion: v1.11.5-eks-6bad6d
  goVersion: go1.10.3
  major: "1"
  minor: 11+
  platform: linux/amd64

Other debugging information (if applicable):

  • workflow result:
$ argo get <workflowname>

argo get argo-gpu-s3-copy-4qzp8
Name: argo-gpu-s3-copy-4qzp8
Namespace: development
ServiceAccount: argo
Status: Error
Created: Sat Feb 02 19:10:05 +0000 (3 minutes ago)
Started: Sat Feb 02 19:10:05 +0000 (3 minutes ago)
Finished: Sat Feb 02 19:10:21 +0000 (3 minutes ago)
Duration: 16 seconds
Parameters:
s3-path: Shared_data/OULU/small_frames_npy
local-path: test2
bucker-name: onfido-mlplatform-in
node-selector: m4.xlarge

STEP PODNAME DURATION MESSAGE
âš  argo-gpu-s3-copy-4qzp8
â””-âš  list-chunk argo-gpu-s3-copy-4qzp8-3040831338 16s failed to save outputs: interface conversion: error is *exec.Error, not *exec.ExitError

  • executor logs:
$ kubectl logs <failedpodname> -c init
$ kubectl logs <failedpodname> -c wait
  • workflow-controller logs:
$ kubectl logs -n kube-system $(kubectl get pods -l app=workflow-controller -n kube-system -o name)
bug

All 7 comments

workflow-controller log:

time="2019-02-02T19:25:02Z" level=info msg="Processing workflow" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:02Z" level=info msg="Updated phase -> Running" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:02Z" level=info msg="Steps node argo-gpu-s3-copy-58kdd (argo-gpu-s3-copy-58kdd) initialized Running" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:02Z" level=info msg="StepGroup node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) initialized Running" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:02Z" level=info msg="Created pod: argo-gpu-s3-copy-58kdd[0].check (argo-gpu-s3-copy-58kdd-1253578299)" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:02Z" level=info msg="Pod node argo-gpu-s3-copy-58kdd[0].check (argo-gpu-s3-copy-58kdd-1253578299) initialized Pending" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:02Z" level=info msg="Workflow step group node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) not yet completed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:02Z" level=info msg="Workflow update successful" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:03Z" level=info msg="Processing workflow" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:03Z" level=info msg="Checking for deleted pods" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:04Z" level=info msg="Updating node argo-gpu-s3-copy-58kdd[0].check (argo-gpu-s3-copy-58kdd-1253578299) message: PodInitializing" time="2019-02-02T19:25:04Z" level=info msg="Workflow step group node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) not yet completed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:04Z" level=info msg="Workflow update successful" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:05Z" level=info msg="Processing workflow" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:05Z" level=info msg="Checking for deleted pods" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:05Z" level=info msg="Workflow step group node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) not yet completed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:06Z" level=info msg="Processing workflow" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:06Z" level=info msg="Checking for deleted pods" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:06Z" level=info msg="Workflow step group node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) not yet completed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:09Z" level=info msg="Processing workflow" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:09Z" level=info msg="Updating node argo-gpu-s3-copy-58kdd[0].check (argo-gpu-s3-copy-58kdd-1253578299) status Pending -> Running" time="2019-02-02T19:25:09Z" level=info msg="Workflow step group node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) not yet completed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:09Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"argo-gpu-s3-copy-58kdd\": the object has been modified; please apply your changes to the latest version and try again" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:09Z" level=info msg="Re-appying updates on latest version and retrying update" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:09Z" level=info msg="Update retry attempt 1 successful" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:09Z" level=info msg="Workflow update successful" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="Processing workflow" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="Updating node argo-gpu-s3-copy-58kdd[0].check (argo-gpu-s3-copy-58kdd-1253578299) status Running -> Error" time="2019-02-02T19:25:10Z" level=info msg="Updating node argo-gpu-s3-copy-58kdd[0].check (argo-gpu-s3-copy-58kdd-1253578299) message: failed to save outputs: interface conversion: error is *exec.Error, not *exec.ExitError" time="2019-02-02T19:25:10Z" level=info msg="Step group node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) deemed failed: child 'argo-gpu-s3-copy-58kdd-1253578299' failed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) phase Running -> Failed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) message: child 'argo-gpu-s3-copy-58kdd-1253578299' failed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) finished: 2019-02-02 19:25:10.370256123 +0000 UTC" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="step group argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) was unsuccessful: child 'argo-gpu-s3-copy-58kdd-1253578299' failed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="Outbound nodes of argo-gpu-s3-copy-58kdd-1253578299 is [argo-gpu-s3-copy-58kdd-1253578299]" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="Outbound nodes of argo-gpu-s3-copy-58kdd is [argo-gpu-s3-copy-58kdd-1253578299]" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="node argo-gpu-s3-copy-58kdd (argo-gpu-s3-copy-58kdd) phase Running -> Failed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="node argo-gpu-s3-copy-58kdd (argo-gpu-s3-copy-58kdd) message: child 'argo-gpu-s3-copy-58kdd-1253578299' failed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="node argo-gpu-s3-copy-58kdd (argo-gpu-s3-copy-58kdd) finished: 2019-02-02 19:25:10.3703671 +0000 UTC" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="Checking deamoned children of argo-gpu-s3-copy-58kdd" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="Updated phase Running -> Failed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="Updated message -> child 'argo-gpu-s3-copy-58kdd-1253578299' failed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="Marking workflow completed" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"argo-gpu-s3-copy-58kdd\": the object has been modified; please apply your changes to the latest version and try again" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="Re-appying updates on latest version and retrying update" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="Update retry attempt 1 successful" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:10Z" level=info msg="Workflow update successful" namespace=development workflow=argo-gpu-s3-copy-58kdd time="2019-02-02T19:25:11Z" level=info msg="Labeled pod development/argo-gpu-s3-copy-58kdd-1253578299 completed"

I just build from master ( plus a unrelated tweak ) and can confirm this behavior - except I get it every run

@wadeholler thanks. it fails 80% of times for me. I think we upgraded the k8s cluster version and starts to see this.
Any solution for that?

@wadeholler there had been a bug on master branch, I fixed it with the PR#1213. Try to delete argoexec:latest from your cluster and build it again using the new dockerfile (if argoproj/argoexec:latest is not updated or if you made any modifications).

That helped the stated problem above but now submodule support is broken:

failed to load artifacts: fatal: No url found for submodule path 'obsfuscated' in .gitmodules

my previous reply was for repos that had a submodule reference but no .gitmodules file. the new argoexec updates that force a submodule update caused this issue. unrelated to the above. All is well now. cheers

I'm pretty sure I had fixed a exec.Error, not *exec.ExitError panic conversion as part of the PNS work. Will close this as fixed in v2.3 but please re-open if it is seen again.

Was this page helpful?
0 / 5 - 0 ratings