Argo: StepA calling StepB in WorkflowTemplate expects StepB to also be defined in submitted workflow

Created on 19 Dec 2019  Â·  5Comments  Â·  Source: argoproj/argo

Checklist:

  • [x] I've included the version.
  • [x] I've included reproduction steps.
  • [x] I've included the workflow YAML.
  • [x] I've included the logs.

What happened:
No pods execute. Error reported and no PODNAME

What you expected to happen:
template remoteC executes normally and outputs Happy

How to reproduce it (as minimally and precisely as possible):

kubectl apply -f issue-templates.yaml
argo submit issue-workflow.yaml --serviceaccount developer --watch

issue-templates.yaml.txt
issue-workflow.yaml.txt

Anything else we need to know?:
Workarounds to make this work:

  1. Uncomment template with matching name in issue-workflow.yaml. (Big clue to what's going wrong internally?)
  2. Comment/Uncomment and use other template remoteA in issue-templates.yaml. In a nutshell, call peer template using templateRef

Environment:

  • Argo version:
argo: v2.4.3
  BuildDate: 2019-12-06T03:36:38Z
  GitCommit: cfe5f377bc3552fba90afe6db7a76edd92c753cd
  GitTreeState: clean
  GitTag: v2.4.3
  GoVersion: go1.11.5
  Compiler: gc
  Platform: darwin/amd64
  • Kubernetes version :
clientVersion:
  buildDate: "2019-08-19T11:13:54Z"
  compiler: gc
  gitCommit: 2d3c76f9091b6bec110a5e63777c332469e0cba2
  gitTreeState: clean
  gitVersion: v1.15.3
  goVersion: go1.12.9
  major: "1"
  minor: "15"
  platform: darwin/amd64
serverVersion:
  buildDate: "2019-10-18T17:56:01Z"
  compiler: gc
  gitCommit: b7174db5ee0e30c94a0b9899c20ac980c0850fc8
  gitTreeState: clean
  gitVersion: v1.14.8-eks-b7174d
  goVersion: go1.12.10
  major: "1"
  minor: 14+
  platform: linux/amd64

Other debugging information (if applicable):

  • workflow result:
Name:                my-service-deploy-quick-xtjcf
Namespace:           default
ServiceAccount:      developer
Status:              Failed
Message:             child 'my-service-deploy-quick-xtjcf-2610873042' failed
Created:             Wed Dec 18 16:08:35 -0700 (1 minute ago)
Started:             Wed Dec 18 16:08:35 -0700 (1 minute ago)
Finished:            Wed Dec 18 16:08:35 -0700 (1 minute ago)
Duration:            0 seconds
Parameters:
  service-git-repo:  braze-api-service
  ecr-label:         git-11af99d

STEP                                       PODNAME  DURATION  MESSAGE
 ✖ my-service-deploy-quick-xtjcf (localA)                     child 'my-service-deploy-quick-xtjcf-2610873042' failed
 └---✖ deploy-quick (remoteB)                                 child 'my-service-deploy-quick-xtjcf[0].deploy-quick[0].remoteC' errored
  • executor logs:
# no pods executed.
  • workflow-controller logs:
time="2019-12-18T23:08:35Z" level=info msg="Processing workflow" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Updated phase  -> Running" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Steps node my-service-deploy-quick-xtjcf (my-service-deploy-quick-xtjcf) initialized Pending" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf (my-service-deploy-quick-xtjcf) phase Pending -> Running" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="StepGroup node my-service-deploy-quick-xtjcf[0] (my-service-deploy-quick-xtjcf-207316783) initialized Running" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Steps node my-service-deploy-quick-xtjcf[0].deploy-quick (my-service-deploy-quick-xtjcf-2610873042) initialized Pending" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf[0].deploy-quick (my-service-deploy-quick-xtjcf-2610873042) phase Pending -> Running" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="StepGroup node my-service-deploy-quick-xtjcf[0].deploy-quick[0] (my-service-deploy-quick-xtjcf-2306322900) initialized Running" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Step group node my-service-deploy-quick-xtjcf[0].deploy-quick[0] (my-service-deploy-quick-xtjcf-2306322900) deemed errored due to child my-service-deploy-quick-xtjcf[0].deploy-quick[0].remoteC error: template remoteB not found" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf[0].deploy-quick[0] (my-service-deploy-quick-xtjcf-2306322900) phase Running -> Error" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf[0].deploy-quick[0] (my-service-deploy-quick-xtjcf-2306322900) message: child 'my-service-deploy-quick-xtjcf[0].deploy-quick[0].remoteC' errored" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf[0].deploy-quick[0] (my-service-deploy-quick-xtjcf-2306322900) finished: 2019-12-18 23:08:35.92482828 +0000 UTC" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="step group my-service-deploy-quick-xtjcf-2306322900 was unsuccessful: child 'my-service-deploy-quick-xtjcf[0].deploy-quick[0].remoteC' errored" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Outbound nodes of my-service-deploy-quick-xtjcf-348834221 is []" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Outbound nodes of my-service-deploy-quick-xtjcf-2610873042 is []" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf[0].deploy-quick (my-service-deploy-quick-xtjcf-2610873042) phase Running -> Failed" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf[0].deploy-quick (my-service-deploy-quick-xtjcf-2610873042) message: child 'my-service-deploy-quick-xtjcf[0].deploy-quick[0].remoteC' errored" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf[0].deploy-quick (my-service-deploy-quick-xtjcf-2610873042) finished: 2019-12-18 23:08:35.924894759 +0000 UTC" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Checking daemoned children of my-service-deploy-quick-xtjcf-2610873042" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Step group node my-service-deploy-quick-xtjcf[0] (my-service-deploy-quick-xtjcf-207316783) deemed failed: child 'my-service-deploy-quick-xtjcf-2610873042' failed" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf[0] (my-service-deploy-quick-xtjcf-207316783) phase Running -> Failed" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf[0] (my-service-deploy-quick-xtjcf-207316783) message: child 'my-service-deploy-quick-xtjcf-2610873042' failed" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf[0] (my-service-deploy-quick-xtjcf-207316783) finished: 2019-12-18 23:08:35.924953366 +0000 UTC" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="step group my-service-deploy-quick-xtjcf-207316783 was unsuccessful: child 'my-service-deploy-quick-xtjcf-2610873042' failed" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Outbound nodes of my-service-deploy-quick-xtjcf-2610873042 is []" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Outbound nodes of my-service-deploy-quick-xtjcf is []" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf (my-service-deploy-quick-xtjcf) phase Running -> Failed" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf (my-service-deploy-quick-xtjcf) message: child 'my-service-deploy-quick-xtjcf-2610873042' failed" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="node my-service-deploy-quick-xtjcf (my-service-deploy-quick-xtjcf) finished: 2019-12-18 23:08:35.925001345 +0000 UTC" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Checking daemoned children of my-service-deploy-quick-xtjcf" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Updated phase Running -> Failed" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Updated message  -> child 'my-service-deploy-quick-xtjcf-2610873042' failed" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Marking workflow completed" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Checking daemoned children of " namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:08:35Z" level=info msg="Workflow update successful" namespace=default workflow=my-service-deploy-quick-xtjcf
time="2019-12-18T23:09:15Z" level=info msg="Alloc=8284 TotalAlloc=26977605 Sys=3662658 NumGC=327 Goroutines=69"

Logs

argo get <workflowname>
# see above: workflow result

kubectl logs <failedpodname> -c init
# see above: executor logs

kubectl logs <failedpodname> -c wait
# see above: executor logs

kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)
# see above: workflow-controller logs
bug

All 5 comments

If it helps, downgrading to v2.4.1 or v2.4.2 solves this issue. It seems to be related to https://github.com/argoproj/argo/pull/1744.

I am also having this issue. Upgraded to 2.4.3 to fix the issues that #1744 addresses.

I can't see this issue in the branch of #1920.

I've been unable to reproduce this is v2.5. Can I ask if anyone else has? See https://github.com/argoproj/argo/pull/2269

Confirmed that this issue appears to be fixed in v2.6.1 at least.

Was this page helpful?
0 / 5 - 0 ratings