Argo: Argo *namespace-install* raises `failed to save outputs: Failed to establish pod watch: unknown (get pods)`

Created on 26 Mar 2020  ·  10Comments  ·  Source: argoproj/argo

Checklist:

  • [ X ] I've included the version.
  • [ X ] I've included reproduction steps.
  • [ X ] I've included the workflow YAML.
  • [ X ] I've included the logs.

What happened:

On a fresh namespace install of Argo running: kubectl apply -n custom_namespace -f https://raw.githubusercontent.com/argoproj/argo/master/manifests/namespace-install.yaml, submitting an example workflow through the UI or from a kubectl apply -f command the workflow step will fail with error message:

failed to save outputs: Failed to establish pod watch: unknown (get pods)

What you expected to happen:

The workflow to run successfully.

How to reproduce it (as minimally and precisely as possible):

On a kubernetes development cluster:

> kubectl config use-context <dev-cluster>
> kubectl create ns test-ns
> kubectl apply -n test-ns -f https://raw.githubusercontent.com/argoproj/argo/master/manifests/namespace-install.yaml
> kubectl port-forward deployment/argo-server 2746:2746

From the argo-server UI at localhost:2746: Timeline -> + Submit new workflow -> Submit

Watch the workflow fail.

Anything else we need to know?:

After going over open issues this does not seem to be a duplicate of other failed to save outputs: Failed to establish pod watch: unknown (get pods) issues as we are running things through a namespace-install.

Environment:

  • Argo version:
    argo: v2.7.0-rc3
    BuildDate: 2020-03-25T17:25:15Z
    GitCommit: 2bb0a7a4fd7bbf3da12ac449c3d20f8d5baf0995
    GitTreeState: clean
    GitTag: v2.7.0-rc3
    GoVersion: go1.13.4
    Compiler: gc
    Platform: linux/amd64
$ argo version
  • Kubernetes version :

Kubernetes cluster-side version is pretty old: 1.9.

$ kubectl version -o yaml
clientVersion:
  buildDate: "2020-02-29T16:37:45Z"
  compiler: gc
  gitCommit: 06ad960bfd03b39c8310aaf92d1e7c12ce618213
  gitTreeState: archive
  gitVersion: v1.17.3
  goVersion: go1.14
  major: "1"
  minor: "17"
  platform: linux/amd64
serverVersion:
  buildDate: "2018-04-18T23:58:35Z"
  compiler: gc
  gitCommit: dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92
  gitTreeState: clean
  gitVersion: v1.9.7
  goVersion: go1.9.3
  major: "1"
  minor: "9"
  platform: linux/amd64

Other debugging information (if applicable):

  • workflow result:
↳ argo get lovely-dragon

Name:                lovely-dragon
Namespace:           moo-de
ServiceAccount:      default
Status:              Error
Message:             failed to save outputs: Failed to establish pod watch: unknown (get pods)
Created:             Thu Mar 26 15:29:40 +0000 (1 minute ago)
Started:             Thu Mar 26 15:29:40 +0000 (1 minute ago)
Finished:            Thu Mar 26 15:29:53 +0000 (1 minute ago)
Duration:            13 seconds

STEP                         PODNAME        DURATION  MESSAGE                                                                    RESOURCESDURATION
 ⚠ lovely-dragon (whalesay)  lovely-dragon  12s       failed to save outputs: Failed to establish pod watch: unknown (get pods)  
  • executor logs:
kubectl logs <failedpodname> -c init

error: container init is not valid for pod lovely-dragon

kubectl logs <failedpodname> -c wait

time="2020-03-26T15:29:51Z" level=info msg="Creating a docker executor"
time="2020-03-26T15:29:51Z" level=info msg="Executor (version: vv2.6.3+2e8ac60.dirty, build_date: 2020-03-16T18:05:02Z) initialized (pod: moo-de/lovely-dragon) with template:\n{\"name\":\"whalesay\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"main\",\"image\":\"docker/whalesay:latest\",\"command\":[\"cowsay\"],\"args\":[\"hello world\"],\"resources\":{}}}"
time="2020-03-26T15:29:51Z" level=info msg="Waiting on main container"
time="2020-03-26T15:29:51Z" level=error msg="executor error: Failed to establish pod watch: unknown (get pods)\ngithub.com/argoproj/argo/errors.Wrap\n\t/go/src/github.com/argoproj/argo/errors/errors.go:88\ngithub.com/argoproj/argo/errors.InternalWrapErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:78\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).waitMainContainerStart\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:831\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:795\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:40\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"
time="2020-03-26T15:29:51Z" level=info msg="No output parameters"
time="2020-03-26T15:29:51Z" level=info msg="No output artifacts"
time="2020-03-26T15:29:51Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2020-03-26T15:29:51Z" level=info msg="Killing sidecars"
time="2020-03-26T15:29:51Z" level=info msg="Alloc=3343 TotalAlloc=4579 Sys=70016 NumGC=1 Goroutines=5"
time="2020-03-26T15:29:51Z" level=fatal msg="Failed to establish pod watch: unknown (get pods)\ngithub.com/argoproj/argo/errors.Wrap\n\t/go/src/github.com/argoproj/argo/errors/errors.go:88\ngithub.com/argoproj/argo/errors.InternalWrapErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:78\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).waitMainContainerStart\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:831\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:795\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:40\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"

  • workflow-controller logs:

_Just the last few relevant lines_

kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)

time="2020-03-26T15:29:40Z" level=info msg="Processing workflow" namespace=moo-de workflow=lovely-dragon
time="2020-03-26T15:29:40Z" level=info msg="Updated phase  -> Running" namespace=moo-de workflow=lovely-dragon
time="2020-03-26T15:29:40Z" level=info msg="Pod node {lovely-dragon lovely-dragon lovely-dragon Pod whalesay nil    Pending   2020-03-26 15:29:40.377982325 +0000 UTC 0001-01-01 00:00:00 +0000 UTC  <nil> nil nil [] []} initialized Pending" namespace=moo-de workflow=lovely-dragon
time="2020-03-26T15:29:40Z" level=info msg="Created pod: lovely-dragon (lovely-dragon)" namespace=moo-de workflow=lovely-dragon
time="2020-03-26T15:29:40Z" level=info msg="Workflow update successful" namespace=moo-de phase=Running resourceVersion=223765391 workflow=lovely-dragon
time="2020-03-26T15:29:41Z" level=info msg="Processing workflow" namespace=moo-de workflow=lovely-dragon
time="2020-03-26T15:29:41Z" level=info msg="Updating node &NodeStatus{ID:lovely-dragon,Name:lovely-dragon,DisplayName:lovely-dragon,Type:Pod,TemplateName:whalesay,TemplateRef:nil,Phase:Pending,BoundaryID:,Message:,StartedAt:2020-03-26 15:29:40 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,} message: ContainerCreating"
time="2020-03-26T15:29:41Z" level=info msg="Workflow update successful" namespace=moo-de phase=Running resourceVersion=223765397 workflow=lovely-dragon
time="2020-03-26T15:29:42Z" level=info msg="Processing workflow" namespace=moo-de workflow=lovely-dragon
time="2020-03-26T15:29:53Z" level=info msg="Processing workflow" namespace=moo-de workflow=lovely-dragon
time="2020-03-26T15:29:53Z" level=info msg="Updating node &NodeStatus{ID:lovely-dragon,Name:lovely-dragon,DisplayName:lovely-dragon,Type:Pod,TemplateName:whalesay,TemplateRef:nil,Phase:Pending,BoundaryID:,Message:ContainerCreating,StartedAt:2020-03-26 15:29:40 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,} status Pending -> Error"
time="2020-03-26T15:29:53Z" level=info msg="Updating node &NodeStatus{ID:lovely-dragon,Name:lovely-dragon,DisplayName:lovely-dragon,Type:Pod,TemplateName:whalesay,TemplateRef:nil,Phase:Error,BoundaryID:,Message:,StartedAt:2020-03-26 15:29:40 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,} message: failed to save outputs: Failed to establish pod watch: unknown (get pods)"
time="2020-03-26T15:29:53Z" level=info msg="Updated phase Running -> Error" namespace=moo-de workflow=lovely-dragon
time="2020-03-26T15:29:53Z" level=info msg="Updated message  -> failed to save outputs: Failed to establish pod watch: unknown (get pods)" namespace=moo-de workflow=lovely-dragon
time="2020-03-26T15:29:53Z" level=info msg="Marking workflow completed" namespace=moo-de workflow=lovely-dragon
time="2020-03-26T15:29:53Z" level=info msg="Checking daemoned children of " namespace=moo-de workflow=lovely-dragon
time="2020-03-26T15:29:53Z" level=info msg="Workflow update successful" namespace=moo-de phase=Error resourceVersion=223765430 workflow=lovely-dragon
time="2020-03-26T15:29:54Z" level=info msg="Labeled pod moo-de/lovely-dragon completed"

Logs

argo get <workflowname>
kubectl logs <failedpodname> -c init
kubectl logs <failedpodname> -c wait
kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)


Message from the maintainers:

If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

bug

Most helpful comment

This is a great answer man, very useful :pray: Going over the docs again I now see this covered at: https://github.com/argoproj/argo/blob/master/docs/getting-started.md#3-configure-the-service-account-to-run-workflows.

Provisioning the necessary RBAC I was able to get those workflows to run smoothly :+1: Below is a snippet for anyone interested:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: workflow
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: workflow-role
rules:
# pod get/watch is used to identify the container IDs of the current pod
# pod patch is used to annotate the step's outputs back to controller (e.g. artifact location)
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - watch
  - patch
# logs get/watch are used to get the pods logs for script outputs, and for log archival
- apiGroups:
  - ""
  resources:
  - pods/log
  verbs:
  - get
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: workflow-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: workflow-role
subjects:
- kind: ServiceAccount
  name: workflow

Later on when submitting the workflow adding the spec.serviceAccountName:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: <wf-name>
spec:
  serviceAccountName: workflow
  entrypoint: <entrypoint-name>
  templates:
    ...

Thanks again for the patience and the quick reply man you're the man :tada: Closing the issue now.

All 10 comments

This happens because Argo uses the "default" service account on a namespace to authenticate the wait container and the "default" ServiceAccount on the test-ns namespace does not have permissions to get pod. You can verify this with:

$ kubectl auth can-i get pod --as=system:serviceaccount:test-ns:default
no

From here you have at least a couple of options, depending on your security needs:

  1. Updated answer: You can bind this minimum permissions role to your default service account on the namespace in which you are running the workflow.

    Old answer: ~If you don't have any security needs (i.e. you trust everyone who will use Argo on your cluster) and you only want to do "basic" stuff within your containers, you can give the "default" service account on the test-ns namespace the same Role as argo uses:~

  2. If you have more specific security needs or you need your containers to have more kubectl access, you'll need to create your own Role, ServiceAccount, and RoleBinding to suit your needs. You can then pass your created ServiceAccount to either the controller, or individual Workflows for them to use.

Let me know if this wasn't clear or you have more questions.

Thank you very much for getting back on this :+1: In that case it sounds like a duplicate of another issue I've seen and misunderstood, sorry about that

I'm struggling on one point; the namespace-install.yaml defines the argo-server ServiceAccount and binds it to the argo-server-role Role, which itself has enough privileges to get pods as far as I can tell.

I don't see why the argo-server would actually use the default service account here? :thinking:

I am also struggling to see why I've been able to actually trigger Workflows and Cronworkflows through the UI so far :thinking: Trying to wrap my head around what you mentioned, I would have expected to lack privileges to do so?

No worries at all! There are 3 ServiceAccounts relevant here:

  1. argo-server: which defines what the argo-server can access
  2. argo: which defines what the workflow-controller (the code that schedules and manages workflows) can access
  3. And container-level service accounts: which defines what the specific container that you have scheduled can access

The first two service accounts are defined so that the server and workflow-controller can work properly – i.e. they can create, read, update, and delete Workflow and Pod resources, among others to properly schedule Workflows.

The third service account, which is the one relevant here, defines what the container that you have created through a Workflow can do. It may be that you want that container to have less or more permissions than 1. and 2., but that is only for you to define. Since Argo can't know for sure what permissions it should give containers it creates, it uses the "default" service account to be safe. You can specify which service account to use in this case at many levels: through the controller config, through a workflow-wide service account, or through a template-wide service account.

With respect to being able to trigger Workflows and CronWorkflows, not every Workflow is created equal, some Workflows that are executed might not require the same permissions. The Workflow you submitted when creating this issue required the Pod to get its own outputs which (based on how this is implemented) means that it needs permissions to get pods. Some other Workflows might not need these permissions.

A little update: the minimum service account that 3. needs to run correctly can be found here: https://github.com/argoproj/argo/blob/master/docs/workflow-rbac.md

Updated my original answer to reflect this.

This is a great answer man, very useful :pray: Going over the docs again I now see this covered at: https://github.com/argoproj/argo/blob/master/docs/getting-started.md#3-configure-the-service-account-to-run-workflows.

Provisioning the necessary RBAC I was able to get those workflows to run smoothly :+1: Below is a snippet for anyone interested:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: workflow
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: workflow-role
rules:
# pod get/watch is used to identify the container IDs of the current pod
# pod patch is used to annotate the step's outputs back to controller (e.g. artifact location)
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - watch
  - patch
# logs get/watch are used to get the pods logs for script outputs, and for log archival
- apiGroups:
  - ""
  resources:
  - pods/log
  verbs:
  - get
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: workflow-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: workflow-role
subjects:
- kind: ServiceAccount
  name: workflow

Later on when submitting the workflow adding the spec.serviceAccountName:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: <wf-name>
spec:
  serviceAccountName: workflow
  entrypoint: <entrypoint-name>
  templates:
    ...

Thanks again for the patience and the quick reply man you're the man :tada: Closing the issue now.

Thanks for reaching out @czardien! Feel free to ask if you have any more questions 🎉

Got the same issue and @czardien manifest solved it. Thank you!

@czardien your example was really helpful. Would you mind adding (or let me add) it to the docs?

Hi @secretlifeof :wave: Sure please go for it!

kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=argo:default --namespace=argo

This works for me.

Was this page helpful?
0 / 5 - 0 ratings