Argo: [Script] failed to save outputs: verify serviceaccount default:default has necessary privileges

Created on 6 Sep 2018 · 7Comments · Source: argoproj/argo

Is this a BUG REPORT or FEATURE REQUEST?:
BUG REQUEST

What happened:
Got an error when trying to use script in my workflow.

Name:                scriptmm2nm
Namespace:           default
ServiceAccount:      default
Status:              Failed
Message:             child 'scriptmm2nm-2223370800' failed
Created:             Thu Sep 06 10:46:47 +0800 (13 seconds ago)
Started:             Thu Sep 06 10:46:47 +0800 (13 seconds ago)
Finished:            Thu Sep 06 10:46:51 +0800 (9 seconds ago)
Duration:            4 seconds

STEP            PODNAME                 DURATION  MESSAGE
 ✖ scriptmm2nm                                    child 'scriptmm2nm-2223370800' failed
 └---⚠ script   scriptmm2nm-2223370800  3s        failed to save outputs: verify serviceaccount default:default has necessary privileges

What you expected to happen:
Should run the script without error.

How to reproduce it (as minimally and precisely as possible):

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: script
spec:
  entrypoint: main
  templates:
  - name: main
    steps:
    - - name: script
        template: script
  - name: script
    script:
      image: alpine:latest
      command: ["sh"]
      source: |
        echo test
    metadata:
      labels:
        workflowId: test

argo submit

Anything else we need to know?:
I even assign role cluster-admin to default service-account.

Environment:

Argo version:

$ argo version
argo: v2.1.1
  BuildDate: 2018-05-29T20:38:37Z
  GitCommit: ac241c95c13f08e868cd6f5ee32c9ce273e239ff
  GitTreeState: clean
  GitTag: v2.1.1
  GoVersion: go1.9.3
  Compiler: gc
  Platform: darwin/amd64

Kubernetes version :

$ kubectl version -o yaml
clientVersion:
  buildDate: 2018-06-27T22:29:25Z
  compiler: gc
  gitCommit: 91e7b4fd31fcd3d5f436da26c980becec37ceefe
  gitTreeState: clean
  gitVersion: v1.11.0
  goVersion: go1.10.3
  major: "1"
  minor: "11"
  platform: darwin/amd64
serverVersion:
  buildDate: 2018-08-02T23:42:40Z
  compiler: gc
  gitCommit: 9b635efce81582e1da13b35a7aa539c0ccb32987
  gitTreeState: clean
  gitVersion: v1.9.7-gke.5
  goVersion: go1.9.3b4
  major: "1"
  minor: 9+
  platform: linux/amd64

bug

Source

bappr

👍5

Most helpful comment

failed to save outputs: verify serviceaccount default:default has necessary privileges

This message is not always accurate. There are some assumptions being made in the controller that turn out not to always be related to service account privileges.

This error happens when the controller is expecting some output annotation from the workflow pod, but it did not see the pod annotations updated with the output result. For example, for a script result, the way a workflow pod communicates the script result back to the controller, is that the wait sidecar annotates its own pod with the output result. When the controller sees a pod completed, but does not see the annotation, it assumes the reason why the annotation is missing, is because the pod did not have privileges (i.e. the serviceAccount the workflow ran as, did not have get/update/patch permissions to pods).

As I mentioned, this assumption is not always true, and there are actually other reasons why the annotation might not have been made. One reason that has come up twice so far, is because the wait container could not even communicate to the API server. So despite granting sufficient privileges to the workflow's service account, the wait sidecar still fails to annotate the output.

The way to know for sure, is to get the logs of the wait sidecar.

kubectl logs <workflowpodname> -c wait

In a recent instance, this manifested in the following error (in wait sidecar) due to an issue with the user's CNI networking:

`https://10.255.0.1:443/api/v1/namespaces/mynamespace/pods/my-workflow-v5qlt-4111318516: net/http: TLS handshake timeout

I think the error message should be improved to also point to API server access issues as a potential issue. For those here who are seeing the error verify serviceaccount default:default has necessary privileges, and are certain they gave their workflow adequate permissions, check the wait container logs to see what the issue really is.

jessesuen on 2 Nov 2018

👍6

All 7 comments

Any updates on this? I'm encountering the same issue on a brand new bare mental kubernetes (rke) cluster. It looks like that this issue might be related to #982

As mentioned in #982 the following workaround works (on RKE)

kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=default:default

David-Development on 20 Sep 2018

I have the same issue，but it looks like

kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=default:default

doesn't work

ghost on 27 Sep 2018

failed to save outputs: verify serviceaccount default:default has necessary privileges

This message is not always accurate. There are some assumptions being made in the controller that turn out not to always be related to service account privileges.

The way to know for sure, is to get the logs of the wait sidecar.

kubectl logs <workflowpodname> -c wait

In a recent instance, this manifested in the following error (in wait sidecar) due to an issue with the user's CNI networking:

`https://10.255.0.1:443/api/v1/namespaces/mynamespace/pods/my-workflow-v5qlt-4111318516: net/http: TLS handshake timeout

jessesuen on 2 Nov 2018

👍6

Here are a set of minimal privileges needed by a workflow pod:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: workflow-role
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - watch
  - patch
- apiGroups:
  - ""
  resources:
  - pods/log
  verbs:
  - get
  - watch
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get

Issue https://github.com/argoproj/argo/issues/1072 has been filed to eliminate get secrets as a required rule.

jessesuen on 3 Nov 2018

👍3

check the wait container logs to see what the issue really is.

Thanks for this hint.

My issue was that a hostPath volume couldn't be mounted.

cameronbraid on 10 Jan 2019

Will use this bug to improve error message.

jessesuen on 22 Jan 2019

I saw a similar message arising from a pod running in a non-default namespace (call it mynamespace). The error message was therefore accurate and the default service account needs to be given the appropriate role. The role binding is given below (since google brought me here it might be useful to somebody else).

# Argo artifacts require the mynamespace default user to have appropriate privileges
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: artifact-role
  namespace: mynamespace
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: artifact-role-binding
  namespace: mynamespace
roleRef:
  kind: Role
  name: artifact-role
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: default
  namespace: mynamespace