Is this a BUG REPORT or FEATURE REQUEST?:
BUG REQUEST
What happened:
Got an error when trying to use script in my workflow.
Name: scriptmm2nm
Namespace: default
ServiceAccount: default
Status: Failed
Message: child 'scriptmm2nm-2223370800' failed
Created: Thu Sep 06 10:46:47 +0800 (13 seconds ago)
Started: Thu Sep 06 10:46:47 +0800 (13 seconds ago)
Finished: Thu Sep 06 10:46:51 +0800 (9 seconds ago)
Duration: 4 seconds
STEP PODNAME DURATION MESSAGE
✖ scriptmm2nm child 'scriptmm2nm-2223370800' failed
â””---âš script scriptmm2nm-2223370800 3s failed to save outputs: verify serviceaccount default:default has necessary privileges
What you expected to happen:
Should run the script without error.
How to reproduce it (as minimally and precisely as possible):
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: script
spec:
entrypoint: main
templates:
- name: main
steps:
- - name: script
template: script
- name: script
script:
image: alpine:latest
command: ["sh"]
source: |
echo test
metadata:
labels:
workflowId: test
argo submit
Anything else we need to know?:
I even assign role cluster-admin to default service-account.
Environment:
$ argo version
argo: v2.1.1
BuildDate: 2018-05-29T20:38:37Z
GitCommit: ac241c95c13f08e868cd6f5ee32c9ce273e239ff
GitTreeState: clean
GitTag: v2.1.1
GoVersion: go1.9.3
Compiler: gc
Platform: darwin/amd64
$ kubectl version -o yaml
clientVersion:
buildDate: 2018-06-27T22:29:25Z
compiler: gc
gitCommit: 91e7b4fd31fcd3d5f436da26c980becec37ceefe
gitTreeState: clean
gitVersion: v1.11.0
goVersion: go1.10.3
major: "1"
minor: "11"
platform: darwin/amd64
serverVersion:
buildDate: 2018-08-02T23:42:40Z
compiler: gc
gitCommit: 9b635efce81582e1da13b35a7aa539c0ccb32987
gitTreeState: clean
gitVersion: v1.9.7-gke.5
goVersion: go1.9.3b4
major: "1"
minor: 9+
platform: linux/amd64
Any updates on this? I'm encountering the same issue on a brand new bare mental kubernetes (rke) cluster. It looks like that this issue might be related to #982
As mentioned in #982 the following workaround works (on RKE)
kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=default:default
I have the same issue,but it looks like
kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=default:default
doesn't work
failed to save outputs: verify serviceaccount default:default has necessary privileges
This message is not always accurate. There are some assumptions being made in the controller that turn out not to always be related to service account privileges.
This error happens when the controller is expecting some output annotation from the workflow pod, but it did not see the pod annotations updated with the output result. For example, for a script result, the way a workflow pod communicates the script result back to the controller, is that the wait sidecar annotates its own pod with the output result. When the controller sees a pod completed, but does not see the annotation, it assumes the reason why the annotation is missing, is because the pod did not have privileges (i.e. the serviceAccount the workflow ran as, did not have get/update/patch permissions to pods).
As I mentioned, this assumption is not always true, and there are actually other reasons why the annotation might not have been made. One reason that has come up twice so far, is because the wait container could not even communicate to the API server. So despite granting sufficient privileges to the workflow's service account, the wait sidecar still fails to annotate the output.
The way to know for sure, is to get the logs of the wait sidecar.
kubectl logs <workflowpodname> -c wait
In a recent instance, this manifested in the following error (in wait sidecar) due to an issue with the user's CNI networking:
`https://10.255.0.1:443/api/v1/namespaces/mynamespace/pods/my-workflow-v5qlt-4111318516: net/http: TLS handshake timeout
I think the error message should be improved to also point to API server access issues as a potential issue. For those here who are seeing the error verify serviceaccount default:default has necessary privileges, and are certain they gave their workflow adequate permissions, check the wait container logs to see what the issue really is.
Here are a set of minimal privileges needed by a workflow pod:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: workflow-role
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- watch
- patch
- apiGroups:
- ""
resources:
- pods/log
verbs:
- get
- watch
- apiGroups:
- ""
resources:
- secrets
verbs:
- get
Issue https://github.com/argoproj/argo/issues/1072 has been filed to eliminate get secrets as a required rule.
check the wait container logs to see what the issue really is.
Thanks for this hint.
My issue was that a hostPath volume couldn't be mounted.
Will use this bug to improve error message.
I saw a similar message arising from a pod running in a non-default namespace (call it mynamespace). The error message was therefore accurate and the default service account needs to be given the appropriate role. The role binding is given below (since google brought me here it might be useful to somebody else).
# Argo artifacts require the mynamespace default user to have appropriate privileges
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: artifact-role
namespace: mynamespace
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: artifact-role-binding
namespace: mynamespace
roleRef:
kind: Role
name: artifact-role
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
name: default
namespace: mynamespace
Most helpful comment
This message is not always accurate. There are some assumptions being made in the controller that turn out not to always be related to service account privileges.
This error happens when the controller is expecting some output annotation from the workflow pod, but it did not see the pod annotations updated with the output result. For example, for a script result, the way a workflow pod communicates the script result back to the controller, is that the wait sidecar annotates its own pod with the output result. When the controller sees a pod completed, but does not see the annotation, it assumes the reason why the annotation is missing, is because the pod did not have privileges (i.e. the serviceAccount the workflow ran as, did not have get/update/patch permissions to pods).
As I mentioned, this assumption is not always true, and there are actually other reasons why the annotation might not have been made. One reason that has come up twice so far, is because the wait container could not even communicate to the API server. So despite granting sufficient privileges to the workflow's service account, the wait sidecar still fails to annotate the output.
The way to know for sure, is to get the logs of the wait sidecar.
In a recent instance, this manifested in the following error (in wait sidecar) due to an issue with the user's CNI networking:
I think the error message should be improved to also point to API server access issues as a potential issue. For those here who are seeing the error
verify serviceaccount default:default has necessary privileges, and are certain they gave their workflow adequate permissions, check the wait container logs to see what the issue really is.