User reported this problem in this thread.
https://groups.google.com/forum/#!topic/kubeflow-discuss/5Y_7lhoQLIo
Example is failing because it is trying to mount the docker socket via hostPath.
They are running this example:
https://github.com/kubeflow/pipelines/blob/master/samples/notebooks/Lightweight%20Python%20components%20-%20basics.ipynb
The pod spec is below. The spec shows that it is trying to mount the docker socket. I'm guessing this is for docker in docker to build containers.
I'm not sure where this is coming from. The example in the notebook isn't explicitly building containers so not sure why it would need to do docker in docker.
Are Kubeflow pipelines always doing docker in docker?
apiVersion: v1
kind: Pod
metadata:
annotations:
openshift.io/scc: privileged
workflows.argoproj.io/node-name: pipeline-flip-coin-xlkfl.flip
workflows.argoproj.io/outputs: >-
{"parameters":[{"name":"flip-output","value":"tails","valueFrom":{"path":"/tmp/output"}}],"artifacts":[{"name":"mlpipeline-ui-metadata","path":"/mlpipeline-ui-metadata.json","s3":{"endpoint":"minio-service.kubeflow:9000","bucket":"mlpipeline","insecure":true,"accessKeySecret":{"name":"mlpipeline-minio-artifact","key":"accesskey"},"secretKeySecret":{"name":"mlpipeline-minio-artifact","key":"secretkey"},"key":"runs/30850dfb-0180-11e9-bd47-063a66a580a8/pipeline-flip-coin-xlkfl-3596557372/mlpipeline-ui-metadata.tgz"}},{"name":"mlpipeline-metrics","path":"/mlpipeline-metrics.json","s3":{"endpoint":"minio-service.kubeflow:9000","bucket":"mlpipeline","insecure":true,"accessKeySecret":{"name":"mlpipeline-minio-artifact","key":"accesskey"},"secretKeySecret":{"name":"mlpipeline-minio-artifact","key":"secretkey"},"key":"runs/30850dfb-0180-11e9-bd47-063a66a580a8/pipeline-flip-coin-xlkfl-3596557372/mlpipeline-metrics.tgz"}}]}
workflows.argoproj.io/template: >-
{"name":"flip","inputs":{},"outputs":{"parameters":[{"name":"flip-output","valueFrom":{"path":"/tmp/output"}}],"artifacts":[{"name":"mlpipeline-ui-metadata","path":"/mlpipeline-ui-metadata.json","s3":{"endpoint":"minio-service.kubeflow:9000","bucket":"mlpipeline","insecure":true,"accessKeySecret":{"name":"mlpipeline-minio-artifact","key":"accesskey"},"secretKeySecret":{"name":"mlpipeline-minio-artifact","key":"secretkey"},"key":"runs/30850dfb-0180-11e9-bd47-063a66a580a8/pipeline-flip-coin-xlkfl-3596557372/mlpipeline-ui-metadata.tgz"}},{"name":"mlpipeline-metrics","path":"/mlpipeline-metrics.json","s3":{"endpoint":"minio-service.kubeflow:9000","bucket":"mlpipeline","insecure":true,"accessKeySecret":{"name":"mlpipeline-minio-artifact","key":"accesskey"},"secretKeySecret":{"name":"mlpipeline-minio-artifact","key":"secretkey"},"key":"runs/30850dfb-0180-11e9-bd47-063a66a580a8/pipeline-flip-coin-xlkfl-3596557372/mlpipeline-metrics.tgz"}}]},"metadata":{},"container":{"name":"","image":"python:alpine3.6","command":["sh","-c"],"args":["python
-c \"import random; result = 'heads' if random.randint(0,1) == 0 else
'tails'; print(result)\" | tee
/tmp/output"],"resources":{}},"archiveLocation":{}}
creationTimestamp: '2018-12-16T22:16:09Z'
labels:
workflows.argoproj.io/completed: 'true'
workflows.argoproj.io/workflow: pipeline-flip-coin-xlkfl
name: pipeline-flip-coin-xlkfl-3596557372
namespace: kubeflow
ownerReferences:
- apiVersion: argoproj.io/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Workflow
name: pipeline-flip-coin-xlkfl
uid: 30850dfb-0180-11e9-bd47-063a66a580a8
resourceVersion: '14833825'
selfLink: /api/v1/namespaces/kubeflow/pods/pipeline-flip-coin-xlkfl-3596557372
uid: 309010c0-0180-11e9-ac4e-0abcca1e707a
spec:
containers:
- args:
- >-
python -c "import random; result = 'heads' if random.randint(0,1) == 0
else 'tails'; print(result)" | tee /tmp/output
command:
- sh
- '-c'
image: 'python:alpine3.6'
imagePullPolicy: IfNotPresent
name: main
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: pipeline-runner-token-wffsv
readOnly: true
- args:
- wait
command:
- argoexec
env:
- name: ARGO_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: 'argoproj/argoexec:v2.2.1'
imagePullPolicy: IfNotPresent
name: wait
resources: {}
securityContext:
privileged: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /argo/podmetadata
name: podmetadata
- mountPath: /var/lib/docker
name: docker-lib
readOnly: true
- mountPath: /var/run/docker.sock
name: docker-sock
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: pipeline-runner-token-wffsv
readOnly: true
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: pipeline-runner-dockercfg-xpbn2
nodeName: ip-10-0-48-147.us-east-2.compute.internal
nodeSelector:
node-role.kubernetes.io/compute: 'true'
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: pipeline-runner
serviceAccountName: pipeline-runner
terminationGracePeriodSeconds: 30
volumes:
- downwardAPI:
defaultMode: 420
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.annotations
path: annotations
name: podmetadata
- hostPath:
path: /var/lib/docker
type: Directory
name: docker-lib
- hostPath:
path: /var/run/docker.sock
type: Socket
name: docker-sock
- name: pipeline-runner-token-wffsv
secret:
defaultMode: 420
secretName: pipeline-runner-token-wffsv
status:
conditions:
- lastProbeTime: null
lastTransitionTime: '2018-12-16T22:16:09Z'
reason: PodCompleted
status: 'True'
type: Initialized
- lastProbeTime: null
lastTransitionTime: '2018-12-16T22:16:09Z'
reason: PodCompleted
status: 'False'
type: Ready
- lastProbeTime: null
lastTransitionTime: '2018-12-16T22:16:09Z'
status: 'True'
type: PodScheduled
containerStatuses:
- containerID: >-
docker://bc66e85bce78f14247b325b421ae321b1e5bc27c14fcab4b8c27d749f7690810
image: 'docker.io/python:alpine3.6'
imageID: >-
docker-pullable://docker.io/python@sha256:766a961bf699491995cc29e20958ef11fd63741ff41dcc70ec34355b39d52971
lastState: {}
name: main
ready: false
restartCount: 0
state:
terminated:
containerID: >-
docker://bc66e85bce78f14247b325b421ae321b1e5bc27c14fcab4b8c27d749f7690810
exitCode: 0
finishedAt: '2018-12-16T22:16:15Z'
reason: Completed
startedAt: '2018-12-16T22:16:15Z'
- containerID: >-
docker://4dcbf5229f61a04b842281b01bc102789228c7519583c33c1c62ef2324a2830e
image: 'docker.io/argoproj/argoexec:v2.2.1'
imageID: >-
docker-pullable://docker.io/argoproj/argoexec@sha256:9b12553aa7dccddc88c766d3dd59f4e8758acbd82ceef9e7aedc75f09934480a
lastState: {}
name: wait
ready: false
restartCount: 0
state:
terminated:
containerID: >-
docker://4dcbf5229f61a04b842281b01bc102789228c7519583c33c1c62ef2324a2830e
exitCode: 0
finishedAt: '2018-12-16T22:16:16Z'
reason: Completed
startedAt: '2018-12-16T22:16:16Z'
hostIP: 10.0.48.147
phase: Succeeded
podIP: 10.129.2.12
qosClass: BestEffort
startTime: '2018-12-16T22:16:09Z'
The docker socket is installed by argo for using "docker cp" to copy the artifact out from a container.
https://github.com/argoproj/argo/blob/master/workflow/controller/workflowpod.go#L48
I think this is the default behavior for openshift. User needs to relax the security constraint explicitly: https://docs.okd.io/latest/admin_guide/manage_scc.html#use-the-hostpath-volume-plugin
Thanks @hongye-sun. Does pipelines depend on this behavior of copying out the artifact using docker cp? Could pipelines instead just use a volume (e.g. emptyDir) to share data between containers.
Making the docker socket available to the pod seems like an undesirable escalation of privileges.
/cc @ioandr @vkoukis @pdmack @jessesuen
Yes, we highly rely on this behavior to get component outputs and upload pipeline artifacts. Currently, argo doesn't support other ways to copy file content from the main container. We might consider to use k8s API to copy the file content by implementing the copy methods in argo's k8s API executor. It requires non-trivial work.
Does it only affect openshift? From a web search, I don't see other providers (aws and azure) have similar issues.
/cc @Ark-kun
This is a more relevant bug in argo: https://github.com/argoproj/argo/issues/970
It looks like Argo team is planning to take care of this.
This also breaks all workflows which should be executed on a k8s cluster which doesnt use docker. My current usecase is running argo inside k3s which uses containerd a pod executer.
We've now upgraded to Argo 2.3. AFAIK there are many improvements to different executors. Let's check whether switching the executor fixes the problem.
I'm running Kubeflow v0.6.2. Pipelines still trying to mount hostPath:
Invalid value: "hostPath": hostPath volumes are not allowed to be used
Pipelines still trying to mount hostPath:
What Kubernetes environment do you use? Does this Argo sample work for you? https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml
If you're using a Docker-less environment the first step would be to change Argo workflow controller configuration to non-Docker executor. See this thread: https://github.com/kubeflow/pipelines/issues/1654
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
Hi @Ark-kun I just had a look at this and the referenced argo issue. Is my assumption correct, that this ticket is not solved yet?
We are currently deploying KFP 1.0 and it seems that hostPath volumes are still required:
This step is in Error state with this message: pods "conditional-execution-pipeline-with-exit-handler-tnpv5-1956183255" is forbidden: unable to validate against any pod security policy: [spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used]
We are using k8s 1.14 with docker.
We were on the hand able to deploy argo directly and only emptyDir was required AFAIK and argo even seems to offer an option for putting the logs on a specific persistent volume, but this is not fully verified. pls ignore, switched it up with airflow...
Thanks in advance!
/reopen
@Jeffwan: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The argo version in v1.1 still have the issue. This blocks one use case in EKS that we can not deploy kubeflow pipeline on EKS Fargate since Fargate doesn't support HostPath yet.
I am running a local cluster using kind and getting the same error. Here is what I get when I describe my pod using kubectl describe pod file-passing-pipelines-cclzh-2358551148 -n kubeflow:
Name: file-passing-pipelines-cclzh-2358551148
Namespace: kubeflow
Priority: 0
Node: kind-worker/172.19.0.2
Start Time: Mon, 24 Aug 2020 17:44:08 +0900
Labels: pipelines.kubeflow.org/cache_enabled=true
pipelines.kubeflow.org/cache_id=
pipelines.kubeflow.org/metadata_context_id=1
pipelines.kubeflow.org/metadata_execution_id=3
workflows.argoproj.io/completed=false
workflows.argoproj.io/workflow=file-passing-pipelines-cclzh
Annotations: pipelines.kubeflow.org/component_ref: {}
pipelines.kubeflow.org/component_spec:
{"implementation": {"container": {"args": [{"if": {"cond": {"isPresent": "start"}, "then": ["--start", {"inputValue": "start"}]}}, {"if": ...
pipelines.kubeflow.org/execution_cache_key: f6594b8f0728df187ec4f26083654d7b147e9e512c2a0bbeb11138846e028a60
pipelines.kubeflow.org/metadata_input_artifact_ids: []
sidecar.istio.io/inject: false
workflows.argoproj.io/node-name: file-passing-pipelines-cclzh.write-numbers
workflows.argoproj.io/template:
{"name":"write-numbers","arguments":{},"inputs":{},"outputs":{"artifacts":[{"name":"write-numbers-numbers","path":"/tmp/outputs/numbers/da...
Status: Pending
IP:
IPs: <none>
Controlled By: Workflow/file-passing-pipelines-cclzh
Containers:
wait:
Container ID:
Image: gcr.io/ml-pipeline/argoexec:v2.7.5-license-compliance
Image ID:
Port: <none>
Host Port: <none>
Command:
argoexec
wait
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
ARGO_POD_NAME: file-passing-pipelines-cclzh-2358551148 (v1:metadata.name)
Mounts:
/argo/podmetadata from podmetadata (rw)
/argo/secret/mlpipeline-minio-artifact from mlpipeline-minio-artifact (ro)
/var/run/docker.sock from docker-sock (ro)
/var/run/secrets/kubernetes.io/serviceaccount from pipeline-runner-token-vvz7g (ro)
main:
Container ID:
Image: python:3.7
Image ID:
Port: <none>
Host Port: <none>
Command:
python3
-u
-c
def _make_parent_dirs_and_return_path(file_path: str):
import os
os.makedirs(os.path.dirname(file_path), exist_ok=True)
return file_path
def write_numbers(numbers_path, start = 0, count = 10):
with open(numbers_path, 'w') as writer:
for i in range(start, count):
writer.write(str(i) + '\n')
import argparse
_parser = argparse.ArgumentParser(prog='Write numbers', description='')
_parser.add_argument("--start", dest="start", type=int, required=False, default=argparse.SUPPRESS)
_parser.add_argument("--count", dest="count", type=int, required=False, default=argparse.SUPPRESS)
_parser.add_argument("--numbers", dest="numbers_path", type=_make_parent_dirs_and_return_path, required=True, default=argparse.SUPPRESS)
_parsed_args = vars(_parser.parse_args())
_outputs = write_numbers(**_parsed_args)
Args:
--count
100000
--numbers
/tmp/outputs/numbers/data
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from pipeline-runner-token-vvz7g (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
podmetadata:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
docker-sock:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType: Socket
mlpipeline-minio-artifact:
Type: Secret (a volume populated by a Secret)
SecretName: mlpipeline-minio-artifact
Optional: false
pipeline-runner-token-vvz7g:
Type: Secret (a volume populated by a Secret)
SecretName: pipeline-runner-token-vvz7g
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 53m default-scheduler Successfully assigned kubeflow/file-passing-pipelines-cclzh-2358551148 to kind-worker
Warning FailedMount 47m kubelet, kind-worker Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[mlpipeline-minio-artifact pipeline-runner-token-vvz7g podmetadata docker-sock]: timed out waiting for the condition
Warning FailedMount 36m (x2 over 49m) kubelet, kind-worker Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[pipeline-runner-token-vvz7g podmetadata docker-sock mlpipeline-minio-artifact]: timed out waiting for the condition
Warning FailedMount 32m (x2 over 45m) kubelet, kind-worker Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[docker-sock mlpipeline-minio-artifact pipeline-runner-token-vvz7g podmetadata]: timed out waiting for the condition
Warning FailedMount 8m7s (x11 over 51m) kubelet, kind-worker Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[podmetadata docker-sock mlpipeline-minio-artifact pipeline-runner-token-vvz7g]: timed out waiting for the condition
Warning FailedMount 2m24s (x33 over 53m) kubelet, kind-worker MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file
I was able to get KFP working on kind. Thanks to the comments mentioned here: https://github.com/kubeflow/pipelines/issues/4256
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I think I've run into this issue as well with Kubeflow 1.2 on Kubernetes 1.20 using containerd. Considering the deprecation of the dockershim that was announced, I think it might be a good idea to switch the on-prem kdef to use pns for the containerRuntimeExecutor.
https://github.com/kubeflow/pipelines/issues/1654#issuecomment-747183561
Most helpful comment
This also breaks all workflows which should be executed on a k8s cluster which doesnt use docker. My current usecase is running argo inside k3s which uses containerd a pod executer.