Pipelines: Document how to run KFP on local Kubernetes cluster like Kind, k3s, minikube

Created on 22 Jul 2020  路  14Comments  路  Source: kubeflow/pipelines

What steps did you take:

  1. I tried to install https://k3s.io/ and https://kind.sigs.k8s.io/
  2. install KFP 1.0.0 platform agnostic
  3. Run data passing pipeline

What happened:

KFP servers run properly for both k3s and Kind, but pipelines fail running.

For k3s, the error message was "This step is in Error state with this message: failed to save outputs: Error response from daemon: No such container: 40a0d047807b37e1cc4433fc957c4f89e8c2c93b151399d33f2b77e761dc3419"

For Kind, the error message was "MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file".
https://github.com/kubernetes-sigs/kind/issues/1002#issuecomment-545498176 is probably related.

What did you expect to happen:

Pipelines should run.

Environment:

Both k3s and Kind are Kubernetes 1.18.
How did you deploy Kubeflow Pipelines (KFP)?
standalone
KFP version:1.0.0

/kind bug

aredocs deployment kinbug statutriaged

Most helpful comment

@alfsuse That would be awesome!
Can you add related documentation to https://www.kubeflow.org/docs/pipelines/installation/standalone-deployment/?
(it's currently only targetting GCP, you can think about what's the best way to include the new info, may be a separate doc that only introduces how to install it to a local cluster)

also I think we should provide the pns-executor manifest in manifests/kustomize/env/platform-agnostic-pns-executor to make it easy to install without local editing.

All 14 comments

@Bobgy: The label(s) area/deployment cannot be applied, because the repository doesn't have them

In response to this:

What steps did you take:

  1. I tried to install https://k3s.io/ and https://kind.sigs.k8s.io/
  2. install KFP 1.0.0 platform agnostic
  3. Run data passing pipeline

What happened:

KFP servers run properly for both k3s and Kind, but pipelines fail running.

For k3s, the error message was "This step is in Error state with this message: failed to save outputs: Error response from daemon: No such container: 40a0d047807b37e1cc4433fc957c4f89e8c2c93b151399d33f2b77e761dc3419"

For Kind, the error message was "MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file".
https://github.com/kubernetes-sigs/kind/issues/1002#issuecomment-545498176 is probably related.

What did you expect to happen:

Pipelines should run.

Environment:

Both k3s and Kind are Kubernetes 1.18.
How did you deploy Kubeflow Pipelines (KFP)?
standalone
KFP version:1.0.0

/kind bug

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/assign @Bobgy
/cc @rmgogogo
Did you manage to run KFP on Kind? I remember you mentioned something, was that related?

Complete error message for wait container in k3s:

time="2020-07-22T08:33:04Z" level=info msg="Starting Workflow Executor" version=v2.7.5+ede163e.dirty
time="2020-07-22T08:33:04Z" level=info msg="Creating a docker executor"
time="2020-07-22T08:33:04Z" level=info msg="Executor (version: v2.7.5+ede163e.dirty, build_date: 2020-04-21T01:12:08Z) initialized (pod: kubeflow/file-passing-pipelines-8g8k2-2724592582) with template:\n{\"name\":\"write-numbers\",\"arguments\":{},\"inputs\":{},\"outputs\":{\"artifacts\":[{\"name\":\"write-numbers-numbers\",\"path\":\"/tmp/outputs/numbers/data\"}]},\"metadata\":{\"annotations\":{\"pipelines.kubeflow.org/component_ref\":\"{}\",\"pipelines.kubeflow.org/component_spec\":\"{\\\"implementation\\\": {\\\"container\\\": {\\\"args\\\": [{\\\"if\\\": {\\\"cond\\\": {\\\"isPresent\\\": \\\"start\\\"}, \\\"then\\\": [\\\"--start\\\", {\\\"inputValue\\\": \\\"start\\\"}]}}, {\\\"if\\\": {\\\"cond\\\": {\\\"isPresent\\\": \\\"count\\\"}, \\\"then\\\": [\\\"--count\\\", {\\\"inputValue\\\": \\\"count\\\"}]}}, \\\"--numbers\\\", {\\\"outputPath\\\": \\\"numbers\\\"}], \\\"command\\\": [\\\"python3\\\", \\\"-u\\\", \\\"-c\\\", \\\"def _make_parent_dirs_and_return_path(file_path: str):\\\\n    import os\\\\n    os.makedirs(os.path.dirname(file_path), exist_ok=True)\\\\n    return file_path\\\\n\\\\ndef write_numbers(numbers_path, start = 0, count = 10):\\\\n    with open(numbers_path, 'w') as writer:\\\\n        for i in range(start, count):\\\\n            writer.write(str(i) + '\\\\\\\\n')\\\\n\\\\nimport argparse\\\\n_parser = argparse.ArgumentParser(prog='Write numbers', description='')\\\\n_parser.add_argument(\\\\\\\"--start\\\\\\\", dest=\\\\\\\"start\\\\\\\", type=int, required=False, default=argparse.SUPPRESS)\\\\n_parser.add_argument(\\\\\\\"--count\\\\\\\", dest=\\\\\\\"count\\\\\\\", type=int, required=False, default=argparse.SUPPRESS)\\\\n_parser.add_argument(\\\\\\\"--numbers\\\\\\\", dest=\\\\\\\"numbers_path\\\\\\\", type=_make_parent_dirs_and_return_path, required=True, default=argparse.SUPPRESS)\\\\n_parsed_args = vars(_parser.parse_args())\\\\n\\\\n_outputs = write_numbers(**_parsed_args)\\\\n\\\"], \\\"image\\\": \\\"python:3.7\\\"}}, \\\"inputs\\\": [{\\\"default\\\": \\\"0\\\", \\\"name\\\": \\\"start\\\", \\\"optional\\\": true, \\\"type\\\": \\\"Integer\\\"}, {\\\"default\\\": \\\"10\\\", \\\"name\\\": \\\"count\\\", \\\"optional\\\": true, \\\"type\\\": \\\"Integer\\\"}], \\\"name\\\": \\\"Write numbers\\\", \\\"outputs\\\": [{\\\"name\\\": \\\"numbers\\\", \\\"type\\\": \\\"String\\\"}]}\",\"sidecar.istio.io/inject\":\"false\"},\"labels\":{\"pipelines.kubeflow.org/cache_enabled\":\"true\"}},\"container\":{\"name\":\"\",\"image\":\"python:3.7\",\"command\":[\"python3\",\"-u\",\"-c\",\"def _make_parent_dirs_and_return_path(file_path: str):\\n    import os\\n    os.makedirs(os.path.dirname(file_path), exist_ok=True)\\n    return file_path\\n\\ndef write_numbers(numbers_path, start = 0, count = 10):\\n    with open(numbers_path, 'w') as writer:\\n        for i in range(start, count):\\n            writer.write(str(i) + '\\\\n')\\n\\nimport argparse\\n_parser = argparse.ArgumentParser(prog='Write numbers', description='')\\n_parser.add_argument(\\\"--start\\\", dest=\\\"start\\\", type=int, required=False, default=argparse.SUPPRESS)\\n_parser.add_argument(\\\"--count\\\", dest=\\\"count\\\", type=int, required=False, default=argparse.SUPPRESS)\\n_parser.add_argument(\\\"--numbers\\\", dest=\\\"numbers_path\\\", type=_make_parent_dirs_and_return_path, required=True, default=argparse.SUPPRESS)\\n_parsed_args = vars(_parser.parse_args())\\n\\n_outputs = write_numbers(**_parsed_args)\\n\"],\"args\":[\"--count\",\"100000\",\"--numbers\",\"/tmp/outputs/numbers/data\"],\"resources\":{}},\"archiveLocation\":{\"archiveLogs\":true,\"s3\":{\"endpoint\":\"minio-service.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"artifacts/file-passing-pipelines-8g8k2/file-passing-pipelines-8g8k2-2724592582\"}}}"
time="2020-07-22T08:33:04Z" level=info msg="Waiting on main container"
time="2020-07-22T08:33:16Z" level=info msg="main container started with container ID: a3ccfc6f422eb82d86fe6aa2b982258368ac71b441ee45ab987f691574b0bea8"
time="2020-07-22T08:33:16Z" level=info msg="Starting annotations monitor"
time="2020-07-22T08:33:17Z" level=info msg="docker wait a3ccfc6f422eb82d86fe6aa2b982258368ac71b441ee45ab987f691574b0bea8"
time="2020-07-22T08:33:17Z" level=info msg="Starting deadline monitor"
time="2020-07-22T08:33:20Z" level=error msg="`docker wait a3ccfc6f422eb82d86fe6aa2b982258368ac71b441ee45ab987f691574b0bea8` failed: Error response from daemon: No such container: a3ccfc6f422eb82d86fe6aa2b982258368ac71b441ee45ab987f691574b0bea8\n"
time="2020-07-22T08:33:20Z" level=warning msg="Failed to wait for container id 'a3ccfc6f422eb82d86fe6aa2b982258368ac71b441ee45ab987f691574b0bea8': Error response from daemon: No such container: a3ccfc6f422eb82d86fe6aa2b982258368ac71b441ee45ab987f691574b0bea8"
time="2020-07-22T08:33:20Z" level=error msg="executor error: Error response from daemon: No such container: a3ccfc6f422eb82d86fe6aa2b982258368ac71b441ee45ab987f691574b0bea8\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.InternalError\n\t/go/src/github.com/argoproj/argo/errors/errors.go:60\ngithub.com/argoproj/argo/workflow/common.RunCommand\n\t/go/src/github.com/argoproj/argo/workflow/common/util.go:406\ngithub.com/argoproj/argo/workflow/executor/docker.(*DockerExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/docker/docker.go:139\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait.func1\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:829\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:292\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:828\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:40\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"
time="2020-07-22T08:33:20Z" level=info msg="Saving logs"
time="2020-07-22T08:33:20Z" level=info msg="[docker logs a3ccfc6f422eb82d86fe6aa2b982258368ac71b441ee45ab987f691574b0bea8]"
time="2020-07-22T08:33:20Z" level=info msg="Annotations monitor stopped"
time="2020-07-22T08:33:21Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: artifacts/file-passing-pipelines-8g8k2/file-passing-pipelines-8g8k2-2724592582/main.log"
time="2020-07-22T08:33:21Z" level=info msg="Creating minio client minio-service.kubeflow:9000 using static credentials"
time="2020-07-22T08:33:21Z" level=info msg="Saving from /tmp/argo/outputs/logs/main.log to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/file-passing-pipelines-8g8k2/file-passing-pipelines-8g8k2-2724592582/main.log)"
time="2020-07-22T08:33:21Z" level=info msg="Deadline monitor stopped"
time="2020-07-22T08:33:21Z" level=info msg="No output parameters"
time="2020-07-22T08:33:21Z" level=info msg="Saving output artifacts"
time="2020-07-22T08:33:21Z" level=info msg="Staging artifact: write-numbers-numbers"
time="2020-07-22T08:33:21Z" level=info msg="Copying /tmp/outputs/numbers/data from container base image layer to /tmp/argo/outputs/artifacts/write-numbers-numbers.tgz"
time="2020-07-22T08:33:21Z" level=info msg="Archiving a3ccfc6f422eb82d86fe6aa2b982258368ac71b441ee45ab987f691574b0bea8:/tmp/outputs/numbers/data to /tmp/argo/outputs/artifacts/write-numbers-numbers.tgz"
time="2020-07-22T08:33:21Z" level=info msg="sh -c docker cp -a a3ccfc6f422eb82d86fe6aa2b982258368ac71b441ee45ab987f691574b0bea8:/tmp/outputs/numbers/data - | gzip > /tmp/argo/outputs/artifacts/write-numbers-numbers.tgz"
time="2020-07-22T08:33:21Z" level=warning msg="path /tmp/outputs/numbers/data does not exist in archive /tmp/argo/outputs/artifacts/write-numbers-numbers.tgz"
time="2020-07-22T08:33:21Z" level=error msg="executor error: path /tmp/outputs/numbers/data does not exist in archive /tmp/argo/outputs/artifacts/write-numbers-numbers.tgz\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/workflow/executor/docker.(*DockerExecutor).CopyFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/docker/docker.go:67\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).stageArchiveFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:347\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).saveArtifact\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:240\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:226\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:59\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"
time="2020-07-22T08:33:21Z" level=info msg="Killing sidecars"
time="2020-07-22T08:33:21Z" level=info msg="Alloc=4657 TotalAlloc=12098 Sys=70080 NumGC=4 Goroutines=13"
time="2020-07-22T08:33:21Z" level=fatal msg="path /tmp/outputs/numbers/data does not exist in archive /tmp/argo/outputs/artifacts/write-numbers-numbers.tgz\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/workflow/executor/docker.(*DockerExecutor).CopyFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/docker/docker.go:67\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).stageArchiveFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:347\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).saveArtifact\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:240\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:226\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:59\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"

I couldn't get much information on this, only seems like argo is not working properly.

Anyway, for both cases, I think the problem is argo cannot run in those environments.

I think the problem is in the executorRuntime both kind and k3s use containerd so you have to add the runtimeExecutor to pns in manifestskustomizebaseargoworkflow-controller-configmap.yaml as in https://github.com/argoproj/argo/issues/2685 add containerRuntimeExecutor: pns after executorImage: gcr.io/ml-pipeline/argoexec:v2.7.5-license-compliance,

Right, thanks @alfsuse, I also just found this.
I verified KFP can run on k3s after the change.

Also verified on Kind

I've tested this on minikube as well and everything works fine

Close now, since problem solved

Hold on, we should get this documented.

@Bobgy we already did this many if you are okay we may take this task and do the documentation on kind, minikube etc..
Let us know how to proceed.

@alfsuse That would be awesome!
Can you add related documentation to https://www.kubeflow.org/docs/pipelines/installation/standalone-deployment/?
(it's currently only targetting GCP, you can think about what's the best way to include the new info, may be a separate doc that only introduces how to install it to a local cluster)

also I think we should provide the pns-executor manifest in manifests/kustomize/env/platform-agnostic-pns-executor to make it easy to install without local editing.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

xinbinhuang picture xinbinhuang  路  3Comments

Toeplitz picture Toeplitz  路  4Comments

suzusuzu picture suzusuzu  路  4Comments

Bobgy picture Bobgy  路  3Comments

discordianfish picture discordianfish  路  4Comments