Pipelines: failed to save outputs: Error response from daemon: No such container

Created on 7 Jun 2019  路  12Comments  路  Source: kubeflow/pipelines

While trying to setup my own kubeflow pipeline I ran into a problem when one step is finished and the outputs should be saved. After finishing the step kubeflow always throws an error with the message This step is in Error state with this message: failed to save outputs: Error response from daemon: No such container: <container-id>

First I thought I would have made a mistake with my pipeline, but it's the same with the preexisting examples pipeline, e.g. for "[Sample] Basic - Conditional execution" I get this message after the first step (flip-coin) is finished.

The main container shows following output:


So it seems to have run successfully.

The wait container shows following output:

time="2019-06-07T11:41:35Z" level=info msg="Creating a docker executor"
time="2019-06-07T11:41:35Z" level=info msg="Executor (version: v2.2.0, build_date: 2018-08-30T08:52:54Z) initialized with template:\narchiveLocation:\n  s3:\n    accessKeySecret:\n      key: accesskey\n      name: mlpipeline-minio-artifact\n    bucket: mlpipeline\n    endpoint: minio-service.kubeflow:9000\n    insecure: true\n    key: artifacts/conditional-execution-pipeline-vmdhx/conditional-execution-pipeline-vmdhx-2104306666\n    secretKeySecret:\n      key: secretkey\n      name: mlpipeline-minio-artifact\ncontainer:\n  args:\n  - python -c \"import random; result = 'heads' if random.randint(0,1) == 0 else 'tails';\n    print(result)\" | tee /tmp/output\n  command:\n  - sh\n  - -c\n  image: python:alpine3.6\n  name: \"\"\n  resources: {}\ninputs: {}\nmetadata: {}\nname: flip-coin\noutputs:\n  artifacts:\n  - name: mlpipeline-ui-metadata\n    path: /mlpipeline-ui-metadata.json\n  - name: mlpipeline-metrics\n    path: /mlpipeline-metrics.json\n  parameters:\n  - name: flip-coin-output\n    valueFrom:\n      path: /tmp/output\n"
time="2019-06-07T11:41:35Z" level=info msg="Waiting on main container"
time="2019-06-07T11:41:36Z" level=info msg="main container started with container ID: 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c"
time="2019-06-07T11:41:36Z" level=info msg="Starting annotations monitor"
time="2019-06-07T11:41:36Z" level=info msg="docker wait 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c"
time="2019-06-07T11:41:36Z" level=info msg="Starting deadline monitor"
time="2019-06-07T11:41:37Z" level=error msg="`docker wait 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c` failed: Error response from daemon: No such container: 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c\n"
time="2019-06-07T11:41:37Z" level=info msg="Main container completed"
time="2019-06-07T11:41:37Z" level=info msg="No sidecars"
time="2019-06-07T11:41:37Z" level=info msg="Saving output artifacts"
time="2019-06-07T11:41:37Z" level=info msg="Annotations monitor stopped"
time="2019-06-07T11:41:37Z" level=info msg="Saving artifact: mlpipeline-ui-metadata"
time="2019-06-07T11:41:37Z" level=info msg="Archiving 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/mlpipeline-ui-metadata.json to /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2019-06-07T11:41:37Z" level=info msg="sh -c docker cp -a 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/mlpipeline-ui-metadata.json - | gzip > /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2019-06-07T11:41:37Z" level=info msg="Archiving completed"
time="2019-06-07T11:41:37Z" level=info msg="Creating minio client minio-service.kubeflow:9000 using static credentials"
time="2019-06-07T11:41:37Z" level=info msg="Saving from /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/conditional-execution-pipeline-vmdhx/conditional-execution-pipeline-vmdhx-2104306666/mlpipeline-ui-metadata.tgz)"
time="2019-06-07T11:41:37Z" level=info msg="Successfully saved file: /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2019-06-07T11:41:37Z" level=info msg="Saving artifact: mlpipeline-metrics"
time="2019-06-07T11:41:37Z" level=info msg="Archiving 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/mlpipeline-metrics.json to /argo/outputs/artifacts/mlpipeline-metrics.tgz"
time="2019-06-07T11:41:37Z" level=info msg="sh -c docker cp -a 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/mlpipeline-metrics.json - | gzip > /argo/outputs/artifacts/mlpipeline-metrics.tgz"
time="2019-06-07T11:41:37Z" level=info msg="Archiving completed"
time="2019-06-07T11:41:37Z" level=info msg="Creating minio client minio-service.kubeflow:9000 using static credentials"
time="2019-06-07T11:41:37Z" level=info msg="Saving from /argo/outputs/artifacts/mlpipeline-metrics.tgz to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/conditional-execution-pipeline-vmdhx/conditional-execution-pipeline-vmdhx-2104306666/mlpipeline-metrics.tgz)"
time="2019-06-07T11:41:37Z" level=info msg="Successfully saved file: /argo/outputs/artifacts/mlpipeline-metrics.tgz"
time="2019-06-07T11:41:37Z" level=info msg="Saving output parameters"
time="2019-06-07T11:41:37Z" level=info msg="Saving path output parameter: flip-coin-output"
time="2019-06-07T11:41:37Z" level=info msg="[sh -c docker cp -a 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/tmp/output - | tar -ax -O]"
time="2019-06-07T11:41:37Z" level=error msg="`[sh -c docker cp -a 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/tmp/output - | tar -ax -O]` stderr:\nError: No such container:path: 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/tmp/output\ntar: This does not look like a tar archive\ntar: Exiting with failure status due to previous errors\n"
time="2019-06-07T11:41:37Z" level=info msg="Alloc=4338 TotalAlloc=11911 Sys=10598 NumGC=4 Goroutines=11"
time="2019-06-07T11:41:37Z" level=fatal msg="exit status 2\ngithub.com/argoproj/argo/errors.Wrap\n\t/root/go/src/github.com/argoproj/argo/errors/errors.go:87\ngithub.com/argoproj/argo/errors.InternalWrapError\n\t/root/go/src/github.com/argoproj/argo/errors/errors.go:70\ngithub.com/argoproj/argo/workflow/executor/docker.(*DockerExecutor).GetFileContents\n\t/root/go/src/github.com/argoproj/argo/workflow/executor/docker/docker.go:40\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveParameters\n\t/root/go/src/github.com/argoproj/argo/workflow/executor/executor.go:343\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/root/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:49\ngithub.com/argoproj/argo/cmd/argoexec/commands.glob..func4\n\t/root/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:19\ngithub.com/argoproj/argo/vendor/github.com/spf13/cobra.(*Command).execute\n\t/root/go/src/github.com/argoproj/argo/vendor/github.com/spf13/cobra/command.go:766\ngithub.com/argoproj/argo/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/src/github.com/argoproj/argo/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/argoproj/argo/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/root/go/src/github.com/argoproj/argo/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/root/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:15\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:198\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2361"

So it seems that there is a problem with either kubeflow or my docker daemon. The output of kubectl describe pods for the created pod is following:

Name:               conditional-execution-pipeline-vmdhx-2104306666
Namespace:          kubeflow
Priority:           0
PriorityClassName:  <none>
Node:               root-nuc8i5beh/9.233.5.90
Start Time:         Fri, 07 Jun 2019 13:41:29 +0200
Labels:             workflows.argoproj.io/completed=true
                    workflows.argoproj.io/workflow=conditional-execution-pipeline-vmdhx
Annotations:        workflows.argoproj.io/node-message:
                      Error response from daemon: No such container: 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c
                    workflows.argoproj.io/node-name: conditional-execution-pipeline-vmdhx.flip-coin
                    workflows.argoproj.io/template:
                      {"name":"flip-coin","inputs":{},"outputs":{"parameters":[{"name":"flip-coin-output","valueFrom":{"path":"/tmp/output"}}],"artifacts":[{"na...
Status:             Failed
IP:                 10.1.1.30
Controlled By:      Workflow/conditional-execution-pipeline-vmdhx
Containers:
  main:
    Container ID:  containerd://7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c
    Image:         python:alpine3.6
    Image ID:      docker.io/library/python@sha256:766a961bf699491995cc29e20958ef11fd63741ff41dcc70ec34355b39d52971
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
    Args:
      python -c "import random; result = 'heads' if random.randint(0,1) == 0 else 'tails'; print(result)" | tee /tmp/output
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 07 Jun 2019 13:41:35 +0200
      Finished:     Fri, 07 Jun 2019 13:41:35 +0200
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from pipeline-runner-token-xh2p7 (ro)
  wait:
    Container ID:  containerd://f0449dc70c0a651c09aeb883edda9ce0ec5e415fa15a5468fe5b360fb06637c2
    Image:         argoproj/argoexec:v2.2.0
    Image ID:      docker.io/argoproj/argoexec@sha256:eea81e0b0d8899a0b7f9815c9c7bd89afa73ab32e5238430de82342b3bb7674a
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
    Args:
      wait
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 07 Jun 2019 13:41:35 +0200
      Finished:     Fri, 07 Jun 2019 13:41:37 +0200
    Ready:          False
    Restart Count:  0
    Environment:
      ARGO_POD_NAME:  conditional-execution-pipeline-vmdhx-2104306666 (v1:metadata.name)
    Mounts:
      /argo/podmetadata from podmetadata (rw)
      /var/lib/docker from docker-lib (ro)
      /var/run/docker.sock from docker-sock (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from pipeline-runner-token-xh2p7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  podmetadata:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
  docker-lib:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/docker
    HostPathType:  Directory
  docker-sock:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/docker.sock
    HostPathType:  Socket
  pipeline-runner-token-xh2p7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pipeline-runner-token-xh2p7
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age    From                     Message
  ----    ------     ----   ----                     -------
  Normal  Scheduled  8m1s   default-scheduler        Successfully assigned kubeflow/conditional-execution-pipeline-vmdhx-2104306666 to root-nuc8i5beh
  Normal  Pulling    8m1s   kubelet, root-nuc8i5beh  Pulling image "python:alpine3.6"
  Normal  Pulled     7m56s  kubelet, root-nuc8i5beh  Successfully pulled image "python:alpine3.6"
  Normal  Created    7m56s  kubelet, root-nuc8i5beh  Created container main
  Normal  Started    7m55s  kubelet, root-nuc8i5beh  Started container main
  Normal  Pulled     7m55s  kubelet, root-nuc8i5beh  Container image "argoproj/argoexec:v2.2.0" already present on machine
  Normal  Created    7m55s  kubelet, root-nuc8i5beh  Created container wait
  Normal  Started    7m55s  kubelet, root-nuc8i5beh  Started container wait

So probably there is a problem with the argoexec container image? I see it tries to mount /var/run/docker.sock. When I try to read this file with cat I get a "No such device or address" even though I can see the file with ls /var/run. When I try to open it with vi it mentions that the Permissions were denied, so I cannot see inside of the file. Is this the usual behavior with this file or does it seem like there are any problems with it?

I would really appreciate any help I can get! Thank you guys!

Most helpful comment

I know this is several months old but FWIW, with microk8s v1.15.3 and kubeflow v0.6 , I solved this issue by changing the kubelet container-runtime from remote to docker by editing /var/snap/microk8s/current/args/kubelet :

#--container-runtime=remote
#--container-runtime-endpoint=${SNAP_COMMON}/run/containerd.sock
--container-runtime=docker

All 12 comments

What is your environment? Are you using GKE?
Is it reproducible on your side?
Can you try Argo's coin flip sample?

I'm having the exact sample problem with all of the Basic Samples. I'm running Kubeflow on-top of microk8s on a local machine.

Every time I try to run one of the samples I get: This step is in Error state with this message: failed to save outputs: Error response from daemon: No such container

And my output of kubectl describe pods is the same as the one above.

Yes, I am running Kubeflow on top of microk8s as well. It doesnt work with the Flip Coin example neither, same error. So its probably related to issue 2347 as you mentioned. However, the suggested "dirty fix" is not working for me, because there is no /var/snap/microk8s/current/docker.sock which I could link the var/run/docker.sock to (probably because they replaced the docker daemon with containerd?). Any other ideas how to get it working? Or do I have to downgrade my microk8s?

I'm finding that I don't have /var/snap/microk8s/current/docker.sock or /var/snap/microk8s/common/var/lib/docker.

I have noticed that when I begin a new run, a new snapshot is created under containerd with a docker.sock and a lib/docker.

Finding docker.sock
sudo find /var/snap/microk8s -name "docker.sock"
returns...
/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2733/fs/run/docker.sock /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2730/fs/run/docker.sock /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2727/fs/run/docker.sock

Finding lib/docker
sudo find /var/snap/microk8s -name "docker" -type d | grep "lib/docker"
returns...
/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2733/fs/var/lib/docker /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2730/fs/var/lib/docker /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2727/fs/var/lib/docker

@magreenberg1 Could you solve the issue?

@PascalSchroederDE I have not. I suspect the short-term fix for me will either involve downgrading microk8s (and seeing if that works) or trying out MiniKF.

Switching to Minikube and setting up kubeflow pipelines on that Minikube cluster worked for me.

would downgrading microk8 solve this issue cause i can see that the code in the container is executed but it is something to do with containerd handling the containers.
i tried a single container pipeline i.e only one job and it ran but ended with the message similar to this issue. may be containerd daemon is pointing to some other place when its searching for containers ?

can any one fill me on this ?

I know this is several months old but FWIW, with microk8s v1.15.3 and kubeflow v0.6 , I solved this issue by changing the kubelet container-runtime from remote to docker by editing /var/snap/microk8s/current/args/kubelet :

#--container-runtime=remote
#--container-runtime-endpoint=${SNAP_COMMON}/run/containerd.sock
--container-runtime=docker

I solved this issue by changing the kubelet container-runtime from remote to docker by editing /var/snap/microk8s/current/args/kubelet :

Switching Argo to non-Docker executor is probably needed for non-Docker environments. There are several issues discussing it.

I know this is several months old but FWIW, with microk8s v1.15.3 and kubeflow v0.6 , I solved this issue by changing the kubelet container-runtime from remote to docker by editing /var/snap/microk8s/current/args/kubelet :

```

--container-runtime=remote

--container-runtime-endpoint=${SNAP_COMMON}/run/containerd.sock

--container-runtime=docker

Yes. Absolutely when I changed --container-runtime=docker (from remote) everything started working. Thanks for the suggestion.

Was this page helpful?
0 / 5 - 0 ratings