Pipelines: Setting ipc=host on docker run command

Created on 7 Jul 2020 · 9Comments · Source: kubeflow/pipelines

Hi, sorry if this is not the correct place to ask but I have been unable to find a solution anywhere.
When using pytorch with docker it's useful to add --ipc=host in order to have multithreaded loaders. More info here. Is there anyway to add such a flag with kubeflow pipelines?

kinfeature statutriaged upstream_issue

Source

patrickpoirson

👍1

Most helpful comment

@mmwebster Yes, we found the following workaround.
volume = kfp.dslPipelineVolume(name='shm-vol', empty_dir={'medium': 'Memory'}) kfp.dsl.ContainerOp(...., pvolumes={'/dev/shm': volume})

patrickpoirson on 27 Aug 2020

👍3 🎉1

All 9 comments

@patrickpoirson which KFP component are you using to launch pytorch? You should probably ask it there

or if you are writing your own component, you are free to add any argument to it

Bobgy on 7 Jul 2020

Thanks for your reply. I am writing my own component. I store my pytorch training code in a docker container and use kfp.dsl.ContainerOp to wrap the docker container. Is there a way to pass --ipc=host through the kfp.dsl.ContainerOp?

patrickpoirson on 7 Jul 2020

@patrickpoirson Yes, you can pass command and arguments there just like in docker: https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.dsl.html#kfp.dsl.ContainerOp

Bobgy on 7 Jul 2020

Sorry, I may be confused but the command and arguments can be passed to docker but what I need is to pass a flag that affects the actual docker run command e.g. docker run image [command] [arguments] changed to docker run --ipc=host image [command] [arguments]

I believe in standard kubernetes this is done by setting hostIPC to true, which looks to be not possible in kfp?

patrickpoirson on 7 Jul 2020

Sorry, I may be confused but the command and arguments can be passed to docker but what I need is to pass a flag that affects the actual docker run command e.g. docker run image [command] [arguments] changed to docker run --ipc=host image [command] [arguments]

I believe in standard kubernetes this is done by setting hostIPC to true, which looks to be not possible in kfp?

KFP runs on top of a Kubernetes cluster. KFP does not create a cluster or run docker.

which looks to be not possible in kfp?

This option does not seem to be supported by Argo which we use for orchestration. You can open an issue in their repo asking to support this feature.

When using pytorch with docker it's useful to add --ipc=host in order to have multithreaded loaders.

Interesting. Have you verified that you're going to benefit from that option when running the loaders?

I am writing my own component. I store my pytorch training code in a docker container and use kfp.dsl.ContainerOp to wrap the docker container.

Please write a reusable component instead of creating ContainerOp objects yourself. Please read the documentation and check our library of components to get inspiration. https://www.kubeflow.org/docs/pipelines/sdk/component-development/ https://github.com/kubeflow/pipelines/tree/master/components/XGBoost/Train/from_ApacheParquet https://github.com/kubeflow/pipelines/blob/master/components/datasets/Chicago_Taxi_Trips/component.yaml

Ark-kun on 7 Jul 2020

When using pytorch with docker it's useful to add --ipc=host in order to have multithreaded loaders.

Interesting. Have you verified that you're going to benefit from that option when running the loaders?

@Ark-kun From pytorch "Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run."

Training speed without multithreaded data loading will be significantly slower.

patrickpoirson on 8 Jul 2020

As a workaround,you might be able to use https://kubernetes.io/docs/concepts/workloads/pods/podpreset/

Ark-kun on 8 Jul 2020

@patrickpoirson Did you find a fix for this? I just switched over to PyTorch's DataLoader and am running into the same issue

mmwebster on 26 Aug 2020

@mmwebster Yes, we found the following workaround.
volume = kfp.dslPipelineVolume(name='shm-vol', empty_dir={'medium': 'Memory'}) kfp.dsl.ContainerOp(...., pvolumes={'/dev/shm': volume})

patrickpoirson on 27 Aug 2020

👍3 🎉1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Metrics don't show with latest kfp version

Svendegroote91 · 3Comments

[FR] Default resource requirement/limits for the KFP UI and system services

Bobgy · 3Comments

[Multi User] Move manifests from kubeflow/manifests back

Bobgy · 5Comments

Grant pipeline-runner k8s service account admin permission

IronPan · 4Comments

NOTICE: "Context retired without replacement" during migration to google-oss-robot

Bobgy · 4Comments