Pipelines: An error occurs when run TFX example in local kubeflow cluster

Created on 18 Jan 2019  Â·  10Comments  Â·  Source: kubeflow/pipelines

Because i am Unable to access Google, so i want to run pipelines in my local kubeflow examples。

I followed the guide to run pipeline examples.
https://www.kubeflow.org/docs/guides/pipelines/pipelines-quickstart/

and i successed in running the basic pipeline。

image

but when i run the ML pipeline example(KubeFlow pipeline using TFX OSS components), there was an error.

Because i want to run this example in my local environment, so i replace the gs with my local directories like this(i download these file in advance):
image

The error is here, it seems that it cannot find these local file.

image

so i run this code:
image

it works.

So i think the file is in the right place, i should do something else to run this example.
i have a tentative suggestion that i should replace the "GcsUri", but i don't know what to be replaced with.
image
SO, i hope someone can help me , i will be very appreciated !!!

in short, i have two questions:

(1) can the ML pipeline examples run in local kubeflow environment (without GCP).
(2) if the above answer is yes. how to modify the code to use local file.

Most helpful comment

ML pipeline is a cross-platform product and can be surely deployed and run in local environments. However, the file not being detected is due to the fact that each component is a containerized operator. In other words, the code can only fetch the files inside the container.
There are two options:
1) copy your file to some place(your own cloud, file system) that the kubernetes container has access to.
or
2) build a new image containing your local file.

All 10 comments

ML pipeline is a cross-platform product and can be surely deployed and run in local environments. However, the file not being detected is due to the fact that each component is a containerized operator. In other words, the code can only fetch the files inside the container.
There are two options:
1) copy your file to some place(your own cloud, file system) that the kubernetes container has access to.
or
2) build a new image containing your local file.

It's a known issue that most of current ML components are assuming the input paths are gcs paths. We are working on a solution to pass artifact to container through cloud provider agnostic way.

Though it's not tested yet, the container code should be able to work with local path if the file is accessable in the container. Other than Ning's suggestion, if you just want to make it work locally. You might try using hostPath volume to make your local files visible to the container. You can use add_volume and add_volumne_mount add hostPath volume to the ContainerOp and change the path to the mounted path.

@gaoning777 i followed you advice, and i build a new image. it works, but it cannot solve the problem totally. Because the second container can't find the first container's output.
so i want to mount volume in Container op.

In this question:

https://github.com/kubeflow/pipelines/issues/477

i found that you gave an example how to mount a volume with both add_volume and add_volume_mount. but the page is not exist now. can you give the example again if you can find the example.

@hongye-sun
hi, i'am very appreciated for your advice, and i followed your advice and used add_volume and add_volumne_mount, but it didn't work.
so i run a example to mount volume in Container op to test how to use add_volume and add_volumne_mount. i hope you can give me some advice again if you know.

Here is my steps:

first, I created a PersistentVolume using:
image

and it successed
image

then, I run an example, the code is here:
image

the result is here:
image
image

it seems to i didn't mount volume in Container op successfully, and the status of the pv(tfx-pv) is always Available.
image
(PS: i created '/nfs-data/tfx-pv/train.csv' in advance)

i have no idea now how to solve it , i hope you can give some advice, Thanks!!!

I made that successfully before. I create PV/PVC firstly, and then edit the sample taxi-cab-classification-pipeline.py to attach the PVC. Copied the related data such as train.csv to the local storage before running the TFX sample.

@jinchihe Thanks for your comment.
I followed your steps, but i didn't success. by the way i only created the pv. Should i create both pv and pvc?
i'm a new learner for k8s, i will be very appreciated if you can describe how to create pvc and attach the pvc in detail. Thanks!

@zoux86 Yes, I think you should create both PV and PVC manually. There is no specific configuration in the pvc defination file.

# kubectl describe pvc pipeline-pvc -n kubeflow 
Name:          pipeline-pvc
Namespace:     kubeflow
StorageClass:  
Status:        Bound
Volume:        pipeline-pv
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      10Gi
Access Modes:  RWX
Events:        <none>

@jinchihe Thank you very much. i run the example succeeded finally.
i created pvc. And i found that in my environment, i should use NFS volume rather than the HostPath.
i will close the question.

@jinchihe thanks for explaining the volumes mount steps. We will add instructions on how to mount volumes and share among components.

@gaoning777 I would like to discuss this in the #721, thanks.

Was this page helpful?
0 / 5 - 0 ratings