Pipelines: kfp not compatible with Kubeflow 1.2

Created on 8 Jan 2021  路  14Comments  路  Source: kubeflow/pipelines

What steps did you take:

[A clear and concise description of what the bug is.]

Heres a sample MNIST training example:
mnist_complete_train.py.zip

Here is a reproducible docker image:

FROM ubuntu:18.04
RUN apt-get update

RUN mkdir -p /testdocker
RUN apt-get install -y software-properties-common

RUN apt-get update \
  && apt-get install -y python3-pip python3-dev \
  && cd /usr/local/bin \
  && ln -s /usr/bin/python3 python \
  && pip3 install --upgrade pip

RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y git


WORKDIR /app

RUN git clone https://github.com/kubeflow/pipelines.git

WORKDIR /app/pipelines/sdk/python

RUN pip3 install -r requirements.txt

RUN python3 setup.py install

ADD mnist_complete_train.py /app/pipelines/sdk/python

This docker image builds kfp from the main (https://github.com/kubeflow/pipelines.git) GitHub repo. After building kfp I run python3 mnist_complete_train.py this python file is attached above as a zip. Running this generates a yaml file. Now, we upload the yaml file to Kubeflow Dashboard, but the pipelines is stuck without any reason:

Screenshot 2021-01-08 at 2 48 29 PM

What happened:

Pipeline is stuck in Run, its hard to debug as the pipeline does not fail with any error message.

What did you expect to happen:

When I do pip3 install kfp this problem is not seen, the Run gets executed properly. Its only when I install kfp from pipelines repo I encounter this issue.

Environment:

attaching Dockerfile to reproduce the bug
Dockerfile.zip

How did you deploy Kubeflow Pipelines (KFP)?

KFP version:

KFP SDK version:

kfp                      1.3.0
kfp-pipeline-spec        0.1.3.1
kfp-server-api           1.2.0

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind bug
/area sdk
/area backend
/area engprod


kinquestion

Most helpful comment

I think we can probably refactor the Pod Events tab UI, so that it emphasis more on warning/error messages. That will make these problems easier to find.

All 14 comments

Hi @ajinkya933, can you click Pod and Events tabs and copy paste content here for debugging purposes?

@Bobgy
Screenshot 2021-01-08 at 6 28 58 PM
Screenshot 2021-01-08 at 6 29 13 PM

Screenshot 2021-01-08 at 6 29 30 PM

Screenshot 2021-01-08 at 6 29 49 PM

What does the Pod say? If you have access to the Pod status via console and/or kubectl can you see what its status is?

Here is the pod.yaml and events.yaml

pod+events.zip

Here are the screenshots (attached above is the zip file which contains yaml):

kfp
kfp2

@parthmishra @Bobgy ^

Hi @ajinkya933,
your Pod events include a warning message:

    reason: FailedScheduling
    message: >-
      Failed to bind volumes: provisioning failed for PVC
      "mnist-pipeline-for-train-and-prediction-mmkb7-data-volume"

so it seems something went wrong with the PVC provisioning

I think we can probably refactor the Pod Events tab UI, so that it emphasis more on warning/error messages. That will make these problems easier to find.

Hi @Bobgy When I do

pip install kfp

And then convert the python code to yaml and then run the yaml I don't see this problem.

Heres my python file attached

. It is reproducible and relies on open source docker. You can also reproduce the results by running :

But when I install kfp from GitHub its only then I see the above issue

@ajinkya933 can you first take a look at why that PVC failed provisioning? I think that can help us a lot on understanding the root cause

Not fully sure, can it be related to issue like: https://github.com/kubeflow/pipelines/pull/4993

The issue can be solved by replacing: modes=dsl.VOLUME_MODE_RWM with modes=dsl.VOLUME_MODE_RWO in the input python script.

5046

Was this page helpful?
0 / 5 - 0 ratings