Pipelines: Run KFP compiled yamls on argo natively

Created on 10 Mar 2020 · 15Comments · Source: kubeflow/pipelines

What steps did you take:

I would like to use KFP compiler in order to create argo pipelines but without other kubeflow services

What happened:

kfp.compiler.Compiler().compile(pipeline, __file__.replace(".py","") + '.yaml', pipeline_conf=conf)
Using the above command actually creates a valid argo yaml, but there are 2 problems when trying to deploy it onto argo standalone

an empty namespace may not be set during creation - A namespace should be added (default)
serviceAccountName pipeline-runner doesn't exist - It should be allowed to be editable

What did you expect to happen:

Environment:

How did you deploy Kubeflow Pipelines (KFP)?
Only installed via sdk

KFP version:

KFP SDK version: 0.2.5

Anything else you would like to add:

It's possible to solve it by extending the following:

workflow = {
      'apiVersion': 'argoproj.io/v1alpha1',
      'kind': 'Workflow',
      'metadata': {'generateName': pipeline_template_name + '-'},
      'spec': {
        'entrypoint': pipeline_template_name,
        'templates': templates,
        'arguments': {'parameters': input_params},
        'serviceAccountName': 'pipeline-runner'
      }
    }

Removing the hard-coded - 'serviceAccountName': 'pipeline-runner'
And allowing to add 'metadata'

/kind bug
/area sdk

aresdk kinfeature lifecyclstale statutriaged

Source

liorshk

👍1

Most helpful comment

While it's true that KFP uses Argo under the hood, this is an implementation detail that can change in future. We prefer to not expose Argo-specific features.
We also try to keep the compiled pipelines minimal and portable. Namespaces and service accounts are highly user-specific and can impede sharing.

You can achieve what you want by just doing something like this:

workflow_dict = kfp.compiler.Compiler().compile(pipeline, None, pipeline_conf=conf)
workflow_dict['metadata']['namespace'] = ...
workflow_dict['spec']['serviceAccountName'] = ...
yaml.save(workflow_dict, __file__.replace(".py","") + '.yaml')

Ark-kun on 11 Mar 2020

👍3

All 15 comments

You can achieve what you want by just doing something like this:

workflow_dict = kfp.compiler.Compiler().compile(pipeline, None, pipeline_conf=conf)
workflow_dict['metadata']['namespace'] = ...
workflow_dict['spec']['serviceAccountName'] = ...
yaml.save(workflow_dict, __file__.replace(".py","") + '.yaml')

Ark-kun on 11 Mar 2020

👍3

Do you have any plans of replacing argo with any other workflow engine?
If not, I think that KFP Pipelines SDK could be very useful for many other data processing tasks (not only related to training ML) and I think the current UI and management of those pipelines is too coupled with an experimentation platform.

liorshk on 11 Mar 2020

We don't have plan of replacing argo now, but ML related usages have always been the only focus for Kubeflow Pipelines.

We may also target general data processing tasks in the future, but that could be a long time later.

Bobgy on 18 Mar 2020

workflow_dict = kfp.compiler.Compiler().compile(pipeline, None, pipeline_conf=conf)

@Ark-kun will the above code snippet work? I am seeing workflow_dictis 'None'

here is my pipeline code

from kubernetes import client as k8s_client
import kfp.dsl as dsl
from kubernetes.client.models import V1EnvVar
import json


@dsl.pipeline(
    name="VolumeOp Basic",
    description="A Basic Example on VolumeOp Usage"
)
def VolumeOp_basic(volumename="test-pipeline"):

    # creating volume using dsl.VolumeOp
    vop=dsl.VolumeOp(name="creating pvc",resource_name=volumename,size="1Gi",modes=dsl.VOLUME_MODE_RWM,storage_class="glusterfs-storageclass" )

    step1 = dsl.ContainerOp(
       name="Create-file-inside-pv",
       image="docker.io/library/bash:4.4.23",
       command=["sh", "-c"],
       arguments=["echo pipelinepvc > /data/file1"],
       pvolumes={"/data": vop.volume}
    ).after(vop)

    step2 = dsl.ContainerOp(
        name="read-file-from-pv",
        image="docker.io/library/bash:4.4.23",
        command=["cat", "/common/file1"],
        pvolumes={"/common": step1.pvolume}
    ).after(step1)

    delete_vop = dsl.ResourceOp(
        name="deleting pvc",
        k8s_resource=vop.k8s_resource,
        action='delete'
    ).after(step2)


if __name__ == '__main__':
    import kfp.compiler as compiler
    pipeline_filename= "VolumeOp_basic.tar.gz"
    workflow_dict = compiler.Compiler().compile(VolumeOp_basic,pipeline_filename)
    print(workflow_dict)
    #workflow_dict['metadata']['namespace'] = 'test'
    #workflow_dict['spec']['serviceAccountName'] = 'default-editor'
    #yaml.save(workflow_dict, __file__.replace(".py","") + '.yaml')

hemantha-kumara on 18 Mar 2020

@Ark-kun will the above code snippet work? I am seeing workflow_dictis 'None'
here is my pipeline code

It should. Your code is slightly different.

You can also just use

import yaml
with open(pipeline_filename, 'r') as workflow_file:
    workflow_dict = yaml.safe_load(workflow_file)

Ark-kun on 19 Mar 2020

Hi @Ark-kun ,
i was trying to run the pipeline in different namespace .
Tried adding namespace and serviceaccount in the dsl-compiled workflow yaml file . uploaded it to the pipeline ui and tried to trigger Run but it was failing with below error. Is this process help to run the pipeline in different namespace .
api-server version used 0.2.0

Run creation failed
{"error":"Failed to create a new run.: InternalServerError: Failed to create a workflow for (): the namespace of the provided object does not match the namespace sent on the request","message":"Failed to create a new run.: InternalServerError: Failed to create a workflow for (): the namespace of the provided object does not match the namespace sent on the request","code":13,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Internal Server Error","error_details":"Failed to create a new run.: InternalServerError: Failed to create a workflow for (): the namespace of the provided object does not match the namespace sent on the request"}]}

workflow yaml:
"apiVersion": |-
argoproj.io/v1alpha1
"kind": |-
Workflow
"metadata":
"namespace": "james"
"annotations":
"pipelines.kubeflow.org/pipeline_spec": |-
{"description": "A Basic Example on VolumeOp Usage", "inputs": [{"default": "test-pipeline", "name": "volumename", "optional": true}], "name": "VolumeOp Basic"}
"generateName": |-
volumeop-basic-
"spec":
"arguments":
"parameters":
- "name": |-
volumename
"value": |-
test-pipeline
"entrypoint": |-
volumeop-basic
"serviceAccountName": |-
default-editor
"templates":
- "container":
"args":
- |-
echo pipelinepvc > /data/file1
"command":
- |-
sh
- |-
-c
"image": |-
docker.io/library/bash:4.4.23
"volumeMounts":
- "mountPath": |-
/data
"name": |-
creating-pvc
"inputs":
"parameters":
- "name": |-
creating-pvc-name
"name": |-
create-file-inside-pv
"volumes":
- "name": |-
creating-pvc
"persistentVolumeClaim":
"claimName": |-
{{inputs.parameters.creating-pvc-name}}
- "inputs":
"parameters":
- "name": |-
volumename
"name": |-
creating-pvc
"outputs":
"parameters":
- "name": |-
creating-pvc-manifest
"valueFrom":
"jsonPath": |-
{}
- "name": |-
creating-pvc-name
"valueFrom":
"jsonPath": |-
{.metadata.name}
- "name": |-
creating-pvc-size
"valueFrom":
"jsonPath": |-
{.status.capacity.storage}
"resource":
"action": |-
create
"manifest": |
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: '{{workflow.name}}-{{inputs.parameters.volumename}}'
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: glusterfs-storageclass
- "inputs":
"parameters":
- "name": |-
volumename
"name": |-
deleting-pvc
"resource":
"action": |-
delete
"manifest": |
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: '{{workflow.name}}-{{inputs.parameters.volumename}}'
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: glusterfs-storageclass
- "container":
"command":
- |-
cat
- |-
/common/file1
"image": |-
docker.io/library/bash:4.4.23
"volumeMounts":
- "mountPath": |-
/common
"name": |-
creating-pvc
"inputs":
"parameters":
- "name": |-
creating-pvc-name
"name": |-
read-file-from-pv
"volumes":
- "name": |-
creating-pvc
"persistentVolumeClaim":
"claimName": |-
{{inputs.parameters.creating-pvc-name}}
- "dag":
"tasks":
- "arguments":
"parameters":
- "name": |-
creating-pvc-name
"value": |-
{{tasks.creating-pvc.outputs.parameters.creating-pvc-name}}
"dependencies":
- |-
creating-pvc
"name": |-
create-file-inside-pv
"template": |-
create-file-inside-pv
- "arguments":
"parameters":
- "name": |-
volumename
"value": |-
{{inputs.parameters.volumename}}
"name": |-
creating-pvc
"template": |-
creating-pvc
- "arguments":
"parameters":
- "name": |-
volumename
"value": |-
{{inputs.parameters.volumename}}
"dependencies":
- |-
read-file-from-pv
"name": |-
deleting-pvc
"template": |-
deleting-pvc
- "arguments":
"parameters":
- "name": |-
creating-pvc-name
"value": |-
{{tasks.creating-pvc.outputs.parameters.creating-pvc-name}}
"dependencies":
- |-
create-file-inside-pv
- |-
creating-pvc
"name": |-
read-file-from-pv
"template": |-
read-file-from-pv
"inputs":
"parameters":
- "name": |-
volumename
"name": |-
volumeop-basic

chetanoruganti on 2 Apr 2020

Gentle ping @Ark-kun @Bobgy
How can we run pipelines with different serviceAccount and in different namespaces? Are there any specific PRs that we should take and test this? Please help.

nrchakradhar on 14 Apr 2020

Gentle ping @Ark-kun @Bobgy
How can we run pipelines with different serviceAccount and in different namespaces? Are there any specific PRs that we should take and test this? Please help.

Do you want to use KFP in multiple namespaces and select between them?

AFAIK, Client().run_pipeline has a namespace parameter.
As for ServiceAccount, the supported way is to set it in the API-server configMap.
//Setting it inside Workflow should also work, but is not supported.

Ark-kun on 14 Apr 2020

Thanks @Ark-kun
@hemantha-kumara @chetanoruganti can we give this a try.

nrchakradhar on 15 Apr 2020

@nrchakradhar @Ark-kun
for clarification:

AFAIK, Client().run_pipeline has a namespace parameter.

This should only be used with KFP multi user mode.

/cc @chensun
maybe we should improve documentation

Bobgy on 15 Apr 2020

This should only be used with KFP multi user mode.

You're right.

@IronPan @rmgogogo Am I correct that with Argo the user can submit a Workflow to any namespace (for example using kubectl create --namespace ns1 -f workflow.yaml), but the KFP's API Server submits workflows to its own namespace only? Maybe there is a quick fix possible for the API server then (to submit the workflow to the specified namespace).

On the other hand, I think the multi-user mode has been created for this explicit purpose so it should be used for this scenario.

Ark-kun on 15 Apr 2020

@nrchakradhar @Ark-kun
for clarification:

AFAIK, Client().run_pipeline has a namespace parameter.

This should only be used with KFP multi user mode.

/cc @chensun
maybe we should improve documentation

run_pipeline doesn't have a namespace param, because it already asks for experiment_id, and we follow the namespace of its owning experiment.

For others that has namespace param, e.g. create_run_from_pipeline_func, we covered the usage in single-user vs multi-user in the docstring.

chensun on 15 Apr 2020

This should only be used with KFP multi user mode.

You're right.

@IronPan @rmgogogo Am I correct that with Argo the user can submit a Workflow to any namespace (for example using kubectl create --namespace ns1 -f workflow.yaml), but the KFP's API Server submits workflows to its own namespace only? Maybe there is a quick fix possible for the API server then (to submit the workflow to the specified namespace).

Haven't tried the kubectl create --namespace ns1 -f workflow.yaml personally, but I guess it probably works. kubectl is available to cluster admin only, and the admin should probably have the permission to do anything, but that's not true for an end user.
It is by design that API server will only allow submitting workflows to a namespace either the user owns or is listed as a contributor. Same applies on read access.

chensun on 15 Apr 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.