I would like to use KFP compiler in order to create argo pipelines but without other kubeflow services
kfp.compiler.Compiler().compile(pipeline, __file__.replace(".py","") + '.yaml',
pipeline_conf=conf)
Using the above command actually creates a valid argo yaml, but there are 2 problems when trying to deploy it onto argo standalone
How did you deploy Kubeflow Pipelines (KFP)?
Only installed via sdk
KFP version:
KFP SDK version: 0.2.5
It's possible to solve it by extending the following:
workflow = {
'apiVersion': 'argoproj.io/v1alpha1',
'kind': 'Workflow',
'metadata': {'generateName': pipeline_template_name + '-'},
'spec': {
'entrypoint': pipeline_template_name,
'templates': templates,
'arguments': {'parameters': input_params},
'serviceAccountName': 'pipeline-runner'
}
}
Removing the hard-coded - 'serviceAccountName': 'pipeline-runner'
And allowing to add 'metadata'
/kind bug
/area sdk
While it's true that KFP uses Argo under the hood, this is an implementation detail that can change in future. We prefer to not expose Argo-specific features.
We also try to keep the compiled pipelines minimal and portable. Namespaces and service accounts are highly user-specific and can impede sharing.
You can achieve what you want by just doing something like this:
workflow_dict = kfp.compiler.Compiler().compile(pipeline, None, pipeline_conf=conf)
workflow_dict['metadata']['namespace'] = ...
workflow_dict['spec']['serviceAccountName'] = ...
yaml.save(workflow_dict, __file__.replace(".py","") + '.yaml')
Do you have any plans of replacing argo with any other workflow engine?
If not, I think that KFP Pipelines SDK could be very useful for many other data processing tasks (not only related to training ML) and I think the current UI and management of those pipelines is too coupled with an experimentation platform.
We don't have plan of replacing argo now, but ML related usages have always been the only focus for Kubeflow Pipelines.
We may also target general data processing tasks in the future, but that could be a long time later.
workflow_dict = kfp.compiler.Compiler().compile(pipeline, None, pipeline_conf=conf)
@Ark-kun will the above code snippet work? I am seeing workflow_dictis 'None'
here is my pipeline code
from kubernetes import client as k8s_client
import kfp.dsl as dsl
from kubernetes.client.models import V1EnvVar
import json
@dsl.pipeline(
name="VolumeOp Basic",
description="A Basic Example on VolumeOp Usage"
)
def VolumeOp_basic(volumename="test-pipeline"):
# creating volume using dsl.VolumeOp
vop=dsl.VolumeOp(name="creating pvc",resource_name=volumename,size="1Gi",modes=dsl.VOLUME_MODE_RWM,storage_class="glusterfs-storageclass" )
step1 = dsl.ContainerOp(
name="Create-file-inside-pv",
image="docker.io/library/bash:4.4.23",
command=["sh", "-c"],
arguments=["echo pipelinepvc > /data/file1"],
pvolumes={"/data": vop.volume}
).after(vop)
step2 = dsl.ContainerOp(
name="read-file-from-pv",
image="docker.io/library/bash:4.4.23",
command=["cat", "/common/file1"],
pvolumes={"/common": step1.pvolume}
).after(step1)
delete_vop = dsl.ResourceOp(
name="deleting pvc",
k8s_resource=vop.k8s_resource,
action='delete'
).after(step2)
if __name__ == '__main__':
import kfp.compiler as compiler
pipeline_filename= "VolumeOp_basic.tar.gz"
workflow_dict = compiler.Compiler().compile(VolumeOp_basic,pipeline_filename)
print(workflow_dict)
#workflow_dict['metadata']['namespace'] = 'test'
#workflow_dict['spec']['serviceAccountName'] = 'default-editor'
#yaml.save(workflow_dict, __file__.replace(".py","") + '.yaml')
@Ark-kun will the above code snippet work? I am seeing
workflow_dictis 'None'
here is my pipeline code
It should. Your code is slightly different.
You can also just use
import yaml
with open(pipeline_filename, 'r') as workflow_file:
workflow_dict = yaml.safe_load(workflow_file)
Hi @Ark-kun ,
i was trying to run the pipeline in different namespace .
Tried adding namespace and serviceaccount in the dsl-compiled workflow yaml file . uploaded it to the pipeline ui and tried to trigger Run but it was failing with below error. Is this process help to run the pipeline in different namespace .
api-server version used 0.2.0
Run creation failed
{"error":"Failed to create a new run.: InternalServerError: Failed to create a workflow for (): the namespace of the provided object does not match the namespace sent on the request","message":"Failed to create a new run.: InternalServerError: Failed to create a workflow for (): the namespace of the provided object does not match the namespace sent on the request","code":13,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Internal Server Error","error_details":"Failed to create a new run.: InternalServerError: Failed to create a workflow for (): the namespace of the provided object does not match the namespace sent on the request"}]}
workflow yaml:
"apiVersion": |-
argoproj.io/v1alpha1
"kind": |-
Workflow
"metadata":
"namespace": "james"
"annotations":
"pipelines.kubeflow.org/pipeline_spec": |-
{"description": "A Basic Example on VolumeOp Usage", "inputs": [{"default": "test-pipeline", "name": "volumename", "optional": true}], "name": "VolumeOp Basic"}
"generateName": |-
volumeop-basic-
"spec":
"arguments":
"parameters":
- "name": |-
volumename
"value": |-
test-pipeline
"entrypoint": |-
volumeop-basic
"serviceAccountName": |-
default-editor
"templates":
- "container":
"args":
- |-
echo pipelinepvc > /data/file1
"command":
- |-
sh
- |-
-c
"image": |-
docker.io/library/bash:4.4.23
"volumeMounts":
- "mountPath": |-
/data
"name": |-
creating-pvc
"inputs":
"parameters":
- "name": |-
creating-pvc-name
"name": |-
create-file-inside-pv
"volumes":
- "name": |-
creating-pvc
"persistentVolumeClaim":
"claimName": |-
{{inputs.parameters.creating-pvc-name}}
- "inputs":
"parameters":
- "name": |-
volumename
"name": |-
creating-pvc
"outputs":
"parameters":
- "name": |-
creating-pvc-manifest
"valueFrom":
"jsonPath": |-
{}
- "name": |-
creating-pvc-name
"valueFrom":
"jsonPath": |-
{.metadata.name}
- "name": |-
creating-pvc-size
"valueFrom":
"jsonPath": |-
{.status.capacity.storage}
"resource":
"action": |-
create
"manifest": |
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: '{{workflow.name}}-{{inputs.parameters.volumename}}'
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: glusterfs-storageclass
- "inputs":
"parameters":
- "name": |-
volumename
"name": |-
deleting-pvc
"resource":
"action": |-
delete
"manifest": |
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: '{{workflow.name}}-{{inputs.parameters.volumename}}'
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: glusterfs-storageclass
- "container":
"command":
- |-
cat
- |-
/common/file1
"image": |-
docker.io/library/bash:4.4.23
"volumeMounts":
- "mountPath": |-
/common
"name": |-
creating-pvc
"inputs":
"parameters":
- "name": |-
creating-pvc-name
"name": |-
read-file-from-pv
"volumes":
- "name": |-
creating-pvc
"persistentVolumeClaim":
"claimName": |-
{{inputs.parameters.creating-pvc-name}}
- "dag":
"tasks":
- "arguments":
"parameters":
- "name": |-
creating-pvc-name
"value": |-
{{tasks.creating-pvc.outputs.parameters.creating-pvc-name}}
"dependencies":
- |-
creating-pvc
"name": |-
create-file-inside-pv
"template": |-
create-file-inside-pv
- "arguments":
"parameters":
- "name": |-
volumename
"value": |-
{{inputs.parameters.volumename}}
"name": |-
creating-pvc
"template": |-
creating-pvc
- "arguments":
"parameters":
- "name": |-
volumename
"value": |-
{{inputs.parameters.volumename}}
"dependencies":
- |-
read-file-from-pv
"name": |-
deleting-pvc
"template": |-
deleting-pvc
- "arguments":
"parameters":
- "name": |-
creating-pvc-name
"value": |-
{{tasks.creating-pvc.outputs.parameters.creating-pvc-name}}
"dependencies":
- |-
create-file-inside-pv
- |-
creating-pvc
"name": |-
read-file-from-pv
"template": |-
read-file-from-pv
"inputs":
"parameters":
- "name": |-
volumename
"name": |-
volumeop-basic
Gentle ping @Ark-kun @Bobgy
How can we run pipelines with different serviceAccount and in different namespaces? Are there any specific PRs that we should take and test this? Please help.
Gentle ping @Ark-kun @Bobgy
How can we run pipelines with different serviceAccount and in different namespaces? Are there any specific PRs that we should take and test this? Please help.
Do you want to use KFP in multiple namespaces and select between them?
AFAIK, Client().run_pipeline has a namespace parameter.
As for ServiceAccount, the supported way is to set it in the API-server configMap.
//Setting it inside Workflow should also work, but is not supported.
Thanks @Ark-kun
@hemantha-kumara @chetanoruganti can we give this a try.
@nrchakradhar @Ark-kun
for clarification:
AFAIK, Client().run_pipeline has a namespace parameter.
This should only be used with KFP multi user mode.
/cc @chensun
maybe we should improve documentation
This should only be used with KFP multi user mode.
You're right.
@IronPan @rmgogogo Am I correct that with Argo the user can submit a Workflow to any namespace (for example using kubectl create --namespace ns1 -f workflow.yaml), but the KFP's API Server submits workflows to its own namespace only? Maybe there is a quick fix possible for the API server then (to submit the workflow to the specified namespace).
On the other hand, I think the multi-user mode has been created for this explicit purpose so it should be used for this scenario.
@nrchakradhar @Ark-kun
for clarification:AFAIK, Client().run_pipeline has a namespace parameter.
This should only be used with KFP multi user mode.
/cc @chensun
maybe we should improve documentation
run_pipeline doesn't have a namespace param, because it already asks for experiment_id, and we follow the namespace of its owning experiment.
For others that has namespace param, e.g. create_run_from_pipeline_func, we covered the usage in single-user vs multi-user in the docstring.
This should only be used with KFP multi user mode.
You're right.
@IronPan @rmgogogo Am I correct that with Argo the user can submit a Workflow to any namespace (for example using
kubectl create --namespace ns1 -f workflow.yaml), but the KFP's API Server submits workflows to its own namespace only? Maybe there is a quick fix possible for the API server then (to submit the workflow to the specified namespace).
Haven't tried the kubectl create --namespace ns1 -f workflow.yaml personally, but I guess it probably works. kubectl is available to cluster admin only, and the admin should probably have the permission to do anything, but that's not true for an end user.
It is by design that API server will only allow submitting workflows to a namespace either the user owns or is listed as a contributor. Same applies on read access.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
Most helpful comment
While it's true that KFP uses Argo under the hood, this is an implementation detail that can change in future. We prefer to not expose Argo-specific features.
We also try to keep the compiled pipelines minimal and portable. Namespaces and service accounts are highly user-specific and can impede sharing.
You can achieve what you want by just doing something like this: