Pipelines: Could you tell me what the kubeflow pipeline is good for mlops or general pipelining over argo?

Created on 20 Aug 2020  路  1Comment  路  Source: kubeflow/pipelines

I am currently experiencing various libraries or platforms related to machine learning.

The kubeflow pipeline is one of them, and I am trying to test a bit by reading the documentation and configuring the environment. However, I don't think the documentation doesn't express anything like the philosophy behind the kubeflow pipeline. (Currently, I think kubeflow documentation lacks such things...)

I want to know what the design intent and philosophy of the kubeflow pipeline is. It would be great if you could explain it against argo.

kinquestion

Most helpful comment

KFP uses Argo under the hood, but it provides valuable features on top.

  • Reusable components. KFP has concept of components. They're somewhat similar to Argo templates, but are more portable and reusable. People are creating hundreds of components and are sharing them on GitHub. Components can be loaded from anywhere.
  • Python-based pipeline building. You can just write your pipeline as predict_op(model=train_op(data=get_dataset_op(...).output).output)
  • Simplified component authoring. You can easily create component from a python function using create_component_from_func
  • Execution caching: When you run modified pipeline, the steps that got completed before get skipped, so execution is faster.
  • UX: Visualization, artifact previews, etc
  • Ease of pipeline authoring

Check the following pipeline: https://github.com/kubeflow/pipelines/blob/master/components/XGBoost/_samples/training_with_cross_validation.py

Then get the Argo Workflow generated from it and compare the complexity.

>All comments

KFP uses Argo under the hood, but it provides valuable features on top.

  • Reusable components. KFP has concept of components. They're somewhat similar to Argo templates, but are more portable and reusable. People are creating hundreds of components and are sharing them on GitHub. Components can be loaded from anywhere.
  • Python-based pipeline building. You can just write your pipeline as predict_op(model=train_op(data=get_dataset_op(...).output).output)
  • Simplified component authoring. You can easily create component from a python function using create_component_from_func
  • Execution caching: When you run modified pipeline, the steps that got completed before get skipped, so execution is faster.
  • UX: Visualization, artifact previews, etc
  • Ease of pipeline authoring

Check the following pipeline: https://github.com/kubeflow/pipelines/blob/master/components/XGBoost/_samples/training_with_cross_validation.py

Then get the Argo Workflow generated from it and compare the complexity.

Was this page helpful?
0 / 5 - 0 ratings