Pipelines: Argo events for triggering pipelines

Created on 8 Jan 2019  路  21Comments  路  Source: kubeflow/pipelines

It'd be great if we could trigger pipelines automatically wrt events.
Use Case 1:
When a model is uploaded to an object store -> trigger a step (pipeline) to deploy.
Use Case 2:
When data arrives at a local volume / external storage -> trigger a pipeline to train.

This is related to https://github.com/kubeflow/pipelines/issues/604.

I'd love to see this feature and help out in the implementation with some PRs as well (if it's on the roadmap)

arebackend help wanted kinfeature prioritp1

Most helpful comment

Any updates on this feature?

All 21 comments

@swiftdiaries - yes, this feature is on the roadmap. Let's collaborate on the design.

/assign @vicaire

Awesome ! Looking forward to this :)

I will follow up on this thread as soon as we start tackling this. Thanks.

@swiftdiaries

It's a bit short but I provided an outline of how we plan to support event-driven pipelines here: https://docs.google.com/document/d/1O5n02SzMYmLH0cMkykxHWWWe7eMzaP1vk7Y3fBbLoD8/edit#heading=h.mhe3tnle0c9o

(See event-driven pipelines and data-driven pipelines)

In a nutshell:

  • We will have a metadata store storing info about the data generated by a workflow (metadata).
  • Events can also be stored in that metadata stored from various sources (webhook, pub/sub, etc.) using piece of infrastructure decoupled from the rest of the system.
  • An event-driven CRD will let users specify a workflow to execute each time new data of a particular type is added to the metadata store.

WDYT?

Sorry for the late reply.

The overall idea is sound. I found this thread on kubeflow-discuss quite interesting on how Argo Events is integrated with Argo Workflow at GitHub.

Also, what is the status for this? If there are tasks to be done, happy to work together on this one

@swiftdiaries,

The metadatastore is currently being designed with collaboration from the KF community.

We could start by looking at the best way to integrate Argo events with KFP for common use cases. Adding the "help wanted" flag. Contributions/Proposals are welcome.

Note, resolving this issue should enable support for continuous online learning, as requested in https://github.com/kubeflow/pipelines/issues/1053

Do we need to make it specifc to Argo events? Can it be designed in generic way to support something like KNative eventing? @vicaire please include us if there are any backdoor design discussions going at this end

@jingzhang36 Is this feature being actively worked on?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Any updates on this feature?

/reopen
looks like someone cares

no one is working on this.

I am curious what makes it different from using KFP SDK triggered by the event

@Bobgy: Reopened this issue.

In response to this:

/reopen
looks like someone cares

no one is working on this.

I am curious what makes it different from using KFP SDK triggered by the event

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

+1 on this as an issue. When data lands on a specific volume, an event should be trigger. Should this logic live in KFP?

Secondly, when an event is created, we would need a listener service to trigger the corresponding KFP pipeline. Is this sufficient?

@swiftdiaries

It's a bit short but I provided an outline of how we plan to support event-driven pipelines here: https://docs.google.com/document/d/1O5n02SzMYmLH0cMkykxHWWWe7eMzaP1vk7Y3fBbLoD8/edit#heading=h.mhe3tnle0c9o

(See event-driven pipelines and data-driven pipelines)

In a nutshell:

  • We will have a metadata store storing info about the data generated by a workflow (metadata).
  • Events can also be stored in that metadata stored from various sources (webhook, pub/sub, etc.) using piece of infrastructure decoupled from the rest of the system.
  • An event-driven CRD will let users specify a workflow to execute each time new data of a particular type is added to the metadata store.

WDYT?

+1 on this, would like to see both the event trigger and data trigger configuration make it to KFP. Is Argo events the only solution here or should we use something more generic to Kubeflow?

+1 for this issue.

We would like to be able trigger pipeline runs from GCP pubsub events

@imagr-pat for GCP pubsub events, it's possible to add a cloud function that listens to it and runs a kfp client, does it work for you?

plus 1 for me on this issue as well.

Ideally I would like to see native Kafka support for event based triggering of Kubeflow pipelines. This way we don't have to use something outside like Nifi or Airflow to have to trigger pipelines based upon an event. This is all to ensure there is better native support for online learning which is event driven based upon the mini-batches of training data that constantly flow into the pipelines to re-train and re-deploy a model.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

hold

Look forward to seeing this feature so we don't need AWS lambda or Cloud Function to chain relevant pipelines ~~ A big thank you ~~

Was this page helpful?
0 / 5 - 0 ratings