Pipelines: Implement a persistence agent for logs, and Garbage Collection for Kubernetes Resources.

Created on 21 Feb 2019  路  11Comments  路  Source: kubeflow/pipelines

arebackend arepersist-agent help wanted kinfeature lifecyclstale prioritp1

All 11 comments

FYI this bug is visible in two cases so far

  • auto scaling cluster
  • redeploy new cluster with a old PD

On our deployment of pipelines as well, logs get wiped away once in few weeks- what will be needed is configuration to persist the logs to a persistent storage, as well as support for archival

SG. This is open for contributions if you are interested. An object store would be great to persist the logs. If the implementation uses Minio Client, we would be able to persist the logs on GCS/S3/GCP.

Description of the solution:

  • We need to persist logs.
  • We can then automatically delete Argo workflow resources and scheduled workflow resources from the K8 etcd store once the workflows are terminated.
  • We need to verify that the OWNER fields are properly set in all the resources created by the Argo workflow (for instance, if an Argo workflow creates a TF-job, the TF-job should have the Argo workflow as its owner).
  • This will automatically delete all dependent PODs and K8 resources.

Short term workaround from @amygdala

To clean up PODs:

  • Install the Argo CLI
  • Use the command: argo delete -n kubeflow --all
  • A downside, of course, is that you can no longer look at those pod logs in the UI.

As @animeshsingh suggested in https://github.com/kubeflow/pipelines/issues/940, we could use the latest Argo executor to save logs to S3/GCS persistent volume when the archiving flag is enabled.

What is the status on this issue? Has there been any updates or contributions? Or is there any documentation on how to enable Argo archiving as suggested above?

GC is implemented now. You can set the TTL as persistence agent env here https://github.com/kubeflow/pipelines/pull/1802/files#diff-f4326ec4a2f4f6b219c2aab8887f6c85R21

The log can be persisted with ARGO_ARCHIVE_LOGS
https://github.com/kubeflow/pipelines/pull/2081

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Both are supported

Was this page helpful?
0 / 5 - 0 ratings