Argo: Artifact Garbage Collection

Created on 25 May 2020  路  4Comments  路  Source: argoproj/argo

Summary

When a Workflow is deleted, we should consider garbage collecting artifacts from those workflows where possible. This should likely be configurable.

Motivation

Right now, manual intervention is needed in order to clean up artifacts that were created as part of Workflows if desired. For high frequency workflows with high amounts of artifacts (think a CI system), cleaning up these artifacts can be pretty important for cost savings.

Proposal

Each type of artifact storage will need to support a deleteArtifact function. When a Workflow is deleted (not simply Archived, but actually deleted), all artifacts associated with that workflow would be cleaned up automatically.

We could make this configurable with something like

  # The rest of the output config was omitted
  outputs:
    artifacts:
    - name: my-output-artifact
      artifactGC:
        strategy: Never | OnWorkflowDeletion | OnWorkflowArchival

Initially, I'd suggest that the default be Never to match existing behavior unless you're OK with making a breaking change.



Message from the maintainers:

If you wish to see this enhancement implemented please add a 馃憤 reaction to this issue! We often sort issues this way to know what to prioritize.

enhancement epiartifacts help wanted

Most helpful comment

this sounds like a great idea - would you be interested in submitting a PR?

All 4 comments

this sounds like a great idea - would you be interested in submitting a PR?

Yep!

How are you getting on with this enhancement?

@pbebbo At the moment, it's on hold while I'm reconsidering how to actually implement this. When you consider the huge variety of ways that people can configure their artifact repo on AWS alone such as Access Key/Secret, IRSA, KIAM/kube2iam, temporary secrets that are cleaned up after the workflow runs, etc (and that's just AWS), the scope of the problem is actually pretty big if you try to infer what credentials you should use to delete any given artifact. All of that is actually assuming that you actually have delete permissions on your artifact repo using those credentials as well, and that's not necessarily true.

The current idea I've been kicking around is to add some additional config that would allow you to specify an additional set of credentials to use for garbage collection that are expected to be allowed to delete artifacts from any workflow which basically punts that responsibility to the user. With that in place, I think that should eliminate all of those concerns.

Was this page helpful?
0 / 5 - 0 ratings