Flux: Garbage collection of CronJob resources

Created on 7 Apr 2020  路  10Comments  路  Source: fluxcd/flux

Describe the bug

It appears that garbage collection of CronJob resources is not working.

We have a Flux setup with --sync-garbage-collection=true. It works on our cluster and we have examples of ConfigMap, StatefulSets, RoleBindings being garbage-collected.

However, CronJob resources are not being cleaned-up.

Any ideas?

To Reproduce

  1. Commit a new CronJob to git
  2. Allow flux to sync and create the CronJob on the cluster
  3. Check CronJob created on the cluster
  4. Remove newly created CronJob from git
  5. Allow flux to sync
  6. Check CronJob still exists on the cluster

Expected behavior

CronJob should be removed

Additional context

  • Flux version: 1.18.0
  • Kubernetes version: 1.15.0
bug

All 10 comments

Reference CronJob if it's helpful.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  annotations: {}
  labels:
    name: delete-me
  name: delete-me
  namespace: default
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 20
  jobTemplate:
    spec:
      completions: 1
      parallelism: 1
      template:
        metadata:
          labels:
            name: delete-me
        spec:
          containers:
          - args: []
            command:
            - date
            env: []
            image: alpine:3.11
            imagePullPolicy: IfNotPresent
            name: date
            ports: []
            stdin: false
            tty: false
            volumeMounts: []
          imagePullSecrets: []
          initContainers: []
          restartPolicy: OnFailure
          terminationGracePeriodSeconds: 30
          volumes: []
  schedule: 0 * * * *
  successfulJobsHistoryLimit: 10

It seems that the issue is in getAllowedResourcesBySelector() where only PreferredVersions are respected: https://github.com/fluxcd/flux/blob/414bbd0fb4c5e90ed0c6bf532e60f39a9b274941/pkg/cluster/kubernetes/sync.go#L252

This is what we get from a call to /apis:

    ...
    {
      "name": "batch",
      "versions": [
        {
          "groupVersion": "batch/v1",
          "version": "v1"
        },
        {
          "groupVersion": "batch/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "batch/v1",
        "version": "v1"
      }
    },
    ...

Is there any reason for only preferred versions to be checked at sync?

We are reproducing this exact issue as well on 1.16.0

@ridhoq @vikatskhay @groodt can you all take a look and see if the image referenced in #3008 solves the issue for you?

@hiddeco I'm giving it a try now and will get back to you.

@hiddeco Yes, it appears to work now! Thank you!

ts=2020-04-15T23:09:12.204301076Z caller=sync.go:159 info="cluster resource not in resources to be synced; deleting" dry-run=false resource=default:cronjob/delete-me
ts=2020-04-15T23:09:12.204612196Z caller=sync.go:159 info="cluster resource not in resources to be synced; deleting" dry-run=false resource=default:configmap/delete-me
ts=2020-04-15T23:09:12.204687659Z caller=sync.go:540 method=Sync cmd=delete args= count=2
ts=2020-04-15T23:09:12.4239456Z caller=sync.go:606 method=Sync cmd="kubectl delete -f -" took=219.231801ms err=null output="cronjob.batch \"delete-me\" deleted\nconfigmap \"delete-me\" deleted"

Not sure if there are some tests we can add to prevent regressions in future.

Anything more to be done to get the fix merged to master @hiddeco ?

@hiddeco Will there be (or can there be?) a point release that includes this? Thanks!

@mhenniges The fix was merged here: https://github.com/fluxcd/flux/pull/3008

It looks like a 1.20.0 release will be happening at some stage: https://github.com/fluxcd/flux/milestone/27

Fixed in Flux v1.20.0

Was this page helpful?
0 / 5 - 0 ratings