Describe the bug
It appears that garbage collection of CronJob resources is not working.
We have a Flux setup with --sync-garbage-collection=true. It works on our cluster and we have examples of ConfigMap, StatefulSets, RoleBindings being garbage-collected.
However, CronJob resources are not being cleaned-up.
Any ideas?
To Reproduce
Expected behavior
CronJob should be removed
Additional context
Reference CronJob if it's helpful.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
annotations: {}
labels:
name: delete-me
name: delete-me
namespace: default
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 20
jobTemplate:
spec:
completions: 1
parallelism: 1
template:
metadata:
labels:
name: delete-me
spec:
containers:
- args: []
command:
- date
env: []
image: alpine:3.11
imagePullPolicy: IfNotPresent
name: date
ports: []
stdin: false
tty: false
volumeMounts: []
imagePullSecrets: []
initContainers: []
restartPolicy: OnFailure
terminationGracePeriodSeconds: 30
volumes: []
schedule: 0 * * * *
successfulJobsHistoryLimit: 10
It seems that the issue is in getAllowedResourcesBySelector() where only PreferredVersions are respected: https://github.com/fluxcd/flux/blob/414bbd0fb4c5e90ed0c6bf532e60f39a9b274941/pkg/cluster/kubernetes/sync.go#L252
This is what we get from a call to /apis:
...
{
"name": "batch",
"versions": [
{
"groupVersion": "batch/v1",
"version": "v1"
},
{
"groupVersion": "batch/v1beta1",
"version": "v1beta1"
}
],
"preferredVersion": {
"groupVersion": "batch/v1",
"version": "v1"
}
},
...
Is there any reason for only preferred versions to be checked at sync?
We are reproducing this exact issue as well on 1.16.0
@ridhoq @vikatskhay @groodt can you all take a look and see if the image referenced in #3008 solves the issue for you?
@hiddeco I'm giving it a try now and will get back to you.
@hiddeco Yes, it appears to work now! Thank you!
ts=2020-04-15T23:09:12.204301076Z caller=sync.go:159 info="cluster resource not in resources to be synced; deleting" dry-run=false resource=default:cronjob/delete-me
ts=2020-04-15T23:09:12.204612196Z caller=sync.go:159 info="cluster resource not in resources to be synced; deleting" dry-run=false resource=default:configmap/delete-me
ts=2020-04-15T23:09:12.204687659Z caller=sync.go:540 method=Sync cmd=delete args= count=2
ts=2020-04-15T23:09:12.4239456Z caller=sync.go:606 method=Sync cmd="kubectl delete -f -" took=219.231801ms err=null output="cronjob.batch \"delete-me\" deleted\nconfigmap \"delete-me\" deleted"
Not sure if there are some tests we can add to prevent regressions in future.
Anything more to be done to get the fix merged to master @hiddeco ?
@hiddeco Will there be (or can there be?) a point release that includes this? Thanks!
@mhenniges The fix was merged here: https://github.com/fluxcd/flux/pull/3008
It looks like a 1.20.0 release will be happening at some stage: https://github.com/fluxcd/flux/milestone/27
Fixed in Flux v1.20.0