Describe the feature
Flux should support deployment of Jobs that only need to run once. Example, we deploy RabbitMQ with Flux and would like to run a Job that configures Rabbit's HA Policy - however, if an admin runs kubectl delete jobs --all for any reason the job will be re-deployed by Flux and may run to failure or just unnecessarily use compute and memory for a period of time.
Flux should add an annotation of flux.weave.works/ignore: true to a job once it's completed, but only if another annotation of flux.weave.works/run-once: true is set, otherwide the default behaviour should be observed.
What would the new user story look like?
run-once annotationflux.weave.works/ignore: true annotationkubectl delete jobs --allIf the admin wants to re-run the job, they can simply set flux.weave.works/ignore: true to false
Expected behavior
Jobs are not run if they have been previously run so long as the flux.weave.works/run-once: true annotation is set.
Jobs don't play well with Flux since it's stateless by design. Unless you delete the Job from git Flux will continue kubectl applying them.
The flux.weave.works/run-once: true wouldn't really help because Flux doesn't keep any state and could be restarted before the job ends.
We could add something like flux.weave.works/only-apply-if-missing: true flux.weave.works/ignore-if-state: !Completed but it would be hairy to restart Jobs since you would need to remove the annotation, wait for Flux to apply the job again and add the annotation again.
In a Nutshell: I am not convinced this would be a big improvement. I am happy to hear other suggestions though.
Is the story different if we're using the helm-operator and our jobs are then essentially managed by helm?
Jobs and CronJobs are pretty essential things.
Is the story different if we're using the helm-operator and our jobs are then essentially managed by helm?
Yes, as the only thing Flux manages are HelmRelease resources in that situation. All results of it are managed by Helm-logic.
Jobs and CronJobs are pretty essential things.
Flux _does support_ CronJobs.
Flux does support CronJobs.
Right, makes sense. And even Jobs, for that matter (any k8s manifest it finds, really), just not any tracking of "statefulness" of specific Jobs that shouldn't be run more than once.
@cpressland have you considered moving to Helm + flux's helm-operator? Also, if that's not really feasible for you at this point, it seems like rolling your own state management for this shouldn't be too hard (can just use a REDIS instance, possibly configured to write to disk using a PV/PVC). Check out the example here: https://kubernetes.io/docs/tasks/job/fine-parallel-processing-work-queue/. It's not 1-1 for your use case, but generically speaking all you need is to use an image to run your job that first checks your state somehow (was this job already run? if so, just exit with code 0 / if not, run job, store state, then exit with code 0)
What is the status of this today? I'm trying to define a db initial migration job that I would like to only run once.
How is this still open? is this tool dead or dying ?
Is there a way to be able to run db migrations on release ? seems like a basic requirement .
How is this still open? is this tool dead or dying ?
Quite the opposite, we are working on a next-gen Flux as highlighted in the README.md, see: https://toolkit.fluxcd.io. Besides this, we released a new version of Flux recently (12 days ago).
Is there a way to be able to run db migrations on release ? seems like a basic requirement
There are sufficient alternatives available in the ecosystem to do database migrations without running jobs. Init containers to name one, or the outline described in https://github.com/fluxcd/flux/issues/2440#issuecomment-590393360.
Great! Looking forward to a stable release with at least feature parity with current fluxcd.
I have 10s of Django/celery containers, using init containers would mean all of them would try to run migration at the same time, which is not good. Then I have to fall back to some orchestration tool (I was using salt before switching to flux) which means that I have to write more version management code outside of flux. I haven't still explored helm but that's seems like a lot of work to get going for us. Fluxcd is quite easy to deploy and self contained which is why I started with it, if I go the helm route is have make a helm chart out of the whole infrastructure?
You can create a suspended cron-job and then create a job from that cron-job when ever necessary to run this job. It is a manual step to create the job from cron-job though.