Describe the bug
After a few days flux got stuck and did not sync anymore. I killeded the pod and everything was up and running again.
To Reproduce
I have no idea... maybe the network connection was brokene for a few minutes or the file system got corrupted...
Expected behavior
flux should terminate if it cannot reach the git repo for a long time like 1 hour. This would allow k8s to restart the container which gives visibility and might also solve the issue.
Logs
โ ts=2020-05-05T09:17:13.783823769Z caller=checkpoint.go:24 component=checkpoint msg="up to date" latest=1.19.0 โ
โ ts=2020-05-05T09:25:36.751714104Z caller=loop.go:107 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository., full output:\n Cloning into bare repository '/tmp/flux-gitclone406024532 โ
โ ts=2020-05-05T10:25:36.752085036Z caller=loop.go:107 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository., full output:\n Cloning into bare repository '/tmp/flux-gitclone406024532 โ
โ ts=2020-05-05T11:25:36.752559107Z caller=loop.go:107 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository., full output:\n Cloning into bare repository '/tmp/flux-gitclone406024532 โ
โ ts=2020-05-05T12:25:36.753326379Z caller=loop.go:107 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository., full output:\n Cloning into bare repository '/tmp/flux-gitclone406024532 โ
โ ts=2020-05-05T13:25:36.754100308Z caller=loop.go:107 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository., full output:\n Cloning into bare repository '/tmp/flux-gitclone406024532 โ
โ ts=2020-05-05T14:25:36.754661578Z caller=loop.go:107 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository., full output:\n Cloning into bare repository '/tmp/flux-gitclone406024532
Additional context
Just encountered this in our setup as well. DNS had been temporarily unavailable, therefore it failed to clone the repo. It seems that the readiness/liveness probe did not detect this as a failure, so the pod remained up with issues. Once the pod had been re-created, it started working again.
possible duplicate/related to https://github.com/fluxcd/flux/issues/3014 ?
We have been having this kind of issues since upgrading to 1.19
We've been having this issue as well, on two clusters. One has flux 1.19.0, the other 1.20.0.
Ran into this same issue, had to cycle the pods to get them to start syncing again.
Most helpful comment
Just encountered this in our setup as well. DNS had been temporarily unavailable, therefore it failed to clone the repo. It seems that the readiness/liveness probe did not detect this as a failure, so the pod remained up with issues. Once the pod had been re-created, it started working again.