Describe the bug
We're running a sync that is taking longer than a minute and hits a timeout. We tried to raise the timeout to account for this but we still get a context deadline exceeded error. If we lower the timeout to less than 60s we see timeouts sooner. I suspect there is another timeout that is being hit that is overriding the effective sync-timeout
To Reproduce
Expected behavior
The sync timeout should be respected
Logs
our args
- --log-format=fmt
- --ssh-keygen-dir=/var/fluxd/keygen
- --ssh-keygen-format=RFC4716
- --k8s-secret-name=flux-git-deploy
- --memcached-hostname=fluxcd-memcached
- --sync-state=git
- --sync-timeout=2m
- --memcached-service=
- --git-url=ssh://{{our repo}}
- --git-branch=master
- --git-path={{ our path }}
- --git-readonly=true
- --git-user={{our user }}
- --git-email={{ our email }
- --git-verify-signatures=false
- --git-set-author=false
- --git-poll-interval=5m
- --git-timeout=20s
- --sync-interval=5m
- --git-ci-skip=false
- --manifest-generation=true
- --automation-interval=5m
- --registry-rps=200
- --registry-burst=125
- --registry-trace=false
- --registry-disable-scanning
ts=2020-05-08T15:37:36.922163159Z caller=sync.go:73 component=daemon info="trying to sync git changes to the cluster" old=99ccd2c99ce8a771caf3139db6807d8af2b5b78e new=d4fc25c5a4a2d545007cff0e3433b59a452308d2
ts=2020-05-08T15:38:40.902810713Z caller=loop.go:107 component=sync-loop err="loading resources from repo: error executing generator command \"../../bin/flux-generator.sh\" from file \"../../.flux.yaml\": context deadline exceeded\nerror output:\n\ngenerated output:\n"
Additional context
the command execution timesout at 60s, it's set here - https://github.com/fluxcd/flux/blob/v1.19.0/pkg/manifests/configfile.go#L24 and used here to execute the command - https://github.com/fluxcd/flux/blob/v1.19.0/pkg/manifests/configfile.go#L490
It would be nice to make it configurable, or use the --sync-timeout here.
I agree that the command timeout could use --sync-timeout since you can put whatever script in .flux.yaml including a kubectl apply. @squaremo what are your throughs on this?
Kustomize tends to take quite a while when remote resources in git are referenced, it would be nice to see the command timeout respect the --sync-timeout flag.
Does anyone have time to look at the PR from @marshallford possibly @stefanprodan? We are running into this issue as we use kustomize to install everything in our cluster, our manifest generation is now timing out 90% of the time, and changing sync-timeout has no effect, you can change it to 20min and it still times out. After 2 or 3 days though one of the syncs seems to sneak through successfully.
This is also impacting us in certain regions. We recently spun up a cluster in Australia and its taking 1m 30s to perform a kustomize build due to a large amount of remote git repos and we have no way to adjust.
Hi, could someone look at merging the PR potentially? It seems this issue is also impacting us.
And flux fails syncing with:
{"caller":"loop.go:108","component":"sync-loop","err":"collating resources in cluster for sync: the server was unable to return a response in the time allotted, but may still be processing the request","ts":"2020-10-16T16:54:53.135656371Z"}
I'll be working on the PR review items this weekend.
EDIT: That error is different than the one I'm trying to fix.
Most helpful comment
I'll be working on the PR review items this weekend.
EDIT: That error is different than the one I'm trying to fix.