I've come across annoying behaviour where when there is a problem in the manifests, fluxctl will timeout(usually after a prolonged amount of time). It should catch whatever error and return it to the user instead.
Usually fluxd will end up skipping over problematic manifests and applying the rest, rather than stalling. Can you give an example which causes a timeout? Is it a specific kind of problem with manifests that makes it happen?
Heres an example:
ts=2019-01-02T21:34:23.725646505Z caller=loop.go:118 component=sync-loop jobID=9e304438-e613-cca2-cf8f-180559731a53 state=done success=false err="applying changes: Traceback (most recent call last):\n File \"kubeyaml.py\", line 226, in
It _may_ be a more general problem. I have an EKS cluster in us-west-2 and the latency between there and Dublin may be causing a problem.
Ah OK, that's the update code complaining (at length) that it can't apply a change you made because the YAML is malformed (or there's a bug in that bit of the update code; but it looks more like the former).
You are absolutely right that the error message could be more accessible! Here it's relying on the error returned from a Python library -- which seems to be returning the whole stack trace. Perhaps a first step would be to fish out the substance of the problem (and the location).
Oh well my problem isn't with the stack trace in the logs, it's the timeout error that I find most annoying :)
You didn't give an example of the timeout -- is it what you get when using fluxctl, with the example posted being the log message at the time of the fluxctling?
Got same error today:
fluxctl sync
Synchronizing with [email protected]:some/secret.git
Failed to complete sync job (ID "bdeb4312-9560-3b8b-8324-616a3cf5ff99")
Error: timeout
Run 'fluxctl sync --help' for usage.
Timed out after a minute, should I set git-timeout flag for fluxd to fix this?
I get the same error with fluxctl sync, it seems to time out often.
I got this using Helm 3 and a cluster in Azure.
I got this by installing Flux from help templates on OpenShift in GCP
I am having the same problem. I manually deleted the namespace where my workload was, and now there is no way for flux to catch it and apply, I have done lots of changes and committed, but nothing. And now fluxctl sync, and it times out.
EDIT
Actually, now I'm realizing that if I do fluxctl list-workloads, it does tell me what's the error. That's in my scenario though.
fluxctl sync times out more and more often as the size of my cluster grows larger.
It's completely non-deterministic, however. I have no idea how to remedy this, because it succeeds sometimes.
I have the same issue, checking the logs I got this:
ts=2020-08-06T14:32:46.220230199Z caller=loop.go:108 component=sync-loop err="loading resources from repo: duplicate definition of 'app:ingress/app' (in deployments/app/ingress.yml and deployments/test/ingress.yml)"
it's ok the error because there was a duplicated definition, but why Flux stuck and returns timeout?
Flux version 1.20.0.
Most helpful comment
I get the same error with fluxctl sync, it seems to time out often.