Flux: Fluxctl sync should return a meaningful error instead of timing out

Created on 31 Dec 2018  路  13Comments  路  Source: fluxcd/flux

I've come across annoying behaviour where when there is a problem in the manifests, fluxctl will timeout(usually after a prolonged amount of time). It should catch whatever error and return it to the user instead.

UX bug

Most helpful comment

I get the same error with fluxctl sync, it seems to time out often.

All 13 comments

Usually fluxd will end up skipping over problematic manifests and applying the rest, rather than stalling. Can you give an example which causes a timeout? Is it a specific kind of problem with manifests that makes it happen?

Heres an example:

ts=2019-01-02T21:34:23.725646505Z caller=loop.go:118 component=sync-loop jobID=9e304438-e613-cca2-cf8f-180559731a53 state=done success=false err="applying changes: Traceback (most recent call last):\n File \"kubeyaml.py\", line 226, in \n File \"kubeyaml.py\", line 221, in main\n File \"kubeyaml.py\", line 53, in apply_to_yaml\n File \"kubeyaml.py\", line 59, in update_image\n File \"site-packages/ruamel/yaml/main.py\", line 363, in load_all\n File \"site-packages/ruamel/yaml/constructor.py\", line 101, in get_data\n File \"site-packages/ruamel/yaml/constructor.py\", line 118, in construct_document\n File \"site-packages/ruamel/yaml/constructor.py\", line 1508, in construct_yaml_map\n File \"site-packages/ruamel/yaml/constructor.py\", line 1414, in construct_mapping\n File \"site-packages/ruamel/yaml/constructor.py\", line 279, in check_mapping_key\nruamel.yaml.constructor.DuplicateKeyError: while constructing a mapping\n in \"\", line 9, column 5\nfound duplicate key \"flux.weave.works/automated\" with value \"true\" (original value: \"true\")\n in \"\", line 12, column 5\n\nTo suppress this check see:\n http://yaml.readthedocs.io/en/latest/api.html#duplicate-keys\n\nDuplicate keys will become an error in future releases, and are errors\nby default when using the new API.\n\nFailed to execute script kubeyaml"

It _may_ be a more general problem. I have an EKS cluster in us-west-2 and the latency between there and Dublin may be causing a problem.

Ah OK, that's the update code complaining (at length) that it can't apply a change you made because the YAML is malformed (or there's a bug in that bit of the update code; but it looks more like the former).

You are absolutely right that the error message could be more accessible! Here it's relying on the error returned from a Python library -- which seems to be returning the whole stack trace. Perhaps a first step would be to fish out the substance of the problem (and the location).

Oh well my problem isn't with the stack trace in the logs, it's the timeout error that I find most annoying :)

You didn't give an example of the timeout -- is it what you get when using fluxctl, with the example posted being the log message at the time of the fluxctling?

Got same error today:

fluxctl sync
Synchronizing with [email protected]:some/secret.git
Failed to complete sync job (ID "bdeb4312-9560-3b8b-8324-616a3cf5ff99")
Error: timeout
Run 'fluxctl sync --help' for usage.

Timed out after a minute, should I set git-timeout flag for fluxd to fix this?

I get the same error with fluxctl sync, it seems to time out often.

I got this using Helm 3 and a cluster in Azure.

I got this by installing Flux from help templates on OpenShift in GCP

I am having the same problem. I manually deleted the namespace where my workload was, and now there is no way for flux to catch it and apply, I have done lots of changes and committed, but nothing. And now fluxctl sync, and it times out.

EDIT
Actually, now I'm realizing that if I do fluxctl list-workloads, it does tell me what's the error. That's in my scenario though.

fluxctl sync times out more and more often as the size of my cluster grows larger.

It's completely non-deterministic, however. I have no idea how to remedy this, because it succeeds sometimes.

I have the same issue, checking the logs I got this:

ts=2020-08-06T14:32:46.220230199Z caller=loop.go:108 component=sync-loop err="loading resources from repo: duplicate definition of 'app:ingress/app' (in deployments/app/ingress.yml and deployments/test/ingress.yml)"

it's ok the error because there was a duplicated definition, but why Flux stuck and returns timeout?

Flux version 1.20.0.

Was this page helpful?
0 / 5 - 0 ratings