Flux: Fluxctl sync should return a meaningful error instead of timing out

Created on 31 Dec 2018 · 13Comments · Source: fluxcd/flux

I've come across annoying behaviour where when there is a problem in the manifests, fluxctl will timeout(usually after a prolonged amount of time). It should catch whatever error and return it to the user instead.

UX bug

Source

dmarkey

👍8

Most helpful comment

I get the same error with fluxctl sync, it seems to time out often.

GODBS on 15 Sep 2019

👍8

All 13 comments

Usually fluxd will end up skipping over problematic manifests and applying the rest, rather than stalling. Can you give an example which causes a timeout? Is it a specific kind of problem with manifests that makes it happen?

squaremo on 2 Jan 2019

Heres an example:

ts=2019-01-02T21:34:23.725646505Z caller=loop.go:118 component=sync-loop jobID=9e304438-e613-cca2-cf8f-180559731a53 state=done success=false err="applying changes: Traceback (most recent call last):\n File \"kubeyaml.py\", line 226, in \n File \"kubeyaml.py\", line 221, in main\n File \"kubeyaml.py\", line 53, in apply_to_yaml\n File \"kubeyaml.py\", line 59, in update_image\n File \"site-packages/ruamel/yaml/main.py\", line 363, in load_all\n File \"site-packages/ruamel/yaml/constructor.py\", line 101, in get_data\n File \"site-packages/ruamel/yaml/constructor.py\", line 118, in construct_document\n File \"site-packages/ruamel/yaml/constructor.py\", line 1508, in construct_yaml_map\n File \"site-packages/ruamel/yaml/constructor.py\", line 1414, in construct_mapping\n File \"site-packages/ruamel/yaml/constructor.py\", line 279, in check_mapping_key\nruamel.yaml.constructor.DuplicateKeyError: while constructing a mapping\n in \"\", line 9, column 5\nfound duplicate key \"flux.weave.works/automated\" with value \"true\" (original value: \"true\")\n in \"\", line 12, column 5\n\nTo suppress this check see:\n http://yaml.readthedocs.io/en/latest/api.html#duplicate-keys\n\nDuplicate keys will become an error in future releases, and are errors\nby default when using the new API.\n\nFailed to execute script kubeyaml"

dmarkey on 2 Jan 2019

It _may_ be a more general problem. I have an EKS cluster in us-west-2 and the latency between there and Dublin may be causing a problem.

dmarkey on 2 Jan 2019

Ah OK, that's the update code complaining (at length) that it can't apply a change you made because the YAML is malformed (or there's a bug in that bit of the update code; but it looks more like the former).

You are absolutely right that the error message could be more accessible! Here it's relying on the error returned from a Python library -- which seems to be returning the whole stack trace. Perhaps a first step would be to fish out the substance of the problem (and the location).

squaremo on 3 Jan 2019

Oh well my problem isn't with the stack trace in the logs, it's the timeout error that I find most annoying :)

dmarkey on 3 Jan 2019

You didn't give an example of the timeout -- is it what you get when using fluxctl, with the example posted being the log message at the time of the fluxctling?

squaremo on 3 Jan 2019

Got same error today:

fluxctl sync
Synchronizing with [email protected]:some/secret.git
Failed to complete sync job (ID "bdeb4312-9560-3b8b-8324-616a3cf5ff99")
Error: timeout
Run 'fluxctl sync --help' for usage.

Timed out after a minute, should I set git-timeout flag for fluxd to fix this?

yellowmegaman on 7 Aug 2019

I get the same error with fluxctl sync, it seems to time out often.

GODBS on 15 Sep 2019

👍8

I got this using Helm 3 and a cluster in Azure.

rbitia on 24 Sep 2019

I got this by installing Flux from help templates on OpenShift in GCP

Dimss on 8 Oct 2019

I am having the same problem. I manually deleted the namespace where my workload was, and now there is no way for flux to catch it and apply, I have done lots of changes and committed, but nothing. And now fluxctl sync, and it times out.

EDIT
Actually, now I'm realizing that if I do fluxctl list-workloads, it does tell me what's the error. That's in my scenario though.

nerusnayleinad on 26 Apr 2020

fluxctl sync times out more and more often as the size of my cluster grows larger.

It's completely non-deterministic, however. I have no idea how to remedy this, because it succeeds sometimes.

JVMartin on 29 Apr 2020

I have the same issue, checking the logs I got this:

ts=2020-08-06T14:32:46.220230199Z caller=loop.go:108 component=sync-loop err="loading resources from repo: duplicate definition of 'app:ingress/app' (in deployments/app/ingress.yml and deployments/test/ingress.yml)"

it's ok the error because there was a duplicated definition, but why Flux stuck and returns timeout?

Flux version 1.20.0.