Flux: Multiple Flux instances restarted about the same time

Created on 3 Apr 2020  路  5Comments  路  Source: fluxcd/flux

Describe the bug

Multiple Flux instances of mine restarted last night. They are in the same cluster, but nothing else experienced problem. Between 1035PM 1105PM CET

Suspecting this is related to new version releases, but the logs doesn't say anything.

To Reproduce
I can't reproduce, but happened few weeks back in a similar fashion

Expected behavior
Restarts are not suspiciously close to each other.

Logs

April 2nd 2020, 20:59:25.000ts=2020-04-02T20:59:25.458473647Z caller=main.go:796 exiting=terminated
April 2nd 2020, 20:59:25.000ts=2020-04-02T20:59:25.458619446Z caller=loop.go:76 component=sync-loop stopping=true
April 2nd 2020, 20:59:25.000ts=2020-04-02T20:59:25.458844844Z caller=upstream.go:145 component=upstream connectionclosing=true err="websocket: close sent"
April 2nd 2020, 20:59:25.000ts=2020-04-02T20:59:25.458776345Z caller=upstream.go:174 component=upstream disconnected=true
April 2nd 2020, 20:58:48.000ts=2020-04-02T20:58:48.103819904Z caller=loop.go:133 component=sync-loop event=refreshed url=ssh://[email protected]/xxx branch=master HEAD=yyy
April 2nd 2020, 20:58:39.000ts=2020-04-02T20:58:39.576352086Z caller=loop.go:133 component=sync-loop event=refreshed url=ssh://[email protected]/xxx branch=master HEAD=yyy
April 2nd 2020, 20:58:31.000ts=2020-04-02T20:58:31.005438422Z caller=loop.go:133 component=sync-loop event=refreshed url=ssh://[email protected]/xxx branch=master HEAD=yyy
April 2nd 2020, 20:58:31.000ts=2020-04-02T20:58:31.010313083Z caller=loop.go:133 component=sync-loop event=refreshed url=ssh://[email protected]/xxx branch=master HEAD=yyy
    Last State:     Terminated                                                                                                                                                                
      Reason:       Completed                                                                                                                                                                 
      Exit Code:    0                                                                                                                                                                         
      Started:      Thu, 02 Apr 2020 22:58:16 +0200                                                                                                                                           
      Finished:     Thu, 02 Apr 2020 22:59:35 +0200

Additional context

  • Flux version: 1.18
  • Kubernetes version: 1.14.6
  • Git provider: Github
question

All 5 comments

Could this be related to the GitHub outage https://www.githubstatus.com/incidents/80d0cs6kpsps ?

Can you check if the probes failed and Kubernetes restarted the pods?

Thanks for quick response! The timelines add up nicely. I think it is definitely caused by the Github outage.

The pods were restarted, most likely due to the probes. But can't confirm exactly as I don't see kubernetes events anymore. The events are cleared now (I think they are cleared after an hour)

I think we can close the issue. Although a git related error from the sync-loop could be useful.

Hmm you should see in logs something like #2968

Maybe I misconfigured something. Thanks. I follow up if I found it.

This was happening a few more times, and yesterday in Github's outage too. Moved to 1.19 and the error is handled properly now. Error message logged, Flux not crashing, but retrying.

Was this page helpful?
0 / 5 - 0 ratings