Dashboard: Handle Kubernetes API server failover

Created on 8 May 2018 · 11Comments · Source: kubernetes/dashboard

Environment

Dashboard version: v1.8.3
Kubernetes version: v1.10.2
Operating system: CentOS Linux release 7.4.1708 (Core)

Steps to reproduce

Have a multi-master Kubernetes cluster
Run dashboard with in-cluster config
Stop one of the API servers

Observed result

Dashboard hangs while trying to load cluster resources, until Linux eventually timeouts the TCP connection.
The dashboard pod is not killed and restarted automatically either, because the liveness probe does not exercise the Kubernetes API connection
Even after the TCP timeout and eventual reconnect, the log still repeatedly shows Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: kubernetes-dashboard-key-holder-kube-system watch ended with timeout (see https://github.com/kubernetes/dashboard/issues/2723#issuecomment-382378523)

Expected result

Dashboard has a shorter timeout on the API server connection and reconnects (might depend on https://github.com/kubernetes/client-go/issues/374)
The dashboard has a liveness probe that actually tests core functionality of the dashboard (/api/v1/settings/global might do, but ideally a designated health URI)

good first issue kinbug lifecyclfrozen

Source

mxey

👍2

Most helpful comment

Even with a single-master cluster this happens.

Steps to reproduce:

Run the dashboard (e.g. with default configuration from the helm chart)
Restart the API server (find with kubectl get pods -n kube-system -l component=kube-apiserver)

The logging flood stops when the dashboard pod is deleted/recreated.

vdboor on 4 Jun 2018

👍2

All 11 comments

I just experienced the same issue. The dashboard was trying to synchronize in a fast loop (thousands of log entries in one second) consuming a lot of CPU.

mjeri on 16 May 2018

Even with a single-master cluster this happens.

Steps to reproduce:

Run the dashboard (e.g. with default configuration from the helm chart)
Restart the API server (find with kubectl get pods -n kube-system -l component=kube-apiserver)

The logging flood stops when the dashboard pod is deleted/recreated.

vdboor on 4 Jun 2018

👍2

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 2 Sep 2018

/remove-lifecycle stale

vdboor on 3 Sep 2018

/lifecycle frozen

maciaszczykm on 3 Sep 2018

Same problem.

Running Azure AKS
K8 version 1.11.4 and 1.11.5
Dashboard image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0

After "Microsoft" sometimes restarts the managed API-server the dashboard starts to log around 450 lines per 5 minutes.

Any update on actually getting the reconnect solved?

Zenlil on 15 Jan 2019

@Zenlil it will be fixed in v2. Right now a workaround is to delete Dashboard pod after api server restart.

floreks on 15 Jan 2019

Still happening in 1.10.1 ? I was flodded with gigs of logs in no time. Deleting the dashboard pod did not solve it

WhoAteDaCake on 11 Jun 2019

Can you upload beginning of the log? First 30m let's say.

floreks on 11 Jun 2019

These are the initial log entries that we saw when we encountered the issue:

2019/07/26 02:28:58 Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: unexpected object: &Secret{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Data:map[string][]byte{},Type:,StringData:map[string]string{},}

2019/07/26 02:29:00 Restarting synchronizer: kubernetes-dashboard-key-holder-kube-system.

2019/07/26 02:29:00 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system

2019/07/26 02:29:00 Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: kubernetes-dashboard-key-holder-kube-system watch ended with timeout

2019/07/26 02:29:02 Restarting synchronizer: kubernetes-dashboard-key-holder-kube-system.

2019/07/26 02:29:02 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system

2019/07/26 02:29:02 Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: kubernetes-dashboard-key-holder-kube-system watch ended with timeout

The last 3 errors repeat every 2 seconds causing a flood of log entries.