Dashboard: Handle Kubernetes API server failover

Created on 8 May 2018  路  11Comments  路  Source: kubernetes/dashboard

Environment
Dashboard version: v1.8.3
Kubernetes version: v1.10.2
Operating system: CentOS Linux release 7.4.1708 (Core)
Steps to reproduce
  1. Have a multi-master Kubernetes cluster
  2. Run dashboard with in-cluster config
  3. Stop one of the API servers
Observed result
  • Dashboard hangs while trying to load cluster resources, until Linux eventually timeouts the TCP connection.
  • The dashboard pod is not killed and restarted automatically either, because the liveness probe does not exercise the Kubernetes API connection
  • Even after the TCP timeout and eventual reconnect, the log still repeatedly shows Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: kubernetes-dashboard-key-holder-kube-system watch ended with timeout (see https://github.com/kubernetes/dashboard/issues/2723#issuecomment-382378523)
Expected result
  • Dashboard has a shorter timeout on the API server connection and reconnects (might depend on https://github.com/kubernetes/client-go/issues/374)
  • The dashboard has a liveness probe that actually tests core functionality of the dashboard (/api/v1/settings/global might do, but ideally a designated health URI)
good first issue kinbug lifecyclfrozen

Most helpful comment

Even with a single-master cluster this happens.

Steps to reproduce:

  • Run the dashboard (e.g. with default configuration from the helm chart)
  • Restart the API server (find with kubectl get pods -n kube-system -l component=kube-apiserver)

The logging flood stops when the dashboard pod is deleted/recreated.

All 11 comments

I just experienced the same issue. The dashboard was trying to synchronize in a fast loop (thousands of log entries in one second) consuming a lot of CPU.

Even with a single-master cluster this happens.

Steps to reproduce:

  • Run the dashboard (e.g. with default configuration from the helm chart)
  • Restart the API server (find with kubectl get pods -n kube-system -l component=kube-apiserver)

The logging flood stops when the dashboard pod is deleted/recreated.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

/lifecycle frozen

Same problem.

Running Azure AKS
K8 version 1.11.4 and 1.11.5
Dashboard image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0

After "Microsoft" sometimes restarts the managed API-server the dashboard starts to log around 450 lines per 5 minutes.

Any update on actually getting the reconnect solved?

@Zenlil it will be fixed in v2. Right now a workaround is to delete Dashboard pod after api server restart.

Still happening in 1.10.1 ? I was flodded with gigs of logs in no time. Deleting the dashboard pod did not solve it

Can you upload beginning of the log? First 30m let's say.

These are the initial log entries that we saw when we encountered the issue:

2019/07/26 02:28:58 Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: unexpected object: &Secret{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Data:map[string][]byte{},Type:,StringData:map[string]string{},}

2019/07/26 02:29:00 Restarting synchronizer: kubernetes-dashboard-key-holder-kube-system.

2019/07/26 02:29:00 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system

2019/07/26 02:29:00 Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: kubernetes-dashboard-key-holder-kube-system watch ended with timeout

2019/07/26 02:29:02 Restarting synchronizer: kubernetes-dashboard-key-holder-kube-system.

2019/07/26 02:29:02 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system

2019/07/26 02:29:02 Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: kubernetes-dashboard-key-holder-kube-system watch ended with timeout

The last 3 errors repeat every 2 seconds causing a flood of log entries.

It's no longer a case with v2 as it forces restart of the pod after few retries.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lukmanulhakimd picture lukmanulhakimd  路  4Comments

kasunsjc picture kasunsjc  路  3Comments

maciaszczykm picture maciaszczykm  路  4Comments

minminmsn picture minminmsn  路  4Comments

andrei-dascalu picture andrei-dascalu  路  3Comments