Hi there,
I've just had a problem with my k8s dashboard.
I'm using k8s on AWS installed by Kops. i keep alive 2 master nodes for HA cluster. Today, i have a mistake with ASG and it deleted a master node ( The node that i used to connect with kubectl and api URL). After that, the domain of API URL change to the second master node, and i have problem with k8s dashboard. it's shown: "client: etcd cluster is unavailable or misconfigured".
Any solutions for this problem?
Thanks so much!!
Are you able update the cluster and maybe recreate the instance group? You should have 3 masters for HA as well.
yes, i created 3 masters as well. And i found the mistake, in route53 config, it didn't change the record of "etcd-events". The record of etcd-events still point to old master. So that my cluster got this error.
Thanks so much @chrislovecnm . i will take care of both 3 masters
closing, let me know if I need to re-open
@wayarmy Thanks, this saved me some debugging. How did you solve the problem with stale DNS records? I think this should be handled automatically. Anyone an idea why this didn't work?
@chrislovecnm @wayarmy I tried to update the dns record manually but I still get the Error from server: client: etcd cluster is unavailable or misconfigured.
Is there any tutorial to get a bit more details about the error?
@foxylion I update the DNS record manually, and wait for 15 -20 minutes for DNS updating. after that, i can access the dashboard, and the error didn't show again. Can you post the kubelet's log ? or some logs of your system. Maybe i or someone can help you !!!
@wayarmy I've managed to get it working.
The DNS record was not updated because the events volume for etcd was not mounted correctly. So no problem on the Kubernetes/kops side, only some misconfigured AWS volume tags.
But thanks for your help!
Yay, greate to here that. i think many people will get this error, and it need to fixed.
I think we are also experiencing this error with kops in one of our regions. What is the etcd configuration supposed to be on Route53?
@foxylion did you simply re-tag the volume or did you need to do more?
@giphyvera Yes, simple re-tagging solved the issue. Took some time until everything was mounted correctly, but worked.
To speed it up you can simply terminate the master node, it will be re-created by the auto scaling group.
@foxylion thank you!
Most helpful comment
Yay, greate to here that. i think many people will get this error, and it need to fixed.