Rancher Version:
1.0.1
Docker Version:
1.10.3
OS:
ubuntu
Steps to Reproduce:
Have three Nodes in a kubernetes ENV.
Terminate the Instance on which the etcd container is running.
Results:
Kubernetes is down + All Secrets/Pods/RC Configs are lost after the cluster
repairs itself. Which works like a charm. Thank you :)
Expected:
etcd is HA, has data on all nodes, cluster repairs it self and does not lose all the data
Temporarily Workaround Idea:
If we would allow to configure the etcd address in the kubernetes stack I could install . This would allow me to have etcd HA on different nodes. And every kubernetes Node could go offline without impacting the cluster.
I ask myself if i can click on the kubernetes stack and configure external etcd cluster through the upgrade button, and if the settings i configured will be exchanged after next rancher upgrade?
Implementation Idea:
Start etcd container on every server and spin up a cluster. Kubernetes Container connects to an Array of etcd (like in coreos implementation - https://coreos.com/kubernetes/docs/latest/deploy-master.html - See configuration parameter --> - --etcd-servers=${ETCD_ENDPOINTS}). I would try to implement it through manually changes in the stack by myself but how can i "save" it that it won't change after next upgrade?
Just found the catalog ha etcd from rancher. Will try to implement it into the kubernetes stack.
Greetings Thomas 馃憤
After a lot of research. Here are some Ideas to implement it.
First one:
Second one:
Third one:
Third one - From my poin of view the "best" one until etcd is easy to deploy on dynamic environments:
Trying the third one (modifying etcd address in kube api server docker container)
24.4.2016 19:26:57F0424 17:26:57.688936 1 server.go:211] Cloud provider could not be initialized: could not init cloud provider "rancher": Could not create rancher client: &url.Error{Op:"Get", URL:"", Err:(*http.badStringError)(0xc2081c5740)}
Seems that this was cause through a loss of labels (vimdiff left side old stack, right stack after upgrading two times. One time with etcd on a seperate host, and then a second upgrade to roll it back)
Referenced Bug: https://github.com/rancher/rancher/issues/4476
After setting the labels manually on upgrade (beside changing the etcd addresses to a selfmanaged etcd cluster) the cluster is working again :) Tomorrow i will test what happens if the node running the kubernetes api container goes down. From my point of view it should be able to reconnect to the etcd cluster and no services go down afterwards.
Verified on master with three hosts. Powering off one of the hosts brings up etcd on another host successfully. Kubernetes pods, rc, services can be created successfully too.
Most helpful comment
Pull request