RKE version:
v0.3.2
cluster.yml file:
nodes
- address: shortname
user: ubuntu
role: [ "controlplane", "etcd", "worker" ]
cloud_provider:
name: aws
Steps to Reproduce:
Run rke up with cluster.yml as shown above. (address or internal_address not as IP address and cloud_provider set)
Results:
kube-proxy fails to start:
FATA[0354] [workerPlane] Failed to bring up Worker Plane: [Failed to verify healthcheck: Failed to check http://localhost:10256/healthz for service [kube-proxy] on host [shortname]: Get http://localhost:10256/healthz: Unable to access the service on localhost:10256. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), log: invalid argument "shortname" for "--bind-address" flag: "shortname" is not a valid IP address]
This was implemented because kube-proxy fails to start on k8s 1.16 if the IP if not provided to identify the node when cloud-provider is configured (AWS and possibly Openstack)
Main problem is that the address still needs to be provided, so we should probably error out in validation or find another way to make this happen. We already concluded not to do DNS resolution in RKE and use the address from that.
Workaround is to use IP addresses and not hostname/fqdn as address.
I ran into this issue trying to go from 0.3.1 to 0.3.2 on a rke cluster in AWS. I reverted back to 0.3.1 and everything seems ok.
Can I replace DNS hostname with IP value and retry the upgrade or will that mess things up even more?
putting a IP value for internal_address allowed me to upgrade to 0.3.2 in AWS. ie:
nodes
- address: shortname
internal_address: 1.2.3.4
user: ubuntu
role: [ "controlplane", "etcd", "worker" ]
cloud_provider:
name: aws
Faced with the same problem. After adding internal_address rke add nodes with other names.
k8s-etcd1 Ready etcd 15m v1.15.5
k8s-etcd1.lab.vi.local NotReady etcd 21h v1.15.5
k8s-ingress1 Ready worker 15m v1.15.5
k8s-ingress1.lab.vi.local NotReady worker 21h v1.15.5
k8s-master1 Ready controlplane 15m v1.15.5
k8s-master1.lab.vi.local NotReady controlplane 21h v1.15.5
k8s-node1 Ready worker 15m v1.15.5
k8s-node1.lab.vi.local NotReady worker 21h v1.15.5
k8s-node2 Ready worker 15m v1.15.5
k8s-node2.lab.vi.local NotReady worker 21h v1.15.5
k8s-node3 Ready worker 15m v1.15.5
k8s-node3.lab.vi.local NotReady worker 21h v1.15.5
It's a concern, because when using a not resolvable hostname for internal_address, kube-proxy will start with nodeIP 127.0.0.1, causing LoadBalancer service with local externalTrafficPolicy to fail (at least with MetalLB) : https://github.com/danderson/metallb/issues/287
I saw this issue in rancher 2.3.3.
This doesn't happen in 2.3.2.
Most helpful comment
putting a IP value for
internal_addressallowed me to upgrade to0.3.2in AWS. ie: