Rke: kube-proxy fails with "--bind-address" flag: x is not a valid IP address seen when not using IP address as node address and cloud provider is configured

Created on 29 Oct 2019  路  5Comments  路  Source: rancher/rke

RKE version:
v0.3.2

cluster.yml file:

nodes
- address: shortname
  user: ubuntu
  role: [ "controlplane", "etcd", "worker" ]

cloud_provider:
  name: aws

Steps to Reproduce:
Run rke up with cluster.yml as shown above. (address or internal_address not as IP address and cloud_provider set)

Results:
kube-proxy fails to start:

FATA[0354] [workerPlane] Failed to bring up Worker Plane: [Failed to verify healthcheck: Failed to check http://localhost:10256/healthz for service [kube-proxy] on host [shortname]: Get http://localhost:10256/healthz: Unable to access the service on localhost:10256. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), log: invalid argument "shortname" for "--bind-address" flag: "shortname" is not a valid IP address] 

This was implemented because kube-proxy fails to start on k8s 1.16 if the IP if not provided to identify the node when cloud-provider is configured (AWS and possibly Openstack)

Main problem is that the address still needs to be provided, so we should probably error out in validation or find another way to make this happen. We already concluded not to do DNS resolution in RKE and use the address from that.

Workaround is to use IP addresses and not hostname/fqdn as address.

kinbug

Most helpful comment

putting a IP value for internal_address allowed me to upgrade to 0.3.2 in AWS. ie:

nodes
- address: shortname
  internal_address: 1.2.3.4
  user: ubuntu
  role: [ "controlplane", "etcd", "worker" ]

cloud_provider:
  name: aws

All 5 comments

I ran into this issue trying to go from 0.3.1 to 0.3.2 on a rke cluster in AWS. I reverted back to 0.3.1 and everything seems ok.

Can I replace DNS hostname with IP value and retry the upgrade or will that mess things up even more?

putting a IP value for internal_address allowed me to upgrade to 0.3.2 in AWS. ie:

nodes
- address: shortname
  internal_address: 1.2.3.4
  user: ubuntu
  role: [ "controlplane", "etcd", "worker" ]

cloud_provider:
  name: aws

Faced with the same problem. After adding internal_address rke add nodes with other names.

k8s-etcd1                   Ready      etcd           15m   v1.15.5
k8s-etcd1.lab.vi.local      NotReady   etcd           21h   v1.15.5
k8s-ingress1                Ready      worker         15m   v1.15.5
k8s-ingress1.lab.vi.local   NotReady   worker         21h   v1.15.5
k8s-master1                 Ready      controlplane   15m   v1.15.5
k8s-master1.lab.vi.local    NotReady   controlplane   21h   v1.15.5
k8s-node1                   Ready      worker         15m   v1.15.5
k8s-node1.lab.vi.local      NotReady   worker         21h   v1.15.5
k8s-node2                   Ready      worker         15m   v1.15.5
k8s-node2.lab.vi.local      NotReady   worker         21h   v1.15.5
k8s-node3                   Ready      worker         15m   v1.15.5
k8s-node3.lab.vi.local      NotReady   worker         21h   v1.15.5

It's a concern, because when using a not resolvable hostname for internal_address, kube-proxy will start with nodeIP 127.0.0.1, causing LoadBalancer service with local externalTrafficPolicy to fail (at least with MetalLB) : https://github.com/danderson/metallb/issues/287

I saw this issue in rancher 2.3.3.
This doesn't happen in 2.3.2.

Was this page helpful?
0 / 5 - 0 ratings