Kubeadm: Using ip:port in kubeadm join command to render kubelet config and kube-proxy on node please

Created on 19 Jan 2018 · 48Comments · Source: kubernetes/kubeadm

FEATURE REQUEST

What happened?

kubeadm join node not using the ip:port in join command. I want to using the LB ip and port to join node.

master0 1.1.1.1:6443
master1 2.2.2.2:6443
LB 3.3.3.3:6443

using kubeadm join 3.3.3.3:6443 ... but kubelet config and kube-proxy config may also be master0 ip or master1 ip, this behaviour is not expected in HA.

What you expected to happen?

I want kubeadm render configs using ip port in kubeadm join command.

Anything else we need to know?

Now I need change the node kubelet config and kubeproxy config manually

areUX help wanted kinbug lifecyclstale prioritimportant-longterm

Source

fanux

👍1

Most helpful comment

Folks, the interpretation of the API server endpoint command line argument during join is misleading. In truth, this is used only during bootstrap and for nothing else. And even so, it's used only during token based discovery. It will be ignored (without even a single warning) with kubeconfig based discovery.

So there are really a couple of problems here:

Bootstrap token based discovery API server endpoint is having a misleading UX. My best bet is to deprecate the standalone argument way of supplying this and introduce a descriptive command line switch for this (something like --discovery-token-apiserver). The supplied value then goes to joinCfg.Discovery.BootstrapToken.APIServerEndpoint.

If someone wishes to overwrite the actual API Server on a per node basis, we may have to modify the config (probably add a field in NodeRegistrationOptions and/or possibly command line switch?).
Not persisting it has the potential to break something on subsequent kubeadm runs (such as on upgrade) so we may need to store it as an annotation too.

rosti on 15 Feb 2019

👍2 ❤1

All 48 comments

/assign @liztio

timothysc on 7 Apr 2018

See #598. Clearly a bug, not a feature request, based on logging output by kubeadm.

jethrogb on 10 Apr 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 1 Oct 2018

/remove-lifecycle stale

jethrogb on 1 Oct 2018

/assign @rdodev

could you verify that this still exists for 1.12 and assign 1.13 milestone if you can repro given all our shuffling

/kind bug

timothysc on 11 Oct 2018

598 has easy repro steps

jethrogb on 11 Oct 2018

Finally getting around this.

/lifecycle active

rdodev on 16 Oct 2018

@timothysc was unable to replicate in 1.12

root@ip-10-0-0-43:~#  kubeadm join 10.0.0.106:6000 --token nwoa2x.cqar2ndxrtnw9arc --discovery-token-ca-cert-hash sha256:d993ceed705830e8a10fcf2cb29d7c2030217039c6ebafcfb2766dceb45ed885
[preflight] running pre-flight checks
    [WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs_rr ip_vs_wrr ip_vs_sh ip_vs] or no builtin kernel ipvs support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{}]
you can solve this problem with following methods:
 1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support

[discovery] Trying to connect to API Server "10.0.0.106:6000"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.0.106:6000"
[discovery] Requesting info from "https://10.0.0.106:6000" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.0.0.106:6000"
[discovery] Successfully established connection with API Server "10.0.0.106:6000"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.12" ConfigMap in the kube-system namespace
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[preflight] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "ip-10-0-0-43" as an annotation

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

root@ip-10-0-0-106:~# kubectl get nodes
NAME            STATUS     ROLES    AGE 
ip-10-0-0-106   NotReady   master   3m37s 
ip-10-0-0-43    NotReady   <none>   86s

rdodev on 17 Oct 2018

@rdodev I reproduced it last week on 1.12. Why do you think kubeadm is actually connecting to 10.0.0.106:6000?

jethrogb on 17 Oct 2018

@jethrogb firewall rules. In the repro steps you linked they're forcing via iptables.

rdodev on 17 Oct 2018

@jethrogb
```
root@ip-10-0-0-43:~# kubeadm join 10.0.0.106:6443 --token nwoa2x.cqar2ndxrtnw9arc --discovery-token-ca-cert-hash sha256:d993ceed705830e8a10fcf2cb29d7c2030217039c6ebafcfb2766dceb45ed885
[preflight] running pre-flight checks
[discovery] Trying to connect to API Server "10.0.0.106:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.0.106:6443"
[discovery] Failed to request cluster info, will try again: [Get https://10.0.0.106:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 10.0.0.106:6443: connect: connection refused]
[discovery] Failed to request cluster info, will try again: [Get https://10.0.0.106:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 10.0.0.106:6443: connect: connection refused]
^C
root@ip-10-0-0-43:~# kubeadm join 10.0.0.106:6000 --token nwoa2x.cqar2ndxrtnw9arc --discovery-token-ca-cert-hash sha256:d993ceed705830e8a10fcf2cb29d7c2030217039c6ebafcfb2766dceb45ed885
[preflight] running pre-flight checks
[discovery] Trying to connect to API Server "10.0.0.106:6000"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.0.106:6000"
[discovery] Requesting info from "https://10.0.0.106:6000" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.0.0.106:6000"
[discovery] Successfully established connection with API Server "10.0.0.106:6000"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.12" ConfigMap in the kube-system namespace
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[preflight] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "ip-10-0-0-43" as an annotation

This node has joined the cluster:```

rdodev on 17 Oct 2018

testuser@ali0:~$ sudo kubeadm join 10.198.0.221:6443 --token cykhjx.3kabrvhgdkwohqz5 --discovery-token-ca-cert-hash sha256:c2a5e209423b6dd23fe865d0de7a62e42a3638ae40b243885545e4b5152564db --ignore-preflight-errors=SystemVerification
[preflight] running pre-flight checks
    [WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs_sh ip_vs_rr ip_vs_wrr] or no builtin kernel ipvs support: map[ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{} ip_vs:{} ip_vs_rr:{}]
you can solve this problem with following methods:
 1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support

[discovery] Trying to connect to API Server "10.198.0.221:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.198.0.221:6443"
[discovery] Failed to request cluster info, will try again: [Get https://10.198.0.221:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 10.198.0.221:6443: connect: connection refused]
^C
testuser@ali0:~$ sudo kubeadm join 10.198.0.221:6000 --token cykhjx.3kabrvhgdkwohqz5 --discovery-token-ca-cert-hash sha256:c2a5e209423b6dd23fe865d0de7a62e42a3638ae40b243885545e4b5152564db --ignore-preflight-errors=SystemVerification
[preflight] running pre-flight checks
    [WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs_sh ip_vs_rr ip_vs_wrr] or no builtin kernel ipvs support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{}]
you can solve this problem with following methods:
 1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support

[discovery] Trying to connect to API Server "10.198.0.221:6000"
[discovery] Created cluster-info discovery client, requesting info from "https://10.198.0.221:6000"
[discovery] Requesting info from "https://10.198.0.221:6000" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.198.0.221:6000"
[discovery] Successfully established connection with API Server "10.198.0.221:6000"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.12" ConfigMap in the kube-system namespace
Get https://10.198.0.221:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.12: dial tcp 10.198.0.221:6443: connect: connection refused

jethrogb on 17 Oct 2018

On the master:

$ sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get cm -n kube-public cluster-info -oyaml
apiVersion: v1
data:
  jws-kubeconfig-cykhjx: eyJhbGciOiJIUzI1NiIsImtpZCI6ImN5a2hqeCJ9..BiYLnM2uq2lehUOez8n0tBuMqkErikP0ULsGzyAf_go
  kubeconfig: |
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRFNE1UQXhOakl6TVRNME5sb1hEVEk0TVRBeE16SXpNVE0wTmxvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTXNPCkN3OVpVazZJSTVBTjJSUVVyNmRlM1dpMmhOM2hUVSt5aEhCZVMrU0ttQUFPVkp0SmxSTHMwa0c0eXBSb3pENXIKQUphOVRaSi9XOFhLTWdIOUR3ckdHWC9OUzRVRzNoNXdyME5xMlBxeVVqMGZETUNBR2d2MGc3NlNGaTlCWGcrcwoyaEFmOEl5UFlOZ2F1WXFvMUttdjdleXVHUmp2Z2JnU1J2WVIwZWVWYkhxWTIvdlA3T2RBeXRBcytKcGFTS28zCmpVZTR3dGtEcTYralo4ZnlnUS9EbkkwY0pRK1pMaUVIS0d0T2JscnRNWlRxS0RsTXVQd0Y4TE4yQ1kyZUh1WUgKaTM3cUgxMHp1SmlQZXBmOXdVdzc1QkR3eUNlVTVTbUJWUFo0b2xJT3c3ZW5JdDhoNGVpWTlOSklDbHdPNUhDWApaWG0xYmp6L0FKdEhoejg5QXFVQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFBSzlGRkg5eDB0RXhaTGREWjkzQm4walYzTnEKRWl5VzBmUTVpOHBBdlBVV3dMUVd1dkpuM1BLSUtTcjlpZGdwSUthUk1FKzMyRGRXQzVvZDIyakdBQ1REdjBjdAoxbFBSM3RBSTAwQnY2bS9BM09NQVRqY1JKd1hhL0ZHMDdRMU1sbkxibGhXMTlqaVMxQU9ycjRGZ2l1Z3VJQy9uCmd0UWZ3ZHJqTEhZSDY1KzJPRGtnWldNVjBqbjdpZlNMdnlpamJjRUttVXpSZm44T0hmYldWNXRMd2dRN295dHYKRE5tWHdkRkc3WFh3MVZVZjJKQkhDVGJHNndVU1diVFRPbzB1NnJLazJQakZoKzU5QVl4R2I1Ynp4N2thTW8xZwpYZktrUVVWSVcxaGZhelpSUHYzbWEzTmNsSis0R3VIMGc2OThvaEpHZGFkVHpXNmx2WnhoUW9NKzgycz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
        server: https://10.198.0.221:6443
      name: ""
    contexts: []
    current-context: ""
    kind: Config
    preferences: {}
    users: []
kind: ConfigMap
metadata:
  creationTimestamp: 2018-10-16T23:14:15Z
  name: cluster-info
  namespace: kube-public
  resourceVersion: "288"
  selfLink: /api/v1/namespaces/kube-public/configmaps/cluster-info
  uid: 3318106a-d199-11e8-b21c-54e1ad024614

jethrogb on 17 Oct 2018

@jethrogb don't know what to tell you. This is a mint fresh install on fresh infra.

root@ip-10-0-0-106:~# KUBECONFIG=/etc/kubernetes/admin.conf kubectl get cm -n kube-public cluster-info -oyaml apiVersion: v1 data: jws-kubeconfig-nwoa2x: eyJhbGciOiJIUzI1NiIsImtpZCI6Im53b2EyeCJ9..Be2U7ch__XzQ7em8vLEw8WAX6dQZeeLXaKVjh_a7YYA kubeconfig: | apiVersion: v1 clusters: - cluster: certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRFNE1UQXhOakl5TXpNME4xb1hEVEk0TVRBeE16SXlNek0wTjFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBS2cxCjREbzhNbGtBSVZJM29xem9XK2trbUtmYjIyOGFLd1FzaXJsSTNMN2F1QnlrWC9JaEk0Tm9UYkZmMFpXbEdkRTYKUlVJNFdUZml1L2RqWXJqZG9YM2pZcGtxRERmTm5KNWxteGkzUStwbmVmM3hTWGtEbTNEOXFadWV0R0JXRTZzRwppNHIycUZxSmRnS21MMCswdnlXNmhkRUNUY1VwdFFTSzkzQmUxTzBMQnFRa1BLd0I0QjQ3Z3d6bGtSOFpaeTAyCm1zN1IvaE9lK0h5NEl2c0FQTmFQbHBpVFhQRyt5d2lLMkoxcXJBb0hzUDhNelNhdzN3OHB4bkJmc2V2NmErYWsKZm42b1p3QVJibi9yTDRNbHJaSlNpWC8vVEdvWTN5YlZYZ2lDWWVzMHNZQWR6T1Q3Sjl2VDBzYkRHK0Z2STFTYQpha05WUDJwdVNkdlhvcmtoc1JFQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFEbHJ5eklBRExwbUVEaURxdXhIdURPb2hmS0sKbVhMeVc4SkllYXFzT0k0cGd0aDJITDRJcG4vbm14VWF3bVh4SVB4YWc3N2I1cXZHcm5YTXN6SHd4WUp2SnJ0cgpJU2VyOVdvSmpuY0xVUnhkeTVBb3ZMWFZYZ3Y2S1dHVlFnMkt2dXpmNGMyL1ZVN09jNnpQMlRhNVJJaHgrcVU2CnBSeWN5Q2RJOUdaMUFpN0JSSTd1M3VtUjRiT3BhckpMaVRvZ2hsMGNDTlBDRDBhZ2dlNHBGemxSd0VLbEpINmMKMmgzcGFxZ0dQUU5YY1ZzcGdtbTgvQ2JvbFVta1d1RjZRTm1KemxuK2tUdlhkRTJiY3NkSUxyeU5Nb0J0L2paUQpoaVZxTnhBVWVuV1hEVk8wVnd5ZXRxY3crL2ZGb05jZndUL1FERXduQXpJd29SM3FHdUZXVk1aQllVZz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo= server: https://10.0.0.106:6000 name: "" contexts: [] current-context: "" kind: Config preferences: {} users: [] kind: ConfigMap metadata: creationTimestamp: 2018-10-16T22:34:14Z name: cluster-info namespace: kube-public resourceVersion: "314" selfLink: /api/v1/namespaces/kube-public/configmaps/cluster-info uid: 9c0579c2-d193-11e8-b95c-026da1fc2270

rdodev on 17 Oct 2018

@rdodev That cluster-info looks like it was modified from the default. Using it will not reproduce the issue. What was your kubeadm init command?

jethrogb on 17 Oct 2018

kubeadm init --config kudeadm.yaml

root@ip-10-0-0-106:~# cat kudeadm.yaml
apiVersion: kubeadm.k8s.io/v1alpha3
kind: InitConfiguration
apiEndpoint:
  advertiseAddress: "10.0.0.106"
  bindPort: 6000

rdodev on 17 Oct 2018

So you're initializing it with port 6000. That's not going to work for reproducing this issue. This issue is about not being able to join with a master at an address/port that's different than what the master was originally configured with.

jethrogb on 17 Oct 2018

@jethrogb what's the use case, please. Why would you want to join a control plane at a different address/port than what the control plane was brought online with?

rdodev on 17 Oct 2018

Here's a couple...

Master changed IP address.
Used to have single master, but now changed master to be HA. Future joins will want to use LB IP address.
Master is behind a NAT, not all nodes use the same IP to connect to the master.

The main problem is that you tell kubeadm join what IP/port to use, it says it's going to use that address, but it doesn't!

jethrogb on 17 Oct 2018

@jethrogb

In case 1) you need to re-issue the certs no other way around it. Certs are generated for the host name and ip address of the control plane upon init. Can be mitigated by issuing certs to a DNS name on init.

In case 2) the process of going from single-control plane to HA requires you to re-issue all relevant certs (specially to add the LB dns name)

I'm not sure 3) is a supported configuration at the moment.

rdodev on 17 Oct 2018

I'm not sure what any of this has to do with certificates.

jethrogb on 17 Oct 2018

@jethrogb first, even if you could connect (somehow), it would fail on certificate mismatch as explained above.

Secondly, it's not really a bug as this isn't a supported use case. AFICT it's not that it was once working and now isn't. It just not a feature that exists and is broken.

rdodev on 17 Oct 2018

@timothysc I don't think this item and #598 are the same/dupes. This one (#664) we know it works because that's how we tested HA in 1.11 and 1.12

598 deserves a look as a feature request.

rdodev on 17 Oct 2018

👎1

@timothysc ^^

tallclair on 17 Oct 2018

@tallclair haha I knew this was bound to happen E_TOO_MANY_TIMS_IN_K8S

rdodev on 17 Oct 2018

@jethrogb first, even if you could connect (somehow), it would fail on certificate mismatch as explained above.

This has nothing to do with certificates. Even if the certificate presented by the API server at the address that the user specified in kubeadm join checks out, kubeadm join will then (contrary to its claims on the terminal) connect to a totally different address for the last step of the join procedure.

Secondly, it's not really a bug as this isn't a supported use case. AFICT it's not that it was once working and now isn't. It just not a feature that exists and is broken.

I'm not sure I understand what you're saying. Are you saying that a tool messaging to the user that it's going to do one thing while it's doing another thing is not a bug?

jethrogb on 17 Oct 2018

This has nothing to do with certificates. Even if the certificate presented by the API server at the address that the user specified in kubeadm join checks out, kubeadm join will then (contrary to its claims on the terminal) connect to a totally different address for the last step of the join procedure.

To explain why certificates matter, let's see in the case that the ip of the control plane changes, this happens:

root@ip-10-0-0-43:~# kubeadm join 34.220.204.xxx:6000 --token nwoa2x.cqar2ndxrtnw9arc 
[preflight] running pre-flight checks
[discovery] Trying to connect to API Server "34.220.204.xxx:6000"
[discovery] Created cluster-info discovery client, requesting info from "https://34.220.204.xxx:6000"
[discovery] Requesting info from "https://34.220.204.xxx:6000" again to validate TLS against the pinned public key
[discovery] Failed to request cluster info, will try again: [Get https://34.220.204.xxx:6000/api/v1/namespaces/kube-public/configmaps/cluster-info: x509: certificate is valid for 10.96.0.1, 10.0.0.106, not 34.220.204.xxx]

Same would apply to case 2) if you're going from single control plane to HA. Simply trying to point out those use cases you mentioned above aren't supported by kubeadm w/o re-generating/swizzling certs in the control plane nodes. So the scope here is only to port forwarding.

I'm not sure I understand what you're saying. Are you saying that a tool messaging to the user that it's going to do one thing while it's doing another thing is not a bug?

I'm thinking we should re-open discussion your original issue because it's not pertinent to this particular issue. Let's wait for @timothysc 's input how to proceed.

rdodev on 17 Oct 2018

@jethrogb So if I'm reading this correctly you changed the ip:port after the initial init, but the join is not using the command line override, it's using what was stored in the kubeadm conf which is stored on cluster.

It's a weird use case, but the cmd line args appear to not be overriding @rdodev in this use case. It appears to check 80-90% of the way, then use the config for connection.

timothysc on 30 Oct 2018

Yeah that's accurate

jethrogb on 30 Oct 2018

So there are really a couple of problems here:

Bootstrap token based discovery API server endpoint is having a misleading UX. My best bet is to deprecate the standalone argument way of supplying this and introduce a descriptive command line switch for this (something like --discovery-token-apiserver). The supplied value then goes to joinCfg.Discovery.BootstrapToken.APIServerEndpoint.

If someone wishes to overwrite the actual API Server on a per node basis, we may have to modify the config (probably add a field in NodeRegistrationOptions and/or possibly command line switch?).
Not persisting it has the potential to break something on subsequent kubeadm runs (such as on upgrade) so we may need to store it as an annotation too.

rosti on 15 Feb 2019

👍2 ❤1

Bootstrap token based discovery API server endpoint is having a misleading UX. My best bet is to deprecate the standalone argument way of supplying this and introduce a descriptive command line switch for this (something like --discovery-token-apiserver). The supplied value then goes to joinCfg.Discovery.BootstrapToken.APIServerEndpoint.

If someone wishes to overwrite the actual API Server on a per node basis, we may have to modify the config (probably add a field in NodeRegistrationOptions and/or possibly command line switch?).
Not persisting it has the potential to break something on subsequent kubeadm runs (such as on upgrade) so we may need to store it as an annotation too.

IMO we shouldn't overwrite this on join time, maybe we can have a command to update the Api Server Endpoint on cluster-info configmap

yagonobre on 15 Feb 2019

I wasn't clear actually. By "modify the config" I meant to actually add a new field somewhere in the next config format (possibly v1beta2) and probably persist it somewhere in the cluster (node annotation?).
This needs some discussing though and probably won't happen in the current cycle (especially if we go along the "adding a config option" way).

What we can certainly do in this cycle is to add a command line switch for the bootstrap token discovery API server and deprecate supplying it as an anonymous argument.

@neolit123 @fabriziopandini WDYT?

rosti on 15 Feb 2019

What we can certainly do in this cycle is to add a command line switch for the bootstrap token discovery API server and deprecate supplying it as an anonymous argument.

Tim and Fabrizio sort of disagreed.
but i'm all +1 for running the GA deprecation policy on that arg.

it's nothing but trouble.

neolit123 on 15 Feb 2019

@neolit123 even if we don't go on the command line switch track, we can actually do a better job in documenting the arg, both in the inline tool help (kubeadm join --help) and in the website docs.
I assume, that better docs can (sort of) "fix" the problem too and that this can be done as part of the join phases docs write up.

rosti on 18 Feb 2019

👍1

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 29 Jul 2019

reading trough the whole ticket there are multiple issues that are either misunderstandings, covered elsewhere or not documented.

the original issue:

the original issue talks about:

but kubelet config and kube-proxy config may also be master0 ip or master1 ip, this behaviour is not expected in HA.

i do not understand this problem, so if you feel this is still viable to 1.13 (1.12 is not supported anymore) please log a new ticket with full details and examples.
this ticket deviated a lot.

the misleading anonymous join argument:

this was tracked in here:
https://github.com/kubernetes/kubeadm/issues/1375

we were planning to switch to a named argument for discovery e.g. --discovery-endpoint but opted out of this idea and you are going to have to continue using kubeadm join addr:ip

transitioning from single control-plane to HA:

was tracked here:
https://github.com/kubernetes/kubeadm/issues/1664

a PR for this topic just merged in our docs:
https://github.com/kubernetes/website/pull/15524

the TL;DR is that if you modify the address of the api-server on a control-plane node after the cluster is created you need to regenerate certificates and patch cluster-objects. to better transition to HA use DNS names.

if there is something else at play, please create clean tickets with full details so that the maintainers can grasp the problem.

neolit123 on 3 Aug 2019

@neolit123 #598 got merged into this but it's not clear to me if it is resolved.

jethrogb on 26 Aug 2019

@jethrogb

598 got merged into this

if your problem from #598 is still viable, please reopen the issue but please mind that it has to be reproducible with 1.13+, because older versions are outside of the support skew and are not supported by the kubeadm team.

neolit123 on 26 Aug 2019

@neolit123 I have a Kubernetes control plane node running inside a Docker container via https://github.com/kubernetes-sigs/kind. The api server is exposed on the Docker host via port forwarding. I need to add a worker node in the Docker host's network to the cluster.

Obviously the IP address and hostname of the container running the kubelet and Docker host where the API is exposed via port forwarding differ, so we run into the problems being described in this issue. For one thing, when someone reaches the master node's API via the forwarded port on the Docker host, the IP address does not match the certificate. This is easy to fix: we can just add the Docker host's IP to certificateSANs when using kubeadm to deploy the cluster.

The other problem (which is harder to solve) is that when we try to join the worker node to the cluster we need to consistently override the API endpoint being used to reach the master node (i.e. it should use the Docker host's IP everywhere, and not the internal IP address of the Docker container, to which it has no access).

As far as I understand there's still no way to do this, or at least I can't see one from looking at the flags for kubeadm join, and making the position argument serve this purpose is what this issue was asking for (admittedly I didn't fully understand the counterargument against this). Am I missing something?

masaeedu on 29 Jul 2020

The other problem (which is harder to solve) is that when we try to join the worker node to the cluster we need to consistently override the API endpoint being used to reach the master node (i.e. it should use the Docker host's IP everywhere, and not the internal IP address of the Docker container, to which it has no access).

this seems like an unsupported scenario by kind.
did you get feedback from the kind maintainers (or #kind on k8s slack)?

As far as I understand there's still no way to do this, or at least I can't see one from looking at the flags for kubeadm join, and making the host:port position argument serve this purpose is what this issue was asking for (admittedly I didn't fully understand the counterargument against this). Am I missing something?

the OP of this issue was confusing and i don't think it's related to your problem.

neolit123 on 29 Jul 2020

@neolit123 The important question is whether it's a supported scenario by kubeadm; kind just wraps kubeadm and a container runtime. I can think of a number of other scenarios that don't involve kind where a kubernetes control plane node's API port is forwarded somewhere else, and the worker node must be registered to it in a network where the original control plane address is not accessible. E.g. using an SSH tunnel, or a TCP reverse proxy.

masaeedu on 29 Jul 2020

kubeadm join needs an kube-apiserver endpoint to perform discovery and Node bootstrap.
that kube-apiserver could be anywhere - same network or another network and kubeadm does support those cases.
the endpoint can be load balancer endpoint too.

that endpoint is then written on a worker node's kubelet.conf file that is used to communicate to the API server

you can omit the positional argument completely from kubeadm join and use JoinConfiguration's Discovery field.

The other problem (which is harder to solve) is that when we try to join the worker node to the cluster we need to consistently override the API endpoint being used to reach the master node (i.e. it should use the Docker host's IP everywhere, and not the internal IP address of the Docker container, to which it has no access).

this is seems like a problem of the high-level software that uses kubeadm (e.g. kind).
the high level software is not executing kubeadm join with the endpoint you desire.

The important question is whether it's a supported scenario by kubeadm; kind just wraps kubeadm and a container runtime. I can think of a number of other scenarios that don't involve kind where a kubernetes control plane node's API port is forwarded somewhere else, and the worker node must be registered to it in a network where the original control plane address is not accessible. E.g. using an SSH tunnel, or a TCP reverse proxy.

if a kube-apiserver is not accessible, kubeadm cannot join this new Node to the cluster. period.
kubeadm join needs a valid endpoint to which a k8s client can connect to to perform discovery and validation, which then would lead to TLS bootstrap and the creation of a new Node object.

so yes, kubeadm join does need a valid / reachable API server endpoint.

neolit123 on 29 Jul 2020

the high level software is not executing kubeadm join with the endpoint you desire.

It's not kind that's executing kubeadm join, it's me. I'm executing kubeadm join manually, providing the address of the Docker host where the API is exposed via port forwarding (note that this does not match the --control-plane-endpoint that was used to start the control plane node itself; that address is not accessible to the worker node).

The problem is that the address I provide to kubeadm join is not used consistently throughout the join process: it is only used in the initial stages, after which the process fails because at some point the worker node downloads configuration from the the control plane API, and then starts using the original, inaccessible address corresponding to the --control-plane-endpoint argument that was used to start the control plane node.

masaeedu on 29 Jul 2020

if a kube-apiserver is not accessible, kubeadm cannot join this new Node to the cluster. period.

The kube-apiserver is accessible via port forwarding. It is not accessible at the original address that was specified using --advertise-addr or --control-plane-endpoint when kubeadm init was used, because that address is a function of the network in which the control plane node itself is running, and not necessarily of the network in which the joining worker is running.

masaeedu on 29 Jul 2020

please log a separate issue and provide IP addresses and concrete examples of your setup.

neolit123 on 29 Jul 2020

👍1

@neolit123 it's not clear to me why yet another issue is needed. This issue has already been reported several times over the past several years and it's the same problem every time: you run kubeadm join ADDRESS and at some point ADDRESS (which works) is swapped out for something else (which doesn't).

jethrogb on 29 Jul 2020

let's start in a fresh issue to see: