Cluster-api: kubeadm bootstrap provider does not take into account Cluster APIServerPort

Created on 29 Dec 2019  路  22Comments  路  Source: kubernetes-sigs/cluster-api

What steps did you take and what happened:

  • Create a cluster with a non-default value for cluster.Spec.ClusterNetwork.APIServerPort (e.g. 6444 instead of 6443)
  • Create a controller Node using the kubeadm bootstrap provider
  • The generated cloud-init ignores the APIServerPort setting and uses the default 6443 (or whatever is specified in the controller KubeadmConfig).

What did you expect to happen:

For the specified apiServer port to be used.

Anything else you would like to add:

I am not sure this is a bug, just seems like strange behaviour - as an infrastructure provider I would need to parse the resulting cloud-init or otherwise to work out which port to forward connections to or health check, so I would expect the kubeadm bootstrap provider to override the default or whatever is configured in the KubeadmConfig with the value given by the cluster.

Environment:

  • Cluster-api version: 0.2.8
  • kubeadm bootstrap provider version: 0.1.5
  • Minikube/KIND version: None (using kubernetes infrastructure provider on GKE)
  • Kubernetes version: (use kubectl version): 1.16.3
  • OS (e.g. from /etc/os-release): Container Optimised Linux for GKE cluster (kind images for cluster-api Nodes)

/kind bug

arebootstrap kinbug lifecyclfrozen

Most helpful comment

This should also take into account the spec.livenessProbe.port field as well. It defaults to 6443 and if you change APIServerPort you need to ensure spec.livenessProbe.port is updated as well or your container will just get endlessly rebooted.

In fact, I just hit this bug and have had to resort to the hackiest of hacks to fix this:

spec:
  kubeadmConfigSpec:
    postKubeadmCommands:
    - chmod +x /tmp/kube-apiserver-quickfix.sh && /tmp/kube-apiserver-quickfix.sh
    files:
    - path: /tmp/kube-apiserver-overlay.yaml
      content: |
        apiVersion: v1
        kind: Pod
        spec:
          livenessProbe:
            httpGet:
              port: 443
    - path: /tmp/kube-apiserver-quickfix.sh
      content: |
        # Stop kube-apiserver and edit YAML
        mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/kube-apiserver.yaml
        # Patch the YAML, super hacky :).
        echo "Started patching kube-apiserver"
       cd /tmp && docker run --rm -v "${PWD}":/workdir mikefarah/yq yq m -i --overwrite --autocreate=false /tmp/kube-apiserver.yaml /tmp/kube-apiserver-overlay.yaml
        # Sleep to let container download and run
        sleep 30 && mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/kube-apiserver.yaml
        echo "Finished patching kube-apiserver"

All 22 comments

@dippynark It is generally expected that the infrastructure provider provides some type of persistent API endpoint for the cluster (for example, the AWS provider uses an ELB). This is the reason why the bootstrap provider doesn't currently take into account the value of cluster.Spec.ClusterNetwork.APIServerPort.

@detiber so APIServerPort should be the port exposed by the ELB in that scenario? cluster_types.go is saying it's the bind port.

Or you mean it doesn't matter that the bootstrap provider ignores it because the persistent API endpoint can use a different port if necessary?

We should probably take a closer look at this after the holidays.
/cc @ncdc @vincepri

/cc

Going forward, Cluster.Spec.ControlPlaneEndpoint.Port is what users should use to contact their cluster. This essentially means it is the port for the cluster's load balancer, assuming you have one.

I wonder if we still need Cluster.Spec.ClusterNetwork.APIServerPort?

Cluster.Spec.ControlPlaneEndpoint.Port is meant to be set only by a controller, given that we copy it from the infrastructure provider on each reconciliation loop.

The other variable was meant to give a way for users to specify which port they wanted [the load balancer] to be exposed. It was a conversation had long times ago, but we can probably fix the meaning in this release.

Cluster.Spec.ControlPlaneEndpoint.Port is meant to be set only by a controller, given that we copy it from the infrastructure provider on each reconciliation loop.

Long-term, my vision is that this doesn't come from a cluster infra provider, but instead comes from the MachineLoadBalancer. And there will be a place in the MLB spec for the user to specify the desired port. Which means we wouldn't need Cluster.Spec.ClusterNetwork.APIServerPort

That's fair, until then though I'd not remove functionality that's being used / we added after a bug report, unless there is an alternative ready. wdyt?

I'd generally agree, although I can't find any code that actually uses Cluster.Spec.ClusterNetwork.APIServerPort?

Nevermind, bad searching

So infrastructure providers _may_ use Cluster.Spec.ClusterNetwork.APIServerPort to inform which port to use for the provider's load balancer. Otherwise, this field is 100% unused by core cluster-api.

Correct, that's the current intent for this field

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale
@dippynark any updates聽or action items on this issue?

/area bootstrap

@vincepri IMO it makes sense to either remove the cluster.Spec.ClusterNetwork.APIServerPort field and always use the default 6443 serving port, or change the kubeadm bootstrap provider to respect this field (i.e. configuring the apiserver to serve on whatever is specified) as well as requiring all infrastructure providers to respect it (i.e. configuring the target port of the apiserver loadbalancer to be what is specified).

I'd prefer to add support to respect the field, but that requires a contract amendment. We should have more clear documentation on what we do today

This should also take into account the spec.livenessProbe.port field as well. It defaults to 6443 and if you change APIServerPort you need to ensure spec.livenessProbe.port is updated as well or your container will just get endlessly rebooted.

In fact, I just hit this bug and have had to resort to the hackiest of hacks to fix this:

spec:
  kubeadmConfigSpec:
    postKubeadmCommands:
    - chmod +x /tmp/kube-apiserver-quickfix.sh && /tmp/kube-apiserver-quickfix.sh
    files:
    - path: /tmp/kube-apiserver-overlay.yaml
      content: |
        apiVersion: v1
        kind: Pod
        spec:
          livenessProbe:
            httpGet:
              port: 443
    - path: /tmp/kube-apiserver-quickfix.sh
      content: |
        # Stop kube-apiserver and edit YAML
        mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/kube-apiserver.yaml
        # Patch the YAML, super hacky :).
        echo "Started patching kube-apiserver"
       cd /tmp && docker run --rm -v "${PWD}":/workdir mikefarah/yq yq m -i --overwrite --autocreate=false /tmp/kube-apiserver.yaml /tmp/kube-apiserver-overlay.yaml
        # Sleep to let container download and run
        sleep 30 && mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/kube-apiserver.yaml
        echo "Finished patching kube-apiserver"

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/lifecycle frozen

/assign

To re-triage

/milestone v0.4.0

Was this page helpful?
0 / 5 - 0 ratings