Cluster-api: Using external etcd can not update kcp status

Created on 24 Jun 2020 · 16Comments · Source: kubernetes-sigs/cluster-api

What steps did you take and what happened:

I use external etcd for kcp and refer to the document https://github.com/kubernetes-sigs/cluster-api/blob/6dc38f9b64691822ccbefc8c08ae4a49b474f602/bootstrap/kubeadm/docs/external-etcd.md.
There is an error that the etcd tls.key could not be obtained when the KCP controller updated the status.
Then I add tls.key for CLUSTER-NAME-etcd but I get the error is that etcd unhealthy.
I see kcp controller call the function updateStatus and updateStatus will call GetWorkloadCluster, GetWorkloadCluster will use etcd tls.key and tls.crt to get etcd client, then the client will be used for get the etcd pods in cluster.
So can I think that cluster-api does not support external-etcd yet.

What did you expect to happen:
I expect the kcp status can update but I can not find the method for external etcd or my understanding is wrong.

Anything else you would like to add:

Environment:

Cluster-api version: v0.3.5
Minikube/KIND version:
Kubernetes version: (use kubectl version): v1.17.4
OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

help wanted kinbug lifecyclactive

Source

zanghao2

All 16 comments

Hi @zanghao2 and thanks for filing this issue, the original implementation and proposal for KubeadmControlPlane was meant to be used only with stacked etcd.

From the proposal:

To manage etcd clusters in any topology other than stacked etcd (externally managed etcd clusters can still be leveraged).

This requirement was meant to reduce the support scope for KCP and to allow the controller to fully manage a control plane whole lifecycle. For example, today KCP during an upgrade will make sure to properly remove, add, and upgrade etcd member as new control plane machines join the cluster.

I'm not sure if we have any plans to support external etcd deployments in the future, we've talked about using etcdadm in the past to avoid managing etcd directly, but we'll need an extensive design to start the discussion.

Hope this helps, cc @detiber and @randomvariable as well

vincepri on 24 Jun 2020

There may be a bug in KCP attempting to check the etcd status even when it shouldn't.
@zanghao2, could you provide a sample of your kubeadmcontrolplane configuration, and ideally if you can get the logs of the cluster api controllers, that would also help.

randomvariable on 24 Jun 2020

@vincepri we never intended to remove support for external etcd when specified in the kubeadm configuration that is passed to KCP, we likely just need to short-circuit some of the health checking that we are doing when external etcd is configured

detiber on 24 Jun 2020

Might be worthwhile to amend the proposal with some clarification, that wasn't immediately clear it's/should be supported

vincepri on 24 Jun 2020

@vincepri It's called out specifically in the goals section: "To support pre-existing, user-managed, external etcd clusters", and the scenario is also called out in the behavioral sections as it applies to the various operations

detiber on 24 Jun 2020

👍1

Got it, the non-goal threw me off, but I guess it does say managed :D

vincepri on 24 Jun 2020

/help

CecileRobertMichon on 24 Jun 2020

@CecileRobertMichon:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 24 Jun 2020

/milestone v0.3.x

vincepri on 24 Jun 2020

My kcp configuration is as follows: @randomvariable

kind: KubeadmControlPlane
metadata:
  name: account-test-control-plane
  namespace: cluster-test
spec:
  infrastructureTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: AWSMachineTemplate
    name: account-test-control-plane
  kubeadmConfigSpec:
    clusterConfiguration:
      etcd:
        external:
          endpoints:
          - https://etcd01.demo-cluster.test:2379
          - https://etcd02.demo-cluster.test:2379
          - https://etcd03.demo-cluster.test:2379
          caFile: /etc/kubernetes/pki/ca.pem
          certFile: /etc/kubernetes/pki/etcd.pem
          keyFile: /etc/kubernetes/pki/etcd-key.pem
      certificatesDir: /etc/kubernetes/pki
      imageRepository: google_containers
      dns:
        type: CoreDNS
      apiServer:
        extraArgs:
          service-account-signing-key-file: /etc/kubernetes/pki/sa.key
          service-account-issuer: kubernetes.default.svc
          service-account-key-file: /etc/kubernetes/pki/sa.pub
          cloud-provider: aws
          feature-gates: "APIResponseCompression=true,DynamicAuditing=true,,LocalStorageCapacityIsolationFSQuotaMonitoring=true,QOSReserved=true,SCTPSupport=true,ServiceNodeExclusion=true,BoundServiceAccountTokenVolume=true,NonPreemptingPriority=true,BalanceAttachedNodeVolumes=true,APIPriorityAndFairness=true"
          authorization-mode: "Node,RBAC"
          runtime-config: authentication.k8s.io/v1beta1=true
      controllerManager:
        extraArgs:
          feature-gates: "APIResponseCompression=true,DynamicAuditing=true,LocalStorageCapacityIsolationFSQuotaMonitoring=true,QOSReserved=true,SCTPSupport=true,ServiceNodeExclusion=true,BoundServiceAccountTokenVolume=true,NonPreemptingPriority=true,BalanceAttachedNodeVolumes=true,APIPriorityAndFairness=true"
          cloud-provider: aws
          service-account-private-key-file: /etc/kubernetes/pki/sa.key
          flex-volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
      scheduler:
        extraArgs:
          feature-gates: "APIResponseCompression=true,DynamicAuditing=true,LocalStorageCapacityIsolationFSQuotaMonitoring=true,QOSReserved=true,SCTPSupport=true,ServiceNodeExclusion=true,BoundServiceAccountTokenVolume=true,NonPreemptingPriority=true,BalanceAttachedNodeVolumes=true,APIPriorityAndFairness=true"
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
          volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
          feature-gates: "APIResponseCompression=true,DynamicAuditing=true,LocalStorageCapacityIsolationFSQuotaMonitoring=true,QOSReserved=true,SCTPSupport=true,ServiceNodeExclusion=true,BoundServiceAccountTokenVolume=true,NonPreemptingPriority=true,BalanceAttachedNodeVolumes=true,APIPriorityAndFairness=true"
    joinConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
          volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
          feature-gates: "APIResponseCompression=true,DynamicAuditing=true,LocalStorageCapacityIsolationFSQuotaMonitoring=true,QOSReserved=true,SCTPSupport=true,ServiceNodeExclusion=true,BoundServiceAccountTokenVolume=true,NonPreemptingPriority=true,BalanceAttachedNodeVolumes=true,APIPriorityAndFairness=true"
  replicas: 1
  version: v1.15.3

The CLUSTER_NAME-etcd only contains tls.crt, When check kcp status I get the log "etcd tls key does not exist for cluster cluster-demo cluster-test". I check the code in the GetWorkloadCluster，It's try to check external etcd likes staked etcd. Even if I add the corresponding tls.key. when I change replicase of kcp, I will get the error, because the method scaleUpControlPlane uses the logic of stacked etcd to check the status of etcd and try to get pods of etcd. I think it may be necessary to add new methods to handle the logic of external etcd.

zanghao2 on 26 Jun 2020

@zanghao2 I'm still making up my mind around this issue, but I was expecting to get the etcd keys to be passed as a Files into the kubeadmConfigSpec, otherwise, I don't get how those files are going to be placed into the machine.
Could you kindly help me to clarify this point?

fabriziopandini on 6 Jul 2020

ignore my previous comment, the bootstrap provider takes charge of injecting the files on the machines starting from secrets

fabriziopandini on 6 Jul 2020

/assign
/lifecycle active

fabriziopandini on 6 Jul 2020

@fabriziopandini Thanks for your reply. I have fix the code for external etcd, but I do not know the plan for external etcd. In my opinion, I do not care or check the status of external etcd, I just check ca.crt ,clinet.crt,client.key but this idea may not be in line with the idea of cluster-api.

zanghao2 on 8 Jul 2020

@zanghao2 PTAL to the related PR, it will be great to have you opinion on the proposed changes

fabriziopandini on 8 Jul 2020

👍1

@fabriziopandini Thanks, Your code logic is consistent with my code, I hope this pr can be merged.

zanghao2 on 9 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings