Cluster-api: Kubeadm Bootstrap Controller crashes when passed key material via File field

Created on 2 Apr 2020  路  15Comments  路  Source: kubernetes-sigs/cluster-api

/kind bug

What steps did you take and what happened:

  1. I created an external etcd cluster and made a copy of the apiserver-etcd-client.{crt,key} files.
  2. I added the contents of these files to the kubeadmConfigSpec field of the KubeadmControlPlane object in a manifest for a workload cluster.
  3. I applied the workload cluster manifest to the management cluster. After seeing no activity for a few minutes, I checked the logs of the various controllers, and saw a crash in the logs for the Kubeadm bootstrap controller.

What did you expect to happen:
I expected the Kubeadm bootstrap controller to create the files with the specified content at the specified location.

Anything else you would like to add:
Here is the kubeadmConfigSpec field I used (note the private key material has been redacted):

spec:
  infrastructureTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: AWSMachineTemplate
    name: capi-etcd-control-plane
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-provider: aws
      controllerManager:
        extraArgs:
          cloud-provider: aws
          configure-cloud-routes: "false"
      etcd:
        external:
          endpoints:
            - https://ip-10-45-16-154.us-west-2.compute.internal:2379
            - https://ip-10-45-53-148.us-west-2.compute.internal:2379
            - https://ip-10-45-84-141.us-west-2.compute.internal:2379
          caFile: /etc/kubernetes/pki/etcd/ca.crt
          certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
          keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
        name: '{{ ds.meta_data.local_hostname }}'
    joinConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
        name: '{{ ds.meta_data.local_hostname }}'
    files:
      - path: /etc/kubernetes/pki/apiserver-etcd-client.crt
        content: |
          -----BEGIN CERTIFICATE-----
          MIIC+TCCAeGgAwIBAgIIR8jHKnG7IGcwDQYJKoZIhvcNAQELBQAwEjEQMA4GA1UE
          AxMHZXRjZC1jYTAeFw0yMDA0MDEyMzI0MjdaFw0yMTA0MDEyMzI1MDdaMD4xFzAV
          BgNVBAoTDnN5c3RlbTptYXN0ZXJzMSMwIQYDVQQDExprdWJlLWFwaXNlcnZlci1l
          dGNkLWNsaWVudDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBANIe194P
          oqtyvXnM7a3cpTzzWVWL+EEBCvznBRNVgjQ4jMWYKR3Coq71x11zk/dox/6FGS0S
          wwaCBMIvnc7Mh5l9Tr9w4ZQdn5WXed0eqhNq/Eo7L0KZx0W7EtYH5t+ogQ8tVsIl
          4kU3zVAmeBFuSpXz3/JOq9Tbx9qW2NewbYBosiWrm+r+BEyAD9iwY0Melm7wJzMe
          T+vYKwRXhUbPxdFod6x9dHp/bHoxXaQLeVlwjHauvFnc1Q7B4AmoGjpGJ4KR85pM
          z+EAilatQ8yAt57e05yME7hOgO6MFA0CLkOQjmqhFzlLPknBz32oLc/cCftFKlnN
          5YfAutmbtBLYNAsCAwEAAaMnMCUwDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoG
          CCsGAQUFBwMCMA0GCSqGSIb3DQEBCwUAA4IBAQCQ20D/Z0+1yMyoqweetcY9j7gw
          CVj5NI477I/g+TBqIUb47+0VxPKlimKGH9yzcNNU41EAOVX+tbAhORot4YOe5zp0
          VSkJyHt7npI+K+sAtkdUQuC9K1730jCM1XjReuu65vc6dKxdagFAFi0m3EzrjHwb
          /aI4rhL8upszNh6UtQlP9EAoJMSwC8VSNdc0nE3Ta/otQNd+8TJui3MsSa2gIRaA
          zL9Ztl/yH/Gj/2u4nQuXE1iCE/aWZEfguJwb7756GMaVuDSywdB0oY+HTZhbuJyE
          wWKFQ2NF+P3UXMaLlxjMrkttDxENrx47Fh9R1q/hXjr2KA9lE+2N2+HbqEYP
          -----END CERTIFICATE-----
      - path: /etc/kubernetes/pki/apiserver-etcd-client.key
        content: |
          <redacted>
  replicas: 3
  version: v1.17.3

Here is the output of kubectl logs against the bootstrap controller:

I0402 18:47:06.669455       1 listener.go:44] controller-runtime/metrics "msg"="metrics server is starting to listen"  "addr"="127.0.0.1:8080"
I0402 18:47:06.669770       1 main.go:132] setup "msg"="starting manager"  
I0402 18:47:06.670001       1 leaderelection.go:242] attempting to acquire leader lease  capi-kubeadm-bootstrap-system/kubeadm-bootstrap-manager-leader-election-capi...
I0402 18:47:06.670140       1 internal.go:356] controller-runtime/manager "msg"="starting metrics server"  "path"="/metrics"
I0402 18:47:24.066941       1 leaderelection.go:252] successfully acquired lease capi-kubeadm-bootstrap-system/kubeadm-bootstrap-manager-leader-election-capi
I0402 18:47:24.068022       1 controller.go:164] controller-runtime/controller "msg"="Starting EventSource"  "controller"="kubeadmconfig" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{}}}
I0402 18:47:24.168701       1 controller.go:164] controller-runtime/controller "msg"="Starting EventSource"  "controller"="kubeadmconfig" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{"clusterName":"","bootstrap":{},"infrastructureRef":{}},"status":{"bootstrapReady":false,"infrastructureReady":false}}}
I0402 18:47:24.269613       1 controller.go:164] controller-runtime/controller "msg"="Starting EventSource"  "controller"="kubeadmconfig" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{"controlPlaneEndpoint":{"host":"","port":0}},"status":{"infrastructureReady":false,"controlPlaneInitialized":false}}}
I0402 18:47:24.370272       1 controller.go:171] controller-runtime/controller "msg"="Starting Controller"  "controller"="kubeadmconfig"
I0402 18:47:24.370663       1 controller.go:190] controller-runtime/controller "msg"="Starting workers"  "controller"="kubeadmconfig" "worker count"=10
I0402 18:47:24.472482       1 kubeadmconfig_controller.go:267] controllers/KubeadmConfig "msg"="ConfigOwner is not a control plane Machine. If it should be a control plane, add the label `cluster.x-k8s.io/control-plane: \"\"` to the Machine" "kind"="Machine" "kubeadmconfig"={"Namespace":"default","Name":"capi-etcd-md-0-dtmjt"} "name"="capi-etcd-md-0-7d49f97f5c-ph98v" "version"="6204997" 
I0402 18:47:24.472816       1 kubeadmconfig_controller.go:267] controllers/KubeadmConfig "msg"="ConfigOwner is not a control plane Machine. If it should be a control plane, add the label `cluster.x-k8s.io/control-plane: \"\"` to the Machine" "kind"="Machine" "kubeadmconfig"={"Namespace":"default","Name":"capi-etcd-md-0-c9w2d"} "name"="capi-etcd-md-0-7d49f97f5c-bnqvb" "version"="6205014" 
I0402 18:47:24.474533       1 kubeadmconfig_controller.go:267] controllers/KubeadmConfig "msg"="ConfigOwner is not a control plane Machine. If it should be a control plane, add the label `cluster.x-k8s.io/control-plane: \"\"` to the Machine" "kind"="Machine" "kubeadmconfig"={"Namespace":"default","Name":"capi-etcd-md-0-fcwxx"} "name"="capi-etcd-md-0-7d49f97f5c-9fpvk" "version"="6205016" 
I0402 18:47:24.579069       1 kubeadmconfig_controller.go:298] controllers/KubeadmConfig "msg"="Creating BootstrapData for the init control plane" "kind"="Machine" "kubeadmconfig"={"Namespace":"default","Name":"capi-etcd-control-plane-2sqs7"} "name"="capi-etcd-control-plane-7jgjz" "version"="6204332" 
E0402 18:47:24.681536       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 267 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x14f8c80, 0x245b2f0)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x82
panic(0x14f8c80, 0x245b2f0)
    /usr/local/go/src/runtime/panic.go:679 +0x1b2
sigs.k8s.io/cluster-api/util/secret.(*Certificate).AsFiles(...)
    /workspace/util/secret/certificates.go:318
sigs.k8s.io/cluster-api/util/secret.Certificates.AsFiles(0xc000467c50, 0x5, 0x6, 0x7, 0xc0005aaf30, 0xc000170e00)
    /workspace/util/secret/certificates.go:361 +0x52d
sigs.k8s.io/cluster-api/bootstrap/kubeadm/internal/cloudinit.NewInitControlPlane(0xc0001b4460, 0xc0001b4460, 0x6, 0x18d9280, 0xc0000ac060, 0x18f3480)
    /workspace/bootstrap/kubeadm/internal/cloudinit/controlplane_init.go:55 +0x7e
sigs.k8s.io/cluster-api/bootstrap/kubeadm/controllers.(*KubeadmConfigReconciler).handleClusterNotInitialized(0xc0000fd700, 0x18d9280, 0xc0000ac060, 0xc0006d9c38, 0xc000114200, 0x0, 0x0, 0x0)
    /workspace/bootstrap/kubeadm/controllers/kubeadmconfig_controller.go:355 +0x79f
sigs.k8s.io/cluster-api/bootstrap/kubeadm/controllers.(*KubeadmConfigReconciler).Reconcile(0xc0000fd700, 0xc00049d039, 0x7, 0xc00049f000, 0x1d, 0xc000406c00, 0x0, 0x0, 0x0)
    /workspace/bootstrap/kubeadm/controllers/kubeadmconfig_controller.go:240 +0x126a
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0001bc540, 0x1548d80, 0xc000529b00, 0xc000325d00)
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256 +0x162
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0001bc540, 0x0)
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232 +0xcb
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc0001bc540)
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0003dfb50)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0003dfb50, 0x3b9aca00, 0x0, 0x45ed01, 0xc0004c23c0)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0003dfb50, 0x3b9aca00, 0xc0004c23c0)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:193 +0x328
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x139466d]

goroutine 267 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:55 +0x105
panic(0x14f8c80, 0x245b2f0)
    /usr/local/go/src/runtime/panic.go:679 +0x1b2
sigs.k8s.io/cluster-api/util/secret.(*Certificate).AsFiles(...)
    /workspace/util/secret/certificates.go:318
sigs.k8s.io/cluster-api/util/secret.Certificates.AsFiles(0xc000467c50, 0x5, 0x6, 0x7, 0xc0005aaf30, 0xc000170e00)
    /workspace/util/secret/certificates.go:361 +0x52d
sigs.k8s.io/cluster-api/bootstrap/kubeadm/internal/cloudinit.NewInitControlPlane(0xc0001b4460, 0xc0001b4460, 0x6, 0x18d9280, 0xc0000ac060, 0x18f3480)
    /workspace/bootstrap/kubeadm/internal/cloudinit/controlplane_init.go:55 +0x7e
sigs.k8s.io/cluster-api/bootstrap/kubeadm/controllers.(*KubeadmConfigReconciler).handleClusterNotInitialized(0xc0000fd700, 0x18d9280, 0xc0000ac060, 0xc0006d9c38, 0xc000114200, 0x0, 0x0, 0x0)
    /workspace/bootstrap/kubeadm/controllers/kubeadmconfig_controller.go:355 +0x79f
sigs.k8s.io/cluster-api/bootstrap/kubeadm/controllers.(*KubeadmConfigReconciler).Reconcile(0xc0000fd700, 0xc00049d039, 0x7, 0xc00049f000, 0x1d, 0xc000406c00, 0x0, 0x0, 0x0)
    /workspace/bootstrap/kubeadm/controllers/kubeadmconfig_controller.go:240 +0x126a
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0001bc540, 0x1548d80, 0xc000529b00, 0xc000325d00)
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256 +0x162
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0001bc540, 0x0)
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232 +0xcb
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc0001bc540)
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0003dfb50)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0003dfb50, 0x3b9aca00, 0x0, 0x45ed01, 0xc0004c23c0)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0003dfb50, 0x3b9aca00, 0xc0004c23c0)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:193 +0x328

Environment:

  • Cluster-api-provider-aws version: 0.5.0
  • Kubernetes version: (use kubectl version): 1.17.4 (client), 1.16.4 (server)
  • OS (e.g. from /etc/os-release): Ubuntu 18.04.4 (using CAPA AMI)
kinbug

All 15 comments

/kind documentation
/priority important-soon
/milestove v0.5.x

I just tested this again using CAPA 0.5.2 and CAPI 0.3.3 (latest releases of both). I also modified the files section to write out the API server etcd client certificate, the API server etcd client key, and the etcd CA certificate (as well as having the <clustername>-etcd Secret defined in the management cluster). It appears that the error still persists (I didn't compare the logs from this latest iteration against the previous logs above, but they looked similar).

Let me know if there are additional tests I should/could run, or if there is additional information it would be helpful for me to gather.

Moving to cluster-api repo, since this is an error in the KubeadmConfig controller.

/milestone v0.3.x

What's expected behavior here? It seems like this feature is not very well defined. According to https://github.com/kubernetes-sigs/cluster-api/blob/master/bootstrap/kubeadm/docs/external-etcd.md we only support creating secrets with specific names. Do we just want to fix the NPE and return an error?

In the general case that the keypair is pre-defined in the config, we don't actually need to lookup or create the secret at all, we should skip that step, but only for the certs/keys that are injected via config.

In the more specific case of specifying the use of an external etcd cluster, we should likely also skip the creation of the etcd CA keypair and the associated secrets and should likely throw an error if the etcd client cert/key are not defined in the provided config, since we have no way to generate keypairs for external etcd clusters.

Not sure I understand, maybe I can explain my understanding better. The workflow as I understand it from the external etcd doc is

  • create secrets named my-cluster-apiserver-etcd-client, my-cluster-etcd
  • specify the filepaths under clusterConfiguration.etcd.external.caFile, certFile, and keyFile.
  • KubeadmConfig controller will read the secrets and inject them into bootstrap data to be written to the specified filepaths

When you say "keypair is pre-defined/injected in the config", you mean that it is provided in the files section?

Let me test this again---I may have missed some things (thanks for that link, @benmoss!). I'll work on this again today/tomorrow and report back. It looks as if all the certificates should be provided via ~ConfigMaps~ Secrets on the management cluster, and not via the files directive. Does that sound correct?

Yes, the way it was built it seems the intention was that users would put them in secrets (not ConfigMaps) and the controller would handle mounting those as files. It's probably possible for us to support a user putting them as files as well, but it's not supported now.

@benmoss Yes, sorry, I should have said Secrets (my mistake, thank you for correcting me).

OK, I was able to make this work using Secrets instead of injecting the certificate data in the configuration. I did find some errors in the external etcd doc, for which I'll open an issue (and for which I'm happy to work on a PR). That being said, I think this issue is still valid, but feel free to correct me.

2941 opened for errors/omissions in the external etcd document.

Yup, we certainly shouldn't crash I was hoping to clarify whether we wanted to support this use-case or if we wanted to error to the user that the required secrets couldn't be found. I'm still not sure, @vincepri did you have any opinion on this?

We definitely shouldn't crash or panic, I didn't have time to actually go in the code details yet though, can you do an investigation and come up with a plan of action / PR?

/assign

Was this page helpful?
0 / 5 - 0 ratings