What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]
target cluster is ready:
kubectl --kubeconfig=./capi-quickstart.kubeconfig get nodes
NAME STATUS ROLES AGE VERSION
capi-quickstart-2-control-plane-mnldn Ready master 12m v1.17.3
capi-quickstart-2-control-plane-vjnnr Ready master 8m50s v1.17.3
capi-quickstart-2-control-plane-vqc22 Ready master 10m v1.17.3
capi-quickstart-2-md-0-99ms5 Ready <none> 10m v1.17.3
capi-quickstart-2-md-0-99nzj Ready <none> 9m55s v1.17.3
capi-quickstart-2-md-0-rt4wr Ready <none> 9m54s v1.17.3
Follow the instructions at https://cluster-api.sigs.k8s.io/clusterctl/commands/move.html#pivot
clusterctl --kubeconfig=./capi-quickstart.kubeconfig init
Fetching providers
Installing Provider="cluster-api" Version="v0.3.1" TargetNamespace="capi-system"
Error: action failed after 3 attempts: failed to create provider object cert-manager.io/v1alpha2, Kind=Certificate, capi-webhook-system/capi-serving-cert: Internal error occurred: failed calling webhook "webhook.cert-manager.io": the server is currently unable to handle the request
What did you expect to happen:
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version):/etc/os-release):/kind bug
CAPI version?
I see it in the logs now, could you try with master?
clusterctl master should have a fix that retries cert-manager
/cc @fabriziopandini
same issue from master / v0.3.2
$ clusterctl --kubeconfig=./capi-quickstart.kubeconfig init --v 5
tching File="control-plane-components.yaml" Provider="control-plane-kubeadm" Version="v0.3.2"
Fetching File="metadata.yaml" Provider="cluster-api" Version="v0.3.2"
Fetching File="metadata.yaml" Provider="bootstrap-kubeadm" Version="v0.3.2"
Fetching File="metadata.yaml" Provider="control-plane-kubeadm" Version="v0.3.2"
Installing Provider="cluster-api" Version="v0.3.2" TargetNamespace="capi-system"
Creating shared objects Provider="cluster-api" Version="v0.3.2"
Creating Namespace="capi-webhook-system"
Creating CustomResourceDefinition="clusters.cluster.x-k8s.io"
Creating CustomResourceDefinition="machinedeployments.cluster.x-k8s.io"
Creating CustomResourceDefinition="machinehealthchecks.cluster.x-k8s.io"
Creating CustomResourceDefinition="machinepools.exp.cluster.x-k8s.io"
Creating CustomResourceDefinition="machines.cluster.x-k8s.io"
Creating CustomResourceDefinition="machinesets.cluster.x-k8s.io"
Creating MutatingWebhookConfiguration="capi-mutating-webhook-configuration"
Creating Service="capi-webhook-service" Namespace="capi-webhook-system"
Creating Deployment="capi-controller-manager" Namespace="capi-webhook-system"
Creating Certificate="capi-serving-cert" Namespace="capi-webhook-system"
Operation failed, retry Error={}
Creating Certificate="capi-serving-cert" Namespace="capi-webhook-system"
Operation failed, retry Error={}
Creating Certificate="capi-serving-cert" Namespace="capi-webhook-system"
Operation failed, retry Error={}
Creating Certificate="capi-serving-cert" Namespace="capi-webhook-system"
Operation failed, retry Error={}
Creating Certificate="capi-serving-cert" Namespace="capi-webhook-system"
Operation failed, retry Error={}
Creating Certificate="capi-serving-cert" Namespace="capi-webhook-system"
Operation failed, retry Error={}
Creating Certificate="capi-serving-cert" Namespace="capi-webhook-system"
Operation failed, retry Error={}
Creating Certificate="capi-serving-cert" Namespace="capi-webhook-system"
Operation failed, retry Error={}
Creating Certificate="capi-serving-cert" Namespace="capi-webhook-system"
Operation failed, retry Error={}
Creating Certificate="capi-serving-cert" Namespace="capi-webhook-system"
Error: action failed after 10 attempts: failed to create provider object cert-manager.io/v1alpha2, Kind=Certificate, capi-webhook-system/capi-serving-cert: Internal error occurred: failed calling webhook "webhook.cert-manager.io": the server is currently unable to handle the request
@CecileRobertMichon
webhook "webhook.cert-manager.io": the server is currently unable to handle the request
I saw this error in two cases:
Is it possible we are in one of those conditions
Should we maybe add a check to make sure a CNI has been installed, maybe we could check if the nodes are in a Ready state?
I installed the CNI (Calico) before attempting the move and all nodes were ready (see output of get nodes in the issue description).
I doubt that the machines being under high pressure would be an issue, I'm using Azure VMs size Standard_D2s_v3 for both control planes and machines (with 3 replicas for both).
Here's the output of get pods that shows Calico running:
kubectl --kubeconfig=./capi-quickstart.kubeconfig get pods --all-namespaces  1 ↵  10789  13:49:52
NAMESPACE NAME READY STATUS RESTARTS AGE
capi-webhook-system capi-controller-manager-68b5666f57-lsjdg 0/2 ContainerCreating 0 3m41s
cert-manager cert-manager-69b4f77ffc-b62st 1/1 Running 0 5m53s
cert-manager cert-manager-cainjector-576978ffc8-hjm5m 1/1 Running 0 5m53s
cert-manager cert-manager-webhook-c67fbc858-nrlq5 1/1 Running 0 5m53s
kube-system calico-kube-controllers-77c4b7448-77sww 1/1 Running 0 8m24s
kube-system calico-node-2rsgg 1/1 Running 0 8m24s
kube-system calico-node-6l6bj 1/1 Running 0 8m24s
kube-system calico-node-llf65 1/1 Running 0 8m24s
kube-system calico-node-mxc8q 1/1 Running 0 8m24s
kube-system calico-node-w8vsq 1/1 Running 0 8m24s
kube-system calico-node-wvtl8 1/1 Running 0 8m24s
kube-system coredns-6955765f44-5w9ck 1/1 Running 0 12m
kube-system coredns-6955765f44-bhmxw 1/1 Running 0 12m
kube-system etcd-capi-quickstart-control-plane-7kjgb 1/1 Running 0 10m
kube-system etcd-capi-quickstart-control-plane-d5gtw 1/1 Running 0 12m
kube-system etcd-capi-quickstart-control-plane-jthdv 1/1 Running 0 11m
kube-system kube-apiserver-capi-quickstart-control-plane-7kjgb 1/1 Running 0 10m
kube-system kube-apiserver-capi-quickstart-control-plane-d5gtw 1/1 Running 0 12m
kube-system kube-apiserver-capi-quickstart-control-plane-jthdv 1/1 Running 0 11m
kube-system kube-controller-manager-capi-quickstart-control-plane-7kjgb 1/1 Running 0 10m
kube-system kube-controller-manager-capi-quickstart-control-plane-d5gtw 1/1 Running 1 12m
kube-system kube-controller-manager-capi-quickstart-control-plane-jthdv 1/1 Running 1 11m
kube-system kube-proxy-4qn8d 1/1 Running 0 11m
kube-system kube-proxy-88vcv 1/1 Running 0 11m
kube-system kube-proxy-cfsg7 1/1 Running 0 12m
kube-system kube-proxy-gkhnl 1/1 Running 0 10m
kube-system kube-proxy-nfnjn 1/1 Running 0 10m
kube-system kube-proxy-x69sf 1/1 Running 0 10m
kube-system kube-scheduler-capi-quickstart-control-plane-7kjgb 1/1 Running 1 10m
kube-system kube-scheduler-capi-quickstart-control-plane-d5gtw 1/1 Running 1 12m
kube-system kube-scheduler-capi-quickstart-control-plane-jthdv 1/1 Running 1 11m
So I tried to repro with capz 0.4.0 and capi v0.3.3 and I'm not getting the same error anymore, this time I get past it, however I'm now seeing init getting stuck at Waiting for cert-manager to be available... consistently, even though cert-manager seems to be ready:
kubectl --kubeconfig=./capi-quickstart.kubeconfig get nodes
NAME STATUS ROLES AGE VERSION
capi-2-control-plane-5nws7 Ready master 16m v1.17.3
capi-2-control-plane-g7k5k Ready master 15m v1.17.3
capi-2-control-plane-ps8nw Ready master 18m v1.17.3
capi-2-md-0-ngw2x Ready <none> 17m v1.17.3
capi-2-md-0-nwzh4 Ready <none> 16m v1.17.3
capi-2-md-0-qsb42 Ready <none> 17m v1.17.3
clusterctl --kubeconfig=./capi-quickstart.kubeconfig init --v 5
Installing the clusterctl inventory CRD
Creating CustomResourceDefinition="providers.clusterctl.cluster.x-k8s.io"
Fetching providers
Fetching File="core-components.yaml" Provider="cluster-api" Version="v0.3.3"
Fetching File="bootstrap-components.yaml" Provider="bootstrap-kubeadm" Version="v0.3.3"
Fetching File="control-plane-components.yaml" Provider="control-plane-kubeadm" Version="v0.3.3"
Fetching File="metadata.yaml" Provider="cluster-api" Version="v0.3.3"
Fetching File="metadata.yaml" Provider="bootstrap-kubeadm" Version="v0.3.3"
Fetching File="metadata.yaml" Provider="control-plane-kubeadm" Version="v0.3.3"
Installing cert-manager
Creating Namespace="cert-manager"
Creating CustomResourceDefinition="challenges.acme.cert-manager.io"
Creating CustomResourceDefinition="orders.acme.cert-manager.io"
Creating CustomResourceDefinition="certificaterequests.cert-manager.io"
Creating CustomResourceDefinition="certificates.cert-manager.io"
Creating CustomResourceDefinition="clusterissuers.cert-manager.io"
Creating CustomResourceDefinition="issuers.cert-manager.io"
Creating ServiceAccount="cert-manager-cainjector" Namespace="cert-manager"
Creating ServiceAccount="cert-manager" Namespace="cert-manager"
Creating ServiceAccount="cert-manager-webhook" Namespace="cert-manager"
Creating ClusterRole="cert-manager-cainjector"
Creating ClusterRoleBinding="cert-manager-cainjector"
Creating Role="cert-manager-cainjector:leaderelection" Namespace="kube-system"
Creating RoleBinding="cert-manager-cainjector:leaderelection" Namespace="kube-system"
Creating ClusterRoleBinding="cert-manager-webhook:auth-delegator"
Creating RoleBinding="cert-manager-webhook:webhook-authentication-reader" Namespace="kube-system"
Creating ClusterRole="cert-manager-webhook:webhook-requester"
Creating Role="cert-manager:leaderelection" Namespace="kube-system"
Creating RoleBinding="cert-manager:leaderelection" Namespace="kube-system"
Creating ClusterRole="cert-manager-controller-issuers"
Creating ClusterRole="cert-manager-controller-clusterissuers"
Creating ClusterRole="cert-manager-controller-certificates"
Creating ClusterRole="cert-manager-controller-orders"
Creating ClusterRole="cert-manager-controller-challenges"
Creating ClusterRole="cert-manager-controller-ingress-shim"
Creating ClusterRoleBinding="cert-manager-leaderelection"
Creating ClusterRoleBinding="cert-manager-controller-issuers"
Creating ClusterRoleBinding="cert-manager-controller-clusterissuers"
Creating ClusterRoleBinding="cert-manager-controller-certificates"
Creating ClusterRoleBinding="cert-manager-controller-orders"
Creating ClusterRoleBinding="cert-manager-controller-challenges"
Creating ClusterRoleBinding="cert-manager-controller-ingress-shim"
Creating ClusterRole="cert-manager-view"
Creating ClusterRole="cert-manager-edit"
Creating Service="cert-manager" Namespace="cert-manager"
Creating Service="cert-manager-webhook" Namespace="cert-manager"
Creating Deployment="cert-manager-cainjector" Namespace="cert-manager"
Creating Deployment="cert-manager" Namespace="cert-manager"
Creating Deployment="cert-manager-webhook" Namespace="cert-manager"
Creating APIService="v1beta1.webhook.cert-manager.io"
Creating MutatingWebhookConfiguration="cert-manager-webhook"
Creating ValidatingWebhookConfiguration="cert-manager-webhook"
Waiting for cert-manager to be available...
kubectl --kubeconfig=./capi-quickstart.kubeconfig get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-69b4f77ffc-psxzj 1/1 Running 0 3m27s
cert-manager cert-manager-cainjector-576978ffc8-w2nr5 1/1 Running 0 3m27s
cert-manager cert-manager-webhook-c67fbc858-c4v5b 1/1 Running 1 3m26s
kube-system calico-kube-controllers-576dfc659c-gftt4 1/1 Running 1 5m20s
kube-system calico-node-4zbv6 1/1 Running 1 5m22s
kube-system calico-node-bhpnf 1/1 Running 0 5m22s
kube-system calico-node-c49b5 1/1 Running 1 5m22s
kube-system calico-node-d8wwd 1/1 Running 0 5m22s
kube-system calico-node-knvmh 1/1 Running 0 5m22s
kube-system calico-node-mcz8b 1/1 Running 1 5m22s
kube-system coredns-6955765f44-w2zhr 1/1 Running 0 17m
kube-system coredns-6955765f44-xgznx 1/1 Running 0 17m
kube-system etcd-capi-2-control-plane-5nws7 1/1 Running 0 16m
kube-system etcd-capi-2-control-plane-g7k5k 1/1 Running 0 14m
kube-system etcd-capi-2-control-plane-ps8nw 1/1 Running 0 17m
kube-system kube-apiserver-capi-2-control-plane-5nws7 1/1 Running 0 16m
kube-system kube-apiserver-capi-2-control-plane-g7k5k 1/1 Running 0 14m
kube-system kube-apiserver-capi-2-control-plane-ps8nw 1/1 Running 0 17m
kube-system kube-controller-manager-capi-2-control-plane-5nws7 1/1 Running 1 16m
kube-system kube-controller-manager-capi-2-control-plane-g7k5k 1/1 Running 0 14m
kube-system kube-controller-manager-capi-2-control-plane-ps8nw 1/1 Running 1 17m
kube-system kube-proxy-4ptdx 1/1 Running 0 16m
kube-system kube-proxy-ctbh8 1/1 Running 0 17m
kube-system kube-proxy-jd7f5 1/1 Running 0 16m
kube-system kube-proxy-jv8mg 1/1 Running 0 16m
kube-system kube-proxy-p72tw 1/1 Running 0 14m
kube-system kube-proxy-v45gf 1/1 Running 0 16m
kube-system kube-scheduler-capi-2-control-plane-5nws7 1/1 Running 1 16m
kube-system kube-scheduler-capi-2-control-plane-g7k5k 1/1 Running 1 14m
kube-system kube-scheduler-capi-2-control-plane-ps8nw 1/1 Running 1 17m
kubectl --kubeconfig=./capi-quickstart.kubeconfig get deploy -n cert-manager
NAME READY UP-TO-DATE AVAILABLE AGE
cert-manager 1/1 1 1 7m24s
cert-manager-cainjector 1/1 1 1 7m24s
cert-manager-webhook 1/1 1 1 7m23s
After 10 minutes, init fails with Error: timed out waiting for the condition.
EDIT: I see this when I describe the apiservice:
Message: failing or missing response from https://10.100.0.172:443/apis/webhook.cert-manager.io/v1beta1: Get https://10.100.0.172:443/apis/webhook.cert-manager.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
@CecileRobertMichon "Waiting for cert-manager to be available..." is waiting for the ApiService v1beta1.webhook.cert-manager.io to get status.conditions with type Available and value true.
this can be checked with
kubectl get apiservice v1beta1.webhook.cert-manager.io -o json | jq '.status.conditions[] | select(.type == "Available") | .status'
or with
kubectl wait --for=condition=Available apiservice/v1beta1.webhook.cert-manager.io
By looking at the apiservice/v1beta1.webhook.cert-manager.io spec, this API service depends on the following service
service:
name: cert-manager-webhook
namespace: cert-manager
port: 443
And by my observations, the apiservice condition is set as soon as this service is backed by one pod (the cert-manager-webhook pod)
Could you kindly check the v1beta1.webhook.cert-manager.io API service and the cert-manager-webhook service on your cluster.
Also is it possible that this sequence does not complete in 10m (the current timeout) on your cluster?
@fabriziopandini see the very last line in my previous comment. I did check the API service and it was indeed not available. I suspect it has something to do with the default capz security group not allowing port 443 traffic. I'll give changing the NSG a try today.
Changing the NSG rules did not help unfortunately, it's still failing with the same error. I took a look with @vincepri this morning and we couldn't figure out what was going on. Cluster networking seems generally healthy and I didn't have any issues creating a simple nginx service.
/area clusterctl
@CecileRobertMichon ok to close this issue now that we've identified the issue is with Calico configuration?
Yes
/close
@CecileRobertMichon: Closing this issue.
In response to this:
Yes
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.