I followed the instruction in https://cluster-api.sigs.k8s.io/user/quick-start.html with docker infrastructure provider from behind an HTTP proxy.
As expected the workload cluster's control plane is Initialized and not Ready.
kubectl get kubeadmcontrolplane --all-namespaces
NAMESPACE NAME INITIALIZED API SERVER AVAILABLE VERSION REPLICAS READY UPDATED UNAVAILABLE
default capi-quickstart-control-plane true v1.19.1 1 1 1
But when I try to install the calico CNI, the calico-node pods do not start, they are stuck while pulling the images:
kubectl --kubeconfig=./capi-quickstart.kubeconfig -n kube-system get pods -l k8s-app=calico-node
NAME READY STATUS RESTARTS AGE
calico-node-4sbzg 0/1 Init:ImagePullBackOff 0 2m7s
calico-node-64qvp 0/1 Init:ImagePullBackOff 0 2m7s
calico-node-t24q4 0/1 Init:ImagePullBackOff 0 2m7s
calico-node-t84cx 0/1 Init:ImagePullBackOff 0 2m7s
For each of these pods, I see errors when trying to pull the calico/cni:v3.15.3 Docker image:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m26s default-scheduler Successfully assigned kube-system/calico-node-64qvp to capi-quickstart-md-0-55fc4f8ccf-ncmkv
Normal Pulling 115s (x4 over 3m26s) kubelet, capi-quickstart-md-0-55fc4f8ccf-ncmkv Pulling image "calico/cni:v3.15.3"
Warning Failed 115s (x4 over 3m26s) kubelet, capi-quickstart-md-0-55fc4f8ccf-ncmkv Failed to pull image "calico/cni:v3.15.3": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/calico/cni:v3.15.3": failed to resolve reference "docker.io/calico/cni:v3.15.3": failed to do request: Head https://registry-1.docker.io/v2/calico/cni/manifests/v3.15.3: dial tcp: lookup registry-1.docker.io on 10.171.108.2:53: no such host
Warning Failed 115s (x4 over 3m26s) kubelet, capi-quickstart-md-0-55fc4f8ccf-ncmkv Error: ErrImagePull
Normal BackOff 103s (x6 over 3m26s) kubelet, capi-quickstart-md-0-55fc4f8ccf-ncmkv Back-off pulling image "calico/cni:v3.15.3"
Warning Failed 92s (x7 over 3m26s) kubelet, capi-quickstart-md-0-55fc4f8ccf-ncmkv Error: ImagePullBackOff
What did you expect to happen:
I expected to be able to use the workload cluster.
Environment:
Cluster-api version:
clusterctl version: &version.Info{Major:"0", Minor:"3", GitVersion:"v0.3.12", GitCommit:"9e1dd7e8e428e05bee406602952ae269d55bdbba", GitTreeState:"clean", BuildDate:"2020-12-15T16:42:14Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Minikube/KIND version:
kind v0.9.0 go1.15.2 linux/amd64
Kubernetes version: (use kubectl version):
1.19.1
OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
Workaround
From the host, inspecting the Docker containers I saw that the HTTP proxy envirnment variables are not set for the workload cluster nodes:
docker inspect capi-quickstart-control-plane-d2mvr --format '{{json .Config.Env }}' | jq .
[
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"container=docker"
]
As a workaround, I rebuilt the kindest/node:v1.19.1 simply adding the proxy environment variables inside the image:
cat > Dockerfile << EOF
FROM kindest/node:v1.19.1
ENV HTTP_PROXY ${http_proxy}
ENV HTTP_PROXY ${https_proxy}
ENV NO_PROXY ${NO_PROXY}
EOF
docker build -t kindest/node:v1.19.1-custom .
And I then used this image as the image for the workload cluster nodes:
sed -i "s| spec: {}| spec:\n customImage: kindest/node:v1.19.1-custom|" capi-quickstart.yaml
sed -i "s| extraMounts:| customImage: kindest/node:v1.19.1-custom\n extraMounts:|" capi-quickstart.yaml
With this workaround, the CNI is correctly deployed in the workload cluster:
kubectl --kubeconfig=./capi-quickstart.kubeconfig -n kube-system get pods -l k8s-app=calico-node
NAME READY STATUS RESTARTS AGE
calico-node-2d6k9 1/1 Running 0 102s
calico-node-bpvdc 1/1 Running 0 102s
calico-node-w25mc 1/1 Running 0 102s
calico-node-zc468 1/1 Running 0 102s
And the workload cluster becomes Ready:
kubectl get kubeadmcontrolplane --all-namespaces
NAMESPACE NAME INITIALIZED API SERVER AVAILABLE VERSION REPLICAS READY UPDATED UNAVAILABLE
default capi-quickstart-control-plane true true v1.19.1 1 1 1
Prefered solution
It would be nice if there was a way to automatically propagate the proxy environment variables to the workload cluster nodes.
/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]
/milestone v0.4.0
/area provider/docker
/priority backlog
/help
This could be addressed via documentation or with an implementaiton on the docker provider making it picking up proxy env variables from the host similarly to what kind is doing https://kind.sigs.k8s.io/docs/user/quick-start/#configure-kind-to-use-a-proxy
@fabriziopandini:
This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
/milestone v0.4.0
/area provider/docker
/priority backlog
/helpThis could be addressed via documentation or with an implementaiton on the docker provider making it picking up proxy env variables from the host similarly to what kind is doing https://kind.sigs.k8s.io/docs/user/quick-start/#configure-kind-to-use-a-proxy
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@tcordeu you are right, I was not aware of this being already implemented in CAPD.
So then we should go back @bgoareguer and try to investigate why env variables were not passed to the capi-quickstart-control-plane-d2mvr container
In fact, it seems that the node does have the proxy env variables:
docker inspect kind-control-plane --format '{{json .Config.Env }}' | jq . | cut -d "=" -f 1
[
"https_proxy
"NO_PROXY
"no_proxy
"HTTP_PROXY
"http_proxy
"HTTPS_PROXY
"PATH
"container
]
but the capd-controller-manager deployment does not:
kubectl -n capd-system describe deployment.apps/capd-controller-manager
Name: capd-controller-manager
Namespace: capd-system
CreationTimestamp: Mon, 04 Jan 2021 14:25:57 +0100
Labels: cluster.x-k8s.io/provider=infrastructure-docker
clusterctl.cluster.x-k8s.io=
control-plane=controller-manager
Annotations: deployment.kubernetes.io/revision: 1
Selector: cluster.x-k8s.io/provider=infrastructure-docker,control-plane=controller-manager
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: cluster.x-k8s.io/provider=infrastructure-docker
control-plane=controller-manager
Containers:
kube-rbac-proxy:
Image: gcr.io/kubebuilder/kube-rbac-proxy:v0.4.0
Port: 8443/TCP
Host Port: 0/TCP
Args:
--secure-listen-address=0.0.0.0:8443
--upstream=http://127.0.0.1:8080/
--logtostderr=true
--v=10
Environment: <none>
Mounts: <none>
manager:
Image: gcr.io/k8s-staging-cluster-api/capd-manager:v0.3.12
Ports: 9443/TCP, 9440/TCP
Host Ports: 0/TCP, 0/TCP
Args:
--feature-gates=MachinePool=false
--metrics-addr=0
-v=4
Liveness: http-get http://:healthz/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/tmp/k8s-webhook-server/serving-certs from cert (ro)
/var/run/docker.sock from dockersock (rw)
Volumes:
cert:
Type: Secret (a volume populated by a Secret)
SecretName: capd-webhook-service-cert
Optional: false
dockersock:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: capd-controller-manager-557796f4dd (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 78s deployment-controller Scaled up replica set capd-controller-manager-557796f4dd to 1
So, to me, the problem is that the CAPD manifest infrastructure-components-development.yaml does not let us specify environment variables for the capd-manager container.
Most helpful comment
In fact, it seems that the node does have the proxy env variables:
but the capd-controller-manager deployment does not:
So, to me, the problem is that the CAPD manifest infrastructure-components-development.yaml does not let us specify environment variables for the
capd-managercontainer.