I am following the documentation on https://docs.projectcalico.org/v3.8/getting-started/kubernetes/ and pods are pending forever.
All pods running in watch kubectl get pods --all-namespaces.
When I watch kubectl get pods --all-namespaces over 15 minutes, it is still pending:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-59f54d6bbc-k8dhf 0/1 Pending 0 17m
kube-system calico-node-r7v2n 0/1 Init:0/3 0 17m
kube-system coredns-5c98db65d4-5jrl2 0/1 Pending 0 17m
kube-system coredns-5c98db65d4-d2rc4 0/1 Pending 0 17m
kube-system etcd-cherokee 1/1 Running 0 16m
kube-system kube-apiserver-cherokee 1/1 Running 0 16m
kube-system kube-controller-manager-cherokee 1/1 Running 0 16m
kube-system kube-proxy-hfp6c 1/1 Running 0 17m
kube-system kube-scheduler-cherokee 1/1 Running 0 16m
I don't know why, but version 3.5 just works:
curl \
https://docs.projectcalico.org/v3.5/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml \
-O
kubectl apply -f calico.yaml
I've tried 3.6, 3.7 and 3.8, with the same results.
sudo kubeadm init --pod-network-cidr=192.168.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml
watch kubectl get pods --all-namespaces
I can't create a cluster with newer versions of Calico.
the same problem.
Could it be a resource request? Maybe your node isn't big enough?
What does kubectl describe pod say for each of the non-running pods?
@fasaxc I don't think it is a resource problem, since it is a dedicated machine with 4 core and 16GB dedicated to it and nothing else is running.
@fasaxc All kubectl describes:
kubectl describe pod calico-kube-controllers-59f54d6bbc-gbj95 --namespace=kube-system
Name: calico-kube-controllers-59f54d6bbc-gbj95
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: <none>
Labels: k8s-app=calico-kube-controllers
pod-template-hash=59f54d6bbc
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Pending
IP:
Controlled By: ReplicaSet/calico-kube-controllers-59f54d6bbc
Containers:
calico-kube-controllers:
Image: calico/kube-controllers:v3.8.0
Port: <none>
Host Port: <none>
Readiness: exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ENABLED_CONTROLLERS: node
DATASTORE_TYPE: kubernetes
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from calico-kube-controllers-token-778vt (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
calico-kube-controllers-token-778vt:
Type: Secret (a volume populated by a Secret)
SecretName: calico-kube-controllers-token-778vt
Optional: false
QoS Class: BestEffort
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 41s (x7 over 6m35s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
---
kubectl describe pod calico-node-flgbf --namespace=kube-system
Name: calico-node-flgbf
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: cherokee/150.164.7.70
Start Time: Wed, 03 Jul 2019 06:57:19 -0300
Labels: controller-revision-hash=844ddd97c6
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Pending
IP: 150.164.7.70
Controlled By: DaemonSet/calico-node
Init Containers:
upgrade-ipam:
Container ID: docker://8b0acccf0d1f633b1af29d8cfe2f5b45a53b074e16da4d74b0eca79f4df2ecc6
Image: calico/cni:v3.8.0
Image ID: docker-pullable://calico/cni@sha256:decba0501ab0658e6e7da2f5625f1eabb8aba5690f9206caba3bf98caca5094c
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Running
Started: Wed, 03 Jul 2019 06:57:23 -0300
Ready: False
Restart Count: 0
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-n5wk2 (ro)
install-cni:
Container ID:
Image: calico/cni:v3.8.0
Image ID:
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-n5wk2 (ro)
flexvol-driver:
Container ID:
Image: calico/pod2daemon-flexvol:v3.8.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/host/driver from flexvol-driver-host (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-n5wk2 (ro)
Containers:
calico-node:
Container ID:
Image: calico/node:v3.8.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 250m
Liveness: http-get http://localhost:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -bird-ready -felix-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_IPV4POOL_CIDR: 192.168.0.0/16
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_LOGSEVERITYSCREEN: info
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-n5wk2 (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
flexvol-driver-host:
Type: HostPath (bare host directory volume)
Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
HostPathType: DirectoryOrCreate
calico-node-token-n5wk2:
Type: Secret (a volume populated by a Secret)
SecretName: calico-node-token-n5wk2
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: :NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m11s default-scheduler Successfully assigned kube-system/calico-node-flgbf to cherokee
Normal Pulled 7m8s kubelet, cherokee Container image "calico/cni:v3.8.0" already present on machine
Normal Created 7m7s kubelet, cherokee Created container upgrade-ipam
Normal Started 7m7s kubelet, cherokee Started container upgrade-ipam
---
kubectl describe pod coredns-5c98db65d4-wpc7p --namespace=kube-system
Name: coredns-5c98db65d4-wpc7p
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: <none>
Labels: k8s-app=kube-dns
pod-template-hash=5c98db65d4
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/coredns-5c98db65d4
Containers:
coredns:
Image: k8s.gcr.io/coredns:1.3.1
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-nt88z (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-nt88z:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-nt88z
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 43s (x7 over 8m18s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
---
kubectl describe pod coredns-5c98db65d4-z7sgv --namespace=kube-system
Name: coredns-5c98db65d4-z7sgv
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: <none>
Labels: k8s-app=kube-dns
pod-template-hash=5c98db65d4
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/coredns-5c98db65d4
Containers:
coredns:
Image: k8s.gcr.io/coredns:1.3.1
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-nt88z (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-nt88z:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-nt88z
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 76s (x8 over 8m51s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
OK, so most of the pods are failing to schedule because calico/node hasn't started yet, I think. What about logs for calico/node? That might tell us why the init container isn't finishing.
@fasaxc I've posted the kubectl describe pod calico-node-flgbf --namespace=kube-system for calico-node. How can I get logs from a pods that hasn't started?
I think kubectl logs can show the init container's log.
@fasaxc The only output I get (the other pending pods have no output):
kubectl logs calico-node-f8hb5 --namespace=kube-system
Error from server (BadRequest): container "calico-node" in pod "calico-node-f8hb5" is waiting to start: PodInitializing
This explains how to get the log: https://kubernetes.io/docs/tasks/debug-application-cluster/debug-init-containers/
The upgrade-ipam InitContainer has trillions of errors:
2019-07-03 20:57:26.014 [INFO][1] migrate.go 65: checking host-local IPAM data dir dir existence...
2019-07-03 20:57:26.014 [INFO][1] migrate.go 72: retrieving node for IPIP tunnel address
2019-07-03 20:57:26.019 [INFO][1] migrate.go 80: IPIP tunnel address not found, assigning...
2019-07-03 20:57:26.024 [INFO][1] ipam.go 583: Assigning IP 192.168.0.1 to host: kadet
2019-07-03 20:57:26.026 [ERROR][1] ipam_plugin.go 95: failed to migrate ipam, retrying... error=failed to get add IPIP tunnel addr 192.168.0.1: The provided IP address is not in a configured pool
node="kadet"
2019-07-03 20:57:27.027 [INFO][1] migrate.go 65: checking host-local IPAM data dir dir existence...
2019-07-03 20:57:27.027 [INFO][1] migrate.go 72: retrieving node for IPIP tunnel address
2019-07-03 20:57:27.028 [INFO][1] migrate.go 80: IPIP tunnel address not found, assigning...
2019-07-03 20:57:27.030 [INFO][1] ipam.go 583: Assigning IP 192.168.0.1 to host: kadet
2019-07-03 20:57:27.031 [ERROR][1] ipam_plugin.go 95: failed to migrate ipam, retrying... error=failed to get add IPIP tunnel addr 192.168.0.1: The provided IP address is not in a configured pool
node="kadet"
2019-07-03 20:57:28.031 [INFO][1] migrate.go 65: checking host-local IPAM data dir dir existence...
2019-07-03 20:57:28.031 [INFO][1] migrate.go 72: retrieving node for IPIP tunnel address
2019-07-03 20:57:28.033 [INFO][1] migrate.go 80: IPIP tunnel address not found, assigning...
2019-07-03 20:57:28.036 [INFO][1] ipam.go 583: Assigning IP 192.168.0.1 to host: kadet
2019-07-03 20:57:28.038 [ERROR][1] ipam_plugin.go 95: failed to migrate ipam, retrying... error=failed to get add IPIP tunnel addr 192.168.0.1: The provided IP address is not in a configured pool
node="kadet"
2019-07-03 20:57:29.038 [INFO][1] migrate.go 65: checking host-local IPAM data dir dir existence...
2019-07-03 20:57:29.038 [INFO][1] migrate.go 72: retrieving node for IPIP tunnel address
2019-07-03 20:57:29.041 [INFO][1] migrate.go 80: IPIP tunnel address not found, assigning...
2019-07-03 20:57:29.044 [INFO][1] ipam.go 583: Assigning IP 192.168.0.1 to host: kadet
2019-07-03 20:57:29.045 [ERROR][1] ipam_plugin.go 95: failed to migrate ipam, retrying... error=failed to get add IPIP tunnel addr 192.168.0.1: The provided IP address is not in a configured pool
node="kadet"
install-cni and flexvol-driver are waiting to start.
Same behavior on my newborn cluster. My cluster is initialized using kubeadm, and applying calico manifests from either 3.6 or 3.8 leads to these errors in the upgrade-ipam init container.
As I'm on a fresh installation that doesn't need this "upgrade IPAM" stage (AFAIK), I've tried to delete this init container from the manifest (`kubectl edit daemonset -n kube-system calico-node) and everything went fine, issue resolved.
I've found why the upgrade-ipam init container is looping endlessly through errors and thus preventing the next init container install-cni to run: my server has this folder existing:
/var/lib/cni/networks/k8s-pod-network
I think it comes from a previous installation. Having done a kubeadm reset was not enough.
Source : https://github.com/projectcalico/cni-plugin/blob/v3.8.0/pkg/upgrade/migrate.go#L66
@demikl can it be considered a Calico bug or kubeadm bug? I will try deleting it tomorrow. Thanks
i am already fix it.
kubeadm reset
ifconfig cni0 down
ip link delete cni0
ifconfig flannel.1 down
ip link delete flannel.1
rm -rf /var/lib/cni/
and then excute kubeadm init.....
Deleting /var/lib/cni/ solves the problem. Are you doing a patch for migrate.go @withlin?
no. re-install. it is ok.
Maybe it's worth at least adding that to calico getting started documentation as a note.
@staticdev Have you solved the problem?
@withlin as I said yesterday: "Deleting /var/lib/cni/ solves the problem". =)
ok. i think that you can close the issue. tks.
@withlin Shouldn't this information be added in the documentation to prevent future issues like this one?
@staticdev yes, that'd make a nice PR.
2019-07-03 20:57:26.026 [ERROR][1] ipam_plugin.go 95: failed to migrate ipam, retrying... error=failed to get add IPIP tunnel addr 192.168.0.1: The provided IP address is not in a configured pool
Docs is probably good enough, though I feel we should be able to remove the need for a docs change here with some code adjustments. Ideally a kubeadm reset would be enough, though it seems it leaves behind some cruft on the node that tricks Calico into thinking it doing an upgrade rather than a fresh installation.
I think the following sounds like a reasonable solution:
ClusterInformation CRD that Calico writes to see if this is a new cluster or not. If it is a new cluster, we can skip the upgrade altogether. The reason for this happening is that the pod has no tolerations for running on master nodes.
Most helpful comment
i am already fix it.
and then excute kubeadm init.....