When installing a fresh master node, I expect when I apply the calico yaml that the pod network starts, as did with another setup I installed.
coredns does not start, calico-node starts without a problem, but calico-kube-controller outputs an error:
2019-08-27 09:52:10.607 [INFO][1] main.go 113: Ensuring Calico datastore is initialized
2019-08-27 09:52:20.608 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-08-27 09:52:20.608 [FATAL][1] main.go 118: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
I verified 10.96.0.1:443 is running and reachable from containers in the kube-system namespace.
All interfaces get created on the host, and all containers that ask for an IP get one.
None as of yet.
kubeadm init --kubernetes-version=v1.15.2 --pod-network-cidr=10.246.0.0/16 --apiserver-advertise-address=10.5.0.5curl -s https://docs.projectcalico.org/v3.8/manifests/calico.yaml > calico-3.8.2.yamlPOD_CIDR="10.246.0.0/16" sed -i -e "s?192.168.0.0/16?$POD_CIDR?g" calico-3.8.2.yamlkubectl apply -f calico-3.8.2.yamlNAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-65b8787765-rdkrm 0/1 CrashLoopBackOff 7 13m
kube-system calico-node-x5qcv 1/1 Running 0 13m
kube-system coredns-5c98db65d4-fhb4c 0/1 CrashLoopBackOff 6 17m
kube-system coredns-5c98db65d4-rzd5c 0/1 CrashLoopBackOff 6 17m
kube-system etcd-dcc-5-host-3 1/1 Running 0 16m
kube-system kube-apiserver-dcc-5-host-3 1/1 Running 0 16m
kube-system kube-controller-manager-dcc-5-host-3 1/1 Running 0 16m
kube-system kube-proxy-k72mq 1/1 Running 0 17m
kube-system kube-scheduler-dcc-5-host-3 1/1 Running 0 16m
kubectl describe pod -n kube-system calico-kube-controllers-65b8787765-rdkrm
Name: calico-kube-controllers-65b8787765-rdkrm
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: debian/37.97.227.88
Start Time: Tue, 27 Aug 2019 11:51:54 +0200
Labels: k8s-app=calico-kube-controllers
pod-template-hash=65b8787765
Annotations: cni.projectcalico.org/podIP: 10.246.201.193/32
scheduler.alpha.kubernetes.io/critical-pod:
Status: Running
IP: 10.246.201.193
Controlled By: ReplicaSet/calico-kube-controllers-65b8787765
Containers:
calico-kube-controllers:
Container ID: docker://6c54ccc03728289128fa7bce0bca06fc704c9241896698616c72389f64735ee9
Image: calico/kube-controllers:v3.8.2
Image ID: docker-pullable://calico/kube-controllers@sha256:afc0e28b569059abc6f5e199048c2b4f1d520dece9b16e4ddc3e4edb477c72ed
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 27 Aug 2019 12:25:05 +0200
Finished: Tue, 27 Aug 2019 12:25:15 +0200
Ready: False
Restart Count: 11
Readiness: exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ENABLED_CONTROLLERS: node
DATASTORE_TYPE: kubernetes
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from calico-kube-controllers-token-jvbr7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
calico-kube-controllers-token-jvbr7:
Type: Secret (a volume populated by a Secret)
SecretName: calico-kube-controllers-token-jvbr7
Optional: false
QoS Class: BestEffort
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 34m (x3 over 34m) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
Normal Scheduled 33m default-scheduler Successfully assigned kube-system/calico-kube-controllers-65b8787765-rdkrm to debian
Warning Unhealthy 32m (x4 over 33m) kubelet, debian Readiness probe failed: Failed to read status file status.json: open status.json: no such file or directory
Normal Pulled 31m (x5 over 33m) kubelet, debian Container image "calico/kube-controllers:v3.8.2" already present on machine
Normal Created 31m (x5 over 33m) kubelet, debian Created container calico-kube-controllers
Normal Started 31m (x5 over 33m) kubelet, debian Started container calico-kube-controllers
Warning BackOff 3m45s (x129 over 33m) kubelet, debian Back-off restarting failed container
image: calico/cni:v3.8.2
image: calico/cni:v3.8.2
image: calico/pod2daemon-flexvol:v3.8.2
image: calico/node:v3.8.2
image: calico/kube-controllers:v3.8.2
Using kubernetes as datastore: CALICO_DATASTORE_TYPE=kubernetes
#: kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
#: uname -a
Linux debian 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08) x86_64 GNU/Linux
#: hostname
debian
md5-565237e8cdd842d5ee946b6e0e286c60
#: apt search docker-ce
docker-ce/stretch,now 5:18.09.8~3-0~debian-stretch amd64 [installed]
md5-31f5afd138e60d6bd0fd63fd627180dc
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether <mac>
inet <public ip> brd <brd> scope global dynamic ens3
valid_lft 85455sec preferred_lft 85455sec
3: ens7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether <mac>
inet 10.5.0.5/24 brd 10.5.0.255 scope global ens7
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether <mac>
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
7: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 10.246.201.192/32 brd 10.246.201.192 scope global tunl0
valid_lft forever preferred_lft forever
33: caliafb5ba50c83@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
34: cali4e91c4a8f9e@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
35: cali794af303831@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
Sorry for the delay on responding.
What are the log messages from coredns? It should still function even if calico-kube-controllers is failing. I would guess that it is having the same problem of being unable to reach 10.96.0.1.
Which pod(s) did you check that 10.96.0.1:443 (kubernetes apiserver service address) was reachable from in the kube-system namespace? I'm asking because I'm curious if it was a host networked pod or not. Also what command did you use to test it?
Could you also test if that address is reachable from the host itself (it should be)?
Please take a look at the kube-proxy logs because it is responsible for configuring the iptables rules for the kubernetes apiserver service DNAT, though I don't expect to see any errors there since it is working from at least some pods.
Sorry to bumb this issue, I got similiar problem once a week.
kubectl -n kube-system logs calico-kube-controllers-99bd8767b-gcgnr
2019-12-29 11:39:22.077 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:39:22.086 [ERROR][1] main.go 203: Failed to verify datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:39:54.106 [ERROR][1] main.go 234: Failed to reach apiserver error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:40:14.107 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:40:14.107 [ERROR][1] main.go 203: Failed to verify datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:40:46.108 [ERROR][1] main.go 234: Failed to reach apiserver error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:41:06.109 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:41:06.109 [ERROR][1] main.go 203: Failed to verify datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:41:21.276 [ERROR][1] main.go 234: Failed to reach apiserver error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:41:41.278 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:41:41.278 [ERROR][1] main.go 203: Failed to verify datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:41:43.432 [ERROR][1] main.go 234: Failed to reach apiserver error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:42:03.433 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:42:03.433 [ERROR][1] main.go 203: Failed to verify datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
Coredns logs
.:53 │
2019-12-29T11:42:40.755Z [INFO] CoreDNS-1.3.1 │
2019-12-29T11:42:40.755Z [INFO] linux/amd64, go1.11.4, 6b56a9c │
CoreDNS-1.3.1 │
linux/amd64, go1.11.4, 6b56a9c │
2019-12-29T11:42:40.755Z [INFO] plugin/reload: Running configuration MD5 = 5d5369fbc12f985709b924e721217843
Kube proxy logs:
W1128 00:01:52.715334 1 server_others.go:249] Flag proxy-mode="" unknown, assuming iptables proxy
I1128 00:01:52.816149 1 server_others.go:143] Using iptables Proxier.
I1128 00:01:52.831088 1 server.go:534] Version: v1.15.2
I1128 00:01:52.869732 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I1128 00:01:52.870154 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1128 00:01:52.870521 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I1128 00:01:52.870788 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I1128 00:01:52.871383 1 config.go:187] Starting service config controller
I1128 00:01:52.871452 1 controller_utils.go:1029] Waiting for caches to sync for service config controller
I1128 00:01:52.871543 1 config.go:96] Starting endpoints config controller
I1128 00:01:52.871656 1 controller_utils.go:1029] Waiting for caches to sync for endpoints config controller
I1128 00:01:52.971728 1 controller_utils.go:1036] Caches are synced for service config controller
I1128 00:01:52.971861 1 controller_utils.go:1036] Caches are synced for endpoints config controller
W1201 11:39:28.369718 1 reflector.go:302] k8s.io/client-go/informers/factory.go:133: watch of *v1.Service ended with: too old resource version: 28910872 (29429110)
W1229 11:42:08.738268 1 reflector.go:302] k8s.io/client-go/informers/factory.go:133: watch of *v1.Service ended with: too old resource version: 34494839 (34890445)
Any help? Thank you
@nalakawula Is this something that just happens once a week and resolves itself?
What is your setup? On a cloud/bare metal? I'm wondering if your apiserver is becoming unavailable for small bits of time weekly.
Hi @tmjd . That's just happen once a week and resolve itself. I setup on vm (1 master + 2 worker).
Yes, you are correct, this is my hoster issue.
@dododedodonl I'm going to close this issue for now, if you can provide additional info we can re-open this issue and continue investigation.
I have similar issues, which can be reproduced with the following setup:
https://github.com/boeboe/k8s-calico-vagrant
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5b644bc49c-rzb4t 0/1 CrashLoopBackOff 6 8m30s
calico-node-j7vl6 1/1 Running 0 11m
calico-node-p47ps 1/1 Running 0 11m
calico-node-r48ff 1/1 Running 0 11m
coredns-6955765f44-7jm8b 1/1 Running 0 150m
coredns-6955765f44-v5pcb 1/1 Running 0 150m
etcd-k8s-master 1/1 Running 0 150m
kube-apiserver-k8s-master 1/1 Running 0 150m
kube-controller-manager-k8s-master 1/1 Running 0 150m
kube-proxy-9xt25 1/1 Running 0 145m
kube-proxy-fkcn4 1/1 Running 0 148m
kube-proxy-ncdft 1/1 Running 0 150m
kube-scheduler-k8s-master 1/1 Running 0 150m
$ kubectl logs calico-kube-controllers-5b644bc49c-g6zzw -n kube-system
2020-01-17 02:54:48.666 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", ReconcilerPeriod:"5m", CompactionPeriod:"10m", EnabledControllers:"node", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", HealthEnabled:true, SyncNodeLabels:true, DatastoreType:"kubernetes"}
2020-01-17 02:54:48.667 [INFO][1] k8s.go 228: Using Calico IPAM
W0117 02:54:48.667708 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2020-01-17 02:54:48.668 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-01-17 02:54:58.668 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-01-17 02:54:58.668 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
@boeboe Looking at the repo I see that the vagrant machines have IPs in the default Pod cidr 192.168.0.0/16. I think you should change one or the other.
I am facing same issue with Kubernetes Calico :
I am using Calico 3.11 After restart it was giving me below error
kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
2020-02-08 16:41:52.914 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-02-08 16:41:52.914 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
can you try restart docker ? systemctl restart docker
I had the same issue in a kops-AWS build cluster with 3 master nodes.
The issue raised upgrading the kubernetes version of the cluster (a minor upgrade in the same major release).
The upgrade process got stuck upgrading the second master raising the same above-mentioned error in the same pod, calico-kube-controller.
I tried restarting the docker daemon in the master that got stuck in the upgrade process and didn't work.
In the end, I restarted the master box completely and after all the pods get restarted the upgrade process resumed.
I am facing same issue with Kubernetes Calico v3.11.2 :
kubectl logs -f -n kube-system calico-kube-controllers-59b987858b-rh4kg
2020-04-16 02:10:56.169 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", ReconcilerPeriod:"5m", CompactionPeriod:"10m", EnabledControllers:"node", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", HealthEnabled:true, SyncNodeLabels:true, DatastoreType:"kubernetes"}
2020-04-16 02:10:56.171 [INFO][1] k8s.go 228: Using Calico IPAM
W0416 02:10:56.171389 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2020-04-16 02:10:56.172 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-04-16 02:11:06.172 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.68.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-04-16 02:11:06.172 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.68.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
I have the same issue after upgrading to calico in a kops managed kubernetes cluster.
Even with a previously working controller if I have to restart the master node the controller then start throwing the errors mentioned above and doesn't get back to a working state. Restarting the controller seems to fix the problem.
However I'm afraid that's just a temporary fix
I also had to reboot the nodes in order to fix this issue, restarting services and deleting the Pods to have them recreated didn't solve the problem for me.
To be completely honest, I'm running with a self-installed docker on RHEL8 with --iptables=false. So the environment is a bit different from what I used to deploy on RHEL7 (and where I didn't run into this issue)
I am getting same error
2020-07-14T09:32:54.336189924Z 2020-07-14 09:32:54.334 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
2020-07-14T09:32:54.337157642Z W0714 09:32:54.337061 1 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2020-07-14T09:32:54.338248339Z 2020-07-14 09:32:54.338 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-07-14T09:33:04.339300508Z 2020-07-14 09:33:04.338 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-07-14T09:33:04.339366264Z 2020-07-14 09:33:04.338 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-07-14T09:32:54.336189924Z 2020-07-14 09:32:54.334 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
2020-07-14T09:32:54.337157642Z W0714 09:32:54.337061 1 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2020-07-14T09:32:54.338248339Z 2020-07-14 09:32:54.338 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-07-14T09:33:04.339300508Z 2020-07-14 09:33:04.338 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-07-14T09:33:04.339366264Z 2020-07-14 09:33:04.338 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-07-14T09:38:18.357972316Z 2020-07-14 09:38:18.355 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
2020-07-14T09:38:18.360279112Z W0714 09:38:18.358068 1 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2020-07-14T09:38:18.360295731Z 2020-07-14 09:38:18.359 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-07-14T09:38:28.360679698Z 2020-07-14 09:38:28.360 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-07-14T09:38:28.360732833Z 2020-07-14 09:38:28.360 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
Hi,
I see the same problem during a kops upgrade:
2020-08-17 06:58:42.376 [ERROR][1] main.go 207: Failed to verify datastore error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 06:59:14.376 [ERROR][1] main.go 238: Failed to reach apiserver error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 06:59:34.376 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 06:59:34.376 [ERROR][1] main.go 207: Failed to verify datastore error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 07:00:06.376 [ERROR][1] main.go 238: Failed to reach apiserver error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 07:00:26.377 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 07:00:26.377 [ERROR][1] main.go 207: Failed to verify datastore error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
Killing the calico controller POD solves the problem.
The setup is a single-master multi-node cluster. kops is version 1.18.0 and the Upgrade for k8s was from 1.17.9 to 1.18.8.
kops changed calico version from v3.13.4 to v3.15.1 also replacing a lot of CRDs.
This sounds potentially to be a side-effect of this issue: https://github.com/projectcalico/libcalico-go/issues/1267
Does anyone have any calico/node logs from a cluster that has been impacted by this?
Specifically we'd be looking for logs that look something like the following from calico/node (NOT kube-controllers):
2020-07-09 03:58:50.212 [INFO][51] watchercache.go 206: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/ippools" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/ippools: dial tcp 10.96.0.1:443: connect: connection refused
2020-07-09 03:58:50.220 [INFO][51] watchercache.go 206: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/nodes" error=Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: connect: no route to host
I can reproduce the problem on a single-master single-node playground cluster.
But I don't see any of the above mentioned log entries.
Instead, it seems calico-node startup is successful: https://gist.github.com/czunker/e1f8d7b3533625914fa4e9ec0fdc2a5e
This calico-node was running on the k8s master.
Facing same issue
2020-08-28 01:44:23.778 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0828 01:44:23.779818 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2020-08-28 01:44:23.780 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-08-28 01:44:33.781 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-28 01:44:33.781 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
[root@www ~]# kubectl get pod -A |grep calico
kube-system calico-kube-controllers-7d9fc8f4-s8sfr 0/1 CrashLoopBackOff 12 15m
kube-system calico-node-92kxw 0/1 Running 2 25m
kube-system calico-node-mbsns 1/1 Running 0 25m
kube-system calico-node-rnlb2 1/1 Running 0 25m
kube-system calico-node-xbdwk 1/1 Running 0 25m
[root@www ~]# kubectl logs calico-kube-controllers-7d9fc8f4-s8sfr -n kube-system
2020-09-19 12:49:24.700 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0919 12:49:24.702778 1 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2020-09-19 12:49:24.704 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-09-19 12:49:34.705 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2020-09-19 12:49:34.705 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
Facing same issue
2020-08-28 01:44:23.778 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"} W0828 01:44:23.779818 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. 2020-08-28 01:44:23.780 [INFO][1] main.go 109: Ensuring Calico datastore is initialized 2020-08-28 01:44:33.781 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded 2020-08-28 01:44:33.781 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
for me , my nodes's network is 192.168.44.0/24, i fixed this exception after changed pod network from 192.168.0.0/16 to 192.169.0.0/16
i fixed this exception after changed pod network from 192.16
8.0.0/16 to 192.169.0.0/16
Don't do this. 192.169.0.0/16 is not an unallocated CIDR. Someone owns that network, and it's publicly routable.
I see the similar issue, any idea or update for a solution?
Same problem. Calico 3.16.4 on K8S 1.18.9
calico-node is up and running, as are all the control plane pods (coredns, kube-proxy and others)
calico-kube-controllers is failing for the above mentioned issue (same logs)
@dododedodonl Can you consider reopening this issue since it's seemingly unresolved and affecting other users?
@dododedodonl
Same problem. Calico v3.14.1 on K8S 1.18.8,Do you have any good Suggestions? Thanks.

2020-08-28 01:44:33.781 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
So, this error message can be caused by a number of different root causes. It is effectively saying that it took more than 10 seconds for kube-controllers to perform its initial connection with the kubernetes API server.
This can, for example, be caused by things like:
I suspect, from reading the comments above, that different users in this thread are hitting different root causes of the error message, since some cases are subtly different than others.
I think there are two main classes of issue here:
The first scenario sounds likely to be a misconfiguration of the cluster or a broken dependency (think kube-proxy not working, or security groups not configured to allow the right traffic).
The second sounds like it might be this issue - https://github.com/projectcalico/libcalico-go/issues/1267. In that issue, the connection to the API server is broken, but due to limitations in the Kubernetes client library and the golang HTTP library, Calico is not made aware of this, and so isn't able to take action.
Personally I haven't been able to reproduce yet, but I am looking into the latter scenario and hope to have a fix soon. It's somewhat dependent on upstream fixes in golang / the kubernetes client.
Can see my comment on the linked issue: https://github.com/projectcalico/libcalico-go/issues/1267#issuecomment-729279578
Looks like a fix for that particular issue has been merged to Kubernetes and is targeting v1.20. Once v1.20 it released we should be able to bump our pins and pick up the fix.
Most helpful comment
Facing same issue