Calico: kube-controllers failing to query API: Context deadline exceeded

Created on 27 Aug 2019  Â·  27Comments  Â·  Source: projectcalico/calico

Expected Behavior

When installing a fresh master node, I expect when I apply the calico yaml that the pod network starts, as did with another setup I installed.

Current Behavior

coredns does not start, calico-node starts without a problem, but calico-kube-controller outputs an error:

2019-08-27 09:52:10.607 [INFO][1] main.go 113: Ensuring Calico datastore is initialized
2019-08-27 09:52:20.608 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-08-27 09:52:20.608 [FATAL][1] main.go 118: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

I verified 10.96.0.1:443 is running and reachable from containers in the kube-system namespace.

All interfaces get created on the host, and all containers that ask for an IP get one.

Possible Solution

None as of yet.

Steps to Reproduce (for bugs)

  1. kubeadm init --kubernetes-version=v1.15.2 --pod-network-cidr=10.246.0.0/16 --apiserver-advertise-address=10.5.0.5
  2. curl -s https://docs.projectcalico.org/v3.8/manifests/calico.yaml > calico-3.8.2.yaml
  3. POD_CIDR="10.246.0.0/16" sed -i -e "s?192.168.0.0/16?$POD_CIDR?g" calico-3.8.2.yaml
  4. kubectl apply -f calico-3.8.2.yaml

Context

NAMESPACE     NAME                                       READY   STATUS             RESTARTS   AGE
kube-system   calico-kube-controllers-65b8787765-rdkrm   0/1     CrashLoopBackOff   7          13m
kube-system   calico-node-x5qcv                          1/1     Running            0          13m
kube-system   coredns-5c98db65d4-fhb4c                   0/1     CrashLoopBackOff   6          17m
kube-system   coredns-5c98db65d4-rzd5c                   0/1     CrashLoopBackOff   6          17m
kube-system   etcd-dcc-5-host-3                          1/1     Running            0          16m
kube-system   kube-apiserver-dcc-5-host-3                1/1     Running            0          16m
kube-system   kube-controller-manager-dcc-5-host-3       1/1     Running            0          16m
kube-system   kube-proxy-k72mq                           1/1     Running            0          17m
kube-system   kube-scheduler-dcc-5-host-3                1/1     Running            0          16m

kubectl describe pod -n kube-system calico-kube-controllers-65b8787765-rdkrm

Name:                 calico-kube-controllers-65b8787765-rdkrm
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 debian/37.97.227.88
Start Time:           Tue, 27 Aug 2019 11:51:54 +0200
Labels:               k8s-app=calico-kube-controllers
                      pod-template-hash=65b8787765
Annotations:          cni.projectcalico.org/podIP: 10.246.201.193/32
                      scheduler.alpha.kubernetes.io/critical-pod: 
Status:               Running
IP:                   10.246.201.193
Controlled By:        ReplicaSet/calico-kube-controllers-65b8787765
Containers:
  calico-kube-controllers:
    Container ID:   docker://6c54ccc03728289128fa7bce0bca06fc704c9241896698616c72389f64735ee9
    Image:          calico/kube-controllers:v3.8.2
    Image ID:       docker-pullable://calico/kube-controllers@sha256:afc0e28b569059abc6f5e199048c2b4f1d520dece9b16e4ddc3e4edb477c72ed
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 27 Aug 2019 12:25:05 +0200
      Finished:     Tue, 27 Aug 2019 12:25:15 +0200
    Ready:          False
    Restart Count:  11
    Readiness:      exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ENABLED_CONTROLLERS:  node
      DATASTORE_TYPE:       kubernetes
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from calico-kube-controllers-token-jvbr7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  calico-kube-controllers-token-jvbr7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-kube-controllers-token-jvbr7
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  beta.kubernetes.io/os=linux
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                    From                   Message
  ----     ------            ----                   ----                   -------
  Warning  FailedScheduling  34m (x3 over 34m)      default-scheduler      0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Normal   Scheduled         33m                    default-scheduler      Successfully assigned kube-system/calico-kube-controllers-65b8787765-rdkrm to debian
  Warning  Unhealthy         32m (x4 over 33m)      kubelet, debian        Readiness probe failed: Failed to read status file status.json: open status.json: no such file or directory
  Normal   Pulled            31m (x5 over 33m)      kubelet, debian        Container image "calico/kube-controllers:v3.8.2" already present on machine
  Normal   Created           31m (x5 over 33m)      kubelet, debian        Created container calico-kube-controllers
  Normal   Started           31m (x5 over 33m)      kubelet, debian        Started container calico-kube-controllers
  Warning  BackOff           3m45s (x129 over 33m)  kubelet, debian        Back-off restarting failed container

Your Environment

  • Calico version
image: calico/cni:v3.8.2
image: calico/cni:v3.8.2
image: calico/pod2daemon-flexvol:v3.8.2
image: calico/node:v3.8.2
image: calico/kube-controllers:v3.8.2

Using kubernetes as datastore: CALICO_DATASTORE_TYPE=kubernetes

  • Orchestrator version (e.g. kubernetes, mesos, rkt):
#: kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
  • Operating System and version:
#: uname -a
Linux debian 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08) x86_64 GNU/Linux
#: hostname
debian



md5-565237e8cdd842d5ee946b6e0e286c60



#: apt search docker-ce
docker-ce/stretch,now 5:18.09.8~3-0~debian-stretch amd64 [installed]



md5-31f5afd138e60d6bd0fd63fd627180dc



1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether <mac>
    inet <public ip> brd <brd> scope global dynamic ens3
       valid_lft 85455sec preferred_lft 85455sec
3: ens7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether <mac>
    inet 10.5.0.5/24 brd 10.5.0.255 scope global ens7
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether <mac>
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
7: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 10.246.201.192/32 brd 10.246.201.192 scope global tunl0
       valid_lft forever preferred_lft forever
33: caliafb5ba50c83@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
34: cali4e91c4a8f9e@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
35: cali794af303831@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
impachigh kinbug likelihoohigh

Most helpful comment

Facing same issue

2020-08-28 01:44:23.778 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0828 01:44:23.779818       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020-08-28 01:44:23.780 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-08-28 01:44:33.781 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-28 01:44:33.781 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

All 27 comments

Sorry for the delay on responding.
What are the log messages from coredns? It should still function even if calico-kube-controllers is failing. I would guess that it is having the same problem of being unable to reach 10.96.0.1.

Which pod(s) did you check that 10.96.0.1:443 (kubernetes apiserver service address) was reachable from in the kube-system namespace? I'm asking because I'm curious if it was a host networked pod or not. Also what command did you use to test it?
Could you also test if that address is reachable from the host itself (it should be)?

Please take a look at the kube-proxy logs because it is responsible for configuring the iptables rules for the kubernetes apiserver service DNAT, though I don't expect to see any errors there since it is working from at least some pods.

Sorry to bumb this issue, I got similiar problem once a week.
kubectl -n kube-system logs calico-kube-controllers-99bd8767b-gcgnr


2019-12-29 11:39:22.077 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:39:22.086 [ERROR][1] main.go 203: Failed to verify datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:39:54.106 [ERROR][1] main.go 234: Failed to reach apiserver error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:40:14.107 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:40:14.107 [ERROR][1] main.go 203: Failed to verify datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:40:46.108 [ERROR][1] main.go 234: Failed to reach apiserver error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:41:06.109 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:41:06.109 [ERROR][1] main.go 203: Failed to verify datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:41:21.276 [ERROR][1] main.go 234: Failed to reach apiserver error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:41:41.278 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:41:41.278 [ERROR][1] main.go 203: Failed to verify datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:41:43.432 [ERROR][1] main.go 234: Failed to reach apiserver error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:42:03.433 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-12-29 11:42:03.433 [ERROR][1] main.go 203: Failed to verify datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

Coredns logs

 .:53                                                                                                                                                      │
2019-12-29T11:42:40.755Z [INFO] CoreDNS-1.3.1                                                                                                             │
2019-12-29T11:42:40.755Z [INFO] linux/amd64, go1.11.4, 6b56a9c                                                                                            │
CoreDNS-1.3.1                                                                                                                                             │
linux/amd64, go1.11.4, 6b56a9c                                                                                                                            │
2019-12-29T11:42:40.755Z [INFO] plugin/reload: Running configuration MD5 = 5d5369fbc12f985709b924e721217843

Kube proxy logs:

W1128 00:01:52.715334       1 server_others.go:249] Flag proxy-mode="" unknown, assuming iptables proxy
I1128 00:01:52.816149       1 server_others.go:143] Using iptables Proxier.
I1128 00:01:52.831088       1 server.go:534] Version: v1.15.2
I1128 00:01:52.869732       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I1128 00:01:52.870154       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1128 00:01:52.870521       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I1128 00:01:52.870788       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I1128 00:01:52.871383       1 config.go:187] Starting service config controller
I1128 00:01:52.871452       1 controller_utils.go:1029] Waiting for caches to sync for service config controller
I1128 00:01:52.871543       1 config.go:96] Starting endpoints config controller
I1128 00:01:52.871656       1 controller_utils.go:1029] Waiting for caches to sync for endpoints config controller
I1128 00:01:52.971728       1 controller_utils.go:1036] Caches are synced for service config controller
I1128 00:01:52.971861       1 controller_utils.go:1036] Caches are synced for endpoints config controller
W1201 11:39:28.369718       1 reflector.go:302] k8s.io/client-go/informers/factory.go:133: watch of *v1.Service ended with: too old resource version: 28910872 (29429110)
W1229 11:42:08.738268       1 reflector.go:302] k8s.io/client-go/informers/factory.go:133: watch of *v1.Service ended with: too old resource version: 34494839 (34890445)

Any help? Thank you

@nalakawula Is this something that just happens once a week and resolves itself?
What is your setup? On a cloud/bare metal? I'm wondering if your apiserver is becoming unavailable for small bits of time weekly.

Hi @tmjd . That's just happen once a week and resolve itself. I setup on vm (1 master + 2 worker).
Yes, you are correct, this is my hoster issue.

@dododedodonl I'm going to close this issue for now, if you can provide additional info we can re-open this issue and continue investigation.

I have similar issues, which can be reproduced with the following setup:

https://github.com/boeboe/k8s-calico-vagrant

$ kubectl get pods -n kube-system

    NAME                                       READY   STATUS             RESTARTS   AGE
    calico-kube-controllers-5b644bc49c-rzb4t   0/1     CrashLoopBackOff   6          8m30s
    calico-node-j7vl6                          1/1     Running            0          11m
    calico-node-p47ps                          1/1     Running            0          11m
    calico-node-r48ff                          1/1     Running            0          11m
    coredns-6955765f44-7jm8b                   1/1     Running            0          150m
    coredns-6955765f44-v5pcb                   1/1     Running            0          150m
    etcd-k8s-master                            1/1     Running            0          150m
    kube-apiserver-k8s-master                  1/1     Running            0          150m
    kube-controller-manager-k8s-master         1/1     Running            0          150m
    kube-proxy-9xt25                           1/1     Running            0          145m
    kube-proxy-fkcn4                           1/1     Running            0          148m
    kube-proxy-ncdft                           1/1     Running            0          150m
    kube-scheduler-k8s-master                  1/1     Running            0          150m

$ kubectl logs calico-kube-controllers-5b644bc49c-g6zzw  -n kube-system

    2020-01-17 02:54:48.666 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", ReconcilerPeriod:"5m", CompactionPeriod:"10m", EnabledControllers:"node", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", HealthEnabled:true, SyncNodeLabels:true, DatastoreType:"kubernetes"}
    2020-01-17 02:54:48.667 [INFO][1] k8s.go 228: Using Calico IPAM
    W0117 02:54:48.667708       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
    2020-01-17 02:54:48.668 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
    2020-01-17 02:54:58.668 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
    2020-01-17 02:54:58.668 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

@boeboe Looking at the repo I see that the vagrant machines have IPs in the default Pod cidr 192.168.0.0/16. I think you should change one or the other.

I am facing same issue with Kubernetes Calico :

I am using Calico 3.11 After restart it was giving me below error

kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
2020-02-08 16:41:52.914 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-02-08 16:41:52.914 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

can you try restart docker ? systemctl restart docker

I had the same issue in a kops-AWS build cluster with 3 master nodes.

The issue raised upgrading the kubernetes version of the cluster (a minor upgrade in the same major release).
The upgrade process got stuck upgrading the second master raising the same above-mentioned error in the same pod, calico-kube-controller.

I tried restarting the docker daemon in the master that got stuck in the upgrade process and didn't work.
In the end, I restarted the master box completely and after all the pods get restarted the upgrade process resumed.

I am facing same issue with Kubernetes Calico v3.11.2 :
kubectl logs -f -n kube-system calico-kube-controllers-59b987858b-rh4kg

2020-04-16 02:10:56.169 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", ReconcilerPeriod:"5m", CompactionPeriod:"10m", EnabledControllers:"node", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", HealthEnabled:true, SyncNodeLabels:true, DatastoreType:"kubernetes"}
2020-04-16 02:10:56.171 [INFO][1] k8s.go 228: Using Calico IPAM
W0416 02:10:56.171389 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2020-04-16 02:10:56.172 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-04-16 02:11:06.172 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.68.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-04-16 02:11:06.172 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.68.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

I have the same issue after upgrading to calico in a kops managed kubernetes cluster.
Even with a previously working controller if I have to restart the master node the controller then start throwing the errors mentioned above and doesn't get back to a working state. Restarting the controller seems to fix the problem.
However I'm afraid that's just a temporary fix

I also had to reboot the nodes in order to fix this issue, restarting services and deleting the Pods to have them recreated didn't solve the problem for me.
To be completely honest, I'm running with a self-installed docker on RHEL8 with --iptables=false. So the environment is a bit different from what I used to deploy on RHEL7 (and where I didn't run into this issue)

I am getting same error

2020-07-14T09:32:54.336189924Z 2020-07-14 09:32:54.334 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
2020-07-14T09:32:54.337157642Z W0714 09:32:54.337061       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020-07-14T09:32:54.338248339Z 2020-07-14 09:32:54.338 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-07-14T09:33:04.339300508Z 2020-07-14 09:33:04.338 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-07-14T09:33:04.339366264Z 2020-07-14 09:33:04.338 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-07-14T09:32:54.336189924Z 2020-07-14 09:32:54.334 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
2020-07-14T09:32:54.337157642Z W0714 09:32:54.337061       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020-07-14T09:32:54.338248339Z 2020-07-14 09:32:54.338 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-07-14T09:33:04.339300508Z 2020-07-14 09:33:04.338 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-07-14T09:33:04.339366264Z 2020-07-14 09:33:04.338 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-07-14T09:38:18.357972316Z 2020-07-14 09:38:18.355 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
2020-07-14T09:38:18.360279112Z W0714 09:38:18.358068       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020-07-14T09:38:18.360295731Z 2020-07-14 09:38:18.359 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-07-14T09:38:28.360679698Z 2020-07-14 09:38:28.360 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-07-14T09:38:28.360732833Z 2020-07-14 09:38:28.360 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

Hi,

I see the same problem during a kops upgrade:

2020-08-17 06:58:42.376 [ERROR][1] main.go 207: Failed to verify datastore error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 06:59:14.376 [ERROR][1] main.go 238: Failed to reach apiserver error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 06:59:34.376 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 06:59:34.376 [ERROR][1] main.go 207: Failed to verify datastore error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 07:00:06.376 [ERROR][1] main.go 238: Failed to reach apiserver error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 07:00:26.377 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 07:00:26.377 [ERROR][1] main.go 207: Failed to verify datastore error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

Killing the calico controller POD solves the problem.

The setup is a single-master multi-node cluster. kops is version 1.18.0 and the Upgrade for k8s was from 1.17.9 to 1.18.8.
kops changed calico version from v3.13.4 to v3.15.1 also replacing a lot of CRDs.

This sounds potentially to be a side-effect of this issue: https://github.com/projectcalico/libcalico-go/issues/1267

Does anyone have any calico/node logs from a cluster that has been impacted by this?

Specifically we'd be looking for logs that look something like the following from calico/node (NOT kube-controllers):

2020-07-09 03:58:50.212 [INFO][51] watchercache.go 206: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/ippools" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/ippools: dial tcp 10.96.0.1:443: connect: connection refused
2020-07-09 03:58:50.220 [INFO][51] watchercache.go 206: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/nodes" error=Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: connect: no route to host

I can reproduce the problem on a single-master single-node playground cluster.
But I don't see any of the above mentioned log entries.
Instead, it seems calico-node startup is successful: https://gist.github.com/czunker/e1f8d7b3533625914fa4e9ec0fdc2a5e
This calico-node was running on the k8s master.

Facing same issue

2020-08-28 01:44:23.778 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0828 01:44:23.779818       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020-08-28 01:44:23.780 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-08-28 01:44:33.781 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-28 01:44:33.781 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
[root@www ~]# kubectl get pod -A |grep calico
kube-system   calico-kube-controllers-7d9fc8f4-s8sfr   0/1     CrashLoopBackOff   12         15m
kube-system   calico-node-92kxw                        0/1     Running            2          25m
kube-system   calico-node-mbsns                        1/1     Running            0          25m
kube-system   calico-node-rnlb2                        1/1     Running            0          25m
kube-system   calico-node-xbdwk                        1/1     Running            0          25m
[root@www ~]# kubectl logs calico-kube-controllers-7d9fc8f4-s8sfr  -n kube-system
2020-09-19 12:49:24.700 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0919 12:49:24.702778       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020-09-19 12:49:24.704 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-09-19 12:49:34.705 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2020-09-19 12:49:34.705 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded

Facing same issue

2020-08-28 01:44:23.778 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0828 01:44:23.779818       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020-08-28 01:44:23.780 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-08-28 01:44:33.781 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-28 01:44:33.781 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

for me , my nodes's network is 192.168.44.0/24, i fixed this exception after changed pod network from 192.168.0.0/16 to 192.169.0.0/16

i fixed this exception after changed pod network from 192.168.0.0/16 to 192.169.0.0/16

Don't do this. 192.169.0.0/16 is not an unallocated CIDR. Someone owns that network, and it's publicly routable.

I see the similar issue, any idea or update for a solution?

Same problem. Calico 3.16.4 on K8S 1.18.9

calico-node is up and running, as are all the control plane pods (coredns, kube-proxy and others)

calico-kube-controllers is failing for the above mentioned issue (same logs)

@dododedodonl Can you consider reopening this issue since it's seemingly unresolved and affecting other users?

@dododedodonl
Same problem. Calico v3.14.1 on K8S 1.18.8,Do you have any good Suggestions? Thanks.
image

2020-08-28 01:44:33.781 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

So, this error message can be caused by a number of different root causes. It is effectively saying that it took more than 10 seconds for kube-controllers to perform its initial connection with the kubernetes API server.

This can, for example, be caused by things like:

  • kube-proxy not functioning properly on the node (since access is via a k8s service)
  • API server not running / not healthy (for obvious reasons)
  • A problem with the underlying network or security groups.
  • A problem with pod networking (since kube-controllers is run as a pod)
  • Probably several other things.

I suspect, from reading the comments above, that different users in this thread are hitting different root causes of the error message, since some cases are subtly different than others.

I think there are two main classes of issue here:

  • Those which occur reliably and consistently, despite restarting the pod.
  • Those which occur as a result of a change in the cluster (e.g., kubernetes upgrade) and are fixed by restarting the kube-controllerds pod.

The first scenario sounds likely to be a misconfiguration of the cluster or a broken dependency (think kube-proxy not working, or security groups not configured to allow the right traffic).

The second sounds like it might be this issue - https://github.com/projectcalico/libcalico-go/issues/1267. In that issue, the connection to the API server is broken, but due to limitations in the Kubernetes client library and the golang HTTP library, Calico is not made aware of this, and so isn't able to take action.

Personally I haven't been able to reproduce yet, but I am looking into the latter scenario and hope to have a fix soon. It's somewhat dependent on upstream fixes in golang / the kubernetes client.

Can see my comment on the linked issue: https://github.com/projectcalico/libcalico-go/issues/1267#issuecomment-729279578

Looks like a fix for that particular issue has been merged to Kubernetes and is targeting v1.20. Once v1.20 it released we should be able to bump our pins and pick up the fix.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jpiper picture jpiper  Â·  4Comments

caseydavenport picture caseydavenport  Â·  3Comments

stoyanr picture stoyanr  Â·  3Comments

winromulus picture winromulus  Â·  3Comments

redbaron picture redbaron  Â·  3Comments