calico-node(v3.6) panic during calico-node init

Created on 21 Mar 2019  路  7Comments  路  Source: projectcalico/calico

Expected Behavior

I'd expect the calico-node to get into running state.

Current Behavior

# kubectl get pods -n kube-system
NAME                                         READY   STATUS       RESTARTS   AGE
calico-kube-controllers-644fcf8fbf-h77v7     0/1     Pending      0          6m11s
calico-node-w2fhp                            0/1     Init:Error   1          4s
coredns-86c58d9df4-6f2pp                     0/1     Pending      0          7m59s
coredns-86c58d9df4-t4w7d                     0/1     Pending      0          7m59s
# kubectl logs calico-node-w2fhp -c upgrade-ipam -n kube-system
2019-03-21 06:52:29.632 [INFO][1] ipam_plugin.go 68: migrating from host-local to calico-ipam...
2019-03-21 06:52:29.636 [INFO][1] migrate.go 63: checking host-local IPAM data dir dir existence...
2019-03-21 06:52:29.636 [INFO][1] migrate.go 70: retrieving node for IPIP tunnel address
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x11963d8]

goroutine 1 [running]:
github.com/projectcalico/cni-plugin/pkg/upgrade.Migrate(0x1654b20, 0xc000040018, 0x1667ea0, 0xc0004c66c0, 0xc00003e015, 0x12, 0x0, 0x0)
    /go/src/github.com/projectcalico/cni-plugin/pkg/upgrade/migrate.go:77 +0x2d8
github.com/projectcalico/cni-plugin/pkg/ipamplugin.Main(0x1629e74, 0x6)
    /go/src/github.com/projectcalico/cni-plugin/pkg/ipamplugin/ipam_plugin.go:91 +0x458
main.main()
    /go/src/github.com/projectcalico/cni-plugin/cmd/calico-ipam/calico-ipam.go:25 +0x39

Possible Solution

Using Calico v3.5.3, and the same steps as below, I can't reproduce this issue (only difference during config: 2. wget https://docs.projectcalico.org/v3.5/getting-started/kubernetes/installation/hosted/calicoctl.yaml)

Steps to Reproduce (for bugs)

  1. kubeadm init --pod-network-cidr=192.169.0.0/16
  2. wget https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
  3. sed -i -e "s?192.168.0.0/16?192.169.0.0/16?g" calico.yaml
  4. kubectl apply -f calico.yaml
  5. kubectl logs calico-node-w2fhp -c upgrade-ipam -n kube-system

Context

Trying to setup a new kubernetes cluster with a single master and applying calico to it.

Your Environment

Kubernetes 1.13
CentOS Linux 7 (Core)
Calico v3.6

kinbug

Most helpful comment

@dnmgns Could you try again using this image: calico/cni:v3.6.0-2-gede7889 ?

That should include my fix - it would be useful to see if it fixes your issue before cutting a release, since I haven't been able to reproduce myself.

All 7 comments

@dnmgns thanks for raising - I think I've got a fix for this and I'll put a PR up soon.

I have a question though - was this on nodes that previously had another networking installation (either Calico or something else?)

The code path that is being hit seems to indicate that Calico is attempting to migrate IPAM allocations from a previous installation, but I wouldn't expect that given the steps you outlined above.

@caseydavenport just to confirm, is the fix on this line: https://github.com/projectcalico/libcalico-go/blob/master/lib/backend/k8s/resources/node.go#L232 to ensure that the BGP object is always set?

This PR should do it, I think: https://github.com/projectcalico/cni-plugin/pull/710

I'm still a bit confused as to why this codepath is actually getting hit, but it seems right to protect against it!

@dnmgns thanks for raising - I think I've got a fix for this and I'll put a PR up soon.

I have a question though - was this on nodes that previously had another networking installation (either Calico or something else?)

The code path that is being hit seems to indicate that Calico is attempting to migrate IPAM allocations from a previous installation, but I wouldn't expect that given the steps you outlined above.

I'm almost certain that it didn't have any other networking installation. If it did, it could only be Calico as I have not installed any other network plugin in this cluster. Between the tests I've performed kubeadm reset, but it may have left some parts that needed upgrade.

I was able to recreate the issue right now when performing an actual upgrade as well, going from 3.5.3.

2019-03-21 20:16:36.025 [INFO][1] migrate.go 70: retrieving node for IPIP tunnel address
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x11963d8]

goroutine 1 [running]:
github.com/projectcalico/cni-plugin/pkg/upgrade.Migrate(0x1654b20, 0xc0000b6000, 0x1667ea0, 0xc0003a2b40, 0xc00003e015, 0x12, 0x0, 0x0)
    /go/src/github.com/projectcalico/cni-plugin/pkg/upgrade/migrate.go:77 +0x2d8
github.com/projectcalico/cni-plugin/pkg/ipamplugin.Main(0x1629e74, 0x6)
    /go/src/github.com/projectcalico/cni-plugin/pkg/ipamplugin/ipam_plugin.go:91 +0x458
main.main()
    /go/src/github.com/projectcalico/cni-plugin/cmd/calico-ipam/calico-ipam.go:25 +0x39

@dnmgns Could you try again using this image: calico/cni:v3.6.0-2-gede7889 ?

That should include my fix - it would be useful to see if it fixes your issue before cutting a release, since I haven't been able to reproduce myself.

@caseydavenport - I'm happy to say that using calico/cni:v3.6.0-2-gede7889 fixes my issue!

Checking the logs for pod/calico-node-* I can no longer find this error. And the calico-node-* + calico-kube-controllers are running fine.

# calicoctl version
Client Version:    v3.6.0
Git commit:        eab86dc2
Cluster Version:   v3.6.0
Cluster Type:      k8s,bgp,kdd

Just let me know in case you'd like some more information.

Thanks!

Fix for this should be in v3.6.1, released today.

Was this page helpful?
0 / 5 - 0 ratings