Autoscaler: Nil pointer panic in CA 1.15.1

Created on 25 Jul 2019  路  6Comments  路  Source: kubernetes/autoscaler

Running some tests to see if I can trigger #2154 eventually, and I got a nil pointer panic:

I0724 23:38:52.424553       1 scale_up.go:420] No expansion options
I0724 23:38:52.425447       1 scale_down.go:428] Scale-down calculation: ignoring 10 nodes unremovable in the last 5m0s
I0724 23:39:02.590388       1 static_autoscaler.go:255] 1 unregistered nodes present
I0724 23:39:02.598031       1 utils.go:619] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop
I0724 23:39:02.599269       1 scale_up.go:265] Pod kube-system/keyvault-flexvolume-qtvzm is unschedulable
I0724 23:39:02.599297       1 scale_up.go:265] Pod kube-system/keyvault-flexvolume-hz4vh is unschedulable
I0724 23:39:02.599305       1 scale_up.go:265] Pod kube-system/keyvault-flexvolume-bh55h is unschedulable
I0724 23:39:02.599311       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-9kvvr is unschedulable
I0724 23:39:02.599346       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-78rlf is unschedulable
I0724 23:39:02.599386       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-d9c7p is unschedulable
I0724 23:39:02.599395       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-2g9jm is unschedulable
I0724 23:39:02.599434       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-jwhvw is unschedulable
I0724 23:39:02.599458       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-9r9d6 is unschedulable
I0724 23:39:02.599472       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-w84zk is unschedulable
I0724 23:39:02.599483       1 scale_up.go:265] Pod kube-system/blobfuse-flexvol-installer-jdrqq is unschedulable
I0724 23:39:02.599525       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-t68dc is unschedulable
I0724 23:39:02.599548       1 scale_up.go:265] Pod kube-system/blobfuse-flexvol-installer-5p7rp is unschedulable
I0724 23:39:02.599556       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-vnx44 is unschedulable
I0724 23:39:02.599566       1 scale_up.go:265] Pod kube-system/blobfuse-flexvol-installer-2rwxf is unschedulable
I0724 23:39:02.599573       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-9mjn5 is unschedulable
I0724 23:39:02.599622       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-l28rl is unschedulable
I0724 23:39:02.599644       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-4648h is unschedulable
I0724 23:39:02.599651       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-8d4bg is unschedulable
I0724 23:39:02.599659       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-bvjbr is unschedulable
I0724 23:39:02.599713       1 scale_up.go:268] 7 other pods are also unschedulable
I0724 23:39:02.606144       1 scale_up.go:420] No expansion options
I0724 23:39:02.702743       1 scale_down.go:428] Scale-down calculation: ignoring 10 nodes unremovable in the last 5m0s
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x2223f75]

goroutine 50 [running]:
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).updateReadinessStats(0xc000197a20, 0xbf4658e82af9bedd, 0x6901bb56857, 0x4cb5be0)
    /gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:576 +0x9a5
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).UpdateNodes(0xc000197a20, 0xc000863280, 0xb, 0x10, 0xc001b1fda0, 0xbf4658e82af9bedd, 0x6901bb56857, 0x4cb5be0, 0x0, 0x0)
    /gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:310 +0x227
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).updateClusterState(0xc000ea1360, 0xc000863280, 0xb, 0x10, 0xc001b1fda0, 0xbf4658e82af9bedd, 0x6901bb56857, 0x4cb5be0, 0xc001fc28c0, 0x5)
    /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:568 +0x94
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000ea1360, 0xbf4658e82af9bedd, 0x6901bb56857, 0x4cb5be0, 0x0, 0x0)
    /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:217 +0x5e6
main.run(0xc0002c80a0)
    /gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:331 +0x296
main.main.func2(0x2ff6580, 0xc000458f00)
    /gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:403 +0x2a
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
    /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0xemor

This was cluster-autoscaler 1.15.1 on a Kubernetes 1.16.0-alpha.1 cluster on Azure, provisioned with aks-engine.

This may not be reproducible easily, but I'm posting the panic here in case there's an obvious if foo == nil check that we can add.

cluster-autoscaler kinbug

Most helpful comment

@pdbogen While you're probably right, #2096 was merged to master back in mid June and it hasn't been merged into a cluster-autoscaler release branch yet.
Is it possible to get it merged?
Or perhaps i should be tagging @bskiba apologies

All 6 comments

/kind bug

I also encountered this error:

I0820 20:07:42.849315       1 static_autoscaler.go:398] ip-10-3-198-184.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849333       1 static_autoscaler.go:398] ip-10-3-220-20.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849347       1 static_autoscaler.go:398] ip-10-3-75-152.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849360       1 static_autoscaler.go:398] ip-10-3-190-161.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849374       1 static_autoscaler.go:398] ip-10-3-161-213.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849388       1 static_autoscaler.go:398] ip-10-3-153-189.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849426       1 static_autoscaler.go:398] ip-10-3-71-239.ec2.internal is unneeded since 2019-08-20 20:04:31.122629411 +0000 UTC m=+105.408520828 duration 3m11.71988856s
I0820 20:07:42.849443       1 static_autoscaler.go:398] ip-10-3-178-147.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849458       1 static_autoscaler.go:409] Scale down status: unneededOnly=true lastScaleUpTime=2019-08-20 20:02:50.03392415 +0000 UTC m=+4.319815527 lastScaleDownDeleteTime=2019-08-20 20:02:50.03392427 +0000 UTC m=+4.319815647 lastScaleDownFailTime=2019-08-20 20:02:50.0339244 +0000 UTC m=+4.319815777 scaleDownForbidden=false isDeleteInProgress=false
I0820 20:07:52.892635       1 static_autoscaler.go:187] Starting main loop
I0820 20:07:52.893537       1 auto_scaling_groups.go:320] Regenerating instance to ASG map for ASGs: [kube-test-dev-iad-nodes-mixed kube-test-dev-iad-nodes-mixed-1804]
I0820 20:07:53.064726       1 auto_scaling_groups.go:102] Updating ASG kube-test-dev-iad-nodes-mixed
I0820 20:07:53.064892       1 aws_manager.go:255] Refreshed ASG list, next refresh after 2019-08-20 20:08:53.064886765 +0000 UTC m=+367.350778202
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x2223f75]

goroutine 42 [running]:
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).updateReadinessStats(0xc00100af20, 0xbf4f32a635342384, 0x478543f305, 0x4cb5be0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:576 +0x9a5
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).UpdateNodes(0xc00100af20, 0xc00167d400, 0x11, 0x20, 0xc000572d50, 0xbf4f32a635342384, 0x478543f305, 0x4cb5be0, 0x0, 0x0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:310 +0x227
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).updateClusterState(0xc0007d8c80, 0xc00167d400, 0x11, 0x20, 0xc000572d50, 0xbf4f32a635342384, 0x478543f305, 0x4cb5be0, 0xc000f5a2a0, 0x3)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:568 +0x94
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc0007d8c80, 0xbf4f32a635342384, 0x478543f305, 0x4cb5be0, 0x0, 0x0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:217 +0x5e6
main.run(0xc00012e140)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:331 +0x296
main.main.func2(0x2ff6580, 0xc0001ab100)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:403 +0x2a
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0xec

I see the exact same sigsegv on a 1.15.3 cluster running 1.15.1 ca

I0821 18:11:35.888036       1 node_instances_cache.go:155] Start refreshing cloud provider node instances cache
I0821 18:11:35.888097       1 node_instances_cache.go:167] Refresh cloud provider node instances cache finished, refresh took 40.059碌s
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x2223f75]

goroutine 2518 [running]:
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).updateReadinessStats(0xc0002ee580, 0xbf4f8036139c1a66, 0x1f479a2e920, 0x4cb5be0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:576 +0x9a5
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).UpdateNodes(0xc0002ee580, 0xc00bd9ee00, 0x2a, 0x40, 0xc00c76a330, 0xbf4f8036139c1a66, 0x1f479a2e920, 0x4cb5be0, 0x0, 0x0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:310 +0x227
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).updateClusterState(0xc0045901e0, 0xc00bd9ee00, 0x2a, 0x40, 0xc00c76a330, 0xbf4f8036139c1a66, 0x1f479a2e920, 0x4cb5be0, 0xc00d14a900, 0x12)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:568 +0x94
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc0045901e0, 0xbf4f8036139c1a66, 0x1f479a2e920, 0x4cb5be0, 0x0, 0x0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:217 +0x5e6
main.run(0xc000176140)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:331 +0x296
main.main.func2(0x2ff6580, 0xc0003e0240)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:403 +0x2a
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0xec

I believe this is fixed by #2096.

@pdbogen While you're probably right, #2096 was merged to master back in mid June and it hasn't been merged into a cluster-autoscaler release branch yet.
Is it possible to get it merged?
Or perhaps i should be tagging @bskiba apologies

Fixed with #2096

Was this page helpful?
0 / 5 - 0 ratings

Related issues

davidquarles picture davidquarles  路  7Comments

clamoriniere picture clamoriniere  路  5Comments

hjkatz picture hjkatz  路  4Comments

hadifarnoud picture hadifarnoud  路  7Comments

chapati23 picture chapati23  路  4Comments