Running some tests to see if I can trigger #2154 eventually, and I got a nil pointer panic:
I0724 23:38:52.424553 1 scale_up.go:420] No expansion options
I0724 23:38:52.425447 1 scale_down.go:428] Scale-down calculation: ignoring 10 nodes unremovable in the last 5m0s
I0724 23:39:02.590388 1 static_autoscaler.go:255] 1 unregistered nodes present
I0724 23:39:02.598031 1 utils.go:619] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop
I0724 23:39:02.599269 1 scale_up.go:265] Pod kube-system/keyvault-flexvolume-qtvzm is unschedulable
I0724 23:39:02.599297 1 scale_up.go:265] Pod kube-system/keyvault-flexvolume-hz4vh is unschedulable
I0724 23:39:02.599305 1 scale_up.go:265] Pod kube-system/keyvault-flexvolume-bh55h is unschedulable
I0724 23:39:02.599311 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-9kvvr is unschedulable
I0724 23:39:02.599346 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-78rlf is unschedulable
I0724 23:39:02.599386 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-d9c7p is unschedulable
I0724 23:39:02.599395 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-2g9jm is unschedulable
I0724 23:39:02.599434 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-jwhvw is unschedulable
I0724 23:39:02.599458 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-9r9d6 is unschedulable
I0724 23:39:02.599472 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-w84zk is unschedulable
I0724 23:39:02.599483 1 scale_up.go:265] Pod kube-system/blobfuse-flexvol-installer-jdrqq is unschedulable
I0724 23:39:02.599525 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-t68dc is unschedulable
I0724 23:39:02.599548 1 scale_up.go:265] Pod kube-system/blobfuse-flexvol-installer-5p7rp is unschedulable
I0724 23:39:02.599556 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-vnx44 is unschedulable
I0724 23:39:02.599566 1 scale_up.go:265] Pod kube-system/blobfuse-flexvol-installer-2rwxf is unschedulable
I0724 23:39:02.599573 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-9mjn5 is unschedulable
I0724 23:39:02.599622 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-l28rl is unschedulable
I0724 23:39:02.599644 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-4648h is unschedulable
I0724 23:39:02.599651 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-8d4bg is unschedulable
I0724 23:39:02.599659 1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-bvjbr is unschedulable
I0724 23:39:02.599713 1 scale_up.go:268] 7 other pods are also unschedulable
I0724 23:39:02.606144 1 scale_up.go:420] No expansion options
I0724 23:39:02.702743 1 scale_down.go:428] Scale-down calculation: ignoring 10 nodes unremovable in the last 5m0s
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x2223f75]
goroutine 50 [running]:
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).updateReadinessStats(0xc000197a20, 0xbf4658e82af9bedd, 0x6901bb56857, 0x4cb5be0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:576 +0x9a5
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).UpdateNodes(0xc000197a20, 0xc000863280, 0xb, 0x10, 0xc001b1fda0, 0xbf4658e82af9bedd, 0x6901bb56857, 0x4cb5be0, 0x0, 0x0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:310 +0x227
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).updateClusterState(0xc000ea1360, 0xc000863280, 0xb, 0x10, 0xc001b1fda0, 0xbf4658e82af9bedd, 0x6901bb56857, 0x4cb5be0, 0xc001fc28c0, 0x5)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:568 +0x94
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000ea1360, 0xbf4658e82af9bedd, 0x6901bb56857, 0x4cb5be0, 0x0, 0x0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:217 +0x5e6
main.run(0xc0002c80a0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:331 +0x296
main.main.func2(0x2ff6580, 0xc000458f00)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:403 +0x2a
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0xemor
This was cluster-autoscaler 1.15.1 on a Kubernetes 1.16.0-alpha.1 cluster on Azure, provisioned with aks-engine.
This may not be reproducible easily, but I'm posting the panic here in case there's an obvious if foo == nil check that we can add.
/kind bug
I also encountered this error:
I0820 20:07:42.849315 1 static_autoscaler.go:398] ip-10-3-198-184.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849333 1 static_autoscaler.go:398] ip-10-3-220-20.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849347 1 static_autoscaler.go:398] ip-10-3-75-152.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849360 1 static_autoscaler.go:398] ip-10-3-190-161.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849374 1 static_autoscaler.go:398] ip-10-3-161-213.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849388 1 static_autoscaler.go:398] ip-10-3-153-189.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849426 1 static_autoscaler.go:398] ip-10-3-71-239.ec2.internal is unneeded since 2019-08-20 20:04:31.122629411 +0000 UTC m=+105.408520828 duration 3m11.71988856s
I0820 20:07:42.849443 1 static_autoscaler.go:398] ip-10-3-178-147.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849458 1 static_autoscaler.go:409] Scale down status: unneededOnly=true lastScaleUpTime=2019-08-20 20:02:50.03392415 +0000 UTC m=+4.319815527 lastScaleDownDeleteTime=2019-08-20 20:02:50.03392427 +0000 UTC m=+4.319815647 lastScaleDownFailTime=2019-08-20 20:02:50.0339244 +0000 UTC m=+4.319815777 scaleDownForbidden=false isDeleteInProgress=false
I0820 20:07:52.892635 1 static_autoscaler.go:187] Starting main loop
I0820 20:07:52.893537 1 auto_scaling_groups.go:320] Regenerating instance to ASG map for ASGs: [kube-test-dev-iad-nodes-mixed kube-test-dev-iad-nodes-mixed-1804]
I0820 20:07:53.064726 1 auto_scaling_groups.go:102] Updating ASG kube-test-dev-iad-nodes-mixed
I0820 20:07:53.064892 1 aws_manager.go:255] Refreshed ASG list, next refresh after 2019-08-20 20:08:53.064886765 +0000 UTC m=+367.350778202
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x2223f75]
goroutine 42 [running]:
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).updateReadinessStats(0xc00100af20, 0xbf4f32a635342384, 0x478543f305, 0x4cb5be0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:576 +0x9a5
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).UpdateNodes(0xc00100af20, 0xc00167d400, 0x11, 0x20, 0xc000572d50, 0xbf4f32a635342384, 0x478543f305, 0x4cb5be0, 0x0, 0x0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:310 +0x227
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).updateClusterState(0xc0007d8c80, 0xc00167d400, 0x11, 0x20, 0xc000572d50, 0xbf4f32a635342384, 0x478543f305, 0x4cb5be0, 0xc000f5a2a0, 0x3)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:568 +0x94
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc0007d8c80, 0xbf4f32a635342384, 0x478543f305, 0x4cb5be0, 0x0, 0x0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:217 +0x5e6
main.run(0xc00012e140)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:331 +0x296
main.main.func2(0x2ff6580, 0xc0001ab100)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:403 +0x2a
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0xec
I see the exact same sigsegv on a 1.15.3 cluster running 1.15.1 ca
I0821 18:11:35.888036 1 node_instances_cache.go:155] Start refreshing cloud provider node instances cache
I0821 18:11:35.888097 1 node_instances_cache.go:167] Refresh cloud provider node instances cache finished, refresh took 40.059碌s
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x2223f75]
goroutine 2518 [running]:
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).updateReadinessStats(0xc0002ee580, 0xbf4f8036139c1a66, 0x1f479a2e920, 0x4cb5be0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:576 +0x9a5
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).UpdateNodes(0xc0002ee580, 0xc00bd9ee00, 0x2a, 0x40, 0xc00c76a330, 0xbf4f8036139c1a66, 0x1f479a2e920, 0x4cb5be0, 0x0, 0x0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:310 +0x227
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).updateClusterState(0xc0045901e0, 0xc00bd9ee00, 0x2a, 0x40, 0xc00c76a330, 0xbf4f8036139c1a66, 0x1f479a2e920, 0x4cb5be0, 0xc00d14a900, 0x12)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:568 +0x94
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc0045901e0, 0xbf4f8036139c1a66, 0x1f479a2e920, 0x4cb5be0, 0x0, 0x0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:217 +0x5e6
main.run(0xc000176140)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:331 +0x296
main.main.func2(0x2ff6580, 0xc0003e0240)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:403 +0x2a
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0xec
I believe this is fixed by #2096.
@pdbogen While you're probably right, #2096 was merged to master back in mid June and it hasn't been merged into a cluster-autoscaler release branch yet.
Is it possible to get it merged?
Or perhaps i should be tagging @bskiba apologies
Fixed with #2096
Most helpful comment
@pdbogen While you're probably right, #2096 was merged to master back in mid June and it hasn't been merged into a cluster-autoscaler release branch yet.
Is it possible to get it merged?
Or perhaps i should be tagging @bskiba apologies