Autoscaler: Autoscaler fails to scale up nodes with pending pods

Created on 5 Jul 2018 · 32Comments · Source: kubernetes/autoscaler

We are currently running a cluster-autoscaler in AWS, on a kops cluster. We had autoscaling enabled for an ASG where each node would have a taint and a label applied. We would then only schedule pods onto it that had a toleration for that taint.

Our setup was working correctly, until a few days ago, when scale up stopped happening (scale downs were working correctly). Most of the logs we saw were like this:

16:25:53.494 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.493718       1 utils.go:130] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-26tbkn marked as unschedulable can be scheduled on ip-10-79-150-148.ec2.internal. Ignoring in scale up.
16:25:53.494 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.494043       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-24mkp5 marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.494 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.494225       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2xqhmb marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.494 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.494391       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2lb9tj marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.494 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.494558       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2grt2g marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.494 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.494747       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2pkg4w marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.494 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.494908       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2l5dk8 marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.495 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.495072       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-22wjf9 marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.495 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.495244       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2kv4xp marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.495 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.495404       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-25q2bq marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.495 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.495565       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2nqk9j marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.495 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.495733       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-26v2sv marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.496 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.495904       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2dd65s marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.496 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.496062       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2252x4 marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.496 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.496226       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2znsb4 marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.496 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.496390       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2d6xbs marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.496 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.496553       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-24b289 marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.496 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.496723       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-29b6sc marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.497 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.496923       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2j5bkz marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.497 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.497095       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-28rfln marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.497 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.497295       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2pbfxx marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.497 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.497470       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2g46tk marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.497 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.497668       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-22fg6q marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.590 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.590031       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2xcdsf marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.690 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.689812       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2zqwmh marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.690 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.690170       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2dbksl marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.690 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.690435       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2z9lv9 marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.690 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.690673       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-265h6w marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.691 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.690949       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2l8cvk marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
16:25:53.691 kops-k8s-cluster.services.opendoor.com I0703 23:25:53.691222       1 utils.go:125] Pod spark-worker-test-pyspark-notebook-jupyter-signals-spark-2l2klb marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.

It is unclear why autoscaler thinks these pods are schedulable, since when we disabled autoscaling and manually increase node count those pods were scheduled successfully.

We are running kubernetes 1.9.9 and the autoscaler with the following flags:

./cluster-autoscaler --cloud-provider=aws --namespace=kube-system --nodes=1:50:honeycomb-worker.kops-k8s-cluster.services.opendoor.com --v=4 --alsologtostderr=true

It is running using this docker image k8s.gcr.io/cluster-autoscaler:v1.1.2. Any help in how to fix this would be greatly appreciated since we would love to re-enable cluster autoscaler.

Context

We are using kubernetes and cluster autoscaler to schedule periodically our cronjobs through the k8s API. This means that we create and delete hundreds of pods during ~30min period. Some of these pods are Spark pods, which might require large amounts of memory (~30Gb plus a couple of cores).

cluster-autoscaler

Source

gustavoatt

👍3

All 32 comments

Sounds like a possible case of scheduler's cache corruption. If it is the case, restarting kube-scheduler should resolve the issue. Before restarting, you can also trigger scheduler to compare its cache with fresh information as described here.

aleksandra-malinowska on 10 Jul 2018

@aleksandra-malinowska I think we're suffering from the same issue. Our kubernetes cluster is built with kops. I'm a bit unsure as to have to restart the kube-scheduler, what is the command to do it? I tried deleting the kube-scheduler pods but that didn't really restart it I think.

donalddewulf on 19 Jul 2018

I'm not sure about kops setup and how scheduler in particular is started there. If it's a manifest pod, the easiest way is to edit the manifest.

aleksandra-malinowska on 19 Jul 2018

Thanks @aleksandra-malinowska
I will look into it, if I find something, I'll put it here.

donalddewulf on 23 Jul 2018

@donalddewulf I think I'm facing the same issue. Did you manage to find anything?

messiahUA on 18 Oct 2018

@messiahUA eventually we built a new cluster next to our current one, and we switched traffic from one cluster to the other. We tried restarting kube-scheduler I think by ssh'ing into the master node, but there was nothing to be done for us.

donalddewulf on 18 Oct 2018

We had the same issue where a pod was not able to sit on a specific node and autoscaler fails to scale up the nodes.

According to what we understood, the issue was with the node in which the pod was supposed to sit. If we look at the log "Pod ABC marked as unschedulable can be scheduled on ip-1-2-3-4.internal. Ignoring in scale up." is the message which basically says that pod ABC is supposed to sit on ip-1-2-3-4.internal but for some reason it is not able to sit and hence it is left in pending state. On the other side, autoscaler will not scale up nodes to accomodate this pod in another node because it is already assgined a node that is ip-1-2-3-4.internal.

So, we manually terminated the node ip-1-2-3-4.internal (Ofcourse other pods running on this node were also rescheduled and yes there was downtime) and everything starts to work fine, the autoscaler tries to put the pod ABC into another node and hence was scaling up the nodes if there are no nodes available.
Another work around is to cordon the node ip-1-2-3-4.internal so that no new pods will be scheduled to this node and hence autoscaler is forced to look for a new node for pod ABC which also worked.

We came to above conclusion after debugging in the below manner.
-> Deployed more pods (with different labels, resource quotas than the original pod which is in pending state) into the same cluster and these pods were able to run successfully after the autoscaler scales up the nodes. The issue didnt persist with these new pods.
-> We cordoned the node which was supposed to have the pending pod and the pending pod was assigned to another node where it was running successfully.

el-sai on 14 Nov 2018

The message "Pod ABC marked as unschedulable can be scheduled on ip-1-2-3-4.internal. Ignoring in scale up." means there is inconsistency in what CA thinks and what scheduler thinks about cluster.
Basically CA believes that pod ABC should have been scheduled on ip-1-2-3-4.internal. But for some reason the scheduler does not schedule it there. This is unexpected situation, yet possible if for example cluster uses non-standard scheduler, while CA simulates behavior of standard one. The safe approach is not to add extra nodes in such situation as it is hint that CA cannot simulate scheduler operation correctly in given cluster.

With all that said typically the message means that actually we triggered bug in scheduler which manifests in its internal cache inconsistency. With inconsistent cache scheduler believes the pods ABS is not schedulable anywhere despite the fact that there is place for it on ip-1-2-3-4.internal.

losipiuk on 15 Nov 2018

👍1

Our CA version is 1.0.6 and our Kube-scheduler version is 1.8.3 which is the combination recommended on CA README. But still looks like there is inconsistency in what CA thinks and what scheduler thinks about cluster.
We thought it was an issue with the node as the problem pops up only with specific nodes but not with other ones and looks like it happens randomly. For example, when we scale other pods they were able to sit in other nodes which are scaled up by CA and the behaviour was normal (CA can simulate scheduler operation correctly in given cluster).
Probably its an issue with internal cache inconsistency which we didn't know how to force refresh the cache manually. However, the quick fix we came up with was terminating the node which forced the CA to look for another node for the pod ABC and everything seems to work normal again, but yes the issue persists, not sure when it will pop up again.

el-sai on 15 Nov 2018

Hi we are also facing the same issue. and its very frequent if we scale from 100 to 200 pods instantly

I0117 09:44:09.261508       1 static_autoscaler.go:236] Filtering out schedulables
I0117 09:44:09.268581       1 utils.go:142] Pod pubsub-subscriber-69955d646b-h57qx marked as unschedulable can be scheduled on xxxxxxxxxxxs . Ignoring in scale up.
I0117 09:44:09.269800       1 utils.go:128] Pod pubsub-subscriber-69955d646b-88prm marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
I0117 09:44:09.270531       1 utils.go:128] Pod pubsub-subscriber-69955d646b-t9g8q marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
I0117 09:44:09.271612       1 utils.go:128] Pod pubsub-subscriber-69955d646b-lvtxf marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
I0117 09:44:09.272019       1 utils.go:128] Pod pubsub-subscriber-69955d646b-lb2vd marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
I0117 09:44:09.272363       1 utils.go:128] Pod pubsub-subscriber-69955d646b-trbnm marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
I0117 09:44:09.272982       1 utils.go:128] Pod pubsub-subscriber-69955d646b-jkf9s marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
I0117 09:44:09.275576       1 utils.go:128] Pod pubsub-subscriber-69955d646b-x9flm marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
I0117 09:44:09.277442       1 utils.go:128] Pod pubsub-subscriber-69955d646b-mbzv2 marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
I0117 09:44:09.283851       1 utils.go:128] Pod pubsub-subscriber-69955d646b-74qs5 marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
I0117 09:44:09.284081       1 utils.go:128] Pod pubsub-subscriber-69955d646b-t2mvn marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
I0117 09:44:09.284635       1 utils.go:128] Pod pubsub-subscriber-69955d646b-97fzp marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
I0117 09:44:09.284777       1 utils.go:128] Pod pubsub-subscriber-69955d646b-bjlgp marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.
I0117 09:44:09.284923       1 utils.go:128] Pod pubsub-subscriber-69955d646b-8hhcf marked as unschedulable can be scheduled (based on simulation run for other pod owned by the same controller). Ignoring in scale up.

is it fix or is there some workaround apart from deleting node

Deepak1100 on 17 Jan 2019

If restarting scheduler helps, it's probably a cache issue. If it doesn't, one additional thing to check may be to look for pods with nominatedNodeName field set. On newer Kubernetes versions (1.11 and later), sometimes scheduler's preemption logic incorrectly marks node as nominated for a pod, but that pod is starved and never scheduled (blocking the space from being used ~forever).

aleksandra-malinowska on 17 Jan 2019

@aleksandra-malinowska i tried to restart all Kube-scheduler. but CA didn't start scaling up. I saw few pods having nominated nodes but pods were not scheduled on it. I saw it had space to schedule at least one pending pod. so those pending pods have different cpu limit and request. can it cause some problem for scheduler like overcommitting CPU.

so I had 2 sets of the controller which were in the pending state. one was my application and other was overprovisioned pods. after I scaled down overprovisioned deployment CA started node scale up.

do u have any suggestion what can we do in case of pods having nominated node?

Deepak1100 on 18 Jan 2019

@Deepak1100 as a workaround I can suggest to also try killing scheduler leader in such a way that leadership would transfer to another pod.

messiahUA on 18 Jan 2019

do u have any suggestion what can we do in case of pods having nominated node?

Workaround: clear up this field in the pod or delete this pod.

Long-term solution: wait for it to be fixed in patch release (1.11.7 looks promising, but we haven't confirmed yet) and upgrade once it's available.

aleksandra-malinowska on 18 Jan 2019

@messiahUA I have tried deleting scheduler pods leader and all scheduler pods as well but CA didn't do anything. neither did Kube-scheduler.
@aleksandra-malinowska we are on 1.12.1. so I assume that patch you are talking about will come for 1.12 CA version as well?

Deepak1100 on 18 Jan 2019

@Deepak1100 make sure leadership has been taken by another pod. If you are simply deleting pods same leader can continue its work. If you are sure that another scheduler became leader and you still have the problem, then I have no more suggestions unfortunately.

messiahUA on 18 Jan 2019

Yes, fixes related to it are being backported to 1.11, 1.12 and 1.13.

If these pods have nominatedNodeName set, restarting scheduler simply won't be enough. The issue is rather complex - a combination of race condition causing the system to get into this state and starvation preventing it from getting out of it. Newer patch releases should behave better (especially with regards to recovering), but it's not completely resolved yet.

aleksandra-malinowska on 18 Jan 2019

Hi,
I have recently started encountering this issue as well (possibly as a side effect of kube upgrade to 1.11.6 from 1.11.5 )

What we have seen is as follows:

Pod is stuck in pending state with a message like this:
~
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m27s (x4703 over 12m) default-scheduler 0/78 nodes are available: 70 Insufficient cpu, 8 node(s) had taints that the pod didn't tolerate.
~
If we look at one of our nodes (and this is the case with all of our nodes), the node is in no position to host the pod. Our pod has a CPU request value set to 1600m. Putting this pod on the node will result in total CPU reservation of 3885m which is around 96% of the node's CPU capacity

~
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2285m (57%) 0 (0%)
memory 2430Mi (15%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
~
Our node has 4 CPU and 16GB RAM
~
Capacity:
cpu: 4
ephemeral-storage: 130046416Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 16431360Ki
pods: 58
Allocatable:
cpu: 4
ephemeral-storage: 119850776788
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 16328960Ki
pods: 58
~

The kube-scheduler seems to have made the right decision by not scheduling pod on this node since the pod has a requested CPU of 1600m
However, cluster-autoscaler disagrees and refuses to scale up with following message:
~
I0122 13:23:11.506208 1 utils.go:133] Pod api-v2-app-7bb4cddff4-pz8ml marked as unschedulable can be scheduled on ip-172-24-73-48.eu-west-1.compute.internal. Ignoring in scale up.
I0122 13:23:21.942260 1 utils.go:133] Pod api-v2-app-7bb4cddff4-pz8ml marked as unschedulable can be scheduled on ip-172-24-69-250.eu-west-1.compute.internal. Ignoring in scale up.
I0122 13:23:32.036320 1 utils.go:133] Pod api-v2-app-7bb4cddff4-pz8ml marked as unschedulable can be scheduled on ip-172-24-63-164.eu-west-1.compute.internal. Ignoring in scale up.
I0122 13:23:42.173093 1 utils.go:133] Pod api-v2-app-7bb4cddff4-pz8ml marked as unschedulable can be scheduled on ip-172-24-78-7.eu-west-1.compute.internal. Ignoring in scale up.
I0122 13:23:52.292982 1 utils.go:133] Pod api-v2-app-7bb4cddff4-pz8ml marked as unschedulable can be scheduled on ip-172-24-63-164.eu-west-1.compute.internal. Ignoring in scale up.
I0122 13:24:02.415322 1 utils.go:133] Pod api-v2-app-7bb4cddff4-pz8ml marked as unschedulable can be scheduled on ip-172-24-107-161.eu-west-1.compute.internal. Ignoring in scale up.
I0122 13:24:12.560691 1 utils.go:133] Pod api-v2-app-7bb4cddff4-pz8ml marked as unschedulable can be scheduled on ip-172-24-107-161.eu-west-1.compute.internal. Ignoring in scale up.
I0122 13:24:22.731733 1 utils.go:133] Pod api-v2-app-7bb4cddff4-pz8ml marked as unschedulable can be scheduled on ip-172-24-55-90.eu-west-1.compute.internal. Ignoring in scale up.
I0122 13:24:32.858460 1 utils.go:133] Pod api-v2-app-7bb4cddff4-pz8ml marked as unschedulable can be scheduled on ip-172-24-107-161.eu-west-1.compute.internal. Ignoring in scale up.
I0122 13:24:42.949864 1 utils.go:133] Pod api-v2-app-7bb4cddff4-pz8ml marked as unschedulable can be scheduled on ip-172-24-69-250.eu-west-1.compute.internal. Ignoring in scale up.
I0122 13:24:53.069565 1 utils.go:133] Pod api-v2-app-7bb4cddff4-pz8ml marked as unschedulable can be scheduled on ip-172-24-107-161.eu-west-1.compute.internal. Ignoring in scale up.
~

We checked the nodes in the list above (just as mentioned in step 2) and none of them was in a position to schedule the pod.

So to me it seems CA is making some mistake in calcluations which is causing it to assume that a pod can be scheduled on a node whereas kube-scheuler does not allow pod to be scheduled there. Thus resulting in a deadlock.

Following are the versions of tools we are using :

kops : 1.11.0
Kubernetes : 1.11.6
CA : 1.3.4

harshal-shah on 22 Jan 2019

Putting this pod on the node will result in total CPU reservation of 3885m which is around 96% of the node's CPU capacity

Allocatable CPU of node quoted is the same as capacity, i.e. 4. So yes, from autoscaler's perspective putting the pod there makes sense. Whether allocatable = capacity makes sense is another question :)

What's not clear is why scheduler disagrees. If there are pods with nominatedNodeName, it would explain it. Can you check for these?

aleksandra-malinowska on 22 Jan 2019

@aleksandra-malinowska nothing in the nominated node column. and yes allocatable = capacity does not make sense. there are kubelet flags to govern this but we are using defaults. looking at memory numbers we can see some memory is being held back, this might be clearer if the cpu units were shown in milicore.

harshal-shah on 22 Jan 2019

@aleksandra-malinowska do you guys know the exact problem. if yes can you please let me know.
I may able to raise PR as this issue is giving us lots of problems when we scale all services at the same time.
i had pods in pending state with nominated node coloumn but there were lots of those pods. and i didn't tried to deleted them as i have scaled up manually as it was urgent.

i'll try to reproduce this nominated issue tommorrow and see if CA works after deleting them.

Deepak1100 on 1 Feb 2019

I strongly suspect this fix solves the issue w/ nominatedNodeName: https://github.com/kubernetes/kubernetes/pull/72895

1.13.3 should include it.

aleksandra-malinowska on 1 Feb 2019

I'm also on 1.11.6 and seeing this issue with GKE, without nominatedNodeName. 1.11.7 seems to have a lot of related fixes, just waiting for it to be released on GKE.

mlarrousse on 1 Feb 2019

Has anyone tested 1.11.7 yet? We are seeing this since our 1.11 upgrade (we think).

tcolgate on 15 Feb 2019

I'll need a few days to get confident, but 1.11.7 seems to resolve this for us.

tcolgate on 19 Feb 2019

@tcolgate can you confirm if it's fixed for you in 1.11.7?

terowz on 27 Feb 2019

We have confirmed that 1.11.7 works for us.

On Wed, Feb 27, 2019, 8:23 AM Joe Lodato notifications@github.com wrote:

@tcolgate https://github.com/tcolgate can you confirm if it's fixed for
you in 1.11.7?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/autoscaler/issues/1049#issuecomment-467929208,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AZe3KOwHnVkpRNpEHu5oqKfkZ7QdQHn4ks5vRrEFgaJpZM4VElbS
.

mlarrousse on 28 Feb 2019

👍1

Defintiely haven't seen any problems on 1.11.7

tcolgate on 28 Mar 2019

👍1

yea same for us. Been running for about 2-3 weeks now. Thanks

terowz on 28 Mar 2019

Yes its fix for us as well

Deepak1100 on 2 Apr 2019

Marking as fixed then. Thanks!
/close

bskiba on 2 Apr 2019

@bskiba: Closing this issue.

In response to this:

Marking as fixed then. Thanks!
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.