Website: Unclear definition of the --horizontal-pod-autoscaler-initial-readiness-delay flag

Created on 15 Feb 2019 · 21Comments · Source: kubernetes/website

Hello,
In the Horizontal Pod Autoscaler documentation, the --horizontal-pod-autoscaler-initial-readiness-delay has an unclear definition and make comprehension very difficult:

Due to technical constraints, the HorizontalPodAutoscaler controller cannot exactly determine the first time a pod becomes ready when determining whether to set aside certain CPU metrics. Instead, it considers a Pod "not yet ready" if it's unready and transitioned to unready within a short, configurable window of time since it started. This value is configured with the --horizontal-pod-autoscaler-initial-readiness-delay flag, and its default is 30 seconds. Once a pod has become ready, it considers any transition to ready to be the first if it occurred within a longer, configurable time since it started. This value is configured with the --horizontal-pod-autoscaler-cpu-initialization-period flag, and its default is 5 minutes.

https://github.com/kubernetes/website/blob/master/content/en/docs/tasks/run-application/horizontal-pod-autoscale.md

It doesn't specify how does it retrieves pods status. Is it with the readiness probe?
What happen if the pods is ready before the end of the delay?
Why not configuring the delay to 0 or 1 second?
Is it affecting in any way the routing delay?

Thank you for clarifying

languagen lifecyclfrozen prioritbacklog siautoscaling triagaccepted

Source

Kanshiroron

👍20

Most helpful comment

Yeah, would just like to add that this is really confusing :| I'm specifically interested in this:

What happen if the pods is ready before the end of the delay?

Considering both delays here: --horizontal-pod-autoscaler-cpu-initialization-period and --horizontal-pod-autoscaler-initial-readiness-delay.

Might be because I'm not a native English speaker but the paragraph seems contradictory as well?

For example:

Due to technical constraints, the HorizontalPodAutoscaler controller cannot exactly determine the first time a pod becomes ready when determining whether to set aside certain CPU metrics. Instead, it considers a Pod "not yet ready" if it's unready and transitioned to unready within a short, configurable window of time since it started. This value is configured with the --horizontal-pod-autoscaler-initial-readiness-delay flag, and its default is 30 seconds

Ok, I can kind of get that, although it's not very clear what happens in this scenario:

What if my pod is ready in the first second, does HPA sees it as ready?
And what if it becomes unready at two seconds, will it become unready?
And what if it becomes ready again at three seconds...?

Technically, that's what it says in the documentation! It says it's _not yet ready_ only if it's _unready_... so I should assume that if the pod is ready, even if briefly, it will be ready, which would cause all kinds of absurd scenarios, like above.

This does not make a lot of sense to me so I _assume_ it waits until --horizontal-pod-autoscaler-initial-readiness-delay finishes _until_ HPA considers a pod _ready_, even if kubernetes considers it ready before that. But that should've been explicit in the documentation.

Ok, moving on.

Once a pod has become ready, it considers any transition to ready to be the first if it occurred within a longer, configurable time since it started. This value is configured with the --horizontal-pod-autoscaler-cpu-initialization-period flag, and its default is 5 minutes.

So, this says that _any transition to ready will be the first_ if it occurs before --horizontal-pod-autoscaler-cpu-initialization-period. First question: what does it mean to be the _first_? I couldn't find what is the importance of being the _first_ transition to ready.

Second, what happens if the pod never transitions to ready _before_ --horizontal-pod-autoscaler-cpu-initialization-period? Say it takes 5 minutes and 1 second to become ready? _To me_ this clearly states that to the HPA the pod _never_ becomes ready. :thinking:

I've tried searching on google and on the kubernetes slack group and found no definitive answer to how these parameters work, although it _seems_ many people _believe_ --horizontal-pod-autoscaler-cpu-initialization-period sets a wait time for new pods, and prevents them from being scaled until this time passes (although I'm myself am not convinced). I'll see if I can run some tests in my clusters to at least get some ideas.

Ok, summing up my comments in the form of questions:

What happen if the pods are ready before the end of --horizontal-pod-autoscaler-initial-readiness-delay?
What happen if the pods are ready before the end of --horizontal-pod-autoscaler-cpu-initialization-period?
What is really the importance of being the "first" transition to ready, from HPAs perspective?
If my Java pod takes 3 minutes to start up and uses a lot of CPU and I want to make sure this CPU burst is not taken into account by HPA for scale up, which value should I set to 3 minutes? --horizontal-pod-autoscaler-initial-readiness-delay? --horizontal-pod-autoscaler-cpu-initialization-period? Both? Does it also matter when the readinessProbe returns successfully?

fernandrone on 25 Oct 2019

👍4

All 21 comments

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 16 May 2019

/remove-lifecycle stale

Kanshiroron on 16 May 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 14 Aug 2019

/remove-lifecycle stale

Kanshiroron on 3 Sep 2019

/priority backlog

sftim on 10 Sep 2019

Yeah, would just like to add that this is really confusing :| I'm specifically interested in this:

What happen if the pods is ready before the end of the delay?

Considering both delays here: --horizontal-pod-autoscaler-cpu-initialization-period and --horizontal-pod-autoscaler-initial-readiness-delay.

Might be because I'm not a native English speaker but the paragraph seems contradictory as well?

For example:

Due to technical constraints, the HorizontalPodAutoscaler controller cannot exactly determine the first time a pod becomes ready when determining whether to set aside certain CPU metrics. Instead, it considers a Pod "not yet ready" if it's unready and transitioned to unready within a short, configurable window of time since it started. This value is configured with the --horizontal-pod-autoscaler-initial-readiness-delay flag, and its default is 30 seconds

Ok, I can kind of get that, although it's not very clear what happens in this scenario:

What if my pod is ready in the first second, does HPA sees it as ready?
And what if it becomes unready at two seconds, will it become unready?
And what if it becomes ready again at three seconds...?

Ok, moving on.

Once a pod has become ready, it considers any transition to ready to be the first if it occurred within a longer, configurable time since it started. This value is configured with the --horizontal-pod-autoscaler-cpu-initialization-period flag, and its default is 5 minutes.

Ok, summing up my comments in the form of questions:

What happen if the pods are ready before the end of --horizontal-pod-autoscaler-initial-readiness-delay?
What happen if the pods are ready before the end of --horizontal-pod-autoscaler-cpu-initialization-period?
What is really the importance of being the "first" transition to ready, from HPAs perspective?
If my Java pod takes 3 minutes to start up and uses a lot of CPU and I want to make sure this CPU burst is not taken into account by HPA for scale up, which value should I set to 3 minutes? --horizontal-pod-autoscaler-initial-readiness-delay? --horizontal-pod-autoscaler-cpu-initialization-period? Both? Does it also matter when the readinessProbe returns successfully?

fernandrone on 25 Oct 2019

👍4

/sig autoscaling

sftim on 25 Oct 2019

I was trying to find the same information.
The relevant code for this is here if it helps anyone:
https://github.com/kubernetes/kubernetes/blob/30c9f097ca4a26dab9085832e006f09cb2993dda/pkg/controller/podautoscaler/replica_calculator.go#L392

krunalnsoni on 16 Dec 2019

We are also trying to figure out this issue. Would be happy to know if there are answers to the above questions.

Dafnafrank on 29 Jan 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 28 Apr 2020

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 28 May 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 27 Jun 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 27 Jun 2020

/reopen
/lifecycle frozen
/language en
The doc needs some improvement with the help from SIG autoscaling. We have got quite some votes for improving the HPA docs.

tengqm on 28 Jun 2020

@tengqm: Reopened this issue.

In response to this:

/reopen
/lifecycle frozen
/language en
The doc needs some improvement with the help from SIG autoscaling. We have got quite some votes for improving the HPA docs.