cc @kubernetes/autoscaling @jszczepkowski @derekwaynecarr @smarterclayton
@DirectXMan12 any updates on this issue? Can you provide the actual status of it and update the checkboxes above?
The proposal is posted, but has not been approved yet. We only recently reached general consensus about the design. Still finalizing the exact semantics. It should be removed from the 1.5 milestone, since no code will have gone into 1.5
@DirectXMan12 thank you for clarifying.
@DirectXMan12 please, provide us with the release notes and documentation PR (or links) at https://docs.google.com/spreadsheets/d/1nspIeRVNjAQHRslHQD1-6gPv99OcYZLMezrBe3Pfhhg/edit#gid=0
@DirectXMan12 please, provide us with the release notes and documentation PR (or links)
done :-)
@DirectXMan12 @mwielgus What is planned for HPA in 1.8?
@MaciekPytel @kubernetes/sig-autoscaling-misc
Looking forward to the beta launch!
@bgrant0607 we're hoping to move to v2 to beta in 1.8 (so just stabilization :-) ).
Regarding the functionality - am I correct in understanding that if you wanted to scale on a load indicator that is fundamentally external to Kubernetes, you would need to
1) create some kind of proxy API object inside the cluster (maybe a CRD?) that reflects the load indicator in a manner that the Kubernetes metrics pipeline can consume
2) create an HPA with MetricSourceType == "Object" and point to the proxy object
@davidopp That is incorrect. In order to scale on a load indicator that's not one of the metrics provided by the resource metrics API (CPU, memory), you need to have some implementation of the custom metrics API (see k8s.io/metrics and kubernetes-incubator/custom-metrics-apiserver).
Then, you can either use the "pods" source type, if the metric describes the pods controlled the the target scalable of the HPA (e.g. network throughput), or the "object" source type, if the metric describes an unrelated object (for instance, you might scale on a queue length metric attached to the namespace).
In either case, the HPA controller will the query the custom metrics API accordingly. It is up to cluster admins, etc to actually provide a method to collect the given metrics and expose an implementation of the custom metrics API.
Thanks for your response. So IIUC it is (and let's assume your queue length example)
Is that right?
Still not quite. The API is discoverable, just like any other API. The "object" metric source is only necessary if the metric in question describes an object rather than the collection of pods pointed at by the target scalable. Without any funky footwork, there's only ever one instance of the API, just like any other Kubernetes API group. It's up the cluster admins to set up monitoring in such a way that the API exposes the desired metrics for any given application.
For instance, let's say our custom metrics API is backed by an adapter in front of Prometheus. Prometheus is configured to collect metrics from applications though service discovery, as well as to collect metrics from our ingress controller.
For the purposes of our exercise, let's consider an application deployed in some namespace that consumes tasks from a queue. We've got one Deployment queue-manager
, and another deployment workers
(which consume from the queue, and can be connected to directly), plus, we're running an ingress controller. We care about two metrics:
ingress_connections_total{ingress,pod,namespace}
(collected from the ingress controller)worker_queue_length{deployment,namespace}
(collected from queue manager application)Lets say we want to scale on both metrics (by the current HPA logic, that means whichever produces a larger replica count), so we make an HPA like this
apiVersion: autoscaling/v2alpha1
kind: HorizontalPodAutoscaler
metadata:
name: worker-scaler
name: queuing-example
spec:
minReplicas: 1
maxReplicas: 10
metrics:
- type: Object
object:
target:
kind: Deployment
name: queue-manager
metricName: worker_queue_length
targetValue: 1
- type: Pods
pods:
metricName: ingress_connections
targetAverageValue: 10
When processing this HPA, the HPA controller will first fetch metrics for the Object
source type. It will GET /apis/custom-metrics.metrics.k8s.io/v1alpha1/namespaces/queuing-example/deployments.extensions/queue-manager/worker-queue-length
against it's configured main API server (the Kubernetes master API server). The Kubernetes master API server will forward the request to our adapter (via the aggregator), which will turn the request into a prometheus query looking something like worker_queue_length{namespace="queuing-example",deployment="queue-manager"}
, query Prometheus, and then return the appropriate results. The returned number is compared to the target value (1), a ratio is formed, and a desired replica count is calculated.
Then, the HPA controller will see the Pods
source type. It will GET apis/custom-metrics.metrics.k8s.io/v1alpha1/namespaces/queuing-example/pods/*/ingress_connections?selector=SOMETHING
against the master API server, will will again forward the request to the adapter, which will in turn produce a query like sum(rate(ingress_connections_total{namespace="queuing-example",pod=~"pod1|pod2|pod3"}[2m])) by (pod)
. The adapter will return a metric for each pod, the HPA controller will average those together and compare to the target average value (10), form a ratio, and produce a desired replica count.
The HPA controller, now having both replica counts, will use the highest as the new scale value, and update the target scalable's scale subresource.
Thanks for the explanation! That makes sense. And that will be very useful documentation. :)
@DirectXMan12 are you going to make a progress for 1.8 with this feature? I don't see any label and milestone updates.
@idvoretskyi Is there a prow command to update milestone and label? I don't have permissions to do it otherwise...
@DirectXMan12 Not AFAIK. However, now I did it for you ;)
I was looking at this and the metrics-server
repo to try to plan for how this may look, but wasn't able to quite figure out what is required / where to as a user to add custom metrics to the metrics-server
.
I have an endpoint from _one_ of my pods (the master) at /metrics
that shows the metric (along with others) that I want to scale (worker) pods on:
pending 5
running 10
The HPA should "read" the metric from the "master" deployment's pod and set the scaleTargetRef
(in today's terms) on the worker deployment.
Will I have to write a metrics-server
? If so, it would be really great if I can just submit a configmap to the metrics-server
to scrap master.default
service at /metrics
for sum(pending,running)
(I'm not super familiar with writing Prometheus stat queries) to scale workers to 15.
@DirectXMan12 @kubernetes/sig-autoscaling-feature-requests can you confirm that this feature targets 1.8?
If yes, please, update the features tracking spreadsheet with the feature data, otherwise, let's remove this item from 1.8 milestone.
Thanks
All set. Looks like something got messed up in the spreadsheet
@DirectXMan12 is there any documentation for the changes in this release? For example the docs still refer to K8S 1.6 and alpha.
yep, I missed that when I went through last time. Terribly sorry about that. I've posted https://github.com/kubernetes/kubernetes.github.io/pull/5671 to fix that.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an /lifecycle frozen
comment.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta
.
/lifecycle stale
Good day,
would be very much interested in getting this progressed. Would be invaluable in Google Cloud Platform.
/remove-lifecycle stale
Progressed to stable? We're planning on it soon. It depends on if we want to do another beta in 1.10.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Any update here?
@DirectXMan12
Any plans for this in 1.11?
If so, can you please ensure the feature is up-to-date with the appropriate:
stage/{alpha,beta,stable}
sig/*
kind/feature
cc @idvoretskyi
Yes, sorry, I'll make sure that is up to date. We've got some small improvements coming in 1.11
/remove-lifecycle stale
/stage beta
/kind feature
/sig autoscaling
Thanks for the update, @DirectXMan12!
Awesome, looking forward to this! However, any plans for stable?
Awesome, looking forward to this! However, any plans for stable?
eventually. Realistically, I want to give a release or two for feedback on the new stuff, so that we can be sure we're happy with the shape of the API, then we'll look towards stable.
@DirectXMan12 please fill out the appropriate line item of the
1.11 feature tracking spreadsheet
and open a placeholder docs PR against the
release-1.11
branch
by 5/25/2018 (tomorrow as I write this) if new docs or docs changes are
needed and a relevant PR has not yet been opened.
done.
Thank you 🙌
@kubernetes/sig-release-misc due to some last-minute hiccups, the corresponding PR for this feature's updates did not get merged for 1.11, so there's no new developments for this feature in 1.11.
CC @kubernetes/sig-autoscaling-misc
FYI @justaugustus ^
Thanks for the update on this, @DirectXMan12!
@DirectXMan12 @kubernetes/sig-autoscaling-feature-requests --
This feature was removed from the previous milestone, so we'd like to check in and see if there are any plans for this in Kubernetes 1.12.
If so, please ensure that this issue is up-to-date with ALL of the following information:
Happy shipping!
/cc @justaugustus @kacole2 @robertsandoval @rajendar38
This feature current has no milestone, so we'd like to check in and see if there are any plans for this in Kubernetes 1.12.
If so, please ensure that this issue is up-to-date with ALL of the following information:
Set the following:
Once this feature is appropriately updated, please explicitly ping @justaugustus, @kacole2, @robertsandoval, @rajendar38 to note that it is ready to be included in the Features Tracking Spreadsheet for Kubernetes 1.12.
Please make sure all PRs for features have relevant release notes included as well.
Happy shipping!
P.S. This was sent via automation
@DirectXMan12 Are the changes to API (adding metricselector to pods/object metrics, etc) still in scope for 1.12? If so this should probably be added to milestone.
@mwielgus Anything else that should be added here for 1.12?
It should be in the milestone. I believe the description is up to date. If it's not, please let me know.
/milestone 1.12
Thanks for the updates! I've added this to the tracking sheet for 1.12.
Hey there! @DirectXMan12 I'm the wrangler for the Docs this release. Is there any chance I could have you open up a docs PR against the release-1.12 branch as a placeholder? That gives us more confidence in the feature shipping in this release and gives me something to work with when we start doing reviews/edits. Thanks! If this feature does not require docs, could you please update the features tracking spreadsheet to reflect it?
Would be great if we can include this one as well - https://github.com/kubernetes/kubernetes/pull/66988
@zparnold already did that quite some time ago: https://github.com/kubernetes/website/pull/8757
just need to rebase it ;-)
Thank you Solly!
On Tue, Aug 21, 2018 at 12:54 PM Solly Ross notifications@github.com
wrote:
just need to rebase it ;-)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/features/issues/117#issuecomment-414764571,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AE81SC_ot8w-RDn1UDWAUZpny6CuaRa9ks5uTElngaJpZM4KRcvs
.
Kubernetes 1.13 is going to be a 'stable' release since the cycle is only 10 weeks. We encourage no big alpha features and only consider adding this feature if you have a high level of confidence it will make code slush by 11/09. Are there plans for this enhancement to graduate to alpha/beta/stable within the 1.13 release cycle? If not, can you please remove it from the 1.12 milestone or add it to 1.13?
We are also now encouraging that every new enhancement aligns with a KEP. If a KEP has been created, please link to it in the original post. Please take the opportunity to develop a KEP
@DirectXMan12 I'm following up on @claurence's post to see if there are any plans for this to graduate to 1.13 or if we can close out the 1.12 milestone?
This release is targeted to be more ‘stable’ and will have an aggressive timeline. Please only include this enhancement if there is a high level of confidence it will meet the following deadlines:
Thanks!
hey sorry, I've been out the past couple of weeks, and will likely have limited availability the next couple (GitHub needs out-of-office :-P). I'll try and sync with @MaciekPytel and @mwielgus to make sure we're all set on the sig-autoscaling side.
@DirectXMan12 Hello - I’m the enhancement’s lead for 1.14 and I’m checking in on this issue to see what work (if any) is being planned for the 1.14 release. Enhancements freeze is Jan 29th and I want to remind that all enhancements must have a KEP
@mwielgus can you take over the communication on this?
Hello @DirectXMan12 @mwielgus , I'm the Enhancement Lead for 1.15. Is this feature going to be graduating alpha/beta/stable stages in 1.15? Please let me know so it can be tracked properly and added to the spreadsheet. This will also require a KEP to be included in 1.15.
Once coding begins, please list all relevant k/k PRs in this issue so they can be tracked properly.
Hi @mwielgus , I'm the 1.16 Enhancement Lead. Is this feature going to be graduating alpha/beta/stable stages in 1.16? Please let me know so it can be added to the 1.16 Tracking Spreadsheet. If not's graduating, I will remove it from the milestone and change the tracked label.
Once coding begins or if it already has, please list all relevant k/k PRs in this issue so they can be tracked properly.
As a reminder, every enhancement requires a KEP in an implementable state with Graduation Criteria explaining each alpha/beta/stable stages requirements.
Milestone dates are Enhancement Freeze 7/30 and Code Freeze 8/29.
Thank you.
@mwielgus is on vacation, but as far as I know there is no plan to move to stable in 1.16.
cc: @josephburnett unless you know something about it?
Hey there @mwielgus and @josephburnett -- 1.17 Enhancements Shadow here. I wanted to check in and see if you think this Enhancement will be graduating to alpha/beta/stable in 1.17?
The current release schedule is:
If you do, I'll add it to the 1.17 tracking sheet, https://bit.ly/k8s117-enhancement-tracking. Once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly. 👍
Thanks!
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Hey there @mwielgus and @josephburnett -- 1.18 Enhancements lead here. I wanted to check in and see if you think this Enhancement will be graduating to stable in 1.18 or having a major change in it's current level?
The current release schedule is:
Monday, January 6th - Release Cycle Begins
Tuesday, January 28th EOD PST - Enhancements Freeze
Thursday, March 5th, EOD PST - Code Freeze
Monday, March 16th - Docs must be completed and reviewed
Tuesday, March 24th - Kubernetes 1.18.0 Released
To be included in the release, this enhancement must have a merged KEP in the implementable
status. The KEP must also have graduation criteria and a Test Plan defined.
If you would like to include this enhancement, once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly. 👍
We'll be tracking enhancements here: http://bit.ly/k8s-1-18-enhancements
Thanks!
Hey there @mwielgus and @josephburnett, Enhancements Team reaching out again. We're about a week out from Enhancement Freeze on the 28th. Let us know if you think there will be any activity on this.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Hey there @mwielgus @josephburnett -- 1.19 Enhancements Lead here. I wanted to check in and see if you think this Enhancement will be graduating in 1.19?
In order to have this part of the release:
The current release schedule is:
If you do, I'll add it to the 1.19 tracking sheet (http://bit.ly/k8s-1-19-enhancements). Once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly. 👍
Thanks!
Hi @mwielgus @josephburnett, pinging back again as a reminder. :slightly_smiling_face:
Hi @mwielgus @josephburnett
Tomorrow, Tuesday May 19 EOD Pacific Time is Enhancements Freeze
Will this enhancement be part of the 1.19 release cycle?
@mwielgus @josephburnett -- Unfortunately, the deadline for the 1.19 Enhancement freeze has passed. For now, this is being removed from the milestone and 1.19 tracking sheet. If there is a need to get this in, please file an enhancement exception.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Hi @mwielgus @josephburnett
Enhancements Lead here. Any plans for this to graduate in 1.20?
Thanks!
Kirsten
Hi @mwielgus @josephburnett
Any updates on whether this will be included in 1.20?
Enhancements Freeze is October 6th and by that time we require:
The KEP must be merged in an implementable state
The KEP must have test plans
The KEP must have graduation criteria
The KEP must have an issue in the milestone
I note that your design proposals are quite old, please consider updating to the new KEP format. See: https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template
Thanks
Kirsten
Most helpful comment
Still not quite. The API is discoverable, just like any other API. The "object" metric source is only necessary if the metric in question describes an object rather than the collection of pods pointed at by the target scalable. Without any funky footwork, there's only ever one instance of the API, just like any other Kubernetes API group. It's up the cluster admins to set up monitoring in such a way that the API exposes the desired metrics for any given application.
For instance, let's say our custom metrics API is backed by an adapter in front of Prometheus. Prometheus is configured to collect metrics from applications though service discovery, as well as to collect metrics from our ingress controller.
For the purposes of our exercise, let's consider an application deployed in some namespace that consumes tasks from a queue. We've got one Deployment
queue-manager
, and another deploymentworkers
(which consume from the queue, and can be connected to directly), plus, we're running an ingress controller. We care about two metrics:ingress_connections_total{ingress,pod,namespace}
(collected from the ingress controller)worker_queue_length{deployment,namespace}
(collected from queue manager application)Lets say we want to scale on both metrics (by the current HPA logic, that means whichever produces a larger replica count), so we make an HPA like this
When processing this HPA, the HPA controller will first fetch metrics for the
Object
source type. It willGET /apis/custom-metrics.metrics.k8s.io/v1alpha1/namespaces/queuing-example/deployments.extensions/queue-manager/worker-queue-length
against it's configured main API server (the Kubernetes master API server). The Kubernetes master API server will forward the request to our adapter (via the aggregator), which will turn the request into a prometheus query looking something likeworker_queue_length{namespace="queuing-example",deployment="queue-manager"}
, query Prometheus, and then return the appropriate results. The returned number is compared to the target value (1), a ratio is formed, and a desired replica count is calculated.Then, the HPA controller will see the
Pods
source type. It willGET apis/custom-metrics.metrics.k8s.io/v1alpha1/namespaces/queuing-example/pods/*/ingress_connections?selector=SOMETHING
against the master API server, will will again forward the request to the adapter, which will in turn produce a query likesum(rate(ingress_connections_total{namespace="queuing-example",pod=~"pod1|pod2|pod3"}[2m])) by (pod)
. The adapter will return a metric for each pod, the HPA controller will average those together and compare to the target average value (10), form a ratio, and produce a desired replica count.The HPA controller, now having both replica counts, will use the highest as the new scale value, and update the target scalable's scale subresource.