Kops: autoscaling with k8s 1.9.6 (created with kops 1.9.0) seems to need the metrics-service (docs say only needed for multiple metrics)

Created on 18 Apr 2018  路  17Comments  路  Source: kubernetes/kops

  1. What kops version are you running? The command kops version, will display
    this information.
Version 1.9.0 (git-cccd71e67)
  1. What Kubernetes version are you running? kubectl version will print the
    version if a cluster is running or provide the Kubernetes version specified as
    a kops flag.
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6", GitCommit:"9f8ebd171479bec0ada837d7ee641dec2f8c6dd1", GitTreeState:"clean", BuildDate:"2018-03-21T15:21:50Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6", GitCommit:"9f8ebd171479bec0ada837d7ee641dec2f8c6dd1", GitTreeState:"clean", BuildDate:"2018-03-21T15:13:31Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  1. What cloud provider are you using?

  1. What commands did you run? What is the simplest way to reproduce this issue?
Created a simple deployment (with deis) running tutum/hello-world:latest, set cpu limit to 100m.
kubectl -n test autoscale deployment test-cmd --cpu-percent=10 --min=1 --max=4
  1. What happened after the commands executed?
kubectl -n test describe hpa
Name:                                                  test-cmd
Namespace:                                             test
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Wed, 18 Apr 2018 14:57:56 -0600
Reference:                                             Deployment/test-cmd
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 10%
Min replicas:                                          1
Max replicas:                                          4
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)
Events:
  Type     Reason                        Age   From                       Message
  ----     ------                        ----  ----                       -------
  Warning  FailedGetResourceMetric       1s    horizontal-pod-autoscaler  unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)
  Warning  FailedComputeMetricsReplicas  1s    horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)
  1. What did you expect to happen?
ku -n test describe hpa
Name:                                                  test-cmd
Namespace:                                             test
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Wed, 18 Apr 2018 14:57:56 -0600
Reference:                                             Deployment/test-cmd
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  0% (0) / 10%
Min replicas:                                          1
Max replicas:                                          4
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  the last scale time was sufficiently old as to warrant a new scale
  ScalingActive   True    ValidMetricFound  the HPA was able to succesfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  True    TooFewReplicas    the desired replica count is more than the maximum replica count

To get it to work I had to add the metrics-service with
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/metrics-server/v1.8.x.yaml

  1. Please provide your cluster manifest. Execute
kops get $NAME -oyaml
apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2018-04-18T19:34:54Z
  name: <redacted>
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://<redacted>
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
    - instanceGroup: master-us-east-1c
      name: c
    - instanceGroup: master-us-east-1d
      name: d
    name: main
  - etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
    - instanceGroup: master-us-east-1c
      name: c
    - instanceGroup: master-us-east-1d
      name: d
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubernetesApiAccess:
  - 38.140.185.50/32
  kubernetesVersion: 1.9.6
  masterPublicName: <redacted>
  networkCIDR: 10.52.0.0/16
  networking:
    weave:
      mtu: 8912
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 38.140.185.50/32
  subnets:
  - cidr: 10.52.32.0/19
    name: us-east-1a
    type: Private
    zone: us-east-1a
  - cidr: 10.52.64.0/19
    name: us-east-1c
    type: Private
    zone: us-east-1c
  - cidr: 10.52.96.0/19
    name: us-east-1d
    type: Private
    zone: us-east-1d
  - cidr: 10.52.128.0/19
    name: us-east-1e
    type: Private
    zone: us-east-1e
  - cidr: 10.52.0.0/22
    name: utility-us-east-1a
    type: Utility
    zone: us-east-1a
  - cidr: 10.52.4.0/22
    name: utility-us-east-1c
    type: Utility
    zone: us-east-1c
  - cidr: 10.52.8.0/22
    name: utility-us-east-1d
    type: Utility
    zone: us-east-1d
  - cidr: 10.52.12.0/22
    name: utility-us-east-1e
    type: Utility
    zone: us-east-1e
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-04-18T19:34:54Z
  labels:
    kops.k8s.io/cluster: <redacted>
  name: master-us-east-1a
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14
  machineType: m3.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1a
  role: Master
  subnets:
  - us-east-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-04-18T19:34:55Z
  labels:
    kops.k8s.io/cluster: <redacted>
  name: master-us-east-1c
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14
  machineType: m3.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1c
  role: Master
  subnets:
  - us-east-1c

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-04-18T19:34:55Z
  labels:
    kops.k8s.io/cluster: <redacted>
  name: master-us-east-1d
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14
  machineType: m3.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1d
  role: Master
  subnets:
  - us-east-1d

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-04-18T19:34:55Z
  labels:
    kops.k8s.io/cluster: <redacted>
  name: nodes
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14
  machineType: m3.large
  maxSize: 3
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  subnets:
  - us-east-1a
  - us-east-1c
  - us-east-1d
  - us-east-1e
  1. Please run the commands with most verbose logging by adding the -v 10 flag.
    Paste the logs into this report, or in a gist and provide the gist link here.

  2. Anything else do we need to know?

------------- FEATURE REQUEST TEMPLATE --------------------

  1. Describe IN DETAIL the feature/behavior/change you would like to see.

  2. Feel free to provide a design supporting your feature request.

lifecyclrotten

Most helpful comment

To get it to work I had to add the metrics-service with
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/metrics-server/v1.8.x.yaml

Worked for me on kops 1.10.0-alpha.1 !

All 17 comments

I can confirm that I did this with my cluster to get HPA to work.

@jcscottiii @dmcnaught Are you running Heapster? HPA's require either Heapster or custom metrics:

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

@mikesplain yes - I was running heapster in both cases (pass and fail) with

kubectl apply -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/monitoring-standalone/v1.7.0.yaml

Sorry if there was any confusion, I wrote the initial docs. How I understood it is that, if you have this ...

spec:
  kubeControllerManager:
    horizontalPodAutoscalerUseRestClients: true

... then it means HPA needs the metrics-server. If that is false, it moves to legacy mode (using Heapster) which is being deprecated. Maybe someone can confirm if this is the case.


@dmcnaught did you have horizontalPodAutoscalerUseRestClients: true?

And this error ...

the HPA was unable to compute the replica count: unable to get metrics for resource cpu:
unable to fetch metrics from API: the server could not find the requested resource
(get pods.metrics.k8s.io)

... means that it can't access the APIService (could be anything really). So the question is why?

To give some context around this. Basically, when you install the metrics-server this should be applied too:

apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  service:
    name: metrics-server
    namespace: kube-system
  group: metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100

Which registers metrics-server on the metrics.k8s.io group, version v1beta1 on the API server. Thereafter you should have something like this:

$ kubectl get apiservice/v1beta1.metrics.k8s.io -n kube-system
NAME                     AGE
v1beta1.metrics.k8s.io   57d

Therefore, you need 2 things ... metrics-server running and it registers on the API server because HPA talks to the metrics-server via the API server.

@itskingori I didn't make any changes, or add horizontalPodAutoscalerUseRestClients: true to my cluster spec.

Should it be looking for pods.metrics.k8s.io when it should be using heapster and not be configured to need metrics-server?

along with deploying the metrics server, i did add horizontalPodAutoscalerUseRestClients: true to my cluster spec.

I didn't make any changes, or add horizontalPodAutoscalerUseRestClients: true to my cluster spec.

@dmcnaught Hmmmmm 馃 ... ok 馃槄

Seems like this is what will happen in your case ...

The HorizontalPodAutoscaler controller can fetch metrics in two different ways:
direct Heapster access, and REST client access.

When using direct Heapster access, the HorizontalPodAutoscaler queries Heapster
directly through the API server鈥檚 service proxy subresource. Heapster needs to be
deployed on the cluster and running in the kube-system namespace.

I haven't taken this approach so my thoughts on it are largely theoretical and a personal interpretation of the documentation.

That said, I'd recommend configuring your cluster with horizontalPodAutoscalerUseRestClients: true because the old way is being deprecated and is not recommended (I can't find/recall where I learnt this). The code even refers to it as legacy 馃憞

https://github.com/kubernetes/kubernetes/blob/bfae47ad87807a6361e655f3620cbf2d9f2d6226/cmd/kube-controller-manager/app/autoscaling.go#L37-L48

func startHPAController(ctx ControllerContext) (bool, error) {
    if !ctx.AvailableResources[schema.GroupVersionResource{Group: "autoscaling", Version: "v1", Resource: "horizontalpodautoscalers"}] {
        return false, nil
    }

    if ctx.ComponentConfig.HPAController.HorizontalPodAutoscalerUseRESTClients {
        // use the new-style clients if support for custom metrics is enabled
        return startHPAControllerWithRESTClient(ctx)
    }

    return startHPAControllerWithLegacyClient(ctx)
}

Not a must though, it seems ... but it means there's less work if you decide to use custom metrics in the future. Anyway, glad everyone was able to fix. Will create a PR when I can to clarify the docs.

Thanks!
Does horizontalPodAutoscalerUseRestClients: true break any integrations that you know of?

@dmcnaught Not to the best of my knowledge. I'm using it just fine (and in 1.8).

I have added the following snippet into our kops edit cluster ${NAME},

spec:
  kubeControllerManager:
    horizontalPodAutoscalerUseRestClients: true

and I am able to see the percentage of cpu usage

Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  0% (0) / 10%

No matter what I add the workload into the cluster, the result is always 0%, and not changing.
Please help the core developer.

So I recently noticed this as well. Looks like kube-controller-manager 1.8.* had --horizontal-pod-autoscaler-use-rest-clients not set by default, whereas 1.9.* has it set as true by default. Those of us with old heapster will likely see issues and need to change it to false until you migrate to metrics-server.

See https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-metrics-apis

The --horizontal-pod-autoscaler-use-rest-clients is true or unset. Setting this to false switches to Heapster-based autoscaling, which is deprecated.

To get it to work I had to add the metrics-service with
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/metrics-server/v1.8.x.yaml

Worked for me on kops 1.10.0-alpha.1 !

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chrislovecnm picture chrislovecnm  路  3Comments

justinsb picture justinsb  路  4Comments

minasys picture minasys  路  3Comments

olalonde picture olalonde  路  4Comments

drewfisher314 picture drewfisher314  路  4Comments