Origin: Autoscaler can't get Current CPU utilization

Created on 9 Dec 2015 · 7Comments · Source: openshift/origin

I have configured my cluster metrics. I'm able to see them in the webconsole.
I do not use persistent storage for them and I used auto-generated certificates:

$ oc secrets new metrics-deployer nothing=/dev/null

It looks fine, for every pod I have, I get the metrics (using heapster).

This are the logs of my heapster pod:

Starting Heapster with the following arguments: --source=kubernetes:https://kubernetes.default.svc:443?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250 --sink=hawkular:https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=uRXj1CFvyuQH_H8&filter=label(container_name:^/system.slice.*|^/user.slice) --logtostderr=true --tls_cert=/secrets/heapster.cert --tls_key=/secrets/heapster.key --tls_client_ca=/secrets/heapster.client-ca --allowed_users=system:master-proxy
I1209 07:16:00.166350       1 heapster.go:60] heapster --source=kubernetes:https://kubernetes.default.svc:443?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250 --sink=hawkular:https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=uRXj1CFvyuQH_H8&filter=label(container_name:^/system.slice.*|^/user.slice) --logtostderr=true --tls_cert=/secrets/heapster.cert --tls_key=/secrets/heapster.key --tls_client_ca=/secrets/heapster.client-ca --allowed_users=system:master-proxy
I1209 07:16:00.171115       1 heapster.go:61] Heapster version 0.18.0
I1209 07:16:00.171713       1 kube_factory.go:168] Using Kubernetes client with master "https://kubernetes.default.svc:443" and version "v1"
I1209 07:16:00.171726       1 kube_factory.go:169] Using kubelet port 10250
I1209 07:16:00.172023       1 driver.go:491] Initialised Hawkular Sink with parameters {_system https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=uRXj1CFvyuQH_H8&filter=label(container_name:^/system.slice.*|^/user.slice) 0xc20817afc0 }
I1209 07:16:00.359077       1 heapster.go:71] Starting heapster on port 8082

I scaled my test-project and it's using 40 pods.
Now I want to make an autoscaler for my project test.
The .yaml looks like this:

apiVersion: extensions/v1beta1
kind: HorizontalPodAutoscaler
metadata:
  name: test-scaler
spec:
  scaleRef:
    kind: DeploymentConfig
    name: test #name of my dc
    apiVersion: v1
    subresource: scale
  minReplicas: 2
  maxReplicas: 30
  cpuUtilization:
    targetPercentage: 60

I know the auto-scaler needs the cluster metrics. But that's working fine so I would think it should work but it isn't:

[centos@autoscaler]$ oc get hpa
NAME          REFERENCE                     TARGET    CURRENT     MINPODS   MAXPODS   AGE
test-scaler   DeploymentConfig/test/scale   60%       <waiting>   2         30        13m

[centos@autoscaler]$ oc describe hpa test-scaler
Name:                             test-scaler
Namespace:                 test
Labels:                               <none>
CreationTimestamp:                Tue, 08 Dec 2015 11:21:49 +0000
Reference:                  DeploymentConfig/test/scale
Target CPU utilization:             60%
Current CPU utilization:       <not available>
Min replicas:                        2
Max replicas:                     30

componenapps kinbug prioritP2

Source

lvthillo

Most helpful comment

Yeah, it looks like you're missing a CPU request on your pods (to confirm, I'd need to see the output of kubectl get dc $YOUR_DC -o yaml). In order to use the CPU autoscaling, you'll need to specify a CPU request under the resources section for your pod spec (CPU autoscaling is based on a percentage of the requested CPU: https://docs.openshift.org/latest/dev_guide/pod_autoscaling.html#hpa-supported-metrics). For example:

...
spec:
  containers:
  - image: nginx
    name: nginx
    resources:
      requests:
        cpu: 400m
...

DirectXMan12 on 10 Dec 2015

👍5

All 7 comments

Can you post your DeploymentConfig JSON/YAML? Additionally, if you have access to the logs, do you see anything there? There's a couple issues that could be occurring (e.g. Origin might not be able to connect to Heapster or there's some incorrect configuration on you DC -- you need to specify CPU requests on your pods for the HPA to work).

DirectXMan12 on 10 Dec 2015

@DirectXMan12
A description of the pod is all I can give:

[centos@]$ oc describe pod heapster-bub0e  
Name:               heapster-bub0e
Namespace:          openshift-infra
Image(s):           openshift/origin-metrics-heapster:latest
Node:               ip-10-0-0-xx.eu-west-1.compute.internal/10.0.0.xx
Start Time:         Thu, 10 Dec 2015 07:24:39 +0000
Labels:             metrics-infra=heapster,name=heapster
Status:             Running
Reason:             
Message:            
IP:             10.1.1.6
Replication Controllers:    heapster (1/1 replicas created)
Containers:
  heapster:
    Container ID:   docker://7c9a01a0b4d1c502a770901e181c68f0c8cbee4a927cd453235ff28cbc920b01
    Image:      openshift/origin-metrics-heapster:latest
    Image ID:       docker://ef2c651384befe07342290c8f3a7b01c2fa0d7b4310500aa96dffd177c7e26b1
    QoS Tier:
      cpu:          BestEffort
      memory:           BestEffort
    State:          Running
      Started:          Thu, 10 Dec 2015 07:25:52 +0000
    Last Termination State: Terminated
      Reason:           Error
      Exit Code:        255
      Started:          Thu, 10 Dec 2015 07:25:30 +0000
      Finished:         Thu, 10 Dec 2015 07:25:33 +0000
    Ready:          True
    Restart Count:      2
    Environment Variables:
Conditions:
  Type      Status
  Ready     True 
Volumes:
  heapster-secrets:
    Type:   Secret (a secret that should populate this volume)
    SecretName: heapster-secrets
  hawkular-metrics-certificate:
    Type:   Secret (a secret that should populate this volume)
    SecretName: hawkular-metrics-certificate
  hawkular-metrics-account:
    Type:   Secret (a secret that should populate this volume)
    SecretName: hawkular-metrics-account
  heapster-token-pnlme:
    Type:   Secret (a secret that should populate this volume)
    SecretName: heapster-token-pnlme

There is an 'error' but I think it was because I started my Origin-server at that moment so everything was recreated.

I used this template to create it all, hope this helps:

#!/bin/bash
apiVersion: "v1"
kind: "Template"
metadata:
  name: metrics-deployer-template
  annotations:
    description: "Template for deploying the required Metrics integration. Requires cluster-admin 'metrics-deployer' service account and 'metrics-deployer' secret."
    tags: "infrastructure"
labels:
  metrics-infra: deployer
  provider: openshift
  component: deployer
objects:
-
  apiVersion: v1
  kind: Pod
  metadata:
    generateName: metrics-deployer-
  spec:
    containers:
    - image: ${IMAGE_PREFIX}metrics-deployer:${IMAGE_VERSION}
      name: deployer
      volumeMounts:
      - name: secret
        mountPath: /secret
        readOnly: true
      - name: empty
        mountPath: /etc/deploy
      env:
        - name: PROJECT
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: IMAGE_PREFIX
        - name: IMAGE_VERSION
          value: ${IMAGE_VERSION}
        - name: PUBLIC_MASTER_URL
          value: ${PUBLIC_MASTER_URL}
        - name: MASTER_URL
          value: ${MASTER_URL}
        - name: REDEPLOY
          value: ${REDEPLOY}
        - name: USE_PERSISTENT_STORAGE
          value: ${USE_PERSISTENT_STORAGE}
        - name: HAWKULAR_METRICS_HOSTNAME
          value: ${HAWKULAR_METRICS_HOSTNAME}
        - name: CASSANDRA_NODES
          value: ${CASSANDRA_NODES}
        - name: CASSANDRA_PV_SIZE
          value: ${CASSANDRA_PV_SIZE}
        - name: METRIC_DURATION
          value: ${METRIC_DURATION}
    dnsPolicy: ClusterFirst
    restartPolicy: Never
    serviceAccount: metrics-deployer
    volumes:
    - name: empty
      emptyDir: {}
    - name: secret
      secret:
        secretName: metrics-deployer
parameters:
-
  description: 'Specify prefix for metrics components; e.g. for "openshift/origin-metrics-deployer:v1.1", set prefix "openshift/origin-"'
  name: IMAGE_PREFIX
  value: "openshift/origin-"
-
  description: 'Specify version for metrics components; e.g. for "openshift/origin-metrics-deployer:v1.1", set version "v1.1"'
  name: IMAGE_VERSION
  value: "latest"
-
  description: "Internal URL for the master, for authentication retrieval"
  name: MASTER_URL
  value: "https://kubernetes.default.svc:443"
-
  description: "External hostname where clients will reach Hawkular Metrics"
  name: HAWKULAR_METRICS_HOSTNAME
  required: true
-
  description: "If set to true the deployer will try and delete all the existing components before trying to redeploy."
  name: REDEPLOY
  value: "false"
-
  description: "Set to true for persistent storage, set to false to use non persistent storage"
  name: USE_PERSISTENT_STORAGE
  value: "true"
-
  description: "The number of Cassandra Nodes to deploy for the initial cluster"
  name: CASSANDRA_NODES
  value: "1"
-
  description: "The persistent volume size for each of the Cassandra nodes"
  name: CASSANDRA_PV_SIZE
  value: "1Gi"
-
  description: "How many days metrics should be stored for."
  name: METRIC_DURATION
  value: "7"

and to execute:

oc process -f metrics.yaml -v \
HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.apps.example.com,USE_PERSISTENT_STORAGE=false \
| oc create -f -

It's working and showing up in the metrics-tab in my webconsole. But it's unaccessible for my autoscaler.

The logs of my heapster look different than a few hours ago:

Starting Heapster with the following arguments: --source=kubernetes:https://kubernetes.default.svc:443?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250 --sink=hawkular:https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=uRXj1CFvyuQH_H8&filter=label(container_name:^/system.slice.*|^/user.slice) --logtostderr=true --tls_cert=/secrets/heapster.cert --tls_key=/secrets/heapster.key --tls_client_ca=/secrets/heapster.client-ca --allowed_users=system:master-proxy
I1210 07:25:52.969486       1 heapster.go:60] heapster --source=kubernetes:https://kubernetes.default.svc:443?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250 --sink=hawkular:https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=uRXj1CFvyuQH_H8&filter=label(container_name:^/system.slice.*|^/user.slice) --logtostderr=true --tls_cert=/secrets/heapster.cert --tls_key=/secrets/heapster.key --tls_client_ca=/secrets/heapster.client-ca --allowed_users=system:master-proxy
I1210 07:25:52.985274       1 heapster.go:61] Heapster version 0.18.0
I1210 07:25:52.985813       1 kube_factory.go:168] Using Kubernetes client with master "https://kubernetes.default.svc:443" and version "v1"
I1210 07:25:52.985835       1 kube_factory.go:169] Using kubelet port 10250
I1210 07:25:52.986153       1 driver.go:491] Initialised Hawkular Sink with parameters {_system https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=uRXj1CFvyuQH_H8&filter=label(container_name:^/system.slice.*|^/user.slice) 0xc2081946c0 }
I1210 07:25:53.165210       1 heapster.go:71] Starting heapster on port 8082
W1210 09:25:53.101503       1 reflector.go:224] /tmp/gopath/src/k8s.io/heapster/sources/pods.go:173: watch of *api.Pod ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [47713/47026]) [48712]
2015/12/10 11:20:23 http: TLS handshake error from 10.1.1.1:54211: tls: first record does not look like a TLS handshake

I don't understand this: you need to specify CPU requests on your pods for the HPA to work).
Thanks.

lvthillo on 10 Dec 2015

Logs in my webconsole:


11:20:02 AM     test-scaler     HorizontalPodAutoscaler     FailedGetMetrics    failed to get CPU consumption and request: some pods do not have request for cpu (352 times in the last 2 hours, 55 minutes)

11:20:32 AM     test-scaler     HorizontalPodAutoscaler     FailedComputeReplicas   failed to get cpu utilization: failed to get CPU consumption and request: some pods do not have request for cpu (353 times in the last 2 hours, 56 minutes)

lvthillo on 10 Dec 2015

The steps I perform to set up my heapster are just these steps: https://docs.openshift.org/latest/install_config/cluster_metrics.html#metrics-deployer

Litterally following them with oc secrets new metrics-deployer nothing=/dev/null and persistent storage: false. It works except for my hpa

lvthillo on 10 Dec 2015

...
spec:
  containers:
  - image: nginx
    name: nginx
    resources:
      requests:
        cpu: 400m
...

DirectXMan12 on 10 Dec 2015

👍5

Thanks, you were right. This fixed it.

lvthillo on 11 Dec 2015

Hi, I've been following this issue because I have the same problem with the same configuration but the last configuration doesn't fix the problem.
I don't use persistent storage and I use auto-generated certificates.

Heapster is running in openshift-infra project while the pods and hpa are running in a different project.

This is the the hpa:

oc describe hpa frontend-scaler
Name:                           frontend-scaler
Namespace:                      
Labels:                         
CreationTimestamp:              Fri, 11 Dec 2015 08:41:14 +0000
Reference:                      DeploymentConfig/jupyter-requests/scale
Target CPU utilization:         70%
Current CPU utilization:        
Min replicas:                   1
Max replicas:                   3

Logs in web-console:

9:40:04 AM  
HorizontalPodAutoscaler
frontend-scaler 
FailedComputeReplicas
failed to get cpu utilization: failed to get CPU consumption and request: metrics obtained for 0/1 of pods
9:40:04 AM  
HorizontalPodAutoscaler
frontend-scaler 
FailedGetMetrics
failed to get CPU consumption and request: metrics obtained for 0/1 of pods

This is the output of kubectl get dc:

...
    spec:
      containers:
      - image: .../openshift/jupyter-python
        imagePullPolicy: IfNotPresent
        name: jupyter-requests
        ports:
        - containerPort: 8000
          protocol: TCP
        resources:
          limits:
            cpu: 200m
            memory: 400Mi
          requests:
            cpu: 100m
            memory: 200Mi
....

Thanks.