prometheus-operator high CPU usage

Created on 3 Apr 2019  Â·  3Comments  Â·  Source: prometheus-operator/prometheus-operator

What did you do?

Recently installed Prometheus Operator on an EKS cluster

What did you expect to see?

The operator should simply be reconciling prometheus, servicemonitors, etc (which are not changing). So would not expect to see such high CPU usage.

What did you see instead? Under which circumstances?

Seeing average CPU usage from the operator of 25%, with jumps as high as 75%.

  • Prometheus Operator version: v0.29.0

  • Kubernetes version information:

› kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:39:04Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.8-eks-7c34c0", GitCommit:"7c34c0d2f2d0f11f397d55a46945193a0e22d8f3", GitTreeState:"clean", BuildDate:"2019-03-01T22:49:39Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

Amazon EKS

  • Manifests:

Operator deployment manifest

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "3"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"operator","app.kubernetes.io/managed-by":"flux","app.kubernetes.io/name":"prometheus-operator","app.kubernetes.io/version":"0.29.0"},"name":"prometheus-operator","namespace":"default"},"spec":{"replicas":1,"selector":{"matchLabels":{"app.kubernetes.io/component":"operator","app.kubernetes.io/name":"prometheus-operator","app.kubernetes.io/version":"0.29.0"}},"template":{"metadata":{"labels":{"app.kubernetes.io/component":"operator","app.kubernetes.io/name":"prometheus-operator","app.kubernetes.io/version":"0.29.0"}},"spec":{"containers":[{"args":["--kubelet-service=default/prometheus-operator-kubelet","--logtostderr=true","--crd-apigroup=monitoring.coreos.com","--localhost=127.0.0.1","--prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.29.0","--config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1"],"image":"quay.io/coreos/prometheus-operator:v0.29.0","imagePullPolicy":"IfNotPresent","name":"prometheus-operator","ports":[{"containerPort":8080,"name":"http"}],"resources":{"limits":{"memory":"32Mi"},"requests":{"cpu":"50m","memory":"32Mi"}},"securityContext":{"allowPrivilegeEscalation":false,"readOnlyRootFilesystem":true}}],"securityContext":{"runAsNonRoot":true,"runAsUser":65534},"serviceAccountName":"prometheus-operator"}}}}
  creationTimestamp: "2019-03-28T22:27:01Z"
  generation: 3
  labels:
    app.kubernetes.io/component: operator
    app.kubernetes.io/managed-by: flux
    app.kubernetes.io/name: prometheus-operator
    app.kubernetes.io/version: 0.29.0
  name: prometheus-operator
  namespace: default
  resourceVersion: "8913852"
  selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/prometheus-operator
  uid: 9b724124-51a8-11e9-930d-026ba97b5c52
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: operator
      app.kubernetes.io/name: prometheus-operator
      app.kubernetes.io/version: 0.29.0
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: operator
        app.kubernetes.io/name: prometheus-operator
        app.kubernetes.io/version: 0.29.0
    spec:
      containers:
      - args:
        - --kubelet-service=default/prometheus-operator-kubelet
        - --logtostderr=true
        - --crd-apigroup=monitoring.coreos.com
        - --localhost=127.0.0.1
        - --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.29.0
        - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
        image: quay.io/coreos/prometheus-operator:v0.29.0
        imagePullPolicy: IfNotPresent
        name: prometheus-operator
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        resources:
          limits:
            memory: 32Mi
          requests:
            cpu: 50m
            memory: 32Mi
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
      serviceAccount: prometheus-operator
      serviceAccountName: prometheus-operator
      terminationGracePeriodSeconds: 30
  • Prometheus Operator Logs:
level=info ts=2019-04-02T23:34:48.362376933Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:34:55.936976307Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:03.44680552Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:10.963419578Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:18.478134067Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:25.992500622Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:33.530763593Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:41.048306833Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:48.562223068Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:56.125112643Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:36:03.642759855Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:36:11.1533138Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:36:18.726791085Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:36:26.236170281Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus

Doesn't appear to be syncing constantly like was the issue #1659.

Most helpful comment

Closing this off. This turned out to be a memory limit issue. I had the limit set to 32Mi and the operator was right up against that. Once I bumped the request and limit up, the usage went down to 10 millicores.

My bad. Sorry.

All 3 comments

Can you be more specific what the % mean in terms of milli cores used by the process?

I am seeing an average of 25% CPU usage (so 250 milli cores) with jumps as high as 75% (750 milli cores).

Closing this off. This turned out to be a memory limit issue. I had the limit set to 32Mi and the operator was right up against that. Once I bumped the request and limit up, the usage went down to 10 millicores.

My bad. Sorry.

Was this page helpful?
0 / 5 - 0 ratings