What did you do?
Recently installed Prometheus Operator on an EKS cluster
What did you expect to see?
The operator should simply be reconciling prometheus, servicemonitors, etc (which are not changing). So would not expect to see such high CPU usage.
What did you see instead? Under which circumstances?
Seeing average CPU usage from the operator of 25%, with jumps as high as 75%.
Prometheus Operator version: v0.29.0
Kubernetes version information:
› kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:39:04Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.8-eks-7c34c0", GitCommit:"7c34c0d2f2d0f11f397d55a46945193a0e22d8f3", GitTreeState:"clean", BuildDate:"2019-03-01T22:49:39Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Amazon EKS
Operator deployment manifest
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"operator","app.kubernetes.io/managed-by":"flux","app.kubernetes.io/name":"prometheus-operator","app.kubernetes.io/version":"0.29.0"},"name":"prometheus-operator","namespace":"default"},"spec":{"replicas":1,"selector":{"matchLabels":{"app.kubernetes.io/component":"operator","app.kubernetes.io/name":"prometheus-operator","app.kubernetes.io/version":"0.29.0"}},"template":{"metadata":{"labels":{"app.kubernetes.io/component":"operator","app.kubernetes.io/name":"prometheus-operator","app.kubernetes.io/version":"0.29.0"}},"spec":{"containers":[{"args":["--kubelet-service=default/prometheus-operator-kubelet","--logtostderr=true","--crd-apigroup=monitoring.coreos.com","--localhost=127.0.0.1","--prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.29.0","--config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1"],"image":"quay.io/coreos/prometheus-operator:v0.29.0","imagePullPolicy":"IfNotPresent","name":"prometheus-operator","ports":[{"containerPort":8080,"name":"http"}],"resources":{"limits":{"memory":"32Mi"},"requests":{"cpu":"50m","memory":"32Mi"}},"securityContext":{"allowPrivilegeEscalation":false,"readOnlyRootFilesystem":true}}],"securityContext":{"runAsNonRoot":true,"runAsUser":65534},"serviceAccountName":"prometheus-operator"}}}}
creationTimestamp: "2019-03-28T22:27:01Z"
generation: 3
labels:
app.kubernetes.io/component: operator
app.kubernetes.io/managed-by: flux
app.kubernetes.io/name: prometheus-operator
app.kubernetes.io/version: 0.29.0
name: prometheus-operator
namespace: default
resourceVersion: "8913852"
selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/prometheus-operator
uid: 9b724124-51a8-11e9-930d-026ba97b5c52
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: operator
app.kubernetes.io/name: prometheus-operator
app.kubernetes.io/version: 0.29.0
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/component: operator
app.kubernetes.io/name: prometheus-operator
app.kubernetes.io/version: 0.29.0
spec:
containers:
- args:
- --kubelet-service=default/prometheus-operator-kubelet
- --logtostderr=true
- --crd-apigroup=monitoring.coreos.com
- --localhost=127.0.0.1
- --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.29.0
- --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
image: quay.io/coreos/prometheus-operator:v0.29.0
imagePullPolicy: IfNotPresent
name: prometheus-operator
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
memory: 32Mi
requests:
cpu: 50m
memory: 32Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
runAsNonRoot: true
runAsUser: 65534
serviceAccount: prometheus-operator
serviceAccountName: prometheus-operator
terminationGracePeriodSeconds: 30
level=info ts=2019-04-02T23:34:48.362376933Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:34:55.936976307Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:03.44680552Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:10.963419578Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:18.478134067Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:25.992500622Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:33.530763593Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:41.048306833Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:48.562223068Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:35:56.125112643Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:36:03.642759855Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:36:11.1533138Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:36:18.726791085Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
level=info ts=2019-04-02T23:36:26.236170281Z caller=operator.go:970 component=prometheusoperator msg="sync prometheus" key=default/prometheus
Doesn't appear to be syncing constantly like was the issue #1659.
Can you be more specific what the % mean in terms of milli cores used by the process?
I am seeing an average of 25% CPU usage (so 250 milli cores) with jumps as high as 75% (750 milli cores).
Closing this off. This turned out to be a memory limit issue. I had the limit set to 32Mi and the operator was right up against that. Once I bumped the request and limit up, the usage went down to 10 millicores.
My bad. Sorry.
Most helpful comment
Closing this off. This turned out to be a memory limit issue. I had the limit set to 32Mi and the operator was right up against that. Once I bumped the request and limit up, the usage went down to 10 millicores.
My bad. Sorry.