Node_exporter: [FR] Add metric that would display RAM usage in %

Created on 30 Mar 2018  路  3Comments  路  Source: prometheus/node_exporter

It would be great to have that kind of metric for those, who are using kubernetes HPA with GKE 1.9.x. It has resources type metrics, provided by metrics-server (included by default). And its memory metric gets in the count all the memory, that's used. Also, it can be 130% at targetAverageUtilization, which's strange.

And the better one would be (in a promql):

((node_memory_MemTotal{instance="10.240.0.9:9100"} - node_memory_MemFree{instance="10.240.0.9:9100"} - node_memory_Buffers{instance="10.240.0.9:9100"} - node_memory_Cached{instance="10.240.0.9:9100"}) / node_memory_MemTotal{instance="10.240.0.9:9100"}) * 100

which gives % of the used memory, without buffered and cached.

In my case, I scale redis (in-memory only) pods by used RAM and a keystore size:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: redis-hpa
  namespace: backend
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: redis
  minReplicas: 1
  maxReplicas: 30
  metrics:
  - type: Resource
    resource:
      name: memory
      targetAverageUtilization: 
  - type: Pods
    pods:
      metricName: redis_db_keys
      targetAverageValue: 500

So it's better to use a custom metrics from a node_exporter it that case. That would be more precise, I suppose.

Most helpful comment

This is the recording rules we use:

- name: Node memory
  rules:
  - record: instance:node_memory_available:ratio
    expr: >
      (
        node_memory_MemAvailable_bytes or
        (
          node_memory_Buffers_bytes +
          node_memory_Cached_bytes +
          node_memory_MemFree_bytes +
          node_memory_Slab_bytes
        )
      ) /
      node_memory_MemTotal_bytes
  - record: instance:node_memory_utilization:ratio
    expr: 1 - instance:node_memory_available:ratio

All 3 comments

Thanks to @brancz I've ended up with a simple recording rule for Prometheus:

kind: ConfigMap
apiVersion: v1
metadata:
  name: prometheus-rulefiles-custom
  namespace: monitoring
  labels:
    role: prometheus-rulefiles
    prometheus: k8s
data:
  custom.rules: |
    groups:
    - name: node.rules
      rules:
      - record: instance:node_memory_FreeNoBuff:ratio
        expr: ((node_memory_MemTotal - node_memory_MemFree - node_memory_Buffers - node_memory_Cached) / node_memory_MemTotal) * 100

and used it within a Prometheus manifest:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus-server
  namespace: monitoring
  labels:
    app: prometheus
spec:
  nodeSelector:
    role: system
  replicas: 1
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchExpressions:
    - {key: k8s-app, operator: Exists}
  ruleSelector:
    matchLabels:
      role: prometheus-rulefiles
      prometheus: k8s
  resources:
    requests:
      memory: 400Mi
  retention: 7d
  storage:
    volumeClaimTemplate:
      spec:
        class: slow
        resources:
          requests:
            storage: 10Gi

Just for anyone else, the formula to calculate used memory may be different to the one in the OP.

This redhat post suggests Used = MemTotal - MemFree - Buffers - Cached - Slab

This SO answer suggests Used = MemTotal - MemFree - Cached - SReclaimable - Buffers

For me the SO formula most closely matches what I see in top, free and Monit.

This is the recording rules we use:

- name: Node memory
  rules:
  - record: instance:node_memory_available:ratio
    expr: >
      (
        node_memory_MemAvailable_bytes or
        (
          node_memory_Buffers_bytes +
          node_memory_Cached_bytes +
          node_memory_MemFree_bytes +
          node_memory_Slab_bytes
        )
      ) /
      node_memory_MemTotal_bytes
  - record: instance:node_memory_utilization:ratio
    expr: 1 - instance:node_memory_available:ratio
Was this page helpful?
0 / 5 - 0 ratings