It would be great to have that kind of metric for those, who are using kubernetes HPA with GKE 1.9.x. It has resources type metrics, provided by metrics-server (included by default). And its memory metric gets in the count all the memory, that's used. Also, it can be 130% at targetAverageUtilization, which's strange.
And the better one would be (in a promql):
((node_memory_MemTotal{instance="10.240.0.9:9100"} - node_memory_MemFree{instance="10.240.0.9:9100"} - node_memory_Buffers{instance="10.240.0.9:9100"} - node_memory_Cached{instance="10.240.0.9:9100"}) / node_memory_MemTotal{instance="10.240.0.9:9100"}) * 100
which gives % of the used memory, without buffered and cached.
In my case, I scale redis (in-memory only) pods by used RAM and a keystore size:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: redis-hpa
namespace: backend
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: redis
minReplicas: 1
maxReplicas: 30
metrics:
- type: Resource
resource:
name: memory
targetAverageUtilization:
- type: Pods
pods:
metricName: redis_db_keys
targetAverageValue: 500
So it's better to use a custom metrics from a node_exporter it that case. That would be more precise, I suppose.
Thanks to @brancz I've ended up with a simple recording rule for Prometheus:
kind: ConfigMap
apiVersion: v1
metadata:
name: prometheus-rulefiles-custom
namespace: monitoring
labels:
role: prometheus-rulefiles
prometheus: k8s
data:
custom.rules: |
groups:
- name: node.rules
rules:
- record: instance:node_memory_FreeNoBuff:ratio
expr: ((node_memory_MemTotal - node_memory_MemFree - node_memory_Buffers - node_memory_Cached) / node_memory_MemTotal) * 100
and used it within a Prometheus manifest:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus-server
namespace: monitoring
labels:
app: prometheus
spec:
nodeSelector:
role: system
replicas: 1
serviceAccountName: prometheus
serviceMonitorSelector:
matchExpressions:
- {key: k8s-app, operator: Exists}
ruleSelector:
matchLabels:
role: prometheus-rulefiles
prometheus: k8s
resources:
requests:
memory: 400Mi
retention: 7d
storage:
volumeClaimTemplate:
spec:
class: slow
resources:
requests:
storage: 10Gi
Just for anyone else, the formula to calculate used memory may be different to the one in the OP.
This redhat post suggests Used = MemTotal - MemFree - Buffers - Cached - Slab
This SO answer suggests Used = MemTotal - MemFree - Cached - SReclaimable - Buffers
For me the SO formula most closely matches what I see in top, free and Monit.
This is the recording rules we use:
- name: Node memory
rules:
- record: instance:node_memory_available:ratio
expr: >
(
node_memory_MemAvailable_bytes or
(
node_memory_Buffers_bytes +
node_memory_Cached_bytes +
node_memory_MemFree_bytes +
node_memory_Slab_bytes
)
) /
node_memory_MemTotal_bytes
- record: instance:node_memory_utilization:ratio
expr: 1 - instance:node_memory_available:ratio
Most helpful comment
This is the recording rules we use: