What did you do?
git clone -b master https://github.com/coreos/prometheus-operator.git kube-prometheus-temp;
cd kube-prometheus-temp/contrib/kube-prometheus
git checkout tags/v0.11.0
./hack/cluster-monitoring/deploy
kubectl -n kube-system create -f manifests/k8s/self-hosted/
cd -
What did you expect to see?
No alerts
What did you see instead? Under which circumstances?
2 alerts, persisting:
DeadMansSwitch (1 active)
K8SApiServerLatency (1 active)
Environment
kops 1.7.0 cluster with kubernetes 1.7.3
› » ku version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T07:00:21Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T06:43:48Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
`kops 1.7.0`
Manifests:
Prometheus Operator Logs:
I noticed the same thing when I updated my our setup this week and investigated.
DeadMansSwitch has to fire always. It verifies your alerting works and has severity none. I guess you should be alerted if it doesn't fire.
https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/assets/prometheus/rules/general.rules#L16-L24
K8SApiServerLatency required some tweaking for us. It already excludes some long-lasting stuff. I've found that we also needed to add subresource != log.
https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/assets/prometheus/rules/kube-apiserver.rules#L12-L28
ALERT K8SApiServerLatency
IF histogram_quantile(
0.99,
sum without (instance,resource) (apiserver_request_latencies_bucket{subresource!=log,verb!~"CONNECT|WATCHLIST|WATCH|PROXY"})
) / 1e6 > 1.0
FOR 10m
LABELS {
severity = "warning"
}
ANNOTATIONS {
summary = "Kubernetes apiserver latency is high",
description = "99th percentile Latency for {{ $labels.verb }} requests to the kube-apiserver is higher than 1s.",
}
Is this something we should generally fix? I can create a PR for this.
Nice finding @unguiculus ! Yes these kinds of contributions are of course also highly welcome! :slightly_smiling_face:
Most helpful comment
I noticed the same thing when I updated my our setup this week and investigated.
DeadMansSwitchhas to fire always. It verifies your alerting works and has severitynone. I guess you should be alerted if it doesn't fire.https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/assets/prometheus/rules/general.rules#L16-L24
K8SApiServerLatencyrequired some tweaking for us. It already excludes some long-lasting stuff. I've found that we also needed to addsubresource != log.https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/assets/prometheus/rules/kube-apiserver.rules#L12-L28
Is this something we should generally fix? I can create a PR for this.