Prometheus-operator: Added kube-prometheus to kops 1.7.0 cluster with kubernetes 1.7.3 - two alerts

Created on 11 Aug 2017 · 2Comments · Source: prometheus-operator/prometheus-operator

What did you do?

git clone -b master https://github.com/coreos/prometheus-operator.git kube-prometheus-temp;
cd kube-prometheus-temp/contrib/kube-prometheus
git checkout tags/v0.11.0
./hack/cluster-monitoring/deploy
kubectl -n kube-system create -f manifests/k8s/self-hosted/
cd -

What did you expect to see?
No alerts
What did you see instead? Under which circumstances?
2 alerts, persisting:

DeadMansSwitch (1 active)
K8SApiServerLatency (1 active)

Environment
kops 1.7.0 cluster with kubernetes 1.7.3

Kubernetes version information:

› » ku version Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T07:00:21Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T06:43:48Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:

`kops 1.7.0`

Manifests:
Prometheus Operator Logs:

Source

dmcnaught

Most helpful comment

I noticed the same thing when I updated my our setup this week and investigated.

DeadMansSwitch has to fire always. It verifies your alerting works and has severity none. I guess you should be alerted if it doesn't fire.
https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/assets/prometheus/rules/general.rules#L16-L24

K8SApiServerLatency required some tweaking for us. It already excludes some long-lasting stuff. I've found that we also needed to add subresource != log.
https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/assets/prometheus/rules/kube-apiserver.rules#L12-L28

ALERT K8SApiServerLatency
  IF histogram_quantile(
      0.99,
      sum without (instance,resource) (apiserver_request_latencies_bucket{subresource!=log,verb!~"CONNECT|WATCHLIST|WATCH|PROXY"})
    ) / 1e6 > 1.0
  FOR 10m
  LABELS {
    severity = "warning"
  }
  ANNOTATIONS {
    summary = "Kubernetes apiserver latency is high",
    description = "99th percentile Latency for {{ $labels.verb }} requests to the kube-apiserver is higher than 1s.",
  }

Is this something we should generally fix? I can create a PR for this.

unguiculus on 12 Aug 2017

👍3

All 2 comments

I noticed the same thing when I updated my our setup this week and investigated.

ALERT K8SApiServerLatency
  IF histogram_quantile(
      0.99,
      sum without (instance,resource) (apiserver_request_latencies_bucket{subresource!=log,verb!~"CONNECT|WATCHLIST|WATCH|PROXY"})
    ) / 1e6 > 1.0
  FOR 10m
  LABELS {
    severity = "warning"
  }
  ANNOTATIONS {
    summary = "Kubernetes apiserver latency is high",
    description = "99th percentile Latency for {{ $labels.verb }} requests to the kube-apiserver is higher than 1s.",
  }

Is this something we should generally fix? I can create a PR for this.

unguiculus on 12 Aug 2017

👍3

Nice finding @unguiculus ! Yes these kinds of contributions are of course also highly welcome! :slightly_smiling_face: