K3s: Best practice prometheus monitoring

Created on 1 May 2019 · 23Comments · Source: k3s-io/k3s

Describe the bug
I would like to monitor a k3s system. Therefore I installed the prometheus operator helm chart. Out of the box a lot of alerts are in state FIRING.
A lot of rules which cover the apiserver and kubelet are not working. Should users just disable these rules or ar you going to provide your own default rules for a k3s setup?

To Reproduce
Install prometheus helm chart with default values

Expected behavior
Everything should look green if k3s specific instructions were followed....

Screenshots
KubeAPIDown (1 active)
KubeControllerManagerDown (1 active)
KubeDaemonSetRolloutStuck (1 active) kube-state-metrics
KubeSchedulerDown (1 active)
KubeletDown (1 active)
TargetDown (2 active) apiserver, kubelet

kinquestion

Source

runningman84

👍8

Most helpful comment

kubeControllerManager:
 endpoints:
  - ip_of_your_master_node <i.e. 192.168.1.38>
kubeScheduler:
 endpoints:
  - ip_of_your_master_node <i.e. 192.168.1.38>

This fixed my problems.

ioagel on 11 Aug 2020

👍3

All 23 comments

In order to remove target scrape errors I use this configuration:

    kubeApiServer:
      enabled: false
    kubeEtcd:
      enabled: false
    kubeControllerManager:
      enabled: false
    kubeScheduler:
      enabled: false

Unfortunatly core parts of k3s are not monitored using this config.

runningman84 on 12 May 2019

It should be possible to monitor the API server, or at least give an option to change the advertise address.

JeffreyVdb on 10 Jul 2019

You can try my HelmChart CRD.

It's not perfect, the kubelet for some reason does not report certain labels but it solves most of your issues.

szamuboy on 12 Jul 2019

I am also trying to get kube-prometheus to work on k3s (currenlty version 0.8.0). I am running my cluster on arm, which complicates it a bit: kube-state-metrics and the kube-rbac-proxy for example are not readily available for arm. I made some images myself but lucky enough carlosedp has made the necessary arm images available. You can have a look at his github cluster_monitoring.

Problem is though authorization for node-exporter and kube-state-metrics (and possibly more): it seems k3s uses another authentication version as user phillebaba has found. See issue https://github.com/carlosedp/cluster-monitoring/issues/13#issuecomment-519678266 .

Can k3s developers or anyone else maybe shed some light or advise on this?

hlugt on 13 Aug 2019

I've added a workaround in my cluster-monitoring stack to remove kube-rbac-proxy from node_exporter and kube-state-metrics.

Can you test-out the k3s branch from https://github.com/carlosedp/cluster-monitoring/tree/k3s and report back if it worked? It's a matter of applying the manifests from "manifests" dir. They are already generated from jsonnet.

carlosedp on 20 Aug 2019

With k3s (k3d) and kube-state-metrics (kube-rbac-proxy), I have the same problem. If the intention of k3s is to remove alpha and non-default features, I think the kube-rbac-proxy should change to use authentications/v1, or to remove kube-rbac-proxy in kube-state-metrics and node-exporter in our monitoring stack.
But I wish that k3s handles with authentications/v1beta1 too. :->
kind works fine with kube-rbac-proxy.

Problem is though authorization for node-exporter and kube-state-metrics (and possibly more): it seems k3s uses another authentication version as user phillebaba has found. See issue carlosedp/cluster-monitoring#13 (comment) .

anarcher on 30 Aug 2019

The problem with changing to auth/v1 is that it would not be compatible with previous versions of k8s where the api was still beta.

carlosedp on 30 Aug 2019

You can try my HelmChart CRD.

It's not perfect, the kubelet for some reason does not report certain labels but it solves most of your issues.

Fails first and eventually succeeds with the following additional changes after failure -

  valuesContent: |-
    prometheusOperator:
      createCustomResource: false

It just disables the creation of CRDs after first failed attempt.

ramukima on 15 Oct 2019

I've added a workaround in my cluster-monitoring stack to remove kube-rbac-proxy from node_exporter and kube-state-metrics.

Can you test-out the k3s branch from https://github.com/carlosedp/cluster-monitoring/tree/k3s and report back if it worked? It's a matter of applying the manifests from "manifests" dir. They are already generated from jsonnet.

Hi, I now do have node-exporter metrics, thx, but cadvisor and the k3s kubelet still give authentication errors?

Edit: I have changed prometheus-serviceMonitorKubelet.yaml to use https and include tls and now I can collect metrics with the carlosedp set of manifests (so without the kube-rbac-proxy).

hlugt on 14 Nov 2019

As I added to the readme on the repo with more details on https://github.com/carlosedp/cluster-monitoring/issues/17, under K3s you need to use Docker as the runtime to have all cAdvisor metrics.

carlosedp on 14 Nov 2019

Any update on this? It would be great to monitor with the Prometheus Operator Helm Chart. kubeApiServer working just fine, it's only the following three that are not able to be monitored

    kubeControllerManager:
      enabled: false
    kubeScheduler:
      enabled: false
    kubeProxy:
      enabled: false

onedr0p on 5 Apr 2020

Yeah what is the latest on this?

larssb on 2 Jun 2020

Is there an issue? I have a bog-standard prometheus install pointed at metrics-server and node-exporter. Literally copied the manifests over from an EKS cluster and didn't have to change anything.

brandond on 3 Jun 2020

Hi @brandond,

This issue just gave me the impression that Prometheus could be challenging to get up and running. So I was wondering, trying to inquire for an update onto some best practices. But, if its simply just throwing a Prometheus Helm chart at K3S I'll better just jump into it.

larssb on 3 Jun 2020

You have to make sure you have things like metrics-server, kube-state-metrics, node-exporter etc deployed, but that's not unique to k3s. Nor is the prometheus scraper configuration. None of these should require any configuration that wouldn't be necessary on any other k8s cluster.

brandond on 3 Jun 2020

Great stuff. Thank you Mr. @brandond

larssb on 5 Jun 2020

Hi,

I am new to k3s. I have got k3s installation set up. I am trying to pull metrics from the cluster. My prometheus is hosted outside.

It would be great help if someone could throw some light on how to set this up. I have literally spent hours trying to find a solution.

do the installation should have metrics server or kube-state-metrics running?

isshwar on 21 Jul 2020

kubeControllerManager:
 endpoints:
  - ip_of_your_master_node <i.e. 192.168.1.38>
kubeScheduler:
 endpoints:
  - ip_of_your_master_node <i.e. 192.168.1.38>

This fixed my problems.

ioagel on 11 Aug 2020

👍3

@ioagel @onedr0p did you find a way to get kubeProxy working or is it the only component without metrics access?

CuBiC3D on 23 Aug 2020

+1 for KubeProxy

lictw on 19 Sep 2020

Following @ioagel's advice I got the controller manager and scheduler to work for my K3s cluster. I ended up having to disable (enable: false) etcd and proxy for my single node test cluster. Thanks @ioagel.

djhoese on 4 Oct 2020

I tried getting the https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack chart to run on my K3s cluster of 3 RPi4's, but sadly some of the images aren't proper multi-arch images (e.g. they fail with standard_init_linux.go:211: exec user process caused "exec format error"). I used the following HelmChart spec:

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: kube-prometheus-stack
  namespace: kube-system
spec:
  chart: kube-prometheus-stack
  repo: https://prometheus-community.github.io/helm-charts
  targetNamespace: monitoring

So, what would be the simplest (best practice) way to deploy a minimal installation of Prometheus and Grafana and perhaps point them to the metrics-server afterwards?

Some of the guides on the internet immediately start utilizing all sorts of templated helper repositories, but that doesn't quite serve as an easy to understand minimal baseline installation at all. A tutorial installation IMHO shouldn't rely on any custom repo's, but rather use the conventional ones where possible.