K3s: Best practice prometheus monitoring

Created on 1 May 2019  路  23Comments  路  Source: k3s-io/k3s

Describe the bug
I would like to monitor a k3s system. Therefore I installed the prometheus operator helm chart. Out of the box a lot of alerts are in state FIRING.
A lot of rules which cover the apiserver and kubelet are not working. Should users just disable these rules or ar you going to provide your own default rules for a k3s setup?

To Reproduce
Install prometheus helm chart with default values

Expected behavior
Everything should look green if k3s specific instructions were followed....

Screenshots
KubeAPIDown聽(1 active)
KubeControllerManagerDown聽(1 active)
KubeDaemonSetRolloutStuck聽(1 active) kube-state-metrics
KubeSchedulerDown聽(1 active)
KubeletDown聽(1 active)
TargetDown聽(2 active) apiserver, kubelet

kinquestion

Most helpful comment

kubeControllerManager:
 endpoints:
  - ip_of_your_master_node <i.e. 192.168.1.38>
kubeScheduler:
 endpoints:
  - ip_of_your_master_node <i.e. 192.168.1.38>

This fixed my problems.

All 23 comments

In order to remove target scrape errors I use this configuration:

    kubeApiServer:
      enabled: false
    kubeEtcd:
      enabled: false
    kubeControllerManager:
      enabled: false
    kubeScheduler:
      enabled: false

Unfortunatly core parts of k3s are not monitored using this config.

It should be possible to monitor the API server, or at least give an option to change the advertise address.

You can try my HelmChart CRD.

It's not perfect, the kubelet for some reason does not report certain labels but it solves most of your issues.

I am also trying to get kube-prometheus to work on k3s (currenlty version 0.8.0). I am running my cluster on arm, which complicates it a bit: kube-state-metrics and the kube-rbac-proxy for example are not readily available for arm. I made some images myself but lucky enough carlosedp has made the necessary arm images available. You can have a look at his github cluster_monitoring.

Problem is though authorization for node-exporter and kube-state-metrics (and possibly more): it seems k3s uses another authentication version as user phillebaba has found. See issue https://github.com/carlosedp/cluster-monitoring/issues/13#issuecomment-519678266 .

Can k3s developers or anyone else maybe shed some light or advise on this?

I've added a workaround in my cluster-monitoring stack to remove kube-rbac-proxy from node_exporter and kube-state-metrics.

Can you test-out the k3s branch from https://github.com/carlosedp/cluster-monitoring/tree/k3s and report back if it worked? It's a matter of applying the manifests from "manifests" dir. They are already generated from jsonnet.

With k3s (k3d) and kube-state-metrics (kube-rbac-proxy), I have the same problem. If the intention of k3s is to remove alpha and non-default features, I think the kube-rbac-proxy should change to use authentications/v1, or to remove kube-rbac-proxy in kube-state-metrics and node-exporter in our monitoring stack.
But I wish that k3s handles with authentications/v1beta1 too. :->
kind works fine with kube-rbac-proxy.

Problem is though authorization for node-exporter and kube-state-metrics (and possibly more): it seems k3s uses another authentication version as user phillebaba has found. See issue carlosedp/cluster-monitoring#13 (comment) .

The problem with changing to auth/v1 is that it would not be compatible with previous versions of k8s where the api was still beta.

You can try my HelmChart CRD.

It's not perfect, the kubelet for some reason does not report certain labels but it solves most of your issues.

Fails first and eventually succeeds with the following additional changes after failure -

  valuesContent: |-
    prometheusOperator:
      createCustomResource: false

It just disables the creation of CRDs after first failed attempt.

I've added a workaround in my cluster-monitoring stack to remove kube-rbac-proxy from node_exporter and kube-state-metrics.

Can you test-out the k3s branch from https://github.com/carlosedp/cluster-monitoring/tree/k3s and report back if it worked? It's a matter of applying the manifests from "manifests" dir. They are already generated from jsonnet.

Hi, I now do have node-exporter metrics, thx, but cadvisor and the k3s kubelet still give authentication errors?

Edit: I have changed prometheus-serviceMonitorKubelet.yaml to use https and include tls and now I can collect metrics with the carlosedp set of manifests (so without the kube-rbac-proxy).

As I added to the readme on the repo with more details on https://github.com/carlosedp/cluster-monitoring/issues/17, under K3s you need to use Docker as the runtime to have all cAdvisor metrics.

Any update on this? It would be great to monitor with the Prometheus Operator Helm Chart. kubeApiServer working just fine, it's only the following three that are not able to be monitored

    kubeControllerManager:
      enabled: false
    kubeScheduler:
      enabled: false
    kubeProxy:
      enabled: false

Yeah what is the latest on this?

Is there an issue? I have a bog-standard prometheus install pointed at metrics-server and node-exporter. Literally copied the manifests over from an EKS cluster and didn't have to change anything.

Hi @brandond,

This issue just gave me the impression that Prometheus could be challenging to get up and running. So I was wondering, trying to inquire for an update onto some best practices. But, if its simply just throwing a Prometheus Helm chart at K3S I'll better just jump into it.

You have to make sure you have things like metrics-server, kube-state-metrics, node-exporter etc deployed, but that's not unique to k3s. Nor is the prometheus scraper configuration. None of these should require any configuration that wouldn't be necessary on any other k8s cluster.

Great stuff. Thank you Mr. @brandond

Hi,

I am new to k3s. I have got k3s installation set up. I am trying to pull metrics from the cluster. My prometheus is hosted outside.

It would be great help if someone could throw some light on how to set this up. I have literally spent hours trying to find a solution.

do the installation should have metrics server or kube-state-metrics running?

kubeControllerManager:
 endpoints:
  - ip_of_your_master_node <i.e. 192.168.1.38>
kubeScheduler:
 endpoints:
  - ip_of_your_master_node <i.e. 192.168.1.38>

This fixed my problems.

@ioagel @onedr0p did you find a way to get kubeProxy working or is it the only component without metrics access?

+1 for KubeProxy

Following @ioagel's advice I got the controller manager and scheduler to work for my K3s cluster. I ended up having to disable (enable: false) etcd and proxy for my single node test cluster. Thanks @ioagel.

I tried getting the https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack chart to run on my K3s cluster of 3 RPi4's, but sadly some of the images aren't proper multi-arch images (e.g. they fail with standard_init_linux.go:211: exec user process caused "exec format error"). I used the following HelmChart spec:

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: kube-prometheus-stack
  namespace: kube-system
spec:
  chart: kube-prometheus-stack
  repo: https://prometheus-community.github.io/helm-charts
  targetNamespace: monitoring

So, what would be the simplest (best practice) way to deploy a minimal installation of Prometheus and Grafana and perhaps point them to the metrics-server afterwards?

Some of the guides on the internet immediately start utilizing all sorts of templated helper repositories, but that doesn't quite serve as an easy to understand minimal baseline installation at all. A tutorial installation IMHO shouldn't rely on any custom repo's, but rather use the conventional ones where possible.

It appears, for me at least, after upgrading from k3s 1.18 to 1.19 that the explicit endpoint approach stopped working.

I suspect that there is now a firewall rule preventing connections to the endpoints on port 10251 & 10252 from anywhere other than 127.0.0.1

edit: This commit seems to be the culprit: https://github.com/rancher/k3s/commit/4808c4e7d53db310fb324b2157386e50ebef5167#diff-c68274534954d72488196ca23f12cfb3ebe65998d9e7c4a43d7ba9acc9532574

Was this page helpful?
0 / 5 - 0 ratings

Related issues

giminni picture giminni  路  3Comments

dduportal picture dduportal  路  4Comments

theonewolf picture theonewolf  路  3Comments

ewoutp picture ewoutp  路  4Comments

pierreozoux picture pierreozoux  路  4Comments