Linkerd2: Grafana dashboard isn't using the good label for querying prometheus

Created on 12 May 2020  ยท  7Comments  ยท  Source: linkerd/linkerd2

Bug Report

What is the issue?

The dashboard "Kubernetes cluster monitoring (via Prometheus) " isn't using the good label for querying prometheus.

https://github.com/linkerd/linkerd2/blob/master/grafana/dashboards/kubernetes.json

It uses pod_name but i do not have this label in my prometheus.

How can it be reproduced?

I've just installed a new cluster in 1.16.8 with 3 nodes 1 master. I deployed ArgoCD, linkerd and a microservice app (bookinfo).

Logs, error output, etc

img

linkerd check output

kubernetes-api
--------------
โˆš can initialize the client
โˆš can query the Kubernetes API

kubernetes-version
------------------
โˆš is running the minimum Kubernetes API version
โˆš is running the minimum kubectl version

linkerd-existence
-----------------
โˆš 'linkerd-config' config map exists
โˆš heartbeat ServiceAccount exist
โˆš control plane replica sets are ready
โˆš no unschedulable pods
โˆš controller pod is running
โˆš can initialize the client
โˆš can query the control plane API

linkerd-config
--------------
โˆš control plane Namespace exists
โˆš control plane ClusterRoles exist
โˆš control plane ClusterRoleBindings exist
โˆš control plane ServiceAccounts exist
โˆš control plane CustomResourceDefinitions exist
โˆš control plane MutatingWebhookConfigurations exist
โˆš control plane ValidatingWebhookConfigurations exist
โˆš control plane PodSecurityPolicies exist

linkerd-identity
----------------
โˆš certificate config is valid
โˆš trust roots are using supported crypto algorithm
โˆš trust roots are within their validity period
โˆš trust roots are valid for at least 60 days
โˆš issuer cert is using supported crypto algorithm
โˆš issuer cert is within its validity period
โˆš issuer cert is valid for at least 60 days
โˆš issuer cert is issued by the trust root

linkerd-api
-----------
โˆš control plane pods are ready
โˆš control plane self-check
โˆš [kubernetes] control plane can talk to Kubernetes
โˆš [prometheus] control plane can talk to Prometheus
โˆš tap api service is running

linkerd-version
---------------
โˆš can determine the latest version
โˆš cli is up-to-date

control-plane-version
---------------------
โˆš control plane is up-to-date
โˆš control plane and cli versions match

Status check results are โˆš

Environment

  • Kubernetes Version: 1.16.8
  • Cluster Environment: self-hosted deployed with kubespray
  • Host OS: ubuntu 18.04
  • Linkerd version: Client version: stable-2.7.1
    Server version: stable-2.7.1

Possible solution

Use pod or name instead of pod_name.

Additional context

This is what i got when i query container_cpu_usage_seconds_total in my prometheus :

container_cpu_usage_seconds_total{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",container="POD",cpu="total",id="/kubepods/besteffort/pod0efc5b94-5155-43ec-a76d-7fef302d941a/8df0af43b56d3edc499f39a92fd4b6334e4aa754f0987c3f3049f18e5c8a4b90",image="gcr.io/google_containers/pause-amd64:3.1",instance="node1",job="kubernetes-nodes-cadvisor",kubernetes_io_arch="amd64",kubernetes_io_hostname="node1",kubernetes_io_os="linux",name="k8s_POD_argocd-server-7696cd5f89-g8sj5_argocd_0efc5b94-5155-43ec-a76d-7fef302d941a_0",namespace="argocd",pod="argocd-server-7696cd5f89-g8sj5"}

I think the problem is the same for container_name, haven't got the time to check this one.

areweb good first issue help wanted

All 7 comments

I guess the labels changed, that's a bummer! We'll have to figure that dashboard out again.

I would love to help on this one, i'm gonna do a pr to try to fix this

That'd be awesome @aimbot31 ! I believe we got that dashboard off grafana hub fwiw. They might have a fix up there already.

No update since 2 years :/
https://grafana.com/grafana/dashboards/315

The pr associated has been updated as suggested. Can someone re-open it plz ?
@grampelberg or @ihcsim maybe ?

@aimbot31 GH won't let me re-open the old PR; it's complaining about your branch. I think the easiest thing to do is just to submit a new PR. Thanks!

image

Was this page helpful?
0 / 5 - 0 ratings