[[inputs.kubernetes]]
url = "https://kubernetes.default.svc"
bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"
insecure_skip_verify = true
Ubuntu 18.04
k3s v1.17.2+k3s1
Telegraf image: telegraf:1.12.2
Configure the Kubernetes input plugin in a Telegraf container.
The plugin should colect the Kubernetes metrics.
The Telegraf plugin log shows that Kubernetes API server returned a 403 Forbiden error code. After adding to the RBAC Service Account of the pod the following rules:
rules:
- nonResourceURLs: ["/stats", "/stats/*"]
verbs: ["get", "list"]
the error is 404. No metrics are being collected.
The input plugin kube_intentory seems to be working just fine but the plugin kubernetes is not capable of obtaining any metric, as described. Looking at the code, the kubernetes input pluging calls the /stats/summary Kubernetes API server endpoint.
/stats/summary endpoint was planned to be depracated (https://github.com/kubernetes/kubernetes/issues/68522) but it seems that it is already removed.
We should put together some documentation about what needs done to switch to the replacement and anyway we can smooth the transition. I could definitely use some help from the community on this.
I am assuming similar metrics can be captured with the prometheus input plugin. It would be good to gather a listing of the new metrics because switching over will likely change all metrics and break dashboards/alerts.
It also looks like it should also be possible to use the --enable-cadvisor-endpoints flag to reenable the endpoint, it would be good to describe how this can be set as well.
Hello @danielnelson , thank you for your reply. The cadvisor endpoint support will be removed in Kubernetes 1.19 (https://github.com/kubernetes/kubernetes/issues/76660) so I would recommend using the--enable-cadvisor-endpoints flag only as a temporary fix. I think the way to go is to query the metrics server API (https://github.com/kubernetes-sigs/metrics-server) throught the standard Kubernetes API to obtain pod metrics.
@danielnelson for managed kubernetes, not sure you can ask to add this flag so even as a temporary fix, it won't work for many (most ?) people
@masual : so it would mean we need to deploy the metrics server first to use then this plugin ? or should we use only kube_inventory plugin ?
I could make it work with the help of @rawkode:
As endpoint, you need:
[[inputs.kubernetes]]
url = "https://kubernetes.default.svc.cluster.local/api/v1/nodes/$NODE_NAME/proxy/"
bearer_token = "/run/secrets/kubernetes.io/serviceaccount/token"
insecure_skip_verify = true
be sure to have:
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
and as ClusterRole (I use ClusterRoleAggregations):
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: influx:stats:viewer
labels:
rbac.authorization.k8s.io/aggregate-view-telegraf-stats: "true"
rules:
- apiGroups: [""]
resources: ["nodes/proxy"]
verbs: ["get", "watch", "list"]
Tested on k8s 1.17.0 on OVH K8S Managed Service
... and available soon as an helm chart for the deployment of telegraf as a daemonset => https://github.com/influxdata/helm-charts/pull/16
I have the same problem, I follow these recommentations, but same error:
Error:
2020-04-03T08:38:00Z E! [inputs.kubernetes] Error in plugin: https://kubernetes/stats/summary returned HTTP status 404 Not Found
Is there any solution or another documentation to fix the problem?
I Checked I have configured rbac permissions, this output:
Name: telegraf-cluster-reader
Labels: rbac.authorization.k8s.io/aggregate-view-telegraf=true
rbac.authorization.k8s.io/aggregate-view-telegraf-stats=true
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"labels":{"rbac.authorization.k8s.io/aggreg...
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
deployments [] [] [get watch list]
nodes/proxy [] [] [get watch list]
nodes [] [] [get watch list]
persistentvolumes [] [] [get watch list]
pods [] [] [get watch list]
statefulsets [] [] [get watch list]
[/stats/*] [] [get]
[/stats] [] [get]
[/stats/*] [] [list]
[/stats] [] [list]
[/stats/*] [] [watch]
[/stats] [] [watch]`
I have this config applied in yamls:
apiVersion: v1
kind: ServiceAccount
metadata:
name: telegraf-reader
namespace: default
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: telegraf-cluster-reader
labels:
rbac.authorization.k8s.io/aggregate-view-telegraf: "true"
rbac.authorization.k8s.io/aggregate-view-telegraf-stats: "true"
rules:
- nonResourceURLs: ["/stats", "/stats/*"]
verbs: ["get", "watch", "list"]
- apiGroups: [""]
resources: ["persistentvolumes", "nodes", "pods", "deployments", "statefulsets", "nodes/proxy"]
verbs: ["get", "watch", "list"]
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: telegraf-reader-role
aggregationRule:
clusterRoleSelectors:
- matchLabels:
rbac.authorization.k8s.io/aggregate-view-telegraf-stats: "true"
- matchLabels:
rbac.authorization.k8s.io/aggregate-view-telegraf: "true"
- matchLabels:
rbac.authorization.k8s.io/aggregate-to-view: "true"
rules: [] # Rules are automatically filled in by the controller manager.
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: telegraf-reader-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: telegraf-reader-role
subjects:
- kind: ServiceAccount
name: telegraf-reader
namespace: default
Mi Pod use this, + use token via secrets applied in configMap, other plugins like kube_inventory works fine with this:
spec:
serviceAccountName: telegraf-reader
containers:
- env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
@jmorcar have a look at what we did for telegraf-ds chart as we get it working => https://github.com/influxdata/helm-charts/tree/master/charts/telegraf-ds
[[inputs.kubernetes]] url = "https://kubernetes.default.svc"
I think the plugin is expecting a URL to the Node's API, not the API-server's API. So the telegraph container runs on every node, in a daemonset, configured with something like url = "https://$NODEIP:10250", with the environment variable coming from the downward API.
I have checked right now with NODE IP variable, here HOSTIP, captured via fieldPath: status.hostIP, but is answer is Forbidden:
# curl https://$HOSTIP:10250/stats/summary --header "Authorization: Bearer $TOKEN" --insecureForbidden (user=system:serviceaccount:default:telegraf-reader, verb=get, resource=nodes, subresource=stats)
While if I use the previous command I posted, the query is permitted with data:
# curl https://kubernetes/stats/summary --header "Authorization: Bearer $TOKEN" --insecure
{
"paths": [
"/apis",
"/apis/",
"/apis/apiextensions.k8s.io",
"/apis/apiextensions.k8s.io/v1",
"/apis/apiextensions.k8s.io/v1beta1",
"/healthz",
"/healthz/etcd",
"/healthz/log",
"/healthz/ping",
"/healthz/poststarthook/crd-informer-synced",
"/healthz/poststarthook/generic-apiserver-start-informers",
"/healthz/poststarthook/start-apiextensions-controllers",
"/healthz/poststarthook/start-apiextensions-informers",
"/livez",
"/livez/etcd",
"/livez/log",
"/livez/ping",
"/livez/poststarthook/crd-informer-synced",
"/livez/poststarthook/generic-apiserver-start-informers",
"/livez/poststarthook/start-apiextensions-controllers",
"/livez/poststarthook/start-apiextensions-informers",
"/metrics",
"/openapi/v2",
"/readyz",
"/readyz/etcd",
"/readyz/log",
"/readyz/ping",
"/readyz/poststarthook/crd-informer-synced",
"/readyz/poststarthook/generic-apiserver-start-informers",
"/readyz/poststarthook/start-apiextensions-controllers",
"/readyz/poststarthook/start-apiextensions-informers",
"/readyz/shutdown",
"/version"
]
(Both queries are exec inside the container Telegraf and use the service account created in yaml definition)
For the creation the serviceaccount , telegraf-reader , I followed the guide posted by kube_inventory plugin in GitHub. I checked telegraf-reader has privilegies to query resources like /api/v1/namespaces/default/pods...for that I created ClusterRole and rolebindings.
Before that, it was when all answers of any resource query was Forbidden, but not right now, so URL should be the problem.
I checked "Kubernetes.default.svc" is same "kubernetes" short name, both are the ClusterIP for default to the Kubernetes cluster.
I will have to check source code to kubernetes input plugin for telegraf to find the exac query return a "404 not found"
@jmorcar have a look at what we did for telegraf-ds chart as we get it working => https://github.com/influxdata/helm-charts/tree/master/charts/telegraf-ds
I don't found the ClusterRole or role bindings definitions on template charts, so I think the deploy will have the Forbidden error. I posted a suggest to include this documentation in charts because, yaml definition calling the service account is not sufficient, if you don't created RBAC permissions before.
@jmorcar,
here is the role and rolebinding
The telegraf-ds chart works fine for me - did you try it on your cluster ?
Thanks! I have applied now... and same problem:
2020-04-03T17:21:20Z E! [inputs.kubernetes] Error in plugin: https://kubernetes/stats/summary returned HTTP status 404 Not Found
2020-04-03T17:21:30Z E! [inputs.kubernetes] Error in plugin: https://kubernetes/stats/summary returned HTTP status 404 Not Found
2020-04-03T17:21:40Z E! [inputs.kubernetes] Error in plugin: https://kubernetes/stats/summary returned HTTP status 404 Not Found
2020-04-03T17:21:50Z E! [inputs.kubernetes] Error in plugin: https://kubernetes/stats/summary returned HTTP status 404 Not Found
2020-04-03T17:22:00Z E! [inputs.kubernetes] Error in plugin: https://kubernetes/stats/summary returned HTTP status 404 Not Found
@jmorcar if you are going through the Kubernetes API, you need the proxy endpoint.
It's usually best to go through the NODEIP from the downwardAPI.
I see mentions of that above, but I couldn't work out what problem you had with that approach.
By any chance are you on GKE? They do block access to the Kubelet this way (last time I checked)
Thanks at all, I found the problem, I was using a Deployment defintion, instead of Daemonset. Related problem when you change to daemonset is like commented @alanjcastonguay or @rawkode , you have to use NODEIP:10250, like this:
[[inputs.kubernetes]]
url = "https://$HOSTIP:10250"
bearer_token = "/run/secrets/kubernetes.io/serviceaccount/token"
insecure_skip_verify = true
So I have changed my yaml for the official helm chart like recommended @nsteinmetz because I had to change/add too params in my yaml. The official chart is OK, deploy in the namespace that you need and collect all metrics ok.
Conclusion:
IF you need to monitor a kubernetes cluster the better option is deploy offical helm chart telegraf-ds. This monitorize by Node inside the cluster (deploy a telegraf agent in each one via daemonset) with only one deploy definition.
https://github.com/influxdata/helm-charts/tree/master/charts/telegraf-ds
Try creating a Service Account and ClusterRoleBinding for telegraf using the yaml configuration below. Mind the namespace.
apiVersion: v1
kind: ServiceAccount
metadata:
name: telegraf
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: metric-scanner-kubelet-api-admin
subjects:
- kind: ServiceAccount
name: telegraf
namespace: influxdb
roleRef:
kind: ClusterRole
name: system:kubelet-api-admin
apiGroup: rbac.authorization.k8s.io
Faced similar issue, after applying the yaml telegraf was able to authenticate in the cluster to scrape the metrics.
Most helpful comment
Thanks at all, I found the problem, I was using a Deployment defintion, instead of Daemonset. Related problem when you change to daemonset is like commented @alanjcastonguay or @rawkode , you have to use NODEIP:10250, like this:
So I have changed my yaml for the official helm chart like recommended @nsteinmetz because I had to change/add too params in my yaml. The official chart is OK, deploy in the namespace that you need and collect all metrics ok.
Conclusion:
IF you need to monitor a kubernetes cluster the better option is deploy offical helm chart telegraf-ds. This monitorize by Node inside the cluster (deploy a telegraf agent in each one via daemonset) with only one deploy definition.
https://github.com/influxdata/helm-charts/tree/master/charts/telegraf-ds