The goal of this ticket is to understand how it would be possible or correct way to run prometheus outside of the k8s cluster being monitored. Or what kind of additonal development this would require.
It is a common practice to not run the monitoring software on the stack that is being monitored.
It is important because during outages/problems with the cluster, prometheus might not be working or accessible, leaving the administrator in the blind while solving issues.
Also the case of having multiple clusters to monitor, but wanting to have a centralized prometheus setup.
1) Prometheus configured against kubernetes API, similar manner as in kubectl works(provide host, client-certificate-data and client-key-data.
2) Run some sort of proxy inside of kubernetes cluster that takes care of tokens, discovery and accessing network inside of the cluster. So you configure central prometheus against this proxy, instead of kubernetes api and it provides the metrics from the cluster.
3) Provide instructions/documentation on how to use the current prometheus kubernetes_sd_configs opton to acheve the similar result
https://github.com/prometheus/prometheus/issues/2430
In the end of it there are several users with this issue.
Having the kubernetes interanal network available on the monitoring server is not a desired solution because:
1) Multiple clusters might use the same IP ranges - so routing becomes complicated.
2) The monitoring server can be in another location or "zone" - so it might create latency issues for the entire network (depending on the solution used).
There is some discussion here: https://stackoverflow.com/questions/41845307/prometheus-cannot-export-metrics-from-connected-kubernetes-cluster/47643005
I am already using:
https://github.com/kubernetes/kube-state-metrics , but it is not providing cpu/memory usage of pods and so on.
In this issue there is a comment of how to get cadvisor stats into prometheus - works from external prometheus.
https://github.com/giantswarm/kubernetes-prometheus/issues/89#issuecomment-406613268
If I combine this with kube-state-metrics, then I have what is needed.
Now only thing that would be nice to add is how to get the node hostnames using kubernetes_sd_configs.
1) Could it support client-key-data instead of token?
2) Should the token be generated by a separate kubectl command?
Current configuration is:
Installed https://github.com/kubernetes/kube-state-metrics on cluster and exposed it on NodePort service (our clusters are not reachable from outside),could also use kubectl proxy from the prometheus machine)
- job_name: kubernetes-metrics
static_configs:
- targets: ['kube-master-1.example:8080']
- targets: ['kube-master-1.internal:8080']
- job_name: kubernetes-cadvisor
metrics_path: "/metrics/cadvisor"
static_configs:
- targets: ['kube-master-1.example:10255']
- targets: ['kube-master-1.internal:10255']
- targets: ['kube-worker-1.example:10255']
- targets: ['kube-worker-2.example:10255']
- targets: ['kube-worker-1.internal:10255']
- targets: ['kube-worker-2.internal:10255']
It would be nice to use kubernetes_sd_configs to get the cadvisor nodes.
In this issue there is a comment of how to get cadvisor stats into prometheus - works from external prometheus.
giantswarm/kubernetes-prometheus#89 (comment)If I combine this with kube-state-metrics, then I have what is needed.
Now only thing that would be nice to add is how to get the node hostnames using kubernetes_sd_configs.
- Could it support client-key-data instead of token?
- Should the token be generated by a separate kubectl command?
Current configuration is:
Installed https://github.com/kubernetes/kube-state-metrics on cluster and exposed it on NodePort service (our clusters are not reachable from outside),could also use kubectl proxy from the prometheus machine)- job_name: kubernetes-metrics static_configs: - targets: ['kube-master-1.example:8080'] - targets: ['kube-master-1.internal:8080'] - job_name: kubernetes-cadvisor metrics_path: "/metrics/cadvisor" static_configs: - targets: ['kube-master-1.example:10255'] - targets: ['kube-master-1.internal:10255'] - targets: ['kube-worker-1.example:10255'] - targets: ['kube-worker-2.example:10255'] - targets: ['kube-worker-1.internal:10255'] - targets: ['kube-worker-2.internal:10255']It would be nice to use kubernetes_sd_configs to get the cadvisor nodes.
but the solution can't use kubernetes_sd_configs, and can't service dicover . if you want to add new node to cluster, you must config new target (user static_configs) again.
You can set up k8s SD for nodes and using relabeling access the cAdvisor data via kubernetes API proxy.
- job_name: kubernetes-cadvisor
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- api_server: <URL to you k8s API>
role: node
tls_config:
ca_file: ca.pem
cert_file: cert.pem
key_file: kay.pem
insecure_skip_verify: false
tls_config:
ca_file: ca.pem
cert_file: cert.pem
key_file: kay.pem
insecure_skip_verify: false
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: <URL to you k8s API>
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
action: replace
@FUSAKLA how do you create these ca.perm, cert.pem and kay.pem files? Are the first and second tls_configs the same?
@AttwellBrian Are the first and second tls_configs the same? they are same
You can set up k8s SD for nodes and using relabeling access the cAdvisor data via kubernetes API proxy.
- job_name: kubernetes-cadvisor scrape_interval: 30s scrape_timeout: 10s metrics_path: /metrics scheme: https kubernetes_sd_configs: - api_server: <URL to you k8s API> role: node tls_config: ca_file: ca.pem cert_file: cert.pem key_file: kay.pem insecure_skip_verify: false tls_config: ca_file: ca.pem cert_file: cert.pem key_file: kay.pem insecure_skip_verify: false relabel_configs: - separator: ; regex: __meta_kubernetes_node_label_(.+) replacement: $1 action: labelmap - separator: ; regex: (.*) target_label: __address__ replacement: <URL to you k8s API> action: replace - source_labels: [__meta_kubernetes_node_name] separator: ; regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor action: replaceFor me, until the kubernetes version 1.10 I used that approach, but in the version 1.13, Kubelet doesn't permit authorization correctly. My kubelet log show me
Forbidden (user=kubernetes, verb=get, resource=nodes, subresource=metrics)
I think that it is a forward credential problem. Probably I need to use --requestheader-username-headers=X-Remote-User in the request to kubernetes API (using the proxy approach. I don't have clear this problem, if anybody could be help me, I will appreciate it
Thanks!
You can set up k8s SD for nodes and using relabeling access the cAdvisor data via kubernetes API proxy.
If anyone here is doing this: how is the load on api server as a result? I'd rather not have my control plane go down because some metrics were scraped too aggressively.
I tried the solutions given in the above comment but failed to access it.
I am using the following config :-
- job_name: kubernetes-cadvisor scrape_interval: 30s scrape_timeout: 10s metrics_path: /metrics scheme: https kubernetes_sd_configs: - api_server: https://64.102.188.250:6443 role: node tls_config: ca_file: /etc/prometheus/secrets/cluster-external/ca.pem cert_file: /etc/prometheus/secrets/cluster-external/cert.pem key_file: /etc/prometheus/secrets/cluster-external/key.pem insecure_skip_verify: false tls_config: ca_file: /etc/prometheus/secrets/cluster-external/ca.pem cert_file: /etc/prometheus/secrets/cluster-external/cert.pem key_file: /etc/prometheus/secrets/cluster-external/key.pem insecure_skip_verify: false relabel_configs: - separator: ; regex: __meta_kubernetes_node_label_(.+) replacement: $1 action: labelmap - separator: ; regex: (.*) target_label: __address__ replacement: https://64.102.188.250:6443 action: replace - source_labels: [__meta_kubernetes_node_name] separator: ; regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor action: replace
Help is appreciated.
I tried the solutions given in the above comment but failed to access it.
I am using the following config :-- job_name: kubernetes-cadvisorscrape_interval: 30s scrape_timeout: 10s metrics_path: /metrics scheme: https kubernetes_sd_configs: - api_server: https://64.102.188.250:6443 role: node tls_config: ca_file: /etc/prometheus/secrets/cluster-external/ca.pem cert_file: /etc/prometheus/secrets/cluster-external/cert.pem key_file: /etc/prometheus/secrets/cluster-external/key.pem insecure_skip_verify: false tls_config: ca_file: /etc/prometheus/secrets/cluster-external/ca.pem cert_file: /etc/prometheus/secrets/cluster-external/cert.pem key_file: /etc/prometheus/secrets/cluster-external/key.pem insecure_skip_verify: false relabel_configs: - separator: ; regex: __meta_kubernetes_node_label_(.+) replacement: $1 action: labelmap - separator: ; regex: (.*) target_label: __address__ replacement: https://64.102.188.250:6443 action: replace - source_labels: [__meta_kubernetes_node_name] separator: ; regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor action: replaceHelp is appreciated.
I used to the keys given in .kube file to generate ca.pem, cert.pem and key.pem
and since i am using a helm chart, used the secret section which by defaults create a keys in /etc/prometheus/secrets directory.
Can anyone please clarify me on below: am assuming kubernetes monitoring by external Prometheus
Kubernetes won't push the data to Prometheus, because Prometheus uses a Pull based system.
Hi @pitthecat Thanks for the reply, I have a running setup of kube-state-metrics, Prometheus and Grafana. But kube-state-metrics not providing the CPU and memory utilization by the pod, can anyone please let me know how can i get the pod CPU and Memory utilization
Hey,
You can use metrics server to get pod and CPU utilisation.
On Mon, 26 Aug 2019 at 3:20 PM sri-05 notifications@github.com wrote:
Hi @pitthecat https://github.com/pitthecat Thanks for the reply, I have
a running setup of kube-state-metrics, Prometheus and Grafana. But
kube-state-metrics not providing the CPU and memory utilization by the pod,
can anyone please let me know how can i get the pod CPU and Memory
utilization—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/prometheus/prometheus/issues/4633?email_source=notifications&email_token=ADZNB6UTDJ5BOVPWCBJK2XLQGOROVA5CNFSM4FV7OIA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5D4NOQ#issuecomment-524797626,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADZNB6XTHZHHC3CCCWXPT2TQGOROVANCNFSM4FV7OIAQ
.>
Thank You
Rahul Arora
CR IT-A
Hi @RahulArora31 is the metrics server gives the end URL the as same like kube-state-metrics?
kube-state-metrics gives me end URL http://kubernetes-serverIP:8080/metrics
which i configured in the Prometheus server to pull the stats from the Kubernetes cluster.
Hey,
The metrics server does not have an /metrics endpoint for Prometheus to
scrape.
You can get those metrics by getting data from Cadvisor.
On Wed, Sep 11, 2019 at 3:24 PM sri-05 notifications@github.com wrote:
Hi @RahulArora31 https://github.com/RahulArora31 is the metrics server
gives the end URL the as same like kube-state-metrics?
kube-state-metrics gives me end URL
http://kubernetes-serverIP:8080/metrics
which i configured in the Prometheus server to pull the stats from the
Kubernetes cluster.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/prometheus/prometheus/issues/4633?email_source=notifications&email_token=ADZNB6QM3TNJKRWAL4Y42Q3QJC54NA5CNFSM4FV7OIA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6N6BWA#issuecomment-530309336,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADZNB6QLN5K7WYNBHCMHRSTQJC54NANCNFSM4FV7OIAQ
.
--
Thank You
Rahul Arora
CR IT-A
You can get metrics from the K8s API.
https://api.your_dns_zone/metrics
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/
@pitthecat I am not 100% get the solution here. Does that mean we can query {api_endpoint}/metrics with full metrics like pod/node and /metrics exposed by service?
https://prometheus.io/docs/prometheus/latest/federation/
someone is using this way to monitor multiple clusters.
Hi we went through this issue during our bugscrub and decided to close this as not a bug. This is already supported by Prometheus and we would welcome a guide for the same in our docs.
One simply can't use an external Prometheus to scrape pod metrics. Period.
The only way is to run a federation of prometheuses where external one would scrape metrics about k8s objects from a prometheus running inside the cluster and discovering metrics of k8s objects via k8s API.
with this configuration, we can do service discovery outside of k8s
One can also use ca.crt instead of bearer token for authentication.
yes, i tried that
my env is okd 3.11 and try use the /etc/origin/master/ca.crt, did not work
and i notice the files named ca.pem or some other pem mentioned above, but do not know how to find or generate those!
The way I did was to create a secret out of ca.crt and then mount it in the Prometheus CRD. It worked but then we moved to a federation model and now towards integrating Cortex.
I believe the ca.crt can be found inside kube-config as well.
if so, do you still have some specific steps in hand, which may share with me :-)
Assuming you are inside the cluster
oh, it's really cool !
for step6, how to generate secret, using openssl command?
Refer to this :- https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/
on how to generate secret using a file
@majorinche @RahulArora31
What's the point?
You will get a list of targets (pods/services/whatever), but you won't be able to access them to scrape their metrics.
Because pods inside cluster have their own private network and outside prometheus won't be able to access it.
Not that you cant access them. You can access them provided you give prometheus the rights certs.
They are scraped using a API proxy. We did earlier but was too complex a model to manage.
We used this config to scrape node exporters In external cluster
kubernetes_sd_configs:
- api_server: https://x.x.x.x:6443
role: node
tls_config:
ca_file: /etc/prometheus/secrets/cluster-external/ca.pem
cert_file: /etc/prometheus/secrets/cluster-external/cert.pem
key_file: /etc/prometheus/secrets/cluster-external/key.pem
insecure_skip_verify: false
tls_config:
ca_file: /etc/prometheus/secrets/cluster-external/ca.pem
cert_file: /etc/prometheus/secrets/cluster-external/cert.pem
key_file: /etc/prometheus/secrets/cluster-external/key.pem
insecure_skip_verify: false
@RahulArora31 how did you configure 'API proxy'?
FYI this how I got it working:
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /etc/prometheus/kubernetes-ca.crt
bearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'
kubernetes_sd_configs:
- api_server: 'https://<KUBERNETES URL>'
role: node
tls_config:
ca_file: /etc/prometheus/kubernetes-ca.crt
bearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: <KUBERNETES URL>:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
You need to specify twice tls_config and bearer_token, once to access the kubernetes API and once to access the endpoints to scrape.
@Drugoy the metheod that @ilpianista and I posted scrapes targets via API proxy itself because if you closely look at the scrape endpoint instead of PrivateIP:port/metrics it is API-URL-proxy/metrics if I am not mistaken.
Corrent me if I'm wrong @ilpianista
@RahulArora31, @ilpianista this won't work for, say, Pods.
@RahulArora31, @ilpianista this won't work for, say, Pods.
If you want to get k8s resource's metrics such as Pods, Deployments etc., you shall use https://github.com/kubernetes/kube-state-metrics
Or.. what do you mean?
@pashtet04 Each pod runs an application that is basically a webserver with an endpoint /metrics/prometheus that I'd like to scrape metrics from.
Turns out there is no way to scrape those metrics using Prometheus from OUTSIDE the cluster (over k8s API).
k8s API lets me get a list of pods with my applications, but doesn't let me scrape the metrics provided by those applications.
@Drugoy could you post a screen shot over here.
Cause for me it used to work. We never had a problem scraping NE, KSM and other custom exporters .
@RahulArora31 this can't be true, because Pods have internal IPs, there's no way to access them from outside.
What you did is accessed Services that forwarded to Pods, that's not the same.
There can be 20 replicas, so say 20 Pods behind a Service.
1 of the 20 gets broken and you'd like to get its metrics to investigate.
You query Service and it forwards the request to one random Pod out of 20 you have.
@Drugoy
Isn't it: ?
It's working for services though
Need a prometheus.io/scrape: "true" annotation
Need a prometheus.io/port: "9090" annotation
- job_name: 'kubernetes-pods'
bearer_token: '{{ k8s_token }}'
scheme: https
tls_config:
insecure_skip_verify: true
kubernetes_sd_configs:
- role: pod
api_server: https://{{ groups["k8s"] | map("extract", hostvars, "vrrp_vip") | first }}:6443
bearer_token: '{{ k8s_token }}'
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- target_label: __address__
replacement: '{{ groups["k8s"] | map("extract", hostvars, "vrrp_vip") | first }}:6443'
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
@RahulArora31 this can't be true, because Pods have internal IPs, there's no way to access them from outside.
What you did is accessed Services that forwarded to Pods, that's not the same.
There can be 20 replicas, so say 20 Pods behind a Service.
1 of the 20 gets broken and you'd like to get its metrics to investigate.
You query Service and it forwards the request to one random Pod out of 20 you have.
Metrics from internal IPs collected by API, as i guess
@RahulArora31 this can't be true, because Pods have internal IPs, there's no way to access them from outside.
What you did is accessed Services that forwarded to Pods, that's not the same.
There can be 20 replicas, so say 20 Pods behind a Service.
1 of the 20 gets broken and you'd like to get its metrics to investigate.
You query Service and it forwards the request to one random Pod out of 20 you have.
Could you post some example metrics that you require? Because I used the method above and added scrape from kubelets on /metrics and /metrics/cadvisor. This is sufficient for most alerts/dashboards. I adopted them from kubernetes-mixin project.
@Drugoy
Isn't it: ?It's working for services though
@pashtet04 that's the thing: Services are not Pods.
Services can be exposed to external networks.
But they aren't Pods and don't represent them: a Service forwards your requests to 1 chosen Pod behind it, since there can be multiple replicas in that Service - there's no way to gather stats from all of them reliably: each of your requests to the Service will get you to SOME Pod.
Scraping 1 Pod's metrics is not the same as scraping ALL Pods metrics.
Metrics from internal IPs collected by API, as i guess
@RahulArora31, API has no way of knowing what's inside the Pod. It's just a coincidence that there's a webserver, there could be running just some shell script or binary doing something that has nothing to do with HTTP at all.
Could you post some example metrics that you require? Because I used the method above and added scrape from kubelets on
/metricsand/metrics/cadvisor. This is sufficient for most alerts/dashboards. I adopted them from kubernetes-mixin project.
@kazysgurskas Our apps are written in Java (on Spring Boot framework) and we scrape metrics, provided by Spring Boot Actuator + it allows adding custom metrics that may represent business-related stuff.
Could you post some example metrics that you require? Because I used the method above and added scrape from kubelets on
/metricsand/metrics/cadvisor. This is sufficient for most alerts/dashboards. I adopted them from kubernetes-mixin project.@kazysgurskas Our apps are written in Java (on Spring Boot framework) and we scrape metrics, provided by Spring Boot Actuator + it allows adding custom metrics that may represent business-related stuff.
I guess..you should use in-app metrics OR sidekiq containers with custom business metrics and scrape it like this https://github.com/prometheus/prometheus/issues/4633#issuecomment-624735156
@Drugoy my previous config can be adapted to scrape pods as well:
- job_name: 'kubernetes-pods'
scheme: http
tls_config:
ca_file: /etc/prometheus/kubernetes-ca.crt
bearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'
kubernetes_sd_configs:
- api_server: '<KUBERNETES URL>'
role: pod
tls_config:
ca_file: /etc/prometheus/kubernetes-ca.crt
bearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: kubernetes_pod_container_name
@ilpianista have you actually tried that?
Care to provide a screenshot of scraped pods listed as targets in Prometheus with UP state?
Because for me it scrapes them as internal IPs and thus they are unreachable.
At very least you need some relabeling to relabel internal IPs into k8s_api_url + some path.
Yes, that one requires connectivity between the prometheus and the k8s nodes, but still it's outside the cluster. If this isn't what you need, then I misunderstood it.
@ilpianista well, looks like we mean different things by 'outside'.
I meant that prometheus is in another network and can only reach KubeAPI.
Could you post some example metrics that you require? Because I used the method above and added scrape from kubelets on
/metricsand/metrics/cadvisor. This is sufficient for most alerts/dashboards. I adopted them from kubernetes-mixin project.@kazysgurskas Our apps are written in Java (on Spring Boot framework) and we scrape metrics, provided by Spring Boot Actuator + it allows adding custom metrics that may represent business-related stuff.
I guess..you should use in-app metrics OR sidekiq containers with custom business metrics and scrape it like this #4633 (comment)
Hi,
would be able to let me know, how you got the CA.crt key. I am trying the same config but getting the below error message.
err="unable to use specified CA cert /etc/prometheus/ca.crt" type=*kubernetes.SDConfig
Thanks
Eswar
Hi,
would be able to let me know, how you got the CA.crt key. I am trying the same config but getting the below error message.
err="unable to use specified CA cert /etc/prometheus/ca.crt" type=*kubernetes.SDConfig
Thanks
Eswar
You can skip it with insecure_skip_verify: true inside tls_config block.
Most helpful comment
FYI this how I got it working:
You need to specify twice
tls_configandbearer_token, once to access the kubernetes API and once to access the endpoints to scrape.