1. Describe IN DETAIL the feature/behavior/change you would like to see.
Before updating I used the following setup to get metrics from etcd:
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: prometheus-service-proxier
rules:
- apiGroups: [""]
resources: ["services/proxy"]
resourceNames: ["http:etcd-server-prometheus-discovery:etcd"]
verbs: ["get"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: prometheus-service-proxier
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring
roleRef:
kind: ClusterRole
name: prometheus-service-proxier
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: Service
metadata:
name: etcd-server-prometheus-discovery
namespace: kube-system
labels:
k8s-app: etcd-server
spec:
selector:
k8s-app: etcd-server
type: ClusterIP
clusterIP: None
ports:
- name: https
port: 443
targetPort: 443
protocol: TCP
- name: etcd
port: 4001
targetPort: 4001
protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: etcd-server
name: etcd-server
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: https
scheme: https
path: /api/v1/namespaces/kube-system/services/http:etcd-server-prometheus-discovery:etcd/proxy/metrics
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
serverName: kubernetes
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: etcd-server
It's a little hacky, but much more secure than opening firewall ports for etcd and simpler than deploying another prometheus on masters just to monitor etcd (like suggested in https://github.com/kubernetes/kops/issues/4975 and links in that thread).
After upgrading to 1.12 however, I tried to adapt the above solution to work with etcd-manager:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: prometheus-service-proxier
rules:
- apiGroups: [""]
resources: ["services/proxy"]
resourceNames:
- "https:etcd-manager-main-prometheus-discovery:etcd"
- "https:etcd-manager-events-prometheus-discovery:etcd"
verbs: ["get"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: prometheus-service-proxier
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring
roleRef:
kind: ClusterRole
name: prometheus-service-proxier
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: Service
metadata:
name: etcd-manager-events-prometheus-discovery
namespace: kube-system
labels:
k8s-app: etcd-manager-events
spec:
selector:
k8s-app: etcd-manager-events
type: ClusterIP
clusterIP: None
ports:
- name: https
port: 443
targetPort: 443
protocol: TCP
- name: etcd
port: 4001
targetPort: 4001
protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: etcd-manager-events
name: etcd-manager-events
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: https
scheme: https
path: /api/v1/namespaces/kube-system/services/https:etcd-manager-events-prometheus-discovery:etcd/proxy/metrics
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
serverName: kubernetes
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: etcd-manager-events
---
apiVersion: v1
kind: Service
metadata:
name: etcd-manager-main-prometheus-discovery
namespace: kube-system
labels:
k8s-app: etcd-manager-main
spec:
selector:
k8s-app: etcd-manager-main
type: ClusterIP
clusterIP: None
ports:
- name: https
port: 443
targetPort: 443
protocol: TCP
- name: etcd
port: 4001
targetPort: 4001
protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: etcd-manager-main
name: etcd-manager-main
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: https
scheme: https
path: /api/v1/namespaces/kube-system/services/https:etcd-manager-main-prometheus-discovery:etcd/proxy/metrics
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
serverName: kubernetes
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: etcd-manager-main
Unfortunately this doesn't work:
$ kubectl get --raw /api/v1/namespaces/kube-system/services/https:etcd-manager-main-prometheus-discovery:etcd/proxy/metrics
Error from server (ServiceUnavailable): the server is currently unable to handle the request
But, the cluster has valid endpoints for that service:
$ kubectl describe service -n kube-system etcd-manager-main-prometheus-discovery
Name: etcd-manager-main-prometheus-discovery
Namespace: kube-system
Labels: k8s-app=etcd-manager-main
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"k8s-app":"etcd-manager-main"},"name":"etcd-manager-main-promet...
Selector: k8s-app=etcd-manager-main
Type: ClusterIP
IP: None
Port: https 443/TCP
TargetPort: 443/TCP
Endpoints: 10.120.128.108:443,10.120.129.224:443,10.120.130.81:443
Port: etcd 4001/TCP
TargetPort: 4001/TCP
Endpoints: 10.120.128.108:4001,10.120.129.224:4001,10.120.130.81:4001
Session Affinity: None
Events: <none>
And ssh to the host and executing curl:
root@ip-10-120-129-224 ~# curl https://localhost:4001/metrics -k --cert /etc/kubernetes/pki/etcd-manager-main/etcd-clients-ca.crt --key /etc/kubernetes/pki/etcd-manager-main/etcd-clients-ca.key
Provides valid metrics for etcd.
Do you have any suggestions on what I'm doing wrong? Is it about apiserver proxy not presenting client cert to the metrics service?
Do you know a better way to gather etcd-manager metrics?
Best regards
艁ukasz Tomaszkiewicz
It looks like your ServiceMonitor is configured to use etcd's https port (443) rather than the etcd port (4001) that you used in your curl command. Additionally, I'm not super familiar with ServiceMonitor but your curl command is specifying the client certificate and key and I don't see those defined in the ServiceMonitor, so you may need to specify them somehow.
It's configured to use https by purpose as I use apiserver proxy feature to bypass firewall. The cert and key for service monitor is specified (in tls section and token file) but I'm afraid apiserver proxy doesn't proxy that information to the target service and uses that only for proxy authentication and authorization.
So probably we need to figure out another way to get to the metrics. I've read some docs and probably exposing metrics on separate port with http only will solve the case however I haven't found how to do that in kops :(
Hi,
With the upgrade to etcd-manager etcd is only available from the masters, which means unless prometheus is running on the masters you can not scrape it for metrics. There is a feature in etcd 3.3 that allows metrics to be exposed on a different port, and I have an issue open on etcdmanager, to expose that. https://github.com/kopeio/etcd-manager/issues/139 That will not be avaliable until at least 1.14 though, as that is when kubernetes upgrades the recomended version of etcd.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
I have the same problem.
How get etcd metrics in prometheus ?
bump also looking to get the metrics from kops etcd-manager deployment into prometheus-operator
It looks like https://github.com/kopeio/etcd-manager/issues/139, but I don't see any way of setting this var in kops
Another issue helped me to solve it: https://github.com/coreos/prometheus-operator/issues/2207#issuecomment-505122891
Thanks to @tkozma and @irizzant.
But I couldn't get the certs needed from the mentioned PODs. I fetched them from the kops s3 store:
aws s3 cp s3://${KOPS_STATE_STORE}/${KOPS_CLUSTER_NAME}/pki/issued/etcd-clients-ca/$ca_file_name /tmp/etcd_ca.pem
aws s3 cp ${KOPS_STATE_STORE}/${KOPS_CLUSTER_NAME}/pki/issued/etcd-clients-ca/$client_cert_file_name /tmp/client.crt
aws s3 cp ${KOPS_STATE_STORE}/${KOPS_CLUSTER_NAME}/pki/private/etcd-clients-ca/$client_key_file_name /tmp/client.key
The filenames contain some generated number. You can list the directory to get the names:
aws s3 ls s3://${KOPS_STATE_STORE}/${KOPS_CLUSTER_NAME}/pki/issued/etcd-clients-ca/
But that's just a workaround.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
This has been fixed now. See https://kops.sigs.k8s.io/cluster_spec/#etcd-metrics
/close
@olemarkus: Closing this issue.
In response to this:
This has been fixed now. See https://kops.sigs.k8s.io/cluster_spec/#etcd-metrics
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
I have the same problem.
How get etcd metrics in prometheus ?