kubeadm version (use kubeadm version): 1.18.6
Environment:
kubectl version): 1.18.6uname -a): 4.19.0-9Kubeadm disables the "insecure" ports of kube-scheduler and kube-controller-manager by setting the --port=0 flag. Therefore metrics have to be scaped over TLS. This is fine but Kubeadm doesn't seem to manage the certificates of kube-scheduler and kube-controller manager. These components - if no certificate is provided - will create a self signed certificate to serve requests. One could just disable certificate verification but that would somehow defer the use of TLS.
Kubeadm should create and manage certificates for the "secure" port of kube-scheduler and kube-controller-manager. These certificates should be signed by the CA, that is created by Kubeadm.
@FrediWeber thank you for logging the ticket.
you have a valid observation that we do not sign the serving certificate and key for the components in question.
we had a long discussion with a user on why we are not singing these for kubeadm and you can read more about this here:
https://github.com/kubernetes/kubernetes/issues/80063
IIUC, one undesired side effect, is that if we start doing that our HTTPS probes will fail, as Pod Probe API does not support signed certificates for HTTPS (only self-signed):
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#probe-v1-core
https://github.com/kubernetes/kubernetes/issues/18226#issuecomment-430510491
we could workaround that using a "command" probe that is cert/key aware, but this is difficult as the component images are "distroless" (no shell, no tools). so maybe one day we can support that if core k8s supports it properly.
@neolit123 Thank you very much for your fast response and the clarifications.
If i understand it correctly, the issue https://github.com/kubernetes/kubernetes/issues/80063 is more about mapping the hole PKI dir into the container, external PKIs and shorter renew intervals.
I read a little bit about health checks with HTTPS in https://github.com/kubernetes/kubernetes/issues/18226 and https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/.
If I understand everything correctly, the Kubernetes docs state, that the certificate is not checked at all for the health check so it shouldn't matter if the certificate is self-signed or if it's properly signed with the already present CA.
If scheme field is set to HTTPS, the kubelet sends an HTTPS request skipping the certificate verification.
I don't see any security implications in just mapping the signed certificates and corresponding keys for kube-scheduler and kube-controller-manager.
The only negative downside would be, that the certificate rotation would have to be managed. On the other hand this is already the case for other certificates AFAIK.
If I understand everything correctly, the Kubernetes docs state, that the certificate is not checked at all for the health check so it shouldn't matter if the certificate is self-signed or if it's properly signed with the already present CA.
i'm not so sure about this and i haven't tried it. my understanding is that if the server no longer has self-signed certificates this means that it would reject any client connections on HTTPS that do not pass authentication.
e.g. curl -k... would no longer work?
If i understand it correctly, the issue kubernetes/kubernetes#80063 is more about mapping the hole PKI dir into the container, external PKIs and shorter renew intervals.
that is true. however, the discussion there was also about the fact that today users can customize their kube-scheduler and KCM deployments via kubeadm to enable the usage of the custom signed serving certificates for these components if they want their metrics and health checks to be accessible over "true" TLS (i.e. pass the flags and mount the certificates using extraArgs, extraVolumes under ClusterConfiguration).
with the requirement of kubeadm managing these extra certificates for renewal, i'm leaning towards -1 initially, but i would like to get feedback from others too.
cc @randomvariable @fabriziopandini @detiber
i'm not so sure about this and i haven't tried it. my understanding is that if the server no longer has self-signed certificates this means that it would reject any client connections on HTTPS that do not pass authentication.
e.g.curl -k...would no longer work?
So do you mean the kube-controller-manager and kube-scheduler would no longer accept the health check requests because they do not pass a client certificate or any other authentication?
Or do you mean the "client side" of the Kubernetes health check would not connect because the certificate is not self signed? I'm not sure about the first case but if the docs are correct, the second case should not happen.
Please also keep in mind, that the current certificate is also not really self-signed. Kube-controller-manager and kube-scheduler seem to create an internal, temporary CA on startup and sign the certificate with their own CA.

You are absolutely right about the existing possibility to mount certificates and set the options with the extraArgs.
The thing that has changed is, that Kubeadm by default deactivates the insecure port with --port=0. I'm aware that this is deprecated in the upstream components (kube-scheduler and kube-controller-manager) anyway but I think that Kubeadm should configure these components properly especially when Kubeadm already "manages" a CA from which these certificates could relative easily be signed.
Another approach would be to let these two components handle their front-facing certificates on their own like kubelet does.
Long term, I would love to see the ability to leverage the certificates API to do automated request and renewal of serving certificates for kube-scheduler and kube-controller-manager similar to the work that is being done to enable this support for the kubelet (https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/20190607-certificates-api.md)
So do you mean the kube-controller-manager and kube-scheduler would no longer accept the health check requests because they do not pass a client certificate or any other authentication?
that was my understanding. then again, we do serve the kube-apiserver on HTTPS and it's probe does not have/pass certficates, so perhaps it would just work.
I'm aware that this is deprecated in the upstream components (kube-scheduler and kube-controller-manager) anyway but I think that Kubeadm should configure these components properly especially when Kubeadm already "manages" a CA from which these certificates could relative easily be signed.
but again the problem is that this is yet another set of certficates that kubeadm has to manage during renewal, and must consider during our "copy certficates" functionality for HA support. it is not a strict requirement and kubeadm already supports it for users that want to do that using extraArgs. we have a similar case for the kubelet serving certificate which is "self-signed".
i'd say, at minimum it would be worthy of a enhancement proposal (KEP):
https://github.com/kubernetes/enhancements/tree/master/keps
Long term, I would love to see the ability to leverage the certificates API to do automated request and renewal of serving certificates for kube-scheduler and kube-controller-manager similar to the work that is being done to enable this support for the kubelet (https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/20190607-certificates-api.md)
i tried to follow the latest iterations of the CSR API closely, but i have not seen discussions around CSRs for the serving certificates of these components via the KCM CA. my guess would be that there might be some sort of a blocker for doing that, given a lot of planning went in the v1 of the API.
i'd say, at minimum it would be worthy of a enhancement proposal (KEP):
https://github.com/kubernetes/enhancements/tree/master/keps
Would it be okay for you if I'd start the process?
for a feature that is already possible via the kubeadm config/API, the benefits need to justify the maintenance complexity.
after all, kubeadm's goal is to create a "minimal viable cluster" by default.
to me it always seems better to first collect some support (+1s) on the idea before proceeding with the KEP...
the KEP process can be quite involved and my estimate is that at this stage the KEP will not pass. so, it is probably better to flesh out the idea more in a discussion here.
What if the kube-scheduler and kube-controller-manager would manage their front facing server certificate with the certificates.k8s.io API? There would need to be a new controller to automatically sign the CSRs.
As a fallback these components could still use their self created CA.
Kubeadm wouldn't have to do anything and there would still be the possibility to provide own certificates if needed.
this could work, but i guess we will have to own the source code and container image for this new controller.
BTW, does the kube-scheduler even support /metrics?
for 1.18.0 it just reports:
no kind is registered for the type v1.Status in scheme
KCM on the other hand reports what i've expected to see:
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/metrics\"",
"reason": "Forbidden",
"details": {
I just double checked it.
The problem seems to be, that the scheduler does not provide a clean error message if not properly authenticated.
If you authenticate against the /metrics endpoint with a token of an authorized service account, the metrics are provided.
The problem seems to be, that the scheduler does not provide a clean error message if not properly authenticated.
would you care logging a ticket for that in kubernetes/kubernetes and tag with /sig scheduling instrumentation?
i could not find an existing one by searching in the repository.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
I was trying to set up metrics scraping with Prometheus and am pretty confused about the current state of things. Is there a recommended way to monitor kube-scheduler (S) and kube-controller-manager (CM) metrics?
https://github.com/prometheus-operator/kube-prometheus/issues/718
Prometheus is running on the node, different from a master node, and is expected to scrape S/CM metrics from multiple master nodes. Two issues:
--advertise-client-url with node IP and probably 127.0.0.1 seem like a better idea.insecureSkipVerify: true in Prometheus scrape config, but obviously, it is better not cut corners. X509v3 Subject Alternative Name should be similar to etcd one: DNS:localhost, DNS:node-1, IP Address:<node IP, e.g. 10.1.1.1>, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1Authentication already happens via service bearer token.
I can create another issue, but first, prefer to get some feedback as the current issue is related.
I was trying to set up metrics scraping with Prometheus and am pretty confused about the current state of things. Is there a recommended way to monitor kube-scheduler (S) and kube-controller-manager (CM) metrics?
you can sign certficates for a user that is authorized to access /metrics endpoints.
e.g.
rules:
- nonResourceURLs: ["/metrics"]
verbs: ["get", "post"]
https://kubernetes.io/docs/reference/access-authn-authz/rbac/
creating certificates:
https://kubernetes.io/docs/concepts/cluster-administration/certificates/
you can then feed such to a TLS client that tries to access the endpoint.
can be verified locally with curl too.
alternatively for the legacy behavior of insecure metrics you can grant the user system:anonymous /metrics access.
not really recommended.
EDIT:
Authentication already happens via service bearer token.
sorry missed that part. in that case there is likely lack of authz.
I think you didn't understand me. To access the metrics endpoint 3 things must happen:
You are talking about (3), but this is the only part that works. Parts (1) and (2) are broken.
controllerManager:
extraArgs:
bind-address: 0.0.0.0
scheduler:
extraArgs:
bind-address: 0.0.0.0
And then propagate to configs via
kubeadm init phase control-plane scheduler --config kubeadm.yml
kubeadm init phase control-plane control-manager --config kubeadm.yml
# then restart kubelet
That works not great in my case. I have an interface facing interface (e.g. 80.1.1.1, 80.1.1.2 ...) on the nodes. Binding to 0.0.0.0 also binds to 80.0.0.x, and metrics become available over internet. I can stop using kubeadm and manually fix the manifests to bind to 10.1.1.x (different value on each node), but at this most likely is going to break components talking to S/CM because AFAIU it is not possible to bind to both 127.0.0.1 and 10.1.1.x, and because of (2).
For a client to trust a server, the server must supply a server TLS certificate signed by CA trusted by the client with no discrepancies. Not sure if Prometheus checks the CA, but it certainly checks IP Address entries in X509v3 Subject Alternative Name certificate part. With --bind-address=127.0.0.1 Prometheus gives dial tcp 10.1.1.1:10257: connect: connection refused. With --bind-address=0.0.0.0 Prometheus reports x509: certificate is valid for 127.0.0.1, not 10.1.1.1.
This part works. When I add insecureSkipVerify: true to Prometheus scrape configs, it successfully scrapes the metrics. The scrape config:
- job_name: monitoring/main-kube-prometheus-stack-kube-scheduler/0
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: false
...
The service account token already has the get metrics permission in its role:
rules:
- nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
verbs: ["get"]
My current workaround is to apply the firewall rules on every node BEFORE applying the above steps. Replace the 10.1.1.1 with the node IP:
cat <<EOF >/etc/local.d/k8s-firewall.start
iptables -A INPUT -p tcp -d 127.0.0.1,10.1.1.1 -m multiport --dports 10257,10259 -j ACCEPT
iptables -A INPUT -p tcp -m multiport --dports 10257,10259 -j DROP
EOF
chmod +x /etc/local.d/k8s-firewall.start
/etc/local.d/k8s-firewall.start
we don't really provide support on prometheus in the kubeadm issue tracker. what i'm seeing here can fail for other deployers and not only for kubeadm.
you can ask in #kubeadm on k8s slack, support channels or the prometheus issue tracker because likely there are other users already doing what you are trying to do.
The issue is NOT specific to Prometheus. This is specific to kubeadm. The way kubeadm sets up scheduler and controller-manager unless the metrics collector is deployed on every master node, any pull-based central metrics scraper
What is the point in providing metrics endpoint if it cannot be accessed? I guess, kubeadm doesn't do it on purpose, right?
I guess, kubeadm doesn't do it on purpose, right?
yes, because it feels like an extension and not something that all users would need.
the discussion above had the following:
... today users can customize their kube-scheduler and KCM deployments via kubeadm to enable the usage of the custom signed serving certificates for these components if they want their metrics and health checks to be accessible over "true" TLS (i.e. pass the flags and mount the certificates using extraArgs, extraVolumes under ClusterConfiguration).
https://github.com/kubernetes/kubeadm/issues/2244#issuecomment-668646073
so technically you should be able to pass extra flags to the components and set them up.
Most helpful comment
Long term, I would love to see the ability to leverage the certificates API to do automated request and renewal of serving certificates for kube-scheduler and kube-controller-manager similar to the work that is being done to enable this support for the kubelet (https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/20190607-certificates-api.md)