Kubeadm: Provide proper certificates for kube-scheduler and kube-controller-manager

Created on 4 Aug 2020  路  22Comments  路  Source: kubernetes/kubeadm

FEATURE REQUEST

Versions

kubeadm version (use kubeadm version): 1.18.6

Environment:

  • Kubernetes version (use kubectl version): 1.18.6
  • Cloud provider or hardware configuration: Bare-Metal
  • OS (e.g. from /etc/os-release): Debian 10
  • Kernel (e.g. uname -a): 4.19.0-9
  • Others:

What happened?

Kubeadm disables the "insecure" ports of kube-scheduler and kube-controller-manager by setting the --port=0 flag. Therefore metrics have to be scaped over TLS. This is fine but Kubeadm doesn't seem to manage the certificates of kube-scheduler and kube-controller manager. These components - if no certificate is provided - will create a self signed certificate to serve requests. One could just disable certificate verification but that would somehow defer the use of TLS.

What you expected to happen?

Kubeadm should create and manage certificates for the "secure" port of kube-scheduler and kube-controller-manager. These certificates should be signed by the CA, that is created by Kubeadm.

How to reproduce it (as minimally and precisely as possible)?

  1. Create a cluster with Kubeadm
  2. Access the "secure" port (10257 or 10259)
aresecurity kindesign kinfeature prioritawaiting-more-evidence

Most helpful comment

Long term, I would love to see the ability to leverage the certificates API to do automated request and renewal of serving certificates for kube-scheduler and kube-controller-manager similar to the work that is being done to enable this support for the kubelet (https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/20190607-certificates-api.md)

All 22 comments

@FrediWeber thank you for logging the ticket.
you have a valid observation that we do not sign the serving certificate and key for the components in question.

we had a long discussion with a user on why we are not singing these for kubeadm and you can read more about this here:
https://github.com/kubernetes/kubernetes/issues/80063

IIUC, one undesired side effect, is that if we start doing that our HTTPS probes will fail, as Pod Probe API does not support signed certificates for HTTPS (only self-signed):
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#probe-v1-core
https://github.com/kubernetes/kubernetes/issues/18226#issuecomment-430510491

we could workaround that using a "command" probe that is cert/key aware, but this is difficult as the component images are "distroless" (no shell, no tools). so maybe one day we can support that if core k8s supports it properly.

@neolit123 Thank you very much for your fast response and the clarifications.

If i understand it correctly, the issue https://github.com/kubernetes/kubernetes/issues/80063 is more about mapping the hole PKI dir into the container, external PKIs and shorter renew intervals.

I read a little bit about health checks with HTTPS in https://github.com/kubernetes/kubernetes/issues/18226 and https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/.
If I understand everything correctly, the Kubernetes docs state, that the certificate is not checked at all for the health check so it shouldn't matter if the certificate is self-signed or if it's properly signed with the already present CA.

If scheme field is set to HTTPS, the kubelet sends an HTTPS request skipping the certificate verification.

I don't see any security implications in just mapping the signed certificates and corresponding keys for kube-scheduler and kube-controller-manager.

The only negative downside would be, that the certificate rotation would have to be managed. On the other hand this is already the case for other certificates AFAIK.

If I understand everything correctly, the Kubernetes docs state, that the certificate is not checked at all for the health check so it shouldn't matter if the certificate is self-signed or if it's properly signed with the already present CA.

i'm not so sure about this and i haven't tried it. my understanding is that if the server no longer has self-signed certificates this means that it would reject any client connections on HTTPS that do not pass authentication.
e.g. curl -k... would no longer work?

If i understand it correctly, the issue kubernetes/kubernetes#80063 is more about mapping the hole PKI dir into the container, external PKIs and shorter renew intervals.

that is true. however, the discussion there was also about the fact that today users can customize their kube-scheduler and KCM deployments via kubeadm to enable the usage of the custom signed serving certificates for these components if they want their metrics and health checks to be accessible over "true" TLS (i.e. pass the flags and mount the certificates using extraArgs, extraVolumes under ClusterConfiguration).

with the requirement of kubeadm managing these extra certificates for renewal, i'm leaning towards -1 initially, but i would like to get feedback from others too.

cc @randomvariable @fabriziopandini @detiber

i'm not so sure about this and i haven't tried it. my understanding is that if the server no longer has self-signed certificates this means that it would reject any client connections on HTTPS that do not pass authentication.
e.g. curl -k... would no longer work?

So do you mean the kube-controller-manager and kube-scheduler would no longer accept the health check requests because they do not pass a client certificate or any other authentication?
Or do you mean the "client side" of the Kubernetes health check would not connect because the certificate is not self signed? I'm not sure about the first case but if the docs are correct, the second case should not happen.

Please also keep in mind, that the current certificate is also not really self-signed. Kube-controller-manager and kube-scheduler seem to create an internal, temporary CA on startup and sign the certificate with their own CA.
image

You are absolutely right about the existing possibility to mount certificates and set the options with the extraArgs.

The thing that has changed is, that Kubeadm by default deactivates the insecure port with --port=0. I'm aware that this is deprecated in the upstream components (kube-scheduler and kube-controller-manager) anyway but I think that Kubeadm should configure these components properly especially when Kubeadm already "manages" a CA from which these certificates could relative easily be signed.

Another approach would be to let these two components handle their front-facing certificates on their own like kubelet does.

Long term, I would love to see the ability to leverage the certificates API to do automated request and renewal of serving certificates for kube-scheduler and kube-controller-manager similar to the work that is being done to enable this support for the kubelet (https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/20190607-certificates-api.md)

So do you mean the kube-controller-manager and kube-scheduler would no longer accept the health check requests because they do not pass a client certificate or any other authentication?

that was my understanding. then again, we do serve the kube-apiserver on HTTPS and it's probe does not have/pass certficates, so perhaps it would just work.

I'm aware that this is deprecated in the upstream components (kube-scheduler and kube-controller-manager) anyway but I think that Kubeadm should configure these components properly especially when Kubeadm already "manages" a CA from which these certificates could relative easily be signed.

but again the problem is that this is yet another set of certficates that kubeadm has to manage during renewal, and must consider during our "copy certficates" functionality for HA support. it is not a strict requirement and kubeadm already supports it for users that want to do that using extraArgs. we have a similar case for the kubelet serving certificate which is "self-signed".

i'd say, at minimum it would be worthy of a enhancement proposal (KEP):
https://github.com/kubernetes/enhancements/tree/master/keps

Long term, I would love to see the ability to leverage the certificates API to do automated request and renewal of serving certificates for kube-scheduler and kube-controller-manager similar to the work that is being done to enable this support for the kubelet (https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/20190607-certificates-api.md)

i tried to follow the latest iterations of the CSR API closely, but i have not seen discussions around CSRs for the serving certificates of these components via the KCM CA. my guess would be that there might be some sort of a blocker for doing that, given a lot of planning went in the v1 of the API.

i'd say, at minimum it would be worthy of a enhancement proposal (KEP):
https://github.com/kubernetes/enhancements/tree/master/keps

Would it be okay for you if I'd start the process?

for a feature that is already possible via the kubeadm config/API, the benefits need to justify the maintenance complexity.
after all, kubeadm's goal is to create a "minimal viable cluster" by default.

to me it always seems better to first collect some support (+1s) on the idea before proceeding with the KEP...
the KEP process can be quite involved and my estimate is that at this stage the KEP will not pass. so, it is probably better to flesh out the idea more in a discussion here.

What if the kube-scheduler and kube-controller-manager would manage their front facing server certificate with the certificates.k8s.io API? There would need to be a new controller to automatically sign the CSRs.
As a fallback these components could still use their self created CA.

  1. Check if the flags for certificates are set - if yes use them and start the component
  2. If no front facing certificate flags are set, try to generate the CSR and let it sign by the corresponding controller
  3. If step 2 fails or is disabled, proceed in the same way as today (generate own CA etc.)

Kubeadm wouldn't have to do anything and there would still be the possibility to provide own certificates if needed.

this could work, but i guess we will have to own the source code and container image for this new controller.

BTW, does the kube-scheduler even support /metrics?

for 1.18.0 it just reports:

no kind is registered for the type v1.Status in scheme

KCM on the other hand reports what i've expected to see:

  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/metrics\"",
  "reason": "Forbidden",
  "details": {

I just double checked it.
The problem seems to be, that the scheduler does not provide a clean error message if not properly authenticated.
If you authenticate against the /metrics endpoint with a token of an authorized service account, the metrics are provided.

The problem seems to be, that the scheduler does not provide a clean error message if not properly authenticated.

would you care logging a ticket for that in kubernetes/kubernetes and tag with /sig scheduling instrumentation?
i could not find an existing one by searching in the repository.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

I was trying to set up metrics scraping with Prometheus and am pretty confused about the current state of things. Is there a recommended way to monitor kube-scheduler (S) and kube-controller-manager (CM) metrics?

https://github.com/prometheus-operator/kube-prometheus/issues/718

Prometheus is running on the node, different from a master node, and is expected to scrape S/CM metrics from multiple master nodes. Two issues:

  • Both S and CM bind to 127.0.0.1 by default which makes it impossible to access metrics by Prometheus. Binding to 0.0.0.0 (current recommended workaround) is too wide-scope as it may bind to external IP addresses and expose metrics to the internet. etcd and apiserver's approach with --advertise-client-url with node IP and probably 127.0.0.1 seem like a better idea.
  • Moving to HTTPS it seems reasonable to have TLS certificates on this endpoint for a client (Prometheus) to trust it. As a workaround it is possible to set insecureSkipVerify: true in Prometheus scrape config, but obviously, it is better not cut corners. X509v3 Subject Alternative Name should be similar to etcd one: DNS:localhost, DNS:node-1, IP Address:<node IP, e.g. 10.1.1.1>, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1

Authentication already happens via service bearer token.

I can create another issue, but first, prefer to get some feedback as the current issue is related.

I was trying to set up metrics scraping with Prometheus and am pretty confused about the current state of things. Is there a recommended way to monitor kube-scheduler (S) and kube-controller-manager (CM) metrics?

you can sign certficates for a user that is authorized to access /metrics endpoints.
e.g.

rules:
- nonResourceURLs: ["/metrics"]
  verbs: ["get", "post"]

https://kubernetes.io/docs/reference/access-authn-authz/rbac/

creating certificates:
https://kubernetes.io/docs/concepts/cluster-administration/certificates/

you can then feed such to a TLS client that tries to access the endpoint.
can be verified locally with curl too.

alternatively for the legacy behavior of insecure metrics you can grant the user system:anonymous /metrics access.
not really recommended.

EDIT:

Authentication already happens via service bearer token.

sorry missed that part. in that case there is likely lack of authz.

I think you didn't understand me. To access the metrics endpoint 3 things must happen:

  1. Port must be accessible.
  2. Assuming this is https, the client must trust the server (or opt to not care).
  3. The server must trust the client (or opt to not care).

You are talking about (3), but this is the only part that works. Parts (1) and (2) are broken.

  1. Prometheus is executed on some node (probably non-master), pod IP is e.g. 10.2.1.4. Even if it is master, it must scrape other master nodes as well, and it discovers S and CM via corresponding K8s services, so it gets node IP addresses (e.g. 10.1.1.1, 10.1.1.2 ...) and not 127.0.0.1. With this, 10.1.1.1:10257 and 10.1.1.1:10259 are inaccessible from anywhere including 10.2.1.4. As I said, one workaround is to add the following to kubeadm config:
controllerManager:
  extraArgs:
    bind-address: 0.0.0.0
scheduler:
  extraArgs:
    bind-address: 0.0.0.0

And then propagate to configs via

kubeadm init phase control-plane scheduler --config kubeadm.yml
kubeadm init phase control-plane control-manager --config kubeadm.yml
# then restart kubelet

That works not great in my case. I have an interface facing interface (e.g. 80.1.1.1, 80.1.1.2 ...) on the nodes. Binding to 0.0.0.0 also binds to 80.0.0.x, and metrics become available over internet. I can stop using kubeadm and manually fix the manifests to bind to 10.1.1.x (different value on each node), but at this most likely is going to break components talking to S/CM because AFAIU it is not possible to bind to both 127.0.0.1 and 10.1.1.x, and because of (2).

  1. For a client to trust a server, the server must supply a server TLS certificate signed by CA trusted by the client with no discrepancies. Not sure if Prometheus checks the CA, but it certainly checks IP Address entries in X509v3 Subject Alternative Name certificate part. With --bind-address=127.0.0.1 Prometheus gives dial tcp 10.1.1.1:10257: connect: connection refused. With --bind-address=0.0.0.0 Prometheus reports x509: certificate is valid for 127.0.0.1, not 10.1.1.1.

  2. This part works. When I add insecureSkipVerify: true to Prometheus scrape configs, it successfully scrapes the metrics. The scrape config:

- job_name: monitoring/main-kube-prometheus-stack-kube-scheduler/0
  scheme: https
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: false
... 

The service account token already has the get metrics permission in its role:

rules:
- nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
  verbs: ["get"]

My current workaround is to apply the firewall rules on every node BEFORE applying the above steps. Replace the 10.1.1.1 with the node IP:

cat <<EOF >/etc/local.d/k8s-firewall.start
iptables -A INPUT -p tcp -d 127.0.0.1,10.1.1.1 -m multiport --dports 10257,10259 -j ACCEPT
iptables -A INPUT -p tcp -m multiport --dports 10257,10259 -j DROP
EOF
chmod +x /etc/local.d/k8s-firewall.start
/etc/local.d/k8s-firewall.start

we don't really provide support on prometheus in the kubeadm issue tracker. what i'm seeing here can fail for other deployers and not only for kubeadm.

you can ask in #kubeadm on k8s slack, support channels or the prometheus issue tracker because likely there are other users already doing what you are trying to do.

The issue is NOT specific to Prometheus. This is specific to kubeadm. The way kubeadm sets up scheduler and controller-manager unless the metrics collector is deployed on every master node, any pull-based central metrics scraper

  • cannot read metrics from these components
  • cannot trust the certificates, provided by these components

What is the point in providing metrics endpoint if it cannot be accessed? I guess, kubeadm doesn't do it on purpose, right?

I guess, kubeadm doesn't do it on purpose, right?

yes, because it feels like an extension and not something that all users would need.

the discussion above had the following:

... today users can customize their kube-scheduler and KCM deployments via kubeadm to enable the usage of the custom signed serving certificates for these components if they want their metrics and health checks to be accessible over "true" TLS (i.e. pass the flags and mount the certificates using extraArgs, extraVolumes under ClusterConfiguration).

https://github.com/kubernetes/kubeadm/issues/2244#issuecomment-668646073

so technically you should be able to pass extra flags to the components and set them up.

Was this page helpful?
0 / 5 - 0 ratings