kubeadm should include FQDN when creating certificates

Created on 28 Sep 2017  ยท  19Comments  ยท  Source: kubernetes/kubeadm

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT

Versions

kubeadm version: &version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:30:51Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version: 1.7.4
  • Cloud provider or hardware configuration: on-prem with KVM instances
  • OS: Debian 8
  • Kernel: Linux k8-master-0401 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u2 (2017-06-26) x86_64 GNU/Linux
  • Others:

What happened?

Attempting to use the kubeadm generated ca.crt outside of the cluster on a host that has kubectl installed.

0:1 แ… kubectl get pod Unable to connect to the server: x509: certificate is valid for k8-master-0401, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, not k8-master-0401.italy.smartsheet.com

What you expected to happen?

I would have expected that the FQDN of the API server would have been added to the subject alt name so that the certificate would be valid by hosts outside of the cluster. Also It would be good to add a switch to kubeadm to allow additional names to be added to support multiple masters.

How to reproduce it (as minimally and precisely as possible)?

Install a new cluster and then use the CA cert from a node that is not part of the cluster to execute calls to the API server.

Anything else we need to know?

kinsupport lifecyclrotten

Most helpful comment

Where would we get the fqdn?

Many times hosts can be reached via multiple DNS names. The apiServerCertSANs option should work for now at least.

(Also, a playbook on how to add a name to the cert and swap it in might be good...)

All 19 comments

if you set the nodeName to the fqdn it works, nodeName defaults to the hostname

@hickey What happens if you specify apiServerCertSANs? See config file. That will allow you to append the FQDN as a SAN.

@jbeda @mattmoyer @roberthbailey @mikedanese @kubernetes/sig-auth-feature-requests Do you think including the FQDN in the apiserver cert by default is a good idea? I'm a bit hesitant...

Where would we get the fqdn?

Many times hosts can be reached via multiple DNS names. The apiServerCertSANs option should work for now at least.

(Also, a playbook on how to add a name to the cert and swap it in might be good...)

Where would we get the fqdn?

That wasn't clear to me either. I vote for closing this as out of scope.
Should we document this behavior (non-goal) though?

I'd like to hear from @hickey but it sounds like maybe we just need better documentation for apiServerCertSANs. If you don't already know what a SAN is from context it might be hard to discover.

@hickey does --apiserver-cert-extra-sans (documented here) work for you?

Good morning,

I think that part of the issue is documentation. When I first started using Kubeadm I don't really remember finding a bunch of documentation. It might also be that I was shotgunning the use of Kubeadm without fully researching the documentation (my bad).

I just found a link to the Kubeadm reference pages a bit ago and reviewed them. Yes, there are a couple of switches I wish I had used in setting up the current cluster I am working with. It will probably be a bit before I could take down the cluster and rebuild it again and test the --apiserver-cert-extra-sans switch. If this is adding the hostname that I supply to the switch to the SANS attribute, then I would expect this to work well.

I am not sure I understand why the FQDN would not be wanted to added to the cert by default. This sets the cert to be recognized and used outside the cluster. It would still take adding the CA cert to a cert chain to be fully recognized. Even with a cert chain, using a hostname that is only resolvable by hosts within the cluster is completely useless (I think) from a cert perspective.

I think the key question then comes down to how to properly obtain the FQDN. More specifically the correct (resolvable) FQDN. The quick and dirty approach is to use /bin/hostname -f, but as this can easily demonstrated it is quite easy to get an incorrect hostname and a hostname that does not properly resolve in DNS. I would think a better approach would be to try to do a full forward and reverse resolution using the net library. Something like https://www.google.com/search?q=golang+get+fqdn&ie=utf-8&oe=utf-8 I would expect to be fairly successful.

There will always be times when the FQDN will not be discoverable and it will be necessary for the cluster admin to manually set the FQDN on the cert. (although I would still argue the this is largely pointless without DNS and such being properly setup). Maybe part of the process is that if the FQDN and the DNS resolution is unsuccessful, warning messages and suggestions to validate DNS before proceeding.

Sorry I am being long winded here, but I hope that this spawns a couple of ideas to make Kubeadm a more robust tool. I have found it a good tool already and thank @jbeda for suggesting that I give it a try. I would also be interested in hearing more from @luxas on his reservations of adding the FQDN to the cert--just wondering if I am missing something or not understanding a use case.

@hickey Thanks for taking the time to respond, we appreciate the feedback! So in terms of docs, we're making that a priority in v1.9. I agree that the amount of flags makes it hard to grok what does what. We're working on making our ref docs more accessible (see https://github.com/kubernetes/kubeadm/issues/265), so hopefully that should mitigate confusion in the future.

In terms of using the FQDN as a SAN by default, as @jbeda mentioned (and you too in your comment), identifying a valid hostname is tricky. For example, Rackspace VM hostnames are set to the name of the instance as created by the user. So my-server would obviously be unrouteable. Perhaps the question is to what degree kubeadm should try finding the right FQDN? One option might be to look in a specified place (e.g. hostname), and if it's not resolveable with DNS, we should give up and stick with current behaviour. We could even feature gate it so folks who do not expect this behaviour are not negatively impacted.

I can add it to our community meeting agenda for tomorrow. Please feel free to attend if you'd like to be part of that discussion (meeting info is available here). If not, I can post back here what folks think about the best step forward.

You could always script generate a config.

@jamiehannaford I would have loved to attend the meeting this morning, but mornings are many of the times are hard to get going. If I had it on my schedule, I could have streamed on the motorcycle on the way to work--I bet that would have been a first! :-)

Anyway, my first 10 second thought about getting the FQDN was to just read /etc/hostname, but that is frequently wrong--or more specifically the domain is frequently wrong. The more I think about the issue the only real way to detect the FQDN is to fully resolve a hostname in DNS. If the hostname fails validation in DNS, then the only alternative is to create the certs like they are today. To correct that situation, the admin would need to manually set specific values on the command line or through a config file.

I can envision one or two cases where DNS validation would fail but is completely valid. In these cases the admin would need to configure kubeadm with appropriate values until the rest of the infrastructure has been stood up.

Also, the above link for a thought on doing the DNS validation was wrong. Not sure why my browser copied the wrong link. The correct link should have been https://github.com/Showmax/go-fqdn/blob/master/fqdn.go.

I just think it's cleaner to set this for the users that need FQDNs in the apiserver serving cert.
Creating "hacks" in kubeadm to determine the FQDN; with an error rate that would be pretty high (oh, I didn't expect/want that value go into the cert) -- I think too high in my opinion.

TL;DR; Computing the FQDN is hard. We would have to perform non-obvious steps to get it. Still, not all users may want it. We might end up in a situation where one user don't want it, but we still enforce it and you can't opt out. I think that is worse than requiring those with this need to just pass the value via the flag we have for this.

Thanks for your detailed responses though @hickey :clap:!

I can agree with you that it is less than desirable to setup an incorrect FQDN or include the FQDN into the SANS when an admin does not want it included. I am having a hard time coming up with a use case when an admin would not want the FQDN included into the SANS, but that is probably a lack of creativity on my part to envision a situation. Although there have been one or two statements that have questioned if it would be good to include the FQDN, but nothing has come back when I have asked for clarification or better understanding. So is it more of a fear of breaking someone/something instead of having a solid situation when something would break?

I feel like I am harping on this issue and that is partly because I see Kubeadm as that tool that is trying to select sane defaults (or defaults that almost everyone can agree on) and it tends to be a one shot deal. Once the cluster is created and workloads are added, it starts to be come costly to have to go back and reconstruct things again. I have now had to do it a couple of times to get things worked out (partly because there is more to Kubeadm what one first sees and partly being new to Kubernetes and finding out a week or two later that something should have been setup differently). Now I know what switches to use to get a valid configuration, so there is not much pain for me any more. Just looking to help avoid the pain for future users of Kubeadm or at least lessen it to where it is just minor changes to a cluster.

As I wrote the above, I started thinking that maybe another approach to this issue and probably many others is to instead have a sort of checklist presented to the admin before the cluster is initialized. The checklist would attempt to itemize how the cluster is going to be constructed and identify either an inconstancy or an abnormal setting. Certainly a larger discussion and probably should be tracked on a separate issue if other see any merit in the idea.

I think the best we can do here is to document that if you want to make the API Server accessible via [your FQDN here]; you should use the --api-server-extra-sans flag.

Does that sound reasonable? Again, I'm not in favor for adding "hacks" that might or might not work for an user and has a great risk of creating a "false positive" (a SAN is added where it shouldn't have been added). WDYT?

Interestingly having a somewhat related issue it seems.

  • With --cloud-provider=aws (with custom provisioning, not kubeadm),
  • Hostname of the instances (container linux) are set to ip-172-16-46-149 (not fully qualified)
  • So the CSRs for TLS bootstrapping are requested with subject= /O=system:nodes/CN=system:node:ip-172-16-46-149
  • Because the controller-manager expects the hostname to be fully-qualified (ip-172-16-46-149.us-west-1.compute.internal), so it cannot find the instance when querying the cloud-provider and decides to delete the node.

If I manually set the hostname to the FQDN hostname (static), TLS bootstrapping still occurs properly and the node stays in the list (not removed by controller-manager).

Node ip-172-16-46-149 event: Deleting Node ip-172-16-46-149 because it's not present according to cloud provider

@Quentin-M Using the AWS cloud provider is an edge case, as it modifies the Node API object as you've noticed. That means you need to run kubeadm init --node-name [AWS FQDN], where you get the FQDN AWS wants from the internal metadata servers.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Was this page helpful?
0 / 5 - 0 ratings