Nixpkgs: kubernetes: bootstrap process is hacked-up together and broken

Created on 8 May 2019 · 27Comments · Source: NixOS/nixpkgs

Issue description

I thought that we finally were at the point where whole kubernetes nixos module could be said to be stable, but idk why whole bootstrap process had to be so messed up. Sometimes things are supposed to be stateless, and introducing state only presents more problems. Systemd has tools that you can represent dependencies, but ConditionPathExists was not designed for this, but purely for disablng/enabling units based on paths. I can say i'm sad (panda), that i would probably replace my k8s tests with something else, as it's not only technical issue, but also the fact, that these changes can be merged into master so easily without even running any tests.

@srhb @johanot @calbrecht

Steps to reproduce

Just run tests and check how it fails.

Technical details

All custom magic scripts in kubernetes that do until X; bad-check; done and all ConditionPathExists attempts, and all unnecessary additional targets, that make me a sad panda.

How to improve

Remove all magic scripts and leave statelessness to do its magic.

Source

offlinehacker

Most helpful comment

I wish putting thoughts into words, would be as easy as putting it into code, sorry for any of misunderstandings, i swear i just want to help, just my communications skills are not the best.

offlinehacker on 9 May 2019

❤5

All 27 comments

I am away on business but I'll try and remember to come back to this. As a first note, I think I do agree, especially since I've been trying to figure out how to fix the dependencies in the control plane startup and have thus far not found a way that makes sense and is simple given different distributions of components per node.

srhb on 8 May 2019

@offlinehacker, aha. From another point of view, what is the benefit of repeatedly failing services because pki certs not being present in file system but given as parameters to daemons, probably leading to the whole cluster being unable to start because the stateless "magic" is not working reliably?

calbrecht on 8 May 2019

Hello everyone, some comments from a passer by who is highly interested in this work but is very new to kubernetes.

@offlinehacker thanks for your presentations about kubernetes on nixos and early work. It gave me the warm fuzzy feeling I needed to get started and ignore gravity, rancheros, and other options. Thanks ++

@calbrecht thanks for your recent work, it gave me the confidence that someone else was able to dive in and understand such a complex set of derivations and contribute improvements after painful rounds of feedback. I hope to be able to contribute some time in the future.

Now, to the point:

From my experience with other nixos services like postgresql, I expected the services to be available and usable after doing just "services.kubernetes.enable = true" so that unexperienced users can start to use kubectl and figure out the rest later.

As a data sample, below is how an outsider expected things to work after reading the docs, I thought all the pki problems would go away when I set easyCerts to true.

What is the expected configuration for beginners for the stateless option that @offlinehacker suggests and for the current stateful option @calbrecht ? What are the expected configuration steps for someone that comes with a let's encrypt wildcard certificate and a domain already pointing at the machine?

{ config, lib, pkgs, ... }:
let
 domain="orgnization.gov.country";
in {
  imports = [ <nixpkgs/nixos/modules/installer/scan/not-detected.nix> ];
  boot.initrd.availableKernelModules = [ "xhci_pci" "ahci" "usb_storage" "usbhid" "sd_mod" "sr_mod" "sdhci_pci" ];
  boot.kernelModules = [ "kvm-intel" ];
  boot.extraModulePackages = [ ];
  fileSystems."/" = { device = "/dev/disk/by-label/nixos"; fsType = "btrfs";};
  fileSystems."/boot" = { device = "/dev/disk/by-label/boot"; fsType = "vfat"; };
  swapDevices = [ ];
  nix.maxJobs = lib.mkDefault 8;
  powerManagement.cpuFreqGovernor = lib.mkDefault "powersave";
  system.stateVersion = "19.03";
  boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;
  boot.loader.timeout=0;
  networking.hostName="master";
  networking.extraHosts = ''
     127.0.0.1 ${domain}
     127.0.0.1 traefik.${domain}
     127.0.0.1 dashboard.${domain}
     127.0.0.1 master.${domain}
  '';
  networking.networkmanager.enable = true;
  environment.systemPackages = with pkgs; [ wget vim htop kubectl docker-compose ];
  services.openssh.enable = true;
  services.kubernetes = {
    roles = [ "master" "node" ];
    masterAddress = "master.${domain}";
    apiserverAddress = "master.${domain}";
    easyCerts = true;
  };
  users.users.x = {
   isNormalUser = true;
   home = "/x";
   extraGroups = [ "docker" ];
  };
}

And what I got when kubectl runs:

kubectl cluster-info

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
The connection to the server localhost:8080 was refused - did you specify the right host or port?

ingenieroariel on 8 May 2019

@calbrecht i understand the pain, and i tried to fix it once using same tactics, but it brings in more complexity, and decided that it was good as it was. I will investigate a bit further, but the disappointment came from the fact that it worked quite reliably, and now it does not work anymore(also we have typo here: https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/services/cluster/kubernetes/default.nix#L278)

As i'm aware that development part of nixos modules is not thoroughly documented, and we need to somehow enable @GrahamcOfBorg to run nixos test automatically on changes @grahamc?

offlinehacker on 8 May 2019

Btw sorry for my initial drama text

offlinehacker on 8 May 2019

❤2

@ingenieroariel Your kubectl doesn't work, because you have to give it a working kubeconfig containing credentials for the cluster. See the last paragraph of the manual section: https://nixos.org/nixos/manual/index.html#sec-kubernetes.

I tend to agree with @offlinehacker that perhaps the current systemd dependency chain is a little too "magic"; especially considering the issues that we've seen lately and the fact that the kubernetes components are afterall designed to run individually, self-heal, and start/stop independently. But on the other hand, there sure was a lot of restarting and log-noise going on before the work of @calbrecht, in no small part due to PKI bootstrapping.

In general, it is difficult to satisfy everyone. Regarding kubectl, the trade-off is that pre-configuring kubectl will admin-credentials systemwide might be inconvenient or even a security risk.

johanot on 8 May 2019

👍1

Issue with ConditionPathExists is that it will silently skip dependency if path does not exist:

Before starting a unit, verify that the specified condition is true. If it is not true, the starting of the unit will be (mostly silently) skipped, however all ordering dependencies of it are still respected. A failing condition will not result in the unit being moved into the "failed" state. The condition is checked at the time the queued start job is to be executed.

offlinehacker on 8 May 2019

Hmm, so kubelet actually uses sd_notify, so we can use this signal to wait for kubelet to start, but i guess this is all still too soon: https://github.com/kubernetes/kubernetes/blob/559b11410ee51998316e3e80d864886d88314607/cmd/kubelet/app/server.go#L724

offlinehacker on 8 May 2019

I will try to improve it and make a pull request, some while loops are useful to get rid of all the noise.

offlinehacker on 8 May 2019

@calbrecht i agree with that completely, i think i have an idea :) https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/services/cluster/kubernetes/addon-manager.nix#L92

offlinehacker on 8 May 2019

Regarding the tests @offlinehacker , AFAIK, they run out of memory on ofborg? Don't know if this have effect on ofborg or not: https://github.com/NixOS/nixpkgs/blob/master/nixos/tests/kubernetes/base.nix#L37

johanot on 8 May 2019

@calbrecht i agree with that completely, i think i have an idea :) https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/services/cluster/kubernetes/addon-manager.nix#L92

@offlinehacker I wrote that comment originally, and I think I know what you are thinking. :P

johanot on 8 May 2019

Issue with ConditionPathExists is that it will silently skip dependency if path does not exist:
Before starting a unit, verify that the specified condition is true. If it is not true, the starting of the unit will be (mostly silently) skipped, however all ordering dependencies of it are still respected. A failing condition will not result in the unit being moved into the "failed" state. The condition is checked at the time the queued start job is to be executed.

Therefor every service, that has a ConditionPathExists, also has a path unit that triggers on PathExists and PathChanged. That ensures the service is triggered until and when all paths are present. I fail to see an issue here.

calbrecht on 8 May 2019

[...] the fact that the kubernetes components are afterall designed to run individually, self-heal, and start/stop independently.

It seems to me, that is working reliably if the components are running in containers, but not when building the cluster with systemd services.

calbrecht on 8 May 2019

No biggie, i wanted to do some changes to k8s module anyway, now i'm full-in refactoring mode, as i have some free time. I think we can show coreos and redhat guys how to do k8s properly. Will take into account and think what is the best way to wait for files to exist, i think there are other ways to delay execution that do not directly pollute apiserver.nix and other files. I also want to integrate https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/ as alternative to certmgr. Also replacing addon-manager with something more nixos-like. Will make a WIP pull request when i have something that's review-able.

offlinehacker on 8 May 2019

👍1

Personally I think something was merged a bit prematurely, but it was at a time were everybody else seemed too busy to chip in with comments, and that's really not the fault of the people who ended up merging the change that I guess is the change that was the target of the initial rant.

Since there's actually a few companies and people willing to throw resources at making this better, we actually do have a chance at making the module really cool, but only if we work together, instead of starting all over all the time.

So; maybe it's time to stop up, and rethink things _together_. Statements like i wanted to do some changes to k8s module anyway, now i'm full-in refactoring mode honestly sounds like wanting to just throw away something somebody else made with good intentions, while also ignoring the problems that was attempted to be solved. A bit like "here comes superman to save the world".

This way of communicating (and working, for that matter) is not really something that helps foster an environment where new people wants to contribute. And apologizing later on really doesn't do much to fix the unfortunate tone this discussion started out as.

adamtulinius on 8 May 2019

@offlinehacker when doing #45670 I actually went down the rabbit hole of kubelet-tls-bootstrapping :) At that time (maybe it improved since), it unfortunately turned out to be very little nix-friendly, having too many imperative steps. It might be possible to reduce the number of manual bootstrapping actions to ~one, though. But in any case, you'll need to somehow join/approve nodes to the cluster manually. Also; kubelet and kube-proxy are not the only components that need certificates. Eventually I went with a certmgr-only solution, mainly because the solution works for both master and node components.

johanot on 8 May 2019

Don't get me wrong. I'd love to get rid of certmgr and cfssl in favor of a more kubernetes-native solution, but it would make me sad (panda) if we were to sacrifice declarativeness or security in doing so.

And I agree with @adamtulinius about the working-together part. :-) Anytime you think you found the solution to all the problems in the world, someone else has very likely already tried and failed. So let's learn from each other's mistakes and successes - and triumph.

johanot on 8 May 2019

👍1

@adamtulinius I'm not trying to throw away anything, i just want to improve kubernetes module with best intentions and take into account existing work, regarding my initial message was a bit harsh. I'm not trying to be a superman, i just want to make improvements, as i have a bit of time to do that and i need that for my other projects(kubenix). I was not into nix community much lately, and i'm not sure how much resources we have, as a year or two ago, the state was basically do it yourself. What would be your proposal, so maybe we can split work and improve this together?

@johanot i didn't knew that you tried it already, and yea makes sense.

offlinehacker on 9 May 2019

I wish putting thoughts into words, would be as easy as putting it into code, sorry for any of misunderstandings, i swear i just want to help, just my communications skills are not the best.

offlinehacker on 9 May 2019

❤5

After a quick chat in #nixos, @srhb rightly noted and I agreed to the fact that there seems to be a larger issue to resolve here. What follows is what _I_ think this issue is.

The current k8s infrastructure in NixOS tries to be two things at once - a low level configuration system for configuring and starting k8s services, and a magic single-shot-give-me-kubernetes solution. This results in bugs like the ones around control-plane-online.service, where default configurations seemingly work fine but some custom configurations (including ones where the user just wants to start particular k8s services) fail. IMO, those bugs stem from a very organically-grown and abstraction-leaking model of the current codebase.

I think a good course of action would be to explicitely split the current codebase into two layers - one that _just_ lets you start k8s services with given kubeconfig options and flags, and then one that uses this functionality to bring an automagic declarative k8s cluster to life. All startup sequencing and bootstrapping could thus be very opinionated and made to fit the approach of the latter layer, while the first layer could be used by advanced users who want to bootstrap their clusters differently.

Of course, this is not a small change. There's loads of technical details to iron out and a lot of implementation work, but I do think it's the right time to at least consider this split now.

q3k on 14 May 2019

👍3

@q3k completely agreed, there needs to be clear boundaries. I'm still working on this, will create pull request once ready for review.

offlinehacker on 16 Jun 2019

I was wondering whether we could perhaps piggyback on https://github.com/kubernetes/kubeadm to do most of the state reconciliation for us (it's heavily tested, and maintained by the k8s community) whilst we focus on our core expertise: reproducible configuration and package management. Maybe that way we can have best of both worlds and make our code a bit less fragile and less of a maintenance burden?

I've been slowly messing around with these ideas here: https://github.com/arianvp/nixos-stuff/blob/master/modules/k8s/default.nix

arianvp on 8 Jul 2019

+1 for going with kubeadm. So we could add additional layers on top like selecting the networking solution around. I would also replace docker with CRI-O, which seems more lightweight. I don’t think that we have to extend the scope of kubeadm here, but initially should bring it into nixpkgs. WDYT?

saschagrunert on 4 Sep 2019

Kubeadm is already in Nixpkgs in the Kubernetes package. Just no module
support for it yet

On Wed, Sep 4, 2019, 20:56 Sascha Grunert notifications@github.com wrote:

+1 for going with kubeadm. So we could add additional layers on top like
selecting the networking solution around. I would also replace docker with
CRI-O, which seems more lightweight. I don’t think that we have to extend
the scope of kubeadm here, but initially should bring it into nixpkgs. WDYT?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/NixOS/nixpkgs/issues/61135?email_source=notifications&email_token=AAEZNI2VJZOZQDKHYSH6KQDQIAAGHA5CNFSM4HLQOK52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD54TNWY#issuecomment-528037595,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEZNIYPBBTOAUIARALIKQLQIAAGHANCNFSM4HLQOK5Q
.

arianvp on 5 Sep 2019

I suggest we close this issue, since the original bug report here refers to the systemd dependencies which have now been reverted by #67563. I hope that some of you would like to continue the general discussion about the kubernetes module on discourse: https://discourse.nixos.org/t/kubernetes-the-nixos-module-of-the-future/3922

johanot on 5 Sep 2019

👍1

With #67563 merged, let's close this.