Kubeadm: Kubeadm fails to add volumeMounts for all volumes

Created on 29 Oct 2018 · 14Comments · Source: kubernetes/kubeadm

What happened:
We have an autoscaling group of K8S masters which when coming up use kubeadm init to initialize. The kube-apiserver.yaml manifest intermittently (but quite often) comes up with all the correct volumes defined, but some mounts missing, typically the OS certs and kubernetes/pki directories. This prevents the apiserver from starting obviously. Hand editing the mounts back in fixes the issue.

What you expected to happen:

All volumes defined and mounted.

Environment:

K8S 1.11.3 on AWS, running Ubuntu 18.04 LTS, kernel 4.15.3.

/kind bug

help wanted kinbug

Source

sgmiller

All 14 comments

How we can reproduce this?

yagonobre on 29 Oct 2018

Unfortunately, in my setup, merely setting up a master config and
running kubeadm exhibits the bug something like 70% of the time. Here
is my config:

apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
kubernetesVersion: v1.11.3
tokenTTL: "0s"
api:
bindPort: 443

advertiseAddress: kube-apiserver.production.int.redacted.com

apiServerCertSANs:

kubernetes.production.int.redacted.com
kube-apiserver.production.int.redacted.com
apiServerExtraArgs:
apiserver-count: "3"
runtime-config: batch/v2alpha1=true
cloud-config: /etc/kubernetes/pki/cloud-config.conf
audit-policy-file: /etc/kubernetes/pki/audit-policy.yaml
apiServerExtraVolumes:
name: logs
hostPath: /var/log/kubernetes
mountPath: /var/log/kubernetes
writable: true
controllerManagerExtraArgs:
cloud-config: /etc/kubernetes/pki/cloud-config.conf
authorizationModes:
Node
RBAC
cloudProvider: aws
etcd:
endpoints:
https://10.9.1.109:2379
https://10.9.3.54:2379
https://10.9.5.111:2379
caFile: /etc/ssl/certs/etcd-ca.pem
certFile: /etc/ssl/certs/etcd-cert.pem
keyFile: /etc/ssl/certs/etcd-key.pem

We then init with:

kubeadm init --skip-preflight-checks --config /tmp/master.yaml

My only theory is the one custom volume we have for auditing somehow
trips up kubeadm.

On Mon, Oct 29, 2018 at 9:55 AM, Yago Nobre notifications@github.com
wrote:

How we can reproduce this?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

sgmiller on 29 Oct 2018

I'll try to reproduce it

yagonobre on 31 Oct 2018

I've been trying to debug it as well, I've built a custom kubeadm binary
with extra logging and hope to use it next time I see a new node join in
this way.

On Wed, Oct 31, 2018 at 4:27 PM Yago Nobre notifications@github.com wrote:

I'll try to reproduce it

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubeadm/issues/1196#issuecomment-434854059,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABLTrjH9Z44ZN-oCx9zFakiMhY3p5TULks5uqhXdgaJpZM4X_RvF
.

sgmiller on 1 Nov 2018

What kubeadm version are you using?

yagonobre on 1 Nov 2018

1.11.3 as well.

On Thu, Nov 1, 2018 at 10:32 AM, Yago Nobre notifications@github.com
wrote:

What kubeadm version are you using?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

sgmiller on 1 Nov 2018

We have an autoscaling group of K8S masters which when coming up use kubeadm init to initialize. The kube-apiserver.yaml manifest intermittently (but quite often) comes up with all the correct volumes defined, but some mounts missing, typically the OS certs and kubernetes/pki directories. This prevents the apiserver from starting obviously. Hand editing the mounts back in fixes the issue.

kubeadm having a non-deterministic write to a manifest file is puzzling to say the least.
we don't have fancy logic that writes files concurrently and similar so i'm really not sure what the cause here might be.

please provide the two outcomes as file attachements:

valid api-server manifest.
invalid api-server manifest with missing volumes.

neolit123 on 1 Nov 2018

also please try reproducing with latest stable - 1.12.x.

neolit123 on 1 Nov 2018

Sure. My theory is that somehow the volumeMount slice gets overwritten
or recreated during the gathering phase. But I can't find an obvious
spot in the code where that would happen. Let me get those
attachments...

On Thu, Nov 1, 2018 at 11:18 AM, Lubomir I. Ivanov
notifications@github.com wrote:

We have an autoscaling group of K8S masters which when coming up use
kubeadm init to initialize. The kube-apiserver.yaml manifest
intermittently (but quite often) comes up with all the correct
volumes defined, but some mounts missing, typically the OS certs and
kubernetes/pki directories. This prevents the apiserver from
starting obviously. Hand editing the mounts back in fixes the issue.

kubeadm having a non-deterministic write to a manifest file is
puzzling to say the least.
we don't have fancy logic that writes files concurrently and similar
so i'm really not sure what the cause here might be.

please provide the two outcomes as file attachements:

valid api-server manifest.
invalid api-server manifest with missing volumes.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

sgmiller on 1 Nov 2018

Correct:
kube-apiserver-correct.yaml.txt

Incorrect:
kube-apiserver-buggy.yaml.txt

sgmiller on 1 Nov 2018

Okay, I think I have the problem. We post process the yaml file to get around the bug where you could only have readOnly mounts in kubeadm (https://github.com/kubernetes/kubeadm/issues/628). We would sed out the readOnly statement using a multi-line regex. Due to the non-deterministic order of maps, if that mount occurred anywhere but last and another readOnly mount followed, the regex is swallowing up the other mounts. This for some reason didn't happen on 1.9, so possibly the order used to be deterministic, with extra volume mounts always occurring last and thus not exhibiting the "bug".

We should be able to remove that step in 1.11 with the addition of the writable flag. Let me verify this and then we can close this.

sgmiller on 1 Nov 2018

👍1

we are also sorting volumes in 1.13:
https://github.com/kubernetes/kubernetes/pull/70027

neolit123 on 1 Nov 2018

Yep, that was it. Sorry for the confusion.

sgmiller on 1 Nov 2018

👍1

no problem, glad it worked!

neolit123 on 1 Nov 2018

Was this page helpful?

0 / 5 - 0 ratings