RKE version:
v1.1.4
Docker version: (docker version,docker info preferred)
[core@squirtle ~]$ docker info
Client:
Debug Mode: false
Server:
Containers: 10
Running: 7
Paused: 0
Stopped: 3
Images: 4
Server Version: 19.03.11
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: journald
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: /usr/libexec/docker/docker-init
containerd version:
runc version: fbdbaf85ecbc0e077f336c03062710435607dbf1
init version:
Security Options:
seccomp
Profile: default
selinux
Kernel Version: 5.7.8-200.fc32.x86_64
Operating System: Fedora CoreOS 32.20200715.3.0
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.63GiB
Name: squirtle
ID: T6WG:NF2V:6Q4R:YTML:JQKA:ZQE6:JNBJ:MQG3:VVO3:FAFM:JXRB:2ZCE
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true
Operating system and kernel: (cat /etc/os-release, uname -r preferred)
[core@squirtle ~]$ uname -r
5.7.8-200.fc32.x86_64
[core@squirtle ~]$ cat /etc/os-release
NAME=Fedora
VERSION="32.20200715.3.0 (CoreOS)"
ID=fedora
VERSION_ID=32
VERSION_CODENAME=""
PLATFORM_ID="platform:f32"
PRETTY_NAME="Fedora CoreOS 32.20200715.3.0"
ANSI_COLOR="0;34"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:32"
HOME_URL="https://getfedora.org/coreos/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora-coreos/"
SUPPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
BUG_REPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=32
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=32
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="CoreOS"
VARIANT_ID=coreos
OSTREE_VERSION='32.20200715.3.0'
md5-8a2d6432dc9a7088deecd5b7456fcb9d
cluster_name: pokedex
ssh_key_path: ~/.ssh/id_desktop
nodes:
- address: bulbasaur.lan
user: core
role:
- etcd
- controlplane
- worker
- address: charmander.lan
user: core
role:
- etcd
- controlplane
- worker
- address: squirtle.lan
user: core
role:
- controlplane
- etcd
- worker
authorization:
mode: none
ingress:
provider: nginx
prefix_path: /opt/rke
# attempt to mitigate loading lib/modules with z flag:
services:
kubeproxy:
extra_binds:
- "/lib/modules:/lib/modules:ro"
Steps to Reproduce:
rke up
Results:
Eventually the following error will print:
FATA[0613] [workerPlane] Failed to bring up Worker Plane: [Failed to start [kube-proxy] container on host [squirtle.lan]: Error response from daemon: error setting label on mount source '/usr/lib/modules': relabeling content in /usr is not allowed]
This has to do with this line not detecting Fedora CoreOS, meaning that it tries to mount /lib/modules (symlinked to /usr/lib/modules) with the z flag, which cannot happen due to /usr/ being ro.
Solution:
Add logic to detect FCOS or create an override to force mounting /lib/modules/ as read-only, similar to other services like this
I have created a test-fix by patching plan.go to always mount /lib/modules read-only. I've never contributed/worked with go before so I'll need some guidance on how to properly implement this, but from what I can tell, the bug is fixed when I mount it read-only.
Confirming this on FCOS with RKE v1.1.4:
ERRO[0050] Failed to upgrade worker components on NotReady hosts, error: [Failed to start [kube-proxy] container on host [444.444.444.444]: Error response from daemon: error setting label on mount source '/usr/lib/modules': relabeling content in /usr is not allowed]
rpm-ostree output from fcos:
* ostree://fedora:fedora/x86_64/coreos/stable
Version: 32.20200715.3.0 (2020-07-27T11:36:29Z)
Commit: a3b08ee51b1d950afd9d0d73f32d5424ad52c7703a6b5830e0dc11c3a682d869
GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
@superseb I see this fix was merged in. Do you know when it will be available/released? We just hit this issue today as well.
Depending on the rancher 2.5 release date (unknown to community) will this be ported to a rke 1.1.X release and made available to rancher 2.4 (custom cluster install)?
available to test in https://github.com/rancher/rke/releases/tag/v1.2.0-rc9
hi,
i get following error with flatcar linux:
your fix was about /usr/lib i have /usr/lib64. Is there a different fix needed?
The bug is reproduced with RKE v1.1.4
Steps:
Provision a fedora-coreos-31 instance.ami-001b07efbfa9bc41f
Prepare cluster.yml file:
nodes:
- address: <ip>
internal_address: <ip>
user: core
role: [controlplane,worker,etcd]
ssh_key_path: <cert>
services:
etcd:
snapshot: true
creation: 6h
retention: 24h
kubeproxy:
extra_binds:
- "/lib/modules:/lib/modules:ro"
Result:
Cluster creation fails with:
FATA[0107] [workerPlane] Failed to bring up Worker Plane: [Failed to create [kube-proxy] container on host [<ip>]: Failed to create Docker container [kube-proxy] on host [<ip>]: Error response from daemon: Duplicate mount point: /lib/modules]
The bug fix is verified with RKE v1.2.0-rc13
Steps:
Provision a fedora-coreos-31 instance.ami-001b07efbfa9bc41f
Prepare cluster.yml file:
nodes:
- address: <ip>
internal_address: <ip>
user: core
role: [controlplane,worker,etcd]
ssh_key_path: <cert>
services:
etcd:
snapshot: true
creation: 6h
retention: 24h
kubeproxy:
extra_binds:
- "/lib/modules:/lib/modules:ro"
Result:
Cluster creation succeeds:
INFO[0113] Starting container [kube-proxy] on host [<ip>], try #1
INFO[0113] [worker] Successfully started [kube-proxy] container on host [<ip>]
INFO[0113] [healthcheck] Start Healthcheck on service [kube-proxy] on host [<ip>]
INFO[0114] [healthcheck] service [kube-proxy] on host [<ip>] is healthy
INFO[0114] Image [rancher/rke-tools:v0.1.64] exists on host [<ip>]
INFO[0115] Starting container [rke-log-linker] on host [<ip>], try #1
...
INFO[0141] [addons] Executing deploy job rke-ingress-controller
INFO[0147] [ingress] ingress controller nginx deployed successfully
INFO[0147] [addons] Setting up user addons
INFO[0147] [addons] no user addons defined
INFO[0147] Finished building Kubernetes cluster successfully
Reopening to test without extra_binds: for kubeproxy
The bug is reproduced with RKE v1.1.4
Steps:
Provision a fedora-coreos-31 instance.ami-001b07efbfa9bc41f
Prepare cluster.yml file:
nodes:
- address: <ip>
internal_address: <ip>
user: core
role: [controlplane,worker,etcd]
ssh_key_path: <cert>
services:
etcd:
snapshot: true
creation: 6h
retention: 24h
Result:
Cluster creation fails with:
FATA[0112] [workerPlane] Failed to bring up Worker Plane: [Failed to start [kube-proxy] container on host [52.15.34.171]: Error response from daemon: error setting label on mount source '/usr/lib/modules': relabeling content in /usr is not allowed]
The bug fix is verified with RKE v1.2.0-rc13
Steps:
Provision a fedora-coreos-31 instance.ami-001b07efbfa9bc41f
Prepare cluster.yml file:
nodes:
- address: <ip>
internal_address: <ip>
user: core
role: [controlplane,worker,etcd]
ssh_key_path: <cert>
services:
etcd:
snapshot: true
creation: 6h
retention: 24h
Result:
Cluster creation succeeds:
INFO[0114] Starting container [kube-proxy] on host [<ip>], try #1
INFO[0114] [worker] Successfully started [kube-proxy] container on host [<ip>]
INFO[0114] [healthcheck] Start Healthcheck on service [kube-proxy] on host [<ip>]
INFO[0115] [healthcheck] service [kube-proxy] on host [<ip>] is healthy
...
INFO[0136] [addons] Executing deploy job rke-ingress-controller
INFO[0142] [ingress] ingress controller nginx deployed successfully
INFO[0142] [addons] Setting up user addons
INFO[0142] [addons] no user addons defined
INFO[0142] Finished building Kubernetes cluster successfully
hi,
i get following error with flatcar linux:
[controlPlane] Failed to upgrade Control Plane: [[Failed to start [kube-proxy] container on host [xxx]: Error response from daemon: error setting label on mount source '/usr/lib64/modules': relabeling content in /usr is not allowed]]
your fix was about /usr/lib i have /usr/lib64. Is there a different fix needed?
Did you find a fix ?
@mikekuzak : yes, i disabled the SELinux. Thats more of a workaround then a fix.
usr/lib64 is a symlink on /usr/lib and i could not find anything in the RKE code...
@dirien This did not happen before, is it the the new stable Flatcar version which screwed it ?
I'm running Rancher 2.3.9 on flatcar stable (2605.6.0)
@aaronRancher Will this fix the available for Rancher 2.3.x and 2.4.x branch ?
@mikekuzak this sounds related to https://github.com/rancher/rke/pull/2214 more than the version of Flatcar.
@vbatts i think it too, when diasabling SELINUX it works fine.
@mikekuzak this sounds related to #2214 more than the version of Flatcar.
@mikekuzak, @dirien, @vbatts,
I did comment the #2214 and after testing different rancher versjons and downgraded flatcarOS, the last working flatcar release is 2512.5.0.
@mikekuzak Will be backporting to 2.4, but not 2.3.
Most helpful comment
hi,
i get following error with flatcar linux:
your fix was about /usr/lib i have /usr/lib64. Is there a different fix needed?