Rke: /lib/modules not mounting read-only on Fedora CoreOS

Created on 10 Aug 2020  路  17Comments  路  Source: rancher/rke

RKE version:
v1.1.4
Docker version: (docker version,docker info preferred)

[core@squirtle ~]$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 10
  Running: 7
  Paused: 0
  Stopped: 3
 Images: 4
 Server Version: 19.03.11
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: journald
 Cgroup Driver: systemd
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: /usr/libexec/docker/docker-init
 containerd version: 
 runc version: fbdbaf85ecbc0e077f336c03062710435607dbf1
 init version: 
 Security Options:
  seccomp
   Profile: default
  selinux
 Kernel Version: 5.7.8-200.fc32.x86_64
 Operating System: Fedora CoreOS 32.20200715.3.0
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.63GiB
 Name: squirtle
 ID: T6WG:NF2V:6Q4R:YTML:JQKA:ZQE6:JNBJ:MQG3:VVO3:FAFM:JXRB:2ZCE
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

[core@squirtle ~]$ uname -r
5.7.8-200.fc32.x86_64
[core@squirtle ~]$ cat /etc/os-release
NAME=Fedora
VERSION="32.20200715.3.0 (CoreOS)"
ID=fedora
VERSION_ID=32
VERSION_CODENAME=""
PLATFORM_ID="platform:f32"
PRETTY_NAME="Fedora CoreOS 32.20200715.3.0"
ANSI_COLOR="0;34"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:32"
HOME_URL="https://getfedora.org/coreos/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora-coreos/"
SUPPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
BUG_REPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=32
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=32
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="CoreOS"
VARIANT_ID=coreos
OSTREE_VERSION='32.20200715.3.0'



md5-8a2d6432dc9a7088deecd5b7456fcb9d



cluster_name: pokedex
ssh_key_path: ~/.ssh/id_desktop
nodes:
    - address: bulbasaur.lan
      user: core
      role:
        - etcd
        - controlplane
        - worker
    - address: charmander.lan
      user: core
      role:
        - etcd
        - controlplane
        - worker
    - address: squirtle.lan
      user: core
      role:
        - controlplane
        - etcd
        - worker

authorization:
    mode: none
ingress:
    provider: nginx
prefix_path: /opt/rke
# attempt to mitigate loading lib/modules with z flag:
services:
  kubeproxy:
    extra_binds:
      - "/lib/modules:/lib/modules:ro"

Steps to Reproduce:
rke up
Results:
Eventually the following error will print:
FATA[0613] [workerPlane] Failed to bring up Worker Plane: [Failed to start [kube-proxy] container on host [squirtle.lan]: Error response from daemon: error setting label on mount source '/usr/lib/modules': relabeling content in /usr is not allowed]
This has to do with this line not detecting Fedora CoreOS, meaning that it tries to mount /lib/modules (symlinked to /usr/lib/modules) with the z flag, which cannot happen due to /usr/ being ro.
Solution:
Add logic to detect FCOS or create an override to force mounting /lib/modules/ as read-only, similar to other services like this

Most helpful comment

hi,

i get following error with flatcar linux:

your fix was about /usr/lib i have /usr/lib64. Is there a different fix needed?

All 17 comments

I have created a test-fix by patching plan.go to always mount /lib/modules read-only. I've never contributed/worked with go before so I'll need some guidance on how to properly implement this, but from what I can tell, the bug is fixed when I mount it read-only.

Confirming this on FCOS with RKE v1.1.4:

ERRO[0050] Failed to upgrade worker components on NotReady hosts, error: [Failed to start [kube-proxy] container on host [444.444.444.444]: Error response from daemon: error setting label on mount source '/usr/lib/modules': relabeling content in /usr is not allowed]

rpm-ostree output from fcos:

* ostree://fedora:fedora/x86_64/coreos/stable
                   Version: 32.20200715.3.0 (2020-07-27T11:36:29Z)
                    Commit: a3b08ee51b1d950afd9d0d73f32d5424ad52c7703a6b5830e0dc11c3a682d869
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0

@superseb I see this fix was merged in. Do you know when it will be available/released? We just hit this issue today as well.

Depending on the rancher 2.5 release date (unknown to community) will this be ported to a rke 1.1.X release and made available to rancher 2.4 (custom cluster install)?

hi,

i get following error with flatcar linux:

your fix was about /usr/lib i have /usr/lib64. Is there a different fix needed?

The bug is reproduced with RKE v1.1.4

Steps:
Provision a fedora-coreos-31 instance.ami-001b07efbfa9bc41f
Prepare cluster.yml file:

nodes:
  - address: <ip>
    internal_address: <ip>
    user: core
    role: [controlplane,worker,etcd]
    ssh_key_path: <cert>
services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h
  kubeproxy:
    extra_binds:
      - "/lib/modules:/lib/modules:ro"

Result:
Cluster creation fails with:

FATA[0107] [workerPlane] Failed to bring up Worker Plane: [Failed to create [kube-proxy] container on host [<ip>]: Failed to create Docker container [kube-proxy] on host [<ip>]: Error response from daemon: Duplicate mount point: /lib/modules]

The bug fix is verified with RKE v1.2.0-rc13

Steps:
Provision a fedora-coreos-31 instance.ami-001b07efbfa9bc41f
Prepare cluster.yml file:

nodes:
  - address: <ip>
    internal_address: <ip>
    user: core
    role: [controlplane,worker,etcd]
    ssh_key_path: <cert>
services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h
  kubeproxy:
    extra_binds:
      - "/lib/modules:/lib/modules:ro"

Result:
Cluster creation succeeds:

INFO[0113] Starting container [kube-proxy] on host [<ip>], try #1
INFO[0113] [worker] Successfully started [kube-proxy] container on host [<ip>]
INFO[0113] [healthcheck] Start Healthcheck on service [kube-proxy] on host [<ip>]
INFO[0114] [healthcheck] service [kube-proxy] on host [<ip>] is healthy
INFO[0114] Image [rancher/rke-tools:v0.1.64] exists on host [<ip>]
INFO[0115] Starting container [rke-log-linker] on host [<ip>], try #1
...
INFO[0141] [addons] Executing deploy job rke-ingress-controller
INFO[0147] [ingress] ingress controller nginx deployed successfully
INFO[0147] [addons] Setting up user addons
INFO[0147] [addons] no user addons defined
INFO[0147] Finished building Kubernetes cluster successfully

Reopening to test without extra_binds: for kubeproxy

The bug is reproduced with RKE v1.1.4

Steps:
Provision a fedora-coreos-31 instance.ami-001b07efbfa9bc41f
Prepare cluster.yml file:

nodes:
  - address: <ip>
    internal_address: <ip>
    user: core
    role: [controlplane,worker,etcd]
    ssh_key_path: <cert>
services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h

Result:
Cluster creation fails with:

FATA[0112] [workerPlane] Failed to bring up Worker Plane: [Failed to start [kube-proxy] container on host [52.15.34.171]: Error response from daemon: error setting label on mount source '/usr/lib/modules': relabeling content in /usr is not allowed]

The bug fix is verified with RKE v1.2.0-rc13

Steps:
Provision a fedora-coreos-31 instance.ami-001b07efbfa9bc41f
Prepare cluster.yml file:

nodes:
  - address: <ip>
    internal_address: <ip>
    user: core
    role: [controlplane,worker,etcd]
    ssh_key_path: <cert>
services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h

Result:
Cluster creation succeeds:

INFO[0114] Starting container [kube-proxy] on host [<ip>], try #1
INFO[0114] [worker] Successfully started [kube-proxy] container on host [<ip>]
INFO[0114] [healthcheck] Start Healthcheck on service [kube-proxy] on host [<ip>]
INFO[0115] [healthcheck] service [kube-proxy] on host [<ip>] is healthy
...
INFO[0136] [addons] Executing deploy job rke-ingress-controller
INFO[0142] [ingress] ingress controller nginx deployed successfully
INFO[0142] [addons] Setting up user addons
INFO[0142] [addons] no user addons defined
INFO[0142] Finished building Kubernetes cluster successfully

hi,

i get following error with flatcar linux:
[controlPlane] Failed to upgrade Control Plane: [[Failed to start [kube-proxy] container on host [xxx]: Error response from daemon: error setting label on mount source '/usr/lib64/modules': relabeling content in /usr is not allowed]]
your fix was about /usr/lib i have /usr/lib64. Is there a different fix needed?

Did you find a fix ?

@mikekuzak : yes, i disabled the SELinux. Thats more of a workaround then a fix.

usr/lib64 is a symlink on /usr/lib and i could not find anything in the RKE code...

@dirien This did not happen before, is it the the new stable Flatcar version which screwed it ?
I'm running Rancher 2.3.9 on flatcar stable (2605.6.0)

@aaronRancher Will this fix the available for Rancher 2.3.x and 2.4.x branch ?

@mikekuzak this sounds related to https://github.com/rancher/rke/pull/2214 more than the version of Flatcar.

@vbatts i think it too, when diasabling SELINUX it works fine.

@mikekuzak this sounds related to #2214 more than the version of Flatcar.

@mikekuzak, @dirien, @vbatts,
I did comment the #2214 and after testing different rancher versjons and downgraded flatcarOS, the last working flatcar release is 2512.5.0.

@mikekuzak Will be backporting to 2.4, but not 2.3.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

freeloop914 picture freeloop914  路  16Comments

iljaweis picture iljaweis  路  20Comments

myselfghost picture myselfghost  路  17Comments

de13 picture de13  路  32Comments

pasikarkkainen picture pasikarkkainen  路  16Comments