* bug*
Error installing Longhorn on k3s.
instance-manager-r-434985db 1/1 Running 0 31m
instance-manager-e-867139e6 1/1 Running 0 31m
instance-manager-r-2bafdd7b 1/1 Running 0 31m
instance-manager-e-66826b2f 1/1 Running 0 31m
instance-manager-r-a2f1d4c5 1/1 Running 0 31m
instance-manager-e-ab3218de 1/1 Running 0 31m
longhorn-manager-h6d48 0/1 CrashLoopBackOff 6 32m
engine-image-ei-eee5f438-k6p5p 0/1 CrashLoopBackOff 10 32m
longhorn-ui-8486987944-msczm 0/1 CrashLoopBackOff 10 32m
engine-image-ei-eee5f438-zj49j 0/1 CrashLoopBackOff 10 32m
longhorn-manager-6n9zn 0/1 CrashLoopBackOff 7 32m
longhorn-manager-7l269 0/1 CrashLoopBackOff 7 32m
longhorn-driver-deployer-cd74cb75b-hhgkt 0/1 Init:CrashLoopBackOff 11 32m
engine-image-ei-eee5f438-x65mf 0/1 CreateContainerError 0 32m
To Reproduce
I just run the command:
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml
Log
the logs of longhorn-manager-h6d48 :
time="2020-06-15T14:32:33Z" level=info msg="Start overwriting built-in settings with customized values"
time="2020-06-15T14:32:33Z" level=info msg="cannot list the content of the src directory /var/lib/rancher/longhorn/engine-binaries for the copy, will do nothing: Failed to execute: nsenter [--mount=/host/proc/1/ns/mnt --net=/host/proc/1/ns/net bash -c ls /var/lib/rancher/longhorn/engine-binaries/*], output , stderr, ls: cannot access /var/lib/rancher/longhorn/engine-binaries/*: No such file or directory\n, error exit status 2"
time="2020-06-15T14:32:33Z" level=info msg="New upgrade leader elected: k3s-node01"
time="2020-06-15T14:32:38Z" level=info msg="New upgrade leader elected: k3s-node02"
time="2020-06-15T14:32:58Z" level=info msg="Start upgrading"
time="2020-06-15T14:32:58Z" level=info msg="No API version upgrade is needed"
time="2020-06-15T14:32:58Z" level=info msg="Finish upgrading"
E0615 14:32:58.582171 1 leaderelection.go:282] Failed to release lock: Lease.coordination.k8s.io "longhorn-manager-upgrade-lock" is invalid: spec.leaseDurationSeconds: Invalid value: 0: must be greater than 0
time="2020-06-15T14:32:58Z" level=info msg="Upgrade leader lost: k3s-master"
time="2020-06-15T14:32:58Z" level=debug msg="Waiting for engine image longhornio/longhorn-engine:v1.0.0 to be ready"
time="2020-06-15T14:32:58Z" level=info msg="Start Longhorn Kubernetes node controller"
time="2020-06-15T14:32:58Z" level=info msg="Start Longhorn replica controller"
time="2020-06-15T14:32:58Z" level=info msg="Start Longhorn engine controller"
time="2020-06-15T14:32:58Z" level=info msg="Start Longhorn volume controller"
time="2020-06-15T14:32:58Z" level=info msg="Start Longhorn Engine Image controller"
time="2020-06-15T14:32:58Z" level=info msg="Start Longhorn node controller"
time="2020-06-15T14:32:58Z" level=info msg="Start Longhorn websocket controller"
time="2020-06-15T14:32:58Z" level=info msg="Start Longhorn Setting controller"
time="2020-06-15T14:32:58Z" level=info msg="Starting Longhorn instance manager controller"
time="2020-06-15T14:32:58Z" level=info msg="Start kubernetes controller"
time="2020-06-15T14:32:58Z" level=debug msg="Start monitoring instance manager instance-manager-r-a2f1d4c5"
time="2020-06-15T14:32:58Z" level=debug msg="Start monitoring instance manager instance-manager-e-ab3218de"
time="2020-06-15T14:33:04Z" level=debug msg="Waiting for engine image longhornio/longhorn-engine:v1.0.0 to be ready"
time="2020-06-15T14:33:08Z" level=debug msg="Failed to check for the latest upgrade: Post \"https://longhorn-upgrade-responder.rancher.io/v1/checkupgrade\": dial tcp: lookup longhorn-upgrade-responder.rancher.io on 10.43.0.10:53: read udp 10.42.0.227:41022->10.43.0.10:53: read: connection refused"
time="2020-06-15T14:33:10Z" level=debug msg="Waiting for engine image longhornio/longhorn-engine:v1.0.0 to be ready"
time="2020-06-15T14:34:58Z" level=fatal msg="Error starting manager: failed to wait for engine image longhornio/longhorn-engine:v1.0.0: Wait for engine image longhornio/longhorn-engine:v1.0.0 timed out"
the logs of engine-image-ei-eee5f438-k6p5p :
/bin/bash: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory
the logs of longhorn-ui-8486987944-msczm :
/bin/bash: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory
Environment:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k3s-master Ready master 11h v1.18.3+k3s1 192.168.109.10 <none> CentOS Linux 7 (Core) 3.10.0-1127.10.1.el7.x86_64 containerd://1.3.3-k3s2
k3s-node01 Ready <none> 11h v1.18.3+k3s1 192.168.109.11 <none> CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 containerd://1.3.3-k3s2
k3s-node02 Ready <none> 11h v1.18.3+k3s1 192.168.109.12 <none> CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 containerd://1.3.3-k3s2
Are you running Selinux?
Are you running Selinux?
when I disable the SELinux, longhorn is worked. thank you.
Reproduced using master (06/16) on RHEL with SELinux enabled.
While the installation succeeded, no volumes were successfully created.
```
time="2020-06-16T18:37:41Z" level=warning msg="Error syncing Longhorn engine longhorn-system/wordpress-maria-db-e-f09ec2d5: fail to sync engine for longhorn-system/wordpress-maria-db-e-f09ec2d5: fail to start rebuild for wordpress-maria-db-r-f67ec3cb of wordpress-maria-db-e-f09ec2d5: timed out waiting for the condition"
time="2020-06-16T18:37:41Z" level=info msg="Event(v1.ObjectReference{Kind:\"Engine\", Namespace:\"longhorn-system\", Name:\"wordpress-maria-db-e-f09ec2d5\", UID:\"892556bc-b801-447f-b19f-4ff9cc981620\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"2363174\", FieldPath:\"\"}): type: 'Normal' reason: 'Rebuilding' Start rebuilding replica wordpress-maria-db-r-f67ec3cb with Address 10.42.1.14:10000 for wordpress-maria-db"
time="2020-06-16T18:38:01Z" level=error msg="Failed rebuilding 10.42.1.14:10000 of wordpress-maria-db: failed to add replica address='tcp://10.42.1.14:10000' to controller 'wordpress-maria-db': failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-master/longhorn [--url 10.42.0.71:10000 add tcp://10.42.1.14:10000], output , stderr, time=\"2020-06-16T18:38:01Z\" level=fatal msg=\"Error running add replica command: failed to get replica 10.42.1.14:10000: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \\"transport: Error while dialing dial tcp 10.42.1.14:10000: i/o timeout\\"\"\n, error exit status 1"
time="2020-06-16T18:38:01Z" level=info msg="Event(v1.ObjectReference{Kind:\"Engine\", Namespace:\"longhorn-system\", Name:\"wordpress-maria-db-e-f09ec2d5\", UID:\"892556bc-b801-447f-b19f-4ff9cc981620\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"2363174\", FieldPath:\"\"}): type: 'Warning' reason: 'FailedRebuilding' Failed rebuilding replica with Address 10.42.1.14:10000: failed to add replica address='tcp://10.42.1.14:10000' to controller 'wordpress-maria-db': failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-master/longhorn [--url 10.42.0.71:10000 add tcp://10.42.1.14:10000], output , stderr, time=\"2020-06-16T18:38:01Z\" level=fatal msg=\"Error running add replica command: failed to get replica 10.42.1.14:10000: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \\"transport: Error while dialing dial tcp 10.42.1.14:10000: i/o timeout\\"\"\n, error exit status 1"
time="2020-06-16T18:38:01Z" level=error msg="Removed failed rebuilding replica 10.42.1.14:10000 of wordpress-maria-db"
time="2020-06-16T18:38:01Z" level=info msg="Engine wordpress-maria-db-e-f09ec2d5 is still in backoff for replica wordpress-maria-db-r-f67ec3cb rebuild failure"
麓麓麓
Workaround: Install a policy as follows on all LH nodes:
ausearch -c 'csi-resizer' --raw | audit2allow -M my-csiresizer
semodule -i my-csiresizer.pp
ausearch -c 'csi-provisioner' --raw | audit2allow -M my-csiprovisioner
semodule -i my-csiprovisioner.pp
ausearch -c 'csi-attacher' --raw | audit2allow -M my-csiattacher
semodule -i my-csiattacher.pp
With https://github.com/longhorn/longhorn/issues/1273 fixed, we should be able to support SELinux in v1.0.1.
In fact @janeczku has tested #1273 and it doesn't work for him on RHEL 7.8 with SELinux. @khushboo-rancher can you try to reproduce the issue?
This issue also related to https://github.com/rancher/rancher/issues/26789
Longhorn v1.0.1 successfully gets deployed on a k3s cluster on RHEL 7.8 with SELinux.
Tested the below P1 workflow as well, they worked fine:
Volume claim template using longhorn class which creates volume in longhorn.The node details.
cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.8 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.8"
PRETTY_NAME="Red Hat Enterprise Linux"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.8:GA:server"
[root@ip-xx-xx-xx-xx ec2-user]# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 31
[root@ip-xx-xx-xx-xx ec2-user]# getenforce
Enforcing
Note:
The problem in deployment can be reproduced with longhorn- v1.0.0
Most helpful comment
Workaround: Install a policy as follows on all LH nodes: