Environment:
Hi,
I have been able to get LH running and I see the volume has been successfully created on the underlying storage.

The pod with the volume is not starting.

longhorn-manager
time="2020-04-15T17:12:18Z" level=debug msg="Replica controller us2-k8swkr02 picked up pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-1535f3bf"
time="2020-04-15T17:12:18Z" level=debug msg="Instance handler updated instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-1535f3bf state, old state , new state stopped"
time="2020-04-15T17:12:18Z" level=debug msg="Replica controller us2-k8swkr02 picked up pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-06f4526e"
time="2020-04-15T17:12:18Z" level=debug msg="Instance handler updated instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-06f4526e state, old state , new state stopped"
time="2020-04-15T17:12:19Z" level=debug msg="Replica controller us2-k8swkr02 picked up pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985"
time="2020-04-15T17:12:25Z" level=debug msg="Volume controller us2-k8swkr02 picked up pvc-d061512e-870a-4ece-bd45-2f04672d5256"
time="2020-04-15T17:12:25Z" level=debug msg="Prepare to create instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985"
time="2020-04-15T17:12:25Z" level=info msg="Event(v1.ObjectReference{Kind:\"Replica\", Namespace:\"longhorn-system\", Name:\"pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985\", UID:\"2fcd4749-4cc4-46f6-9964-2e7d9a0ccaf5\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"2864440\", FieldPath:\"\"}): type: 'Normal' reason: 'Start' Starts pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985"
time="2020-04-15T17:12:25Z" level=debug msg="Requeue longhorn-system/pvc-d061512e-870a-4ece-bd45-2f04672d5256 due to error <nil> or Operation cannot be fulfilled on replicas.longhorn.io \"pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-06f4526e\": the object has been modified; please apply your changes to the latest version and try again"
time="2020-04-15T17:12:26Z" level=debug msg="Instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985 starts running, IP 10.42.4.71"
time="2020-04-15T17:12:26Z" level=debug msg="Instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985 starts running, Port 10000"
time="2020-04-15T17:12:26Z" level=debug msg="Instance handler updated instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985 state, old state stopped, new state running"
time="2020-04-15T17:12:26Z" level=debug msg="Engine controller us2-k8swkr02 picked up pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87"
time="2020-04-15T17:12:26Z" level=debug msg="Instance process pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 had been created, need to wait for instance manager update"
time="2020-04-15T17:12:26Z" level=debug msg="Instance process pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 had been created, need to wait for instance manager update"
time="2020-04-15T17:12:27Z" level=debug msg="Instance process pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 had been created, need to wait for instance manager update"
time="2020-04-15T17:12:27Z" level=debug msg="Instance handler updated instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 state, old state stopped, new state starting"
time="2020-04-15T17:12:30Z" level=debug msg="Instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 starts running, IP 10.42.4.70"
time="2020-04-15T17:12:30Z" level=debug msg="Instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 starts running, Port 10000"
time="2020-04-15T17:12:30Z" level=debug msg="Instance handler updated instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 state, old state starting, new state running"
time="2020-04-15T17:12:30Z" level=debug msg="Start monitoring pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87"
time="2020-04-15T17:12:30Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-d061512e-870a-4ece-bd45-2f04672d5256\", UID:\"fecc104d-8ec6-48c4-8af0-0ab25ec1ba90\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"2864445\", FieldPath:\"\"}): type: 'Normal' reason: 'Attached' volume pvc-d061512e-870a-4ece-bd45-2f04672d5256 has been attached to us2-k8swkr02"
longhorn-driver
time="2020-04-15T13:40:25Z" level=debug msg="Deploying CSI driver"
time="2020-04-15T13:40:25Z" level=debug msg="proc cmdline detection pod discover-proc-kubelet-cmdline in phase: Pending"
time="2020-04-15T13:40:26Z" level=debug msg="proc cmdline detection pod discover-proc-kubelet-cmdline in phase: Pending"
time="2020-04-15T13:40:27Z" level=debug msg="proc cmdline detection pod discover-proc-kubelet-cmdline in phase: Running"
time="2020-04-15T13:40:28Z" level=debug msg="proc cmdline detection pod discover-proc-kubelet-cmdline in phase: Running"
time="2020-04-15T13:40:29Z" level=info msg="Proc found: kubelet"
time="2020-04-15T13:40:29Z" level=info msg="Try to find arg [--root-dir] in cmdline: [kubelet --cluster-dns=10.43.0.10 --hostname-override=us2-k8swkr01 --pod-infra-container-image=rancher/pause:3.1 --make-iptables-util-chains=true --cni-conf-dir=/etc/cni/net.d --authentication-token-webhook=true --resolv-conf=/etc/resolv.conf --cloud-provider= --cgroups-per-qos=True --v=2 --read-only-port=10255 --cni-bin-dir=/opt/cni/bin --root-dir=/var/lib/kubelet --kubeconfig=/etc/kubernetes/ssl/kubecfg-kube-node.yaml --authorization-mode=Webhook --address=0.0.0.0 --volume-plugin-dir=/var/lib/kubelet/volumeplugins --network-plugin=cni --client-ca-file=/etc/kubernetes/ssl/kube-ca.pem --fail-swap-on=false --streaming-connection-idle-timeout=30m --event-qps=0 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --anonymous-auth=false --cluster-domain=cluster.local --cgroup-driver=cgroupfs --resolv-conf=/run/systemd/resolve/resolv.conf ]"
time="2020-04-15T13:40:29Z" level=info msg="Detected root dir path: /var/lib/kubelet"
time="2020-04-15T13:40:29Z" level=info msg="Upgrading Longhorn related components for CSI v1.1.0"
time="2020-04-15T13:40:29Z" level=debug msg="Creating CSI Driver driver.longhorn.io"
time="2020-04-15T13:40:29Z" level=debug msg="Created CSI Driver driver.longhorn.io"
time="2020-04-15T13:40:29Z" level=debug msg="Creating service csi-attacher"
time="2020-04-15T13:40:29Z" level=debug msg="Created service csi-attacher"
time="2020-04-15T13:40:29Z" level=debug msg="Creating deployment csi-attacher"
time="2020-04-15T13:40:29Z" level=debug msg="Created deployment csi-attacher"
time="2020-04-15T13:40:29Z" level=debug msg="Creating service csi-provisioner"
time="2020-04-15T13:40:29Z" level=debug msg="Created service csi-provisioner"
time="2020-04-15T13:40:29Z" level=debug msg="Creating deployment csi-provisioner"
time="2020-04-15T13:40:29Z" level=debug msg="Created deployment csi-provisioner"
time="2020-04-15T13:40:30Z" level=debug msg="Creating service csi-resizer"
time="2020-04-15T13:40:30Z" level=debug msg="Created service csi-resizer"
time="2020-04-15T13:40:30Z" level=debug msg="Creating deployment csi-resizer"
time="2020-04-15T13:40:30Z" level=debug msg="Created deployment csi-resizer"
time="2020-04-15T13:40:30Z" level=debug msg="Creating daemon set longhorn-csi-plugin"
time="2020-04-15T13:40:30Z" level=debug msg="Created daemon set longhorn-csi-plugin"
time="2020-04-15T13:40:30Z" level=debug msg="Creating service compatible-csi-attacher"
time="2020-04-15T13:40:30Z" level=debug msg="Created service compatible-csi-attacher"
time="2020-04-15T13:40:30Z" level=debug msg="Creating deployment compatible-csi-attacher"
time="2020-04-15T13:40:30Z" level=debug msg="Created deployment compatible-csi-attacher"
time="2020-04-15T13:40:30Z" level=debug msg="CSI deployment done"
conpatibility-node-driver-registrar
I0415 13:40:32.718021 1 main.go:110] Version: v1.2.0-0-g6ef000ae
I0415 13:40:32.718114 1 main.go:120] Attempting to open a gRPC connection with: "/var/lib/kubelet/plugins/io.rancher.longhorn/csi.sock"
I0415 13:40:32.718146 1 connection.go:151] Connecting to unix:///var/lib/kubelet/plugins/io.rancher.longhorn/csi.sock
I0415 13:40:33.720229 1 main.go:127] Calling CSI driver to discover driver name
I0415 13:40:33.720281 1 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo
I0415 13:40:33.720297 1 connection.go:181] GRPC request: {}
I0415 13:40:33.725650 1 connection.go:183] GRPC response: {"name":"io.rancher.longhorn","vendor_version":"v0.8.0"}
I0415 13:40:33.726588 1 connection.go:184] GRPC error: <nil>
I0415 13:40:33.726608 1 main.go:137] CSI driver name: "io.rancher.longhorn"
I0415 13:40:33.726763 1 node_register.go:58] Starting Registration Server at: /registration/io.rancher.longhorn-reg.sock
I0415 13:40:33.727143 1 node_register.go:67] Registration Server started at: /registration/io.rancher.longhorn-reg.sock
I0415 13:41:22.090870 1 main.go:77] Received GetInfo call: &InfoRequest{}
I0415 13:41:22.706991 1 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
longhorn-csi-plugin
E0416 08:47:25.148934 1 mount_linux.go:487] format of disk "/dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256" failed: type:("ext4") target:("/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount") options:(["defaults"])error:(exit status 1)
time="2020-04-16T08:47:25Z" level=error msg="GRPC error: rpc error: code = Internal desc = exit status 1"
time="2020-04-16T08:49:27Z" level=info msg="GRPC call: /csi.v1.Node/NodeGetCapabilities"
time="2020-04-16T08:49:27Z" level=info msg="GRPC request: {}"
time="2020-04-16T08:49:27Z" level=info msg="GRPC response: {\"capabilities\":[{\"Type\":{\"Rpc\":{\"type\":3}}}]}"
time="2020-04-16T08:49:27Z" level=info msg="GRPC call: /csi.v1.Node/NodeGetCapabilities"
time="2020-04-16T08:49:27Z" level=info msg="GRPC request: {}"
time="2020-04-16T08:49:27Z" level=info msg="GRPC response: {\"capabilities\":[{\"Type\":{\"Rpc\":{\"type\":3}}}]}"
time="2020-04-16T08:49:27Z" level=info msg="GRPC call: /csi.v1.Node/NodePublishVolume"
time="2020-04-16T08:49:27Z" level=info msg="GRPC request: {\"target_path\":\"/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount\",\"volume_capability\":{\"AccessType\":{\"Mount\":{\"fs_type\":\"ext4\"}},\"access_mode\":{\"mode\":1}},\"volume_context\":{\"baseImage\":\"\",\"fromBackup\":\"\",\"numberOfReplicas\":\"3\",\"staleReplicaTimeout\":\"30\",\"storage.kubernetes.io/csiProvisionerIdentity\":\"1586958032802-8081-driver.longhorn.io\"},\"volume_id\":\"pvc-d061512e-870a-4ece-bd45-2f04672d5256\"}"
time="2020-04-16T08:49:27Z" level=info msg="NodeServer NodePublishVolume req: volume_id:\"pvc-d061512e-870a-4ece-bd45-2f04672d5256\" target_path:\"/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"baseImage\" value:\"\" > volume_context:<key:\"fromBackup\" value:\"\" > volume_context:<key:\"numberOfReplicas\" value:\"3\" > volume_context:<key:\"staleReplicaTimeout\" value:\"30\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1586958032802-8081-driver.longhorn.io\" > "
E0416 08:49:27.567704 1 mount_linux.go:143] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256 /var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount
Output: mount: /var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount: /dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256 already mounted or mount point busy.
E0416 08:49:27.576477 1 mount_linux.go:487] format of disk "/dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256" failed: type:("ext4") target:("/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount") options:(["defaults"])error:(exit status 1)
time="2020-04-16T08:49:27Z" level=error msg="GRPC error: rpc error: code = Internal desc = exit status 1"
time="2020-04-16T08:51:29Z" level=info msg="GRPC call: /csi.v1.Node/NodeGetCapabilities"
time="2020-04-16T08:51:29Z" level=info msg="GRPC request: {}"
time="2020-04-16T08:51:29Z" level=info msg="GRPC response: {\"capabilities\":[{\"Type\":{\"Rpc\":{\"type\":3}}}]}"
time="2020-04-16T08:51:29Z" level=info msg="GRPC call: /csi.v1.Node/NodeGetCapabilities"
time="2020-04-16T08:51:29Z" level=info msg="GRPC request: {}"
time="2020-04-16T08:51:29Z" level=info msg="GRPC response: {\"capabilities\":[{\"Type\":{\"Rpc\":{\"type\":3}}}]}"
time="2020-04-16T08:51:29Z" level=info msg="GRPC call: /csi.v1.Node/NodePublishVolume"
time="2020-04-16T08:51:29Z" level=info msg="GRPC request: {\"target_path\":\"/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount\",\"volume_capability\":{\"AccessType\":{\"Mount\":{\"fs_type\":\"ext4\"}},\"access_mode\":{\"mode\":1}},\"volume_context\":{\"baseImage\":\"\",\"fromBackup\":\"\",\"numberOfReplicas\":\"3\",\"staleReplicaTimeout\":\"30\",\"storage.kubernetes.io/csiProvisionerIdentity\":\"1586958032802-8081-driver.longhorn.io\"},\"volume_id\":\"pvc-d061512e-870a-4ece-bd45-2f04672d5256\"}"
time="2020-04-16T08:51:29Z" level=info msg="NodeServer NodePublishVolume req: volume_id:\"pvc-d061512e-870a-4ece-bd45-2f04672d5256\" target_path:\"/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"baseImage\" value:\"\" > volume_context:<key:\"fromBackup\" value:\"\" > volume_context:<key:\"numberOfReplicas\" value:\"3\" > volume_context:<key:\"staleReplicaTimeout\" value:\"30\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1586958032802-8081-driver.longhorn.io\" > "
E0416 08:51:30.015597 1 mount_linux.go:143] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256 /var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount
Output: mount: /var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount: /dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256 already mounted or mount point busy.
E0416 08:51:30.024131 1 mount_linux.go:487] format of disk "/dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256" failed: type:("ext4") target:("/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount") options:(["defaults"])error:(exit status 1)
time="2020-04-16T08:51:30Z" level=error msg="GRPC error: rpc error: code = Internal desc = exit status 1"
Expected behavior:
Pod should mount volume and startup.
Output: mount: /var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount: /dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256 already mounted or mount point busy.
Seems different from #1208
The keyword for the issue is:
already mounted or mount point busy.
We've just saw this error on another setup. Somehow the longhorn device pointed to the wrong disk on the node (root disk this time). You can confirm the issue by checking the major:minor number of longhorn device /dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256 against your root disk.
We're trying to reproduce the issue in house and fix it in the upcoming v0.8.1.
ls -lrt /dev/longhorn/
brw-rw----. 1 root root 8, 144 Apr 15 13:12 pvc-d061512e-870a-4ece-bd45-2f04672d5256
We do have a 3Par Storage attached that we are trying to use.
/var/storage/lh
brw-rw----. 1 root disk 254, 2 Apr 14 07:56 dm-2
When I changed the Default Data Path in LH to the local os device, same issue in the logs.
/opt/lh
brw-rw----. 1 root disk 8, 3 Apr 14 07:57 sda3
CoreOS will flip the disk to /sda4 next update
@mikekuzak Can you show the dmesg when longhorn attach a disk to your node? That should show who is the "real block device" created by iscsi (which Longhorn should use it's major/minor number to create the longhorn device)
Also, a certain block device (other than Longhorn) must have 8, 114 in your /dev directory. It would be great if you can check and let me know.
@mikekuzak Also, can you show me the result of the following commands on the failed mounted node?
iscsiadm -m session -P 3
As well as
ls -l /dev/
And
ls -l /dev/disk/by-path/
Thanks.
@mikekuzak Can you show the dmesg when longhorn attach a disk to your node? That should show who is the "real block device" created by iscsi (which Longhorn should use it's major/minor number to create the longhorn device)
Also, a certain block device (other than Longhorn) must have
8, 114in your/devdirectory. It would be great if you can check and let me know.
brw-rw----. 1 root disk 8, 144 Apr 17 07:39 sdj
93942.823644] Loading iSCSI transport class v2.0-870.
[93943.842976] iscsi: registered transport (tcp)
[93944.130703] scsi host3: iSCSI Initiator over TCP/IP
[93944.427887] scsi 3:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
[93944.467653] scsi 3:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
[93944.506074] sd 3:0:0:1: Power-on or device reset occurred
[93944.531867] sd 3:0:0:1: [sdj] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[93944.566204] sd 3:0:0:1: [sdj] Write Protect is off
[93944.588145] sd 3:0:0:1: [sdj] Mode Sense: 69 00 10 08
[93944.588630] sd 3:0:0:1: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA
[93945.534129] sd 3:0:0:1: [sdj] Attached SCSI disk
[94640.940259] sd 3:0:0:1: [sdj] Synchronizing SCSI cache
[98745.128909] perf: interrupt took too long (9779 > 9770), lowering kernel.perf_event_max_sample_rate to 20000
[105369.068349] scsi host3: iSCSI Initiator over TCP/IP
[105369.363967] scsi 3:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
[105369.405748] scsi 3:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
[105369.445411] sd 3:0:0:1: Power-on or device reset occurred
[105369.472664] sd 3:0:0:1: [sdj] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[105369.508768] sd 3:0:0:1: [sdj] Write Protect is off
[105369.531914] sd 3:0:0:1: [sdj] Mode Sense: 69 00 10 08
[105369.532333] sd 3:0:0:1: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA
[105370.472161] sd 3:0:0:1: [sdj] Attached SCSI disk
[257852.626427] sd 3:0:0:1: [sdj] Synchronizing SCSI cache
[258178.382097] scsi host3: iSCSI Initiator over TCP/IP
[258178.677325] scsi 3:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
[258178.717078] scsi 3:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
[258178.755253] sd 3:0:0:1: Power-on or device reset occurred
[258178.781934] sd 3:0:0:1: [sdj] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[258178.816633] sd 3:0:0:1: [sdj] Write Protect is off
[258178.840110] sd 3:0:0:1: [sdj] Mode Sense: 69 00 10 08
[258178.840932] sd 3:0:0:1: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA
[258179.781407] sd 3:0:0:1: [sdj] Attached SCSI disk
@mikekuzak Also, can you show me the result of the following commands on the failed mounted node?
iscsiadm -m session -P 3As well as
ls -l /dev/And
ls -l /dev/disk/by-path/Thanks.
iscsiadm -m session -P 3
iSCSI Transport Class version 2.0-870
version 2.0-873
Target: iqn.2019-10.io.longhorn:pvc-ad3781eb-8004-4c5f-bc22-1b8fb7fa65f0
Current Portal: 10.42.4.70:3260,1
Persistent Portal: 10.42.4.70:3260,1
**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.2005-03.org.open-iscsi:f18fb68d1193
Iface IPaddress: 10.4.201.102
Iface HWaddress: <empty>
Iface Netdev: <empty>
SID: 3
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
*********
Timeouts:
*********
Recovery Timeout: 5
Target Reset Timeout: 30
LUN Reset Timeout: 30
Abort Timeout: 15
*****
CHAP:
*****
username: <empty>
password: ********
username_in: <empty>
password_in: ********
************************
Negotiated iSCSI params:
************************
HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 262144
MaxXmitDataSegmentLength: 8192
FirstBurstLength: 65536
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: Yes
MaxOutstandingR2T: 1
************************
Attached SCSI devices:
************************
Host Number: 3 State: running
scsi3 Channel 00 Id 0 Lun: 0
scsi3 Channel 00 Id 0 Lun: 1
Attached scsi disk sdj State: running
ls -l /dev/
total 0
crw-r--r--. 1 root root 10, 235 Apr 14 07:56 autofs
drwxr-xr-x. 2 root root 620 Apr 17 07:39 block
drwxr-xr-x. 2 root root 640 Apr 17 07:39 bsg
crw-------. 1 root root 10, 234 Apr 14 07:56 btrfs-control
drwxr-xr-x. 3 root root 60 Apr 14 07:56 bus
drwxr-xr-x. 2 root root 3420 Apr 17 07:39 char
crw-------. 1 root root 5, 1 Apr 14 07:56 console
lrwxrwxrwx. 1 root root 11 Apr 14 07:56 core -> /proc/kcore
drwxr-xr-x. 2 root root 60 Apr 14 07:56 cpu
crw-------. 1 root root 10, 62 Apr 14 07:56 cpu_dma_latency
crw-------. 1 root root 10, 203 Apr 14 07:56 cuse
drwxr-xr-x. 9 root root 180 Apr 14 07:56 disk
brw-rw----. 1 root disk 254, 0 Apr 14 07:56 dm-0
brw-rw----. 1 root disk 254, 1 Apr 14 07:56 dm-1
brw-rw----. 1 root disk 254, 2 Apr 14 07:56 dm-2
brw-rw----. 1 root disk 254, 3 Apr 17 07:39 dm-3
lrwxrwxrwx. 1 root root 13 Apr 14 07:56 fd -> /proc/self/fd
crw-rw-rw-. 1 root root 1, 7 Apr 14 07:56 full
crw-rw-rw-. 1 root root 10, 229 Apr 14 07:56 fuse
crw-------. 1 root root 10, 228 Apr 14 07:56 hpet
drwxr-xr-x. 2 root root 360 Apr 14 07:56 hpilo
drwxr-xr-x. 2 root root 0 Apr 14 07:56 hugepages
crw-------. 1 root root 10, 183 Apr 14 07:56 hwrng
lrwxrwxrwx. 1 root root 12 Apr 14 07:56 initctl -> /run/initctl
drwxr-xr-x. 2 root root 60 Apr 14 07:56 input
crw-------. 1 root root 246, 0 Apr 14 07:56 ipmi0
crw-r--r--. 1 root root 1, 11 Apr 14 07:56 kmsg
crw-rw----. 1 root kvm 10, 232 Apr 14 07:56 kvm
lrwxrwxrwx. 1 root root 28 Apr 14 07:56 log -> /run/systemd/journal/dev-log
drwxr-xr-x. 2 root root 60 Apr 17 07:39 longhorn
crw-rw----. 1 root disk 10, 237 Apr 14 07:56 loop-control
drwxr-xr-x. 2 root root 140 Apr 17 07:39 mapper
crw-r-----. 1 root kmem 1, 1 Apr 14 07:56 mem
crw-------. 1 root root 10, 59 Apr 14 07:56 memory_bandwidth
drwxrwxrwt. 2 root root 40 Apr 14 07:56 mqueue
drwxr-xr-x. 2 root root 60 Apr 14 07:56 net
crw-------. 1 root root 10, 61 Apr 14 07:56 network_latency
crw-------. 1 root root 10, 60 Apr 14 07:56 network_throughput
crw-rw-rw-. 1 root root 1, 3 Apr 14 07:56 null
crw-r-----. 1 root kmem 1, 4 Apr 14 07:56 port
crw-------. 1 root root 108, 0 Apr 14 07:56 ppp
crw-rw-rw-. 1 root tty 5, 2 Apr 20 07:14 ptmx
crw-------. 1 root root 251, 0 Apr 14 07:56 ptp0
crw-------. 1 root root 251, 1 Apr 14 07:56 ptp1
drwxr-xr-x. 2 root root 0 Apr 14 07:56 pts
crw-rw-rw-. 1 root root 1, 8 Apr 14 07:56 random
lrwxrwxrwx. 1 root root 4 Apr 14 07:56 rtc -> rtc0
crw-------. 1 root root 253, 0 Apr 14 07:56 rtc0
brw-rw----. 1 root disk 8, 0 Apr 14 07:57 sda
brw-rw----. 1 root disk 8, 1 Apr 14 07:57 sda1
brw-rw----. 1 root disk 8, 2 Apr 14 07:57 sda2
brw-rw----. 1 root disk 8, 3 Apr 14 07:57 sda3
brw-rw----. 1 root disk 8, 4 Apr 14 07:57 sda4
brw-rw----. 1 root disk 8, 6 Apr 14 07:57 sda6
brw-rw----. 1 root disk 8, 7 Apr 14 07:57 sda7
brw-rw----. 1 root disk 8, 9 Apr 14 07:57 sda9
brw-rw----. 1 root disk 8, 16 Apr 14 07:56 sdb
brw-rw----. 1 root disk 8, 17 Apr 14 07:56 sdb1
brw-rw----. 1 root disk 8, 32 Apr 14 07:56 sdc
brw-rw----. 1 root disk 8, 33 Apr 14 07:56 sdc1
brw-rw----. 1 root disk 8, 48 Apr 14 07:56 sdd
brw-rw----. 1 root disk 8, 49 Apr 14 07:56 sdd1
brw-rw----. 1 root disk 8, 64 Apr 14 07:56 sde
brw-rw----. 1 root disk 8, 65 Apr 14 07:56 sde1
brw-rw----. 1 root disk 8, 80 Apr 14 07:56 sdf
brw-rw----. 1 root disk 8, 81 Apr 14 07:56 sdf1
brw-rw----. 1 root disk 8, 96 Apr 14 07:56 sdg
brw-rw----. 1 root disk 8, 97 Apr 14 07:56 sdg1
brw-rw----. 1 root disk 8, 112 Apr 14 07:56 sdh
brw-rw----. 1 root disk 8, 113 Apr 14 07:56 sdh1
brw-rw----. 1 root disk 8, 128 Apr 14 07:56 sdi
brw-rw----. 1 root disk 8, 129 Apr 14 07:56 sdi1
brw-rw----. 1 root disk 8, 144 Apr 17 07:39 sdj
drwxrwxrwt. 2 root root 40 Apr 14 07:56 shm
lrwxrwxrwx. 1 root root 15 Apr 14 07:56 stderr -> /proc/self/fd/2
lrwxrwxrwx. 1 root root 15 Apr 14 07:56 stdin -> /proc/self/fd/0
lrwxrwxrwx. 1 root root 15 Apr 14 07:56 stdout -> /proc/self/fd/1
-rwxr-xr-x. 1 root root 0 Apr 15 09:40 termination-log
crw-rw-rw-. 1 root tty 5, 0 Apr 14 07:56 tty
crw--w----. 1 root tty 4, 0 Apr 14 07:56 tty0
crw--w----. 1 root tty 4, 1 Apr 17 07:38 tty1
crw--w----. 1 root tty 4, 10 Apr 14 07:56 tty10
crw--w----. 1 root tty 4, 11 Apr 14 07:56 tty11
crw--w----. 1 root tty 4, 12 Apr 14 07:56 tty12
crw--w----. 1 root tty 4, 13 Apr 14 07:56 tty13
crw--w----. 1 root tty 4, 14 Apr 14 07:56 tty14
crw--w----. 1 root tty 4, 15 Apr 14 07:56 tty15
crw--w----. 1 root tty 4, 16 Apr 14 07:56 tty16
crw--w----. 1 root tty 4, 17 Apr 14 07:56 tty17
crw--w----. 1 root tty 4, 18 Apr 14 07:56 tty18
crw--w----. 1 root tty 4, 19 Apr 14 07:56 tty19
crw--w----. 1 root tty 4, 2 Apr 14 07:56 tty2
crw--w----. 1 root tty 4, 20 Apr 14 07:56 tty20
crw--w----. 1 root tty 4, 21 Apr 14 07:56 tty21
crw--w----. 1 root tty 4, 22 Apr 14 07:56 tty22
crw--w----. 1 root tty 4, 23 Apr 14 07:56 tty23
crw--w----. 1 root tty 4, 24 Apr 14 07:56 tty24
crw--w----. 1 root tty 4, 25 Apr 14 07:56 tty25
crw--w----. 1 root tty 4, 26 Apr 14 07:56 tty26
crw--w----. 1 root tty 4, 27 Apr 14 07:56 tty27
crw--w----. 1 root tty 4, 28 Apr 14 07:56 tty28
crw--w----. 1 root tty 4, 29 Apr 14 07:56 tty29
crw--w----. 1 root tty 4, 3 Apr 14 07:56 tty3
crw--w----. 1 root tty 4, 30 Apr 14 07:56 tty30
crw--w----. 1 root tty 4, 31 Apr 14 07:56 tty31
crw--w----. 1 root tty 4, 32 Apr 14 07:56 tty32
crw--w----. 1 root tty 4, 33 Apr 14 07:56 tty33
crw--w----. 1 root tty 4, 34 Apr 14 07:56 tty34
crw--w----. 1 root tty 4, 35 Apr 14 07:56 tty35
crw--w----. 1 root tty 4, 36 Apr 14 07:56 tty36
crw--w----. 1 root tty 4, 37 Apr 14 07:56 tty37
crw--w----. 1 root tty 4, 38 Apr 14 07:56 tty38
crw--w----. 1 root tty 4, 39 Apr 14 07:56 tty39
crw--w----. 1 root tty 4, 4 Apr 14 07:56 tty4
crw--w----. 1 root tty 4, 40 Apr 14 07:56 tty40
crw--w----. 1 root tty 4, 41 Apr 14 07:56 tty41
crw--w----. 1 root tty 4, 42 Apr 14 07:56 tty42
crw--w----. 1 root tty 4, 43 Apr 14 07:56 tty43
crw--w----. 1 root tty 4, 44 Apr 14 07:56 tty44
crw--w----. 1 root tty 4, 45 Apr 14 07:56 tty45
crw--w----. 1 root tty 4, 46 Apr 14 07:56 tty46
crw--w----. 1 root tty 4, 47 Apr 14 07:56 tty47
crw--w----. 1 root tty 4, 48 Apr 14 07:56 tty48
crw--w----. 1 root tty 4, 49 Apr 14 07:56 tty49
crw--w----. 1 root tty 4, 5 Apr 14 07:56 tty5
crw--w----. 1 root tty 4, 50 Apr 14 07:56 tty50
crw--w----. 1 root tty 4, 51 Apr 14 07:56 tty51
crw--w----. 1 root tty 4, 52 Apr 14 07:56 tty52
crw--w----. 1 root tty 4, 53 Apr 14 07:56 tty53
crw--w----. 1 root tty 4, 54 Apr 14 07:56 tty54
crw--w----. 1 root tty 4, 55 Apr 14 07:56 tty55
crw--w----. 1 root tty 4, 56 Apr 14 07:56 tty56
crw--w----. 1 root tty 4, 57 Apr 14 07:56 tty57
crw--w----. 1 root tty 4, 58 Apr 14 07:56 tty58
crw--w----. 1 root tty 4, 59 Apr 14 07:56 tty59
crw--w----. 1 root tty 4, 6 Apr 14 07:56 tty6
crw--w----. 1 root tty 4, 60 Apr 14 07:56 tty60
crw--w----. 1 root tty 4, 61 Apr 14 07:56 tty61
crw--w----. 1 root tty 4, 62 Apr 14 07:56 tty62
crw--w----. 1 root tty 4, 63 Apr 14 07:56 tty63
crw--w----. 1 root tty 4, 7 Apr 14 07:56 tty7
crw--w----. 1 root tty 4, 8 Apr 14 07:56 tty8
crw--w----. 1 root tty 4, 9 Apr 14 07:56 tty9
crw--w----. 1 root tty 4, 64 Apr 17 07:38 ttyS0
crw-rw----. 1 root dialout 4, 65 Apr 14 07:56 ttyS1
crw-rw----. 1 root dialout 4, 66 Apr 14 07:56 ttyS2
crw-rw----. 1 root dialout 4, 67 Apr 14 07:56 ttyS3
crw-------. 1 root root 5, 3 Apr 14 07:56 ttyprintk
crw-rw-rw-. 1 root root 1, 9 Apr 14 07:56 urandom
crw-rw----. 1 root tty 7, 0 Apr 14 07:56 vcs
crw-rw----. 1 root tty 7, 1 Apr 14 07:56 vcs1
crw-rw----. 1 root tty 7, 2 Apr 14 07:56 vcs2
crw-rw----. 1 root tty 7, 3 Apr 14 07:56 vcs3
crw-rw----. 1 root tty 7, 4 Apr 14 07:56 vcs4
crw-rw----. 1 root tty 7, 5 Apr 14 07:56 vcs5
crw-rw----. 1 root tty 7, 6 Apr 14 07:56 vcs6
crw-rw----. 1 root tty 7, 128 Apr 14 07:56 vcsa
crw-rw----. 1 root tty 7, 129 Apr 14 07:56 vcsa1
crw-rw----. 1 root tty 7, 130 Apr 14 07:56 vcsa2
crw-rw----. 1 root tty 7, 131 Apr 14 07:56 vcsa3
crw-rw----. 1 root tty 7, 132 Apr 14 07:56 vcsa4
crw-rw----. 1 root tty 7, 133 Apr 14 07:56 vcsa5
crw-rw----. 1 root tty 7, 134 Apr 14 07:56 vcsa6
crw-rw----. 1 root tty 7, 64 Apr 14 07:56 vcsu
crw-rw----. 1 root tty 7, 65 Apr 14 07:56 vcsu1
crw-rw----. 1 root tty 7, 66 Apr 14 07:56 vcsu2
crw-rw----. 1 root tty 7, 67 Apr 14 07:56 vcsu3
crw-rw----. 1 root tty 7, 68 Apr 14 07:56 vcsu4
crw-rw----. 1 root tty 7, 69 Apr 14 07:56 vcsu5
crw-rw----. 1 root tty 7, 70 Apr 14 07:56 vcsu6
drwxr-xr-x. 2 root root 60 Apr 14 07:56 vfio
crw-------. 1 root root 10, 63 Apr 14 07:56 vga_arbiter
crw-------. 1 root root 10, 238 Apr 14 07:56 vhost-net
crw-------. 1 root root 10, 241 Apr 14 07:56 vhost-vsock
crw-rw-rw-. 1 root root 1, 5 Apr 14 07:56 zero
ls -l /dev/disk/by-path/
total 0
lrwxrwxrwx. 1 root root 9 Apr 17 07:39 ip-10.42.4.70:3260-iscsi-iqn.2019-10.io.longhorn:pvc-ad3781eb-8004-4c5f-bc22-1b8fb7fa65f0-lun-1 -> ../../sdj
lrwxrwxrwx. 1 root root 9 Apr 14 07:57 pci-0000:03:00.0-scsi-0:1:0:0 -> ../../sda
lrwxrwxrwx. 1 root root 10 Apr 14 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 10 Apr 14 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part2 -> ../../sda2
lrwxrwxrwx. 1 root root 10 Apr 14 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part3 -> ../../sda3
lrwxrwxrwx. 1 root root 10 Apr 14 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part4 -> ../../sda4
lrwxrwxrwx. 1 root root 10 Apr 14 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part6 -> ../../sda6
lrwxrwxrwx. 1 root root 10 Apr 14 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part7 -> ../../sda7
lrwxrwxrwx. 1 root root 10 Apr 14 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part9 -> ../../sda9
lrwxrwxrwx. 1 root root 9 Apr 14 07:56 pci-0000:05:00.0-fc-0x20110002ac00886f-lun-0 -> ../../sdc
lrwxrwxrwx. 1 root root 10 Apr 14 07:56 pci-0000:05:00.0-fc-0x20110002ac00886f-lun-0-part1 -> ../../sdc1
lrwxrwxrwx. 1 root root 9 Apr 14 07:56 pci-0000:05:00.0-fc-0x21110002ac00886f-lun-0 -> ../../sdd
lrwxrwxrwx. 1 root root 10 Apr 14 07:56 pci-0000:05:00.0-fc-0x21110002ac00886f-lun-0-part1 -> ../../sdd1
lrwxrwxrwx. 1 root root 9 Apr 14 07:56 pci-0000:05:00.0-fc-0x22110002ac00886f-lun-0 -> ../../sde
lrwxrwxrwx. 1 root root 10 Apr 14 07:56 pci-0000:05:00.0-fc-0x22110002ac00886f-lun-0-part1 -> ../../sde1
lrwxrwxrwx. 1 root root 9 Apr 14 07:56 pci-0000:05:00.0-fc-0x23110002ac00886f-lun-0 -> ../../sdf
lrwxrwxrwx. 1 root root 10 Apr 14 07:56 pci-0000:05:00.0-fc-0x23110002ac00886f-lun-0-part1 -> ../../sdf1
lrwxrwxrwx. 1 root root 9 Apr 14 07:56 pci-0000:05:00.1-fc-0x20120002ac00886f-lun-0 -> ../../sdb
lrwxrwxrwx. 1 root root 10 Apr 14 07:56 pci-0000:05:00.1-fc-0x20120002ac00886f-lun-0-part1 -> ../../sdb1
lrwxrwxrwx. 1 root root 9 Apr 14 07:56 pci-0000:05:00.1-fc-0x21120002ac00886f-lun-0 -> ../../sdg
lrwxrwxrwx. 1 root root 10 Apr 14 07:56 pci-0000:05:00.1-fc-0x21120002ac00886f-lun-0-part1 -> ../../sdg1
lrwxrwxrwx. 1 root root 9 Apr 14 07:56 pci-0000:05:00.1-fc-0x22120002ac00886f-lun-0 -> ../../sdh
lrwxrwxrwx. 1 root root 10 Apr 14 07:56 pci-0000:05:00.1-fc-0x22120002ac00886f-lun-0-part1 -> ../../sdh1
lrwxrwxrwx. 1 root root 9 Apr 14 07:56 pci-0000:05:00.1-fc-0x23120002ac00886f-lun-0 -> ../../sdi
lrwxrwxrwx. 1 root root 10 Apr 14 07:56 pci-0000:05:00.1-fc-0x23120002ac00886f-lun-0-part1 -> ../../sdi1
@mikekuzak Your case is different from what we saw from another issue. In your case, Longhorn has identified the correct disk (/dev/sdj). Is it possible the volume has been already mounted on the disk somewhere else?
Can you show me the result of mount which should print out the mountable of the node?
@mikekuzak Your case is different from what we saw from another issue. In your case, Longhorn has identified the correct disk (/dev/sdj). Is it possible the volume has been already mounted on the disk somewhere else?
Can you show me the result of
mountwhich should print out the mountable of the node?
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=65968628k,nr_inodes=16492157,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime,seclabel)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
/dev/sda9 on / type ext4 (rw,relatime,seclabel)
/dev/mapper/usr on /usr type ext4 (ro,relatime,seclabel)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
none on /run/torcx/unpack type tmpfs (ro,relatime,seclabel)
systemd-1 on /boot type autofs (rw,relatime,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=9697)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel,pagesize=2M)
debugfs on /sys/kernel/debug type debugfs (rw,relatime,seclabel)
tmpfs on /media type tmpfs (rw,nosuid,nodev,noexec,relatime,seclabel)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=39,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=9713)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
/dev/mapper/3PARd0-part1 on /var/3PARd0 type ext4 (rw,relatime,seclabel,stripe=4096)
/dev/sda6 on /usr/share/oem type ext4 (rw,nodev,relatime,seclabel)
/dev/sda1 on /boot type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro)
.... a lot of docker overlays ..
Can you do a mount|grep sdj? Just want to make sure /dev/sdj wasn't mounted. It might get mixed in the docker overlay mounts.
Not mounted. query returns empty.
@mikekuzak Can you mount /dev/sdj or /dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256 manually on the node?
Hi I think it has something to do with iscsid
May 27 10:31:55 myhost iscsid[40060]: connect to 10.42.4.171:3260 failed (No route to host)
May 27 10:31:55 myhost iscsid[40060]: connect to 10.42.4.126:3260 failed (No route to host)
May 27 10:31:55 myhost iscsid[40060]: connect to 10.42.4.171:3260 failed (No route to host)
May 27 10:31:55 myhost iscsid[40060]: connect to 10.42.4.87:3260 failed (No route to host)
May 27 10:31:57 myhost iscsid[40060]: connect to 10.42.4.200:3260 failed (No route to host)
May 27 10:31:58 myhost systemd[1]: iscsid.service: Start-post operation timed out. Stopping.
May 27 10:31:58 myhost systemd[1]: iscsid.service: Control process exited, code=killed, status=15/TERM
May 27 10:31:58 myhost systemd[1]: iscsid.service: Failed with result 'timeout'.
May 27 10:31:58 myhost systemd[1]: Failed to start Open-iSCSI.
Any movement here? I am having the same issues. I am happy to provide any debug information.
Thanks!
@mikekuzak , good find. It looks like iscsid failed to start. And could you help to run ps -elf | grep iscsid to see if iscsid is running or not? And I think for iscsiadm requires iscsid to be running for discovery operations. If iscsid is not running correctly, this may impact on iscsiadm operations.
Not sure if its related but I'm getting the same error
Using Ubuntu 20.04.1 with microk8s I saw this in the log for longhorn-csi-plugin
E0810 19:15:50.073635 1 mount_linux.go:487] format of disk "/dev/longhorn/pvc-3d5558f0-9d07-4db1-914b-0b6ebd98b5bd" failed: type:("ext4") target:("/var/snap/microk8s/common/var/lib/kubelet/pods/50b21f88-47ad-44db-9778-9e5d5dd33039/volumes/kubernetes.io~csi/pvc-3d5558f0-9d07-4db1-914b-0b6ebd98b5bd/mount") options:(["defaults"])error:(exit status 1)
time="2020-08-10T19:15:50Z" level=error msg="GRPC error: rpc error: code = Internal desc = exit status 1"
time="2020-08-10T19:17:52Z" level=info msg="GRPC call: /csi.v1.Node/NodeGetCapabilities"
time="2020-08-10T19:17:52Z" level=info msg="GRPC request: {}"
time="2020-08-10T19:17:52Z" level=info msg="GRPC response: {\"capabilities\":[{\"Type\":{\"Rpc\":{\"type\":3}}}]}"
time="2020-08-10T19:17:52Z" level=info msg="GRPC call: /csi.v1.Node/NodeGetCapabilities"
time="2020-08-10T19:17:52Z" level=info msg="GRPC request: {}"
time="2020-08-10T19:17:52Z" level=info msg="GRPC response: {\"capabilities\":[{\"Type\":{\"Rpc\":{\"type\":3}}}]}"
time="2020-08-10T19:17:52Z" level=info msg="GRPC call: /csi.v1.Node/NodePublishVolume"
time="2020-08-10T19:17:52Z" level=info msg="GRPC request: {\"target_path\":\"/var/snap/microk8s/common/var/lib/kubelet/pods/50b21f88-47ad-44db-9778-9e5d5dd33039/volumes/kubernetes.io~csi/pvc-3d5558f0-9d07-4db1-914b-0b6ebd98b5bd/mount\",\"volume_capability\":{\"AccessType\":{\"Mount\":{\"fs_type\":\"ext4\"}},\"access_mode\":{\"mode\":1}},\"volume_context\":{\"baseImage\":\"\",\"fromBackup\":\"\",\"numberOfReplicas\":\"3\",\"staleReplicaTimeout\":\"30\",\"storage.kubernetes.io/csiProvisionerIdentity\":\"1597084292457-8081-driver.longhorn.io\"},\"volume_id\":\"pvc-3d5558f0-9d07-4db1-914b-0b6ebd98b5bd\"}"
time="2020-08-10T19:17:52Z" level=info msg="NodeServer NodePublishVolume req: volume_id:\"pvc-3d5558f0-9d07-4db1-914b-0b6ebd98b5bd\" target_path:\"/var/snap/microk8s/common/var/lib/kubelet/pods/50b21f88-47ad-44db-9778-9e5d5dd33039/volumes/kubernetes.io~csi/pvc-3d5558f0-9d07-4db1-914b-0b6ebd98b5bd/mount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"baseImage\" value:\"\" > volume_context:<key:\"fromBackup\" value:\"\" > volume_context:<key:\"numberOfReplicas\" value:\"3\" > volume_context:<key:\"staleReplicaTimeout\" value:\"30\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1597084292457-8081-driver.longhorn.io\" > "
E0810 19:17:52.201912 1 mount_linux.go:143] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/longhorn/pvc-3d5558f0-9d07-4db1-914b-0b6ebd98b5bd /var/snap/microk8s/common/var/lib/kubelet/pods/50b21f88-47ad-44db-9778-9e5d5dd33039/volumes/kubernetes.io~csi/pvc-3d5558f0-9d07-4db1-914b-0b6ebd98b5bd/mount
Output: mount: /var/snap/microk8s/common/var/lib/kubelet/pods/50b21f88-47ad-44db-9778-9e5d5dd33039/volumes/kubernetes.io~csi/pvc-3d5558f0-9d07-4db1-914b-0b6ebd98b5bd/mount: /dev/longhorn/pvc-3d5558f0-9d07-4db1-914b-0b6ebd98b5bd already mounted or mount point busy.
E0810 19:17:52.210900 1 mount_linux.go:487] format of disk "/dev/longhorn/pvc-3d5558f0-9d07-4db1-914b-0b6ebd98b5bd" failed: type:("ext4") target:("/var/snap/microk8s/common/var/lib/kubelet/pods/50b21f88-47ad-44db-9778-9e5d5dd33039/volumes/kubernetes.io~csi/pvc-3d5558f0-9d07-4db1-914b-0b6ebd98b5bd/mount") options:(["defaults"])error:(exit status 1)
time="2020-08-10T19:17:52Z" level=error msg="GRPC error: rpc error: code = Internal desc = exit status 1"
@ryantiger658 @mogwai There were two reasons we saw for this issue. The first one happens with AWS mapping mechanism and has been fixed in v1.0.0. #1223
The other one is there is a local process pick up the block device created by Longhorn automatically. Normally what we saw is multipath process. To find out the culprit:
ls -l /dev/longhorn/. The major:minor number will be shown as e.g. 8, 32 before the device name.ls -l /dev and find the device for the same major:minor number, e.g. /dev/sde.lsof to get the list of file handler in use, then grep for the device name (e.g. sde or /dev/longhorn/xxx. You should find one there.Then you can figure out what to do next.
For the multipath process, you can use e.g. multipath -v2 -d to see if the Longhorn device was picked up automatically.
If it's not the multipath process, lsof should have the related info.
@mogwai Can you check your longhorn-manager log? The block device wasn't created. Maybe iscsid isn't running/installed on the node.
ls -l /dev/longhorn
brw-rw---- 1 root root 8, 32 Aug 10 21:50 pvc-39c0db31-628d-41d0-b05a-4568ec02e487
brw-rw---- 1 root disk 8, 32 Aug 10 21:50 sdc
lsof | grep sdc
multipath 514292 root 8r BLK 8,32 0t0 534 /dev/sdc
multipath 514292 514297 multipath root 8r BLK 8,32 0t0 534 /dev/sdc
multipath 514292 514299 multipath root 8r BLK 8,32 0t0 534 /dev/sdc
multipath 514292 514300 multipath root 8r BLK 8,32 0t0 534 /dev/sdc
multipath 514292 514301 multipath root 8r BLK 8,32 0t0 534 /dev/sdc
multipath 514292 514302 multipath root 8r BLK 8,32 0t0 534 /dev/sdc
multipath 514292 514304 multipath root 8r BLK 8,32 0t0 534 /dev/sdc
ime="2020-08-10T21:58:07Z" level=debug msg="Created volume pvc-98cfa312-0d5b-47e5-a196-435398f96da3: {Size:2147483648 Frontend:blockdev FromBackup: NumberOfReplicas:3 StaleReplicaTimeout:30 NodeID: EngineImage:longhornio/longhorn-engine:v1.0.1 RecurringJobs:[] BaseImage: Standby:false DiskSelector:[] NodeSelector:[] DisableFrontend:false}"
10.1.72.37 - - [10/Aug/2020:21:58:07 +0000] "POST /v1/volumes HTTP/1.1" 200 972 "" "Go-http-client/1.1"
time="2020-08-10T21:58:07Z" level=debug msg="Volume controller kmaster picked up pvc-98cfa312-0d5b-47e5-a196-435398f96da3"
time="2020-08-10T21:58:07Z" level=debug msg="Engine controller kmaster picked up pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983"
time="2020-08-10T21:58:07Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 state, old state , new state stopped"
time="2020-08-10T21:58:07Z" level=debug msg="Replica controller kmaster picked up pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c"
time="2020-08-10T21:58:07Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c state, old state , new state stopped"
time="2020-08-10T21:58:07Z" level=debug msg="Schedule replica pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c to node kmaster, disk default-disk-f25f5e0ca885e875, datapath /var/lib/longhorn/replicas/pvc-98cfa312-0d5b-47e5-a196-435398f96da3-6e95ced3"
time="2020-08-10T21:58:07Z" level=debug msg="Schedule replica pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 to node kmaster, disk default-disk-f25f5e0ca885e875, datapath /var/lib/longhorn/replicas/pvc-98cfa312-0d5b-47e5-a196-435398f96da3-741589fd"
time="2020-08-10T21:58:07Z" level=debug msg="Schedule replica pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a to node kmaster, disk default-disk-f25f5e0ca885e875, datapath /var/lib/longhorn/replicas/pvc-98cfa312-0d5b-47e5-a196-435398f96da3-dd83e4cc"
time="2020-08-10T21:58:07Z" level=debug msg="Replica controller kmaster picked up pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58"
time="2020-08-10T21:58:07Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 state, old state , new state stopped"
time="2020-08-10T21:58:07Z" level=debug msg="Replica controller kmaster picked up pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a"
time="2020-08-10T21:58:07Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a state, old state , new state stopped"
time="2020-08-10T21:58:08Z" level=debug msg="Requeue longhorn-system/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 due to error <nil> or Operation cannot be fulfilled on replicas.longhorn.io \"pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a\": the object has been modified; please apply your changes to the latest version and try again"
time="2020-08-10T21:58:08Z" level=debug msg="Schedule replica pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 to node kmaster, disk default-disk-f25f5e0ca885e875, datapath /var/lib/longhorn/replicas/pvc-98cfa312-0d5b-47e5-a196-435398f96da3-4be5c245"
time="2020-08-10T21:58:08Z" level=debug msg="Schedule replica pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a to node kmaster, disk default-disk-f25f5e0ca885e875, datapath /var/lib/longhorn/replicas/pvc-98cfa312-0d5b-47e5-a196-435398f96da3-4882c3f4"
time="2020-08-10T21:58:08Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3\", UID:\"d8b269b0-6449-49e8-9dab-1965f672dd94\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199323\", FieldPath:\"\"}): type: 'Normal' reason: 'Detached' volume pvc-98cfa312-0d5b-47e5-a196-435398f96da3 has been detached"
10.1.72.37 - - [10/Aug/2020:21:58:09 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 2280 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:11 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 3738 "" "Go-http-client/1.1"
time="2020-08-10T21:58:11Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3\", UID:\"d8b269b0-6449-49e8-9dab-1965f672dd94\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199361\", FieldPath:\"\"}): type: 'Normal' reason: 'Start' Persistent Volume pvc-98cfa312-0d5b-47e5-a196-435398f96da3 started to use/reuse Longhorn volume pvc-98cfa312-0d5b-47e5-a196-435398f96da3"
10.1.72.37 - - [10/Aug/2020:21:58:15 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 3918 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:15 +0000] "GET /v1/nodes/kmaster HTTP/1.1" 200 1729 "" "Go-http-client/1.1"
time="2020-08-10T21:58:15Z" level=debug msg="Prepare to create instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c"
time="2020-08-10T21:58:15Z" level=debug msg="Attaching volume pvc-98cfa312-0d5b-47e5-a196-435398f96da3 to kmaster with disableFrontend set false"
10.1.72.37 - - [10/Aug/2020:21:58:15 +0000] "POST /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3?action=attach HTTP/1.1" 200 2500 "" "Go-http-client/1.1"
time="2020-08-10T21:58:15Z" level=info msg="Event(v1.ObjectReference{Kind:\"Replica\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c\", UID:\"7fba87ea-1a3b-47e0-b713-bf8a065c303d\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199405\", FieldPath:\"\"}): type: 'Normal' reason: 'Start' Starts pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c"
time="2020-08-10T21:58:15Z" level=debug msg="Prepare to create instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58"
time="2020-08-10T21:58:15Z" level=info msg="Event(v1.ObjectReference{Kind:\"Replica\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58\", UID:\"a146e9c6-4ff0-436c-b55b-f93645c0e1b2\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199407\", FieldPath:\"\"}): type: 'Normal' reason: 'Start' Starts pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58"
time="2020-08-10T21:58:16Z" level=debug msg="Prepare to create instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a"
time="2020-08-10T21:58:16Z" level=info msg="Event(v1.ObjectReference{Kind:\"Replica\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a\", UID:\"adc69756-d3d7-43d2-be7f-d438b65de987\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199412\", FieldPath:\"\"}): type: 'Normal' reason: 'Start' Starts pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a"
time="2020-08-10T21:58:16Z" level=debug msg="Instance process pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 had been created, need to wait for instance manager update"
time="2020-08-10T21:58:16Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 state, old state stopped, new state starting"
time="2020-08-10T21:58:16Z" level=debug msg="Instance process pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c had been created, need to wait for instance manager update"
time="2020-08-10T21:58:16Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c state, old state stopped, new state starting"
time="2020-08-10T21:58:16Z" level=debug msg="Instance process pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a had been created, need to wait for instance manager update"
time="2020-08-10T21:58:16Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a state, old state stopped, new state starting"
time="2020-08-10T21:58:17Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c starts running, IP 10.1.72.54"
time="2020-08-10T21:58:17Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a starts running, IP 10.1.72.54"
time="2020-08-10T21:58:17Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a starts running, Port 10030"
time="2020-08-10T21:58:17Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a state, old state starting, new state running"
time="2020-08-10T21:58:17Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c starts running, Port 10000"
time="2020-08-10T21:58:17Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c state, old state starting, new state running"
time="2020-08-10T21:58:17Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 starts running, IP 10.1.72.54"
time="2020-08-10T21:58:17Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 starts running, Port 10015"
time="2020-08-10T21:58:17Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 state, old state starting, new state running"
time="2020-08-10T21:58:17Z" level=debug msg="Prepare to create instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983"
time="2020-08-10T21:58:17Z" level=info msg="Event(v1.ObjectReference{Kind:\"Engine\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983\", UID:\"9dd2009a-3d5d-47bc-91c8-fdcb3e98cf47\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199431\", FieldPath:\"\"}): type: 'Normal' reason: 'Start' Starts pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983"
time="2020-08-10T21:58:17Z" level=debug msg="Instance process pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 had been created, need to wait for instance manager update"
10.1.72.37 - - [10/Aug/2020:21:58:17 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 3333 "" "Go-http-client/1.1"
time="2020-08-10T21:58:18Z" level=debug msg="Instance process pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 had been created, need to wait for instance manager update"
time="2020-08-10T21:58:18Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 state, old state stopped, new state starting"
10.1.72.37 - - [10/Aug/2020:21:58:19 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 3360 "" "Go-http-client/1.1"
time="2020-08-10T21:58:21Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 starts running, IP 10.1.72.53"
time="2020-08-10T21:58:21Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 starts running, Port 10000"
time="2020-08-10T21:58:21Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 state, old state starting, new state running"
time="2020-08-10T21:58:21Z" level=debug msg="Start monitoring pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983"
time="2020-08-10T21:58:21Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3\", UID:\"d8b269b0-6449-49e8-9dab-1965f672dd94\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199415\", FieldPath:\"\"}): type: 'Normal' reason: 'Attached' volume pvc-98cfa312-0d5b-47e5-a196-435398f96da3 has been attached to kmaster"
10.1.72.37 - - [10/Aug/2020:21:58:21 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 5069 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:23 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 5069 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:24 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 5069 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:25 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 5069 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:27 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 6163 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:27 +0000] "GET /v1/settings/mkfs-ext4-parameters HTTP/1.1" 200 590 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:32 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 6163 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:32 +0000] "GET /v1/settings/mkfs-ext4-parameters HTTP/1.1" 200 590 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:40 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 6163 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:40 +0000] "GET /v1/settings/mkfs-ext4-parameters HTTP/1.1" 200 590 "" "Go-http-client/1.1"
Yeah, your multipath process hijacked the device. :)
How do I destroy it :)?
I tried killing the process but it just restarts.
I also tried removing the multipath for that device using the following:
multipath -l
mpatha (360000000000000000e00000000010001) dm-0 IET,VIRTUAL-DISK
size=2.0G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=0 status=active
`- 4:0:0:1 sdb 8:16 active undef running
multipath -f mpatha
This worked and the container is now running :muscle: but multipath picks it up automatically when a new pvc is created.
To prevent multipath daemon from adding additional block devices created by Longhorn
First check devices created by Longhorn using lsblk
root@localhost:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 79.5G 0 disk /
sdb 8:16 0 512M 0 disk [SWAP]
sdc 8:32 0 1G 0 disk /var/lib/kubelet/pods/c2c2b848-1f40-4727-8a52-03a74f9c76b9/volumes/kubernetes.io~csi/pvc-859bc3c9-faa8-4f54-85e4-b12935b5ae3c/mount
sdd 8:48 0 1G 0 disk /var/lib/kubelet/pods/063a181a-66ac-4644-8268-9215305f9b73/volumes/kubernetes.io~csi/pvc-837eb6ac-45fe-4de7-9c97-8d371eb02190/mount
sde 8:64 0 1G 0 disk /var/lib/kubelet/pods/4c80842d-7257-4b91-b668-bb5b111da003/volumes/kubernetes.io~csi/pvc-c01cee3e-f292-4979-b183-6546d6397fbd/mount
sdf 8:80 0 1G 0 disk /var/lib/kubelet/pods/052dadd9-042a-451c-9bb1-2d9418f0381f/volumes/kubernetes.io~csi/pvc-ba7a5c9a-d84d-4cb0-959d-3db39f34d81b/mount
sdg 8:96 0 1G 0 disk /var/lib/kubelet/pods/7399b073-c262-4963-8c7f-9e481272ea36/volumes/kubernetes.io~csi/pvc-2b122b42-141a-4181-b8fd-ce3cf91f6a64/mount
sdh 8:112 0 1G 0 disk /var/lib/kubelet/pods/a63d919d-201b-4eb1-9d84-6440926211a9/volumes/kubernetes.io~csi/pvc-b7731785-8364-42a8-9e7d-7516801ab7e0/mount
sdi 8:128 0 1G 0 disk /var/lib/kubelet/pods/3e056ee4-bab4-4230-9054-ab214bdf711f/volumes/kubernetes.io~csi/pvc-89d37a02-8480-4317-b0f1-f17b2a886d1d/mount
root@localhost:~#
Notice that Longhorn device names start with /dev/sd[x]
/etc/multipath.conf if not existeddevnode "^sd[a-z0-9]+"blacklist {
devnode "^sd[a-z0-9]+"
}
Restart multipath service
# systemctl restart multipathd.service
Verify that configuration is applied
# multipath -t
The default configurations for multipath blacklist section is preventing the following device names by default
^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9]
^(td|hd|vd)[a-z]
Ref: http://manpages.ubuntu.com/manpages/bionic/man5/multipath.conf.5.html#blacklist%20section
Just posting so that it may help someone who stumbles across this as well.
I also experienced a similar issue with Longhorn and iscsid. After rebuilding my k8s cluster from scratch and installing Longhorn, I was attempting to restore a PVC from a Minio/S3 backup. My backup would restore, but mounting the PV into the pod resulted in (kubectl describe pod my-pod-12345-123456):
mount: /var/lib/kubelet/pods/22696c7f-277b-4fef-b587-f8099810b8fe/volumes/kubernetes.io~csi/pvc-6b821895-0b91-465b-8cac-4297c8d80019/mount: /dev/longhorn/pvc-6b821895-0b91-465b-8cac-4297c8d80019 already mounted or mount point busy
So after reading this thread I realized that my LVM was mounted to /var/lib/longhorn using the /dev/disk/by-uuid/ method, like so (/etc/fstab):
/dev/disk/by-uuid/c3aec7da-1442-40cf-a3fd-884adr3d972a /var/lib/longhorn ext4 defaults 0 0
Since this was an LVM I instead decided to just mount the disk using the logical volume name, converting the previous line in /etc/fstab to one like this:
/dev/longhorn/disk /var/lib/longhorn ext4 defaults 0 0
As far as I can tell, this has resolved my issue of not being able to mount restored PV/PVCs into pods.
@crutonjohn Thanks for the information. Though I am a bit confused about why it's happened in your case. Btw, I don't think /dev/longhorn/disk is a Longhorn volume right? It should be the LVM device provided to Longhorn IIUC.
@crutonjohn Thanks for the information. Though I am a bit confused about why it's happened in your case. Btw, I don't think
/dev/longhorn/diskis a Longhorn volume right? It should be the LVM device provided to Longhorn IIUC.
correct, /dev/longhorn/disk is an LVM.
i'm not really sure either, idk if it's a different bug or what but the symptoms were sorta the same. i came up with the solution with the amalgamation of info here and just thought i'd share.
@crutonjohn Thanks for the information. Though I am a bit confused about why it's happened in your case. Btw, I don't think
/dev/longhorn/diskis a Longhorn volume right? It should be the LVM device provided to Longhorn IIUC.correct,
/dev/longhorn/diskis an LVM.i'm not really sure either, idk if it's a different bug or what but the symptoms were sorta the same. i came up with the solution with the amalgamation of info here and just thought i'd share.
okay, i had to rebuild my cluster again tonight for reasons and it looks like this is happening again.
i'm gonna have to try the multipath fix tomorrow and see if that fixes it. sorry for the thread spam.
@crutonjohn Thanks for the information. Though I am a bit confused about why it's happened in your case. Btw, I don't think
/dev/longhorn/diskis a Longhorn volume right? It should be the LVM device provided to Longhorn IIUC.correct,
/dev/longhorn/diskis an LVM.
i'm not really sure either, idk if it's a different bug or what but the symptoms were sorta the same. i came up with the solution with the amalgamation of info here and just thought i'd share.okay, i had to rebuild my cluster again tonight for reasons and it looks like this is happening again.
i'm gonna have to try the multipath fix tomorrow and see if that fixes it. sorry for the thread spam.
@meldafrawi 's suggestion worked: https://github.com/longhorn/longhorn/issues/1210#issuecomment-671689746
Most helpful comment
To prevent multipath daemon from adding additional block devices created by Longhorn
First check devices created by Longhorn using
lsblkNotice that Longhorn device names start with
/dev/sd[x]/etc/multipath.confif not existeddevnode "^sd[a-z0-9]+"Restart multipath service
# systemctl restart multipathd.serviceVerify that configuration is applied
# multipath -tRef: http://manpages.ubuntu.com/manpages/bionic/man5/multipath.conf.5.html#blacklist%20section