I am running longhorn 1.0.2 in a self managed kubernetes cluster
Everything works fine.
But I observer the following warning when I launch a new POD (in this case a postgresdb)
MountVolume.SetUp failed for volume "xxx : rpc error: code = InvalidArgument desc = There is no block device frontend for volume xxx
These are the events form the pod description after successful deployment:
Events: β
β Type Reason Age From Message β
β ---- ------ ---- ---- ------- β
β Warning FailedScheduling <unknown> default-scheduler persistentvolumeclaim "dbdata" not found β
β Warning FailedScheduling <unknown> default-scheduler persistentvolumeclaim "dbdata" not found β
β Warning FailedScheduling <unknown> default-scheduler running "VolumeBinding" filter plugin for pod "postgres-6f8fc59cd8-p5p4g": pod has unbound immediate PersistentVolumeClaims β
β Normal Scheduled <unknown> default-scheduler Successfully assigned documents-imixs-com/postgres-6f8fc59cd8-p5p4g to ixchel-worker-2 β
β Normal SuccessfulAttachVolume 4m11s attachdetach-controller AttachVolume.Attach succeeded for volume "documents-imixs-com-dbdata" β
β Warning FailedMount 4m6s (x2 over 4m7s) kubelet, ixchel-worker-2 MountVolume.SetUp failed for volume "documents-imixs-com-dbdata" : rpc error: code = InvalidArgument desc = There is no block device frontend for volume documents-imixs-com-dbdata β
β Normal Pulled 4m4s kubelet, ixchel-worker-2 Container image "postgres:9.6.1" already present on machine β
β Normal Created 4m4s kubelet, ixchel-worker-2 Created container postgres β
β Normal Started 4m4s kubelet, ixchel-worker-2 Started container postgres
My PVC/PV configuation looks like this:
kind: PersistentVolume
apiVersion: v1
metadata:
name: documents-imixs-com-dbdata
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
claimRef:
namespace: documents-imixs-com
name: dbdata
csi:
driver: driver.longhorn.io
fsType: ext4
volumeHandle: xxx
storageClassName: longhorn-durable
----
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dbdata
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn-durable
resources:
requests:
storage: 2Gi
volumeName: "xxx"
Is this warning something I should be concerned of or can I still ignore this warning?
Thanks for Help
===
Ralph
No, this error message is misleading... It's actually a transient/intermediate error during the attachment, and Kubernetes/Longhorn will try it when the error happens.
ok, thanks for clarifying.
The error message issue has been reported multiple times. Let's use this issue to track the fix for the error message.
We can change the error message for the volume is not attached to the volume %v hasn't been attached yet. Keep the error message for other conditions to there is no block device frontend xxx.
Also, we need to check why FailedMount was reported, since in the previous step it's already said the attachment is successful.
At beginning I failed to reproduce this issue with creating a pod. Thanks @khushboo-rancher for pointing out it's easier to reproduce it with deployment.
The reason we see FailedMount after attachment succeed is because in the ControlPublishVolume we don't check the endpoint, but in NodePublishVolume we check the endpoint .
And also if it's not attached, Longhorn will do the attachment request, but it may take a while for the attachment finished, and when in NodePublishVolume checking the endpoint, sometime it hasn't been attached yet.
[x] Does the PR include the explanation for the fix or the feature?
[x] Is the backend code merged (Manager, Engine, Instance Manager, BackupStore etc)?
The PR is at https://github.com/longhorn/longhorn-manager/pull/737
fix csi attachment issue PR: https://github.com/longhorn/longhorn-manager/pull/742
[x] Is the reproduce steps/test steps documented?
[x] Which areas/issues this PR might have potential impacts on?
Area
Issues
[x] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
The compatibility issue is filed at
[x] If labeled: area/ui Has the UI issue filed or ready to be merged?
The UI issue/PR is at
[x] if labeled: require/doc Has the necessary document PR submitted or merged?
The Doc issue/PR is at
[x] If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case?
The automation skeleton PR is at
The automation test case PR is at
[x] if labeled: require/automation-engine Has the engine integration test been merged?
The engine automation PR is at
[x] if labeled: require/manual-test-plan Has the manual test plan been documented?
The updated manual test plan is at
Put a test deploy yaml:
kind: PersistentVolume
apiVersion: v1
metadata:
name: existing
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
claimRef:
namespace: default
name: dbdata
csi:
driver: driver.longhorn.io
fsType: ext4
volumeHandle: existing
storageClassName: longhorn
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dbdata
namespace: default
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
volumeName: "existing"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: volume-pv-test
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: volume-pv-test
image: nginx:stable-alpine
imagePullPolicy: Always
livenessProbe:
exec:
command:
- ls
- /data/lost+found
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: vol
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: vol
persistentVolumeClaim:
claimName: dbdata
Verified with longhorn-master - 10/29/2020
Validation - Pass
No failed FailedScheduling is seen after the successful attachment.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/wk-5-0 to khushboo-test-lh-wk1
Normal SuccessfulAttachVolume 2m1s attachdetach-controller AttachVolume.Attach succeeded for volume "volume-5"
Normal Pulling 113s kubelet, khushboo-test-lh-wk1 Pulling image "ubuntu:xenial"
Normal Pulled 113s kubelet, khushboo-test-lh-wk1 Successfully pulled image "ubuntu:xenial"
Normal Created 113s kubelet, khushboo-test-lh-wk1 Created container wk-5
Normal Started 113s kubelet, khushboo-test-lh-wk1 Started container wk-5
The modification leads to a bug:
The volume can not be used by a pod after the CSI expansion complete.
And it leads to the related integration tests failed, e.g.,
test_csi_offline_expansion
test_csi_expansion_with_replica_failure
The fix of this bug: https://github.com/longhorn/longhorn-manager/pull/752
Verified with longhorn-master - 11/04/2020 after https://github.com/longhorn/longhorn-manager/pull/752 got merged.
Validation - Pass
No failed FailedScheduling is seen after the successful attachment.
Hey, im new to rancher and longhorn, can smb say me how to get this fix inside of my deployment?!
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 16s default-scheduler persistentvolumeclaim "longhorn-nfs-provisioner" not found
Warning FailedScheduling 15s default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 15s default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 10s default-scheduler Successfully assigned longhorn-system/longhorn-nfs-provisioner-7d9d6c788f-gvgdl to worker1
Normal SuccessfulAttachVolume 6s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-fcd87456-2dce-4daf-9953-1713fb564835"
Warning FailedMount 2s (x2 over 2s) kubelet MountVolume.SetUp failed for volume "pvc-fcd87456-2dce-4daf-9953-1713fb564835" : rpc error: code = InvalidArgument desc = There is no block device frontend for volume pvc-fcd87456-2dce-4daf-9953-1713fb564835
@venomone The fix will be available on Longhorn v1.1.0 which is planned to be released soon. Currently, the fix is only available with the Longhorn master image.
We recommend to use the released version v1.1.0 once available and upgrade your Longhorn to v1.1.0 to get this fix and other cool features.
Can you please be so nice and give me a quick hint on how to do a upgrade to the latest available version. I already tried this myself by editing the longhorn deployment yaml to use the images with the master tags from dockerhub but my deployment wont come up. I already redeployed my whole k8s with no success
Maybe our install guide from the imixs-cloud project can help you. It is nealy the same as in longhorn docs but maybe you can find some hint to solve your issue there.
@venomone This issue is not a blocker and shouldn't affect the functionality of Longhorn. It should succeed eventually. If the mount operation cannot succeed in the end, then there is something else in play there.
Most helpful comment
The error message issue has been reported multiple times. Let's use this issue to track the fix for the error message.