Longhorn: [Question] MountVolume.SetUp failed - There is no block device frontend

Created on 23 Aug 2020 · 16Comments · Source: longhorn/longhorn

I am running longhorn 1.0.2 in a self managed kubernetes cluster
Everything works fine.
But I observer the following warning when I launch a new POD (in this case a postgresdb)

MountVolume.SetUp failed for volume "xxx : rpc error: code = InvalidArgument desc = There is no block device frontend for volume xxx

These are the events form the pod description after successful deployment:

 Events:                                                                                                                                                                   │
│   Type     Reason                  Age                  From                      Message                                                                                 │
│   ----     ------                  ----                 ----                      -------                                                                                 │
│   Warning  FailedScheduling        <unknown>            default-scheduler         persistentvolumeclaim "dbdata" not found                                                │
│   Warning  FailedScheduling        <unknown>            default-scheduler         persistentvolumeclaim "dbdata" not found                                                │
│   Warning  FailedScheduling        <unknown>            default-scheduler         running "VolumeBinding" filter plugin for pod "postgres-6f8fc59cd8-p5p4g": pod has unbound immediate PersistentVolumeClaims                                                                                                                                      │
│   Normal   Scheduled               <unknown>            default-scheduler         Successfully assigned documents-imixs-com/postgres-6f8fc59cd8-p5p4g to ixchel-worker-2  │
│   Normal   SuccessfulAttachVolume  4m11s                attachdetach-controller   AttachVolume.Attach succeeded for volume "documents-imixs-com-dbdata"                   │
│   Warning  FailedMount             4m6s (x2 over 4m7s)  kubelet, ixchel-worker-2  MountVolume.SetUp failed for volume "documents-imixs-com-dbdata" : rpc error: code = InvalidArgument desc = There is no block device frontend for volume documents-imixs-com-dbdata                                                                              │
│   Normal   Pulled                  4m4s                 kubelet, ixchel-worker-2  Container image "postgres:9.6.1" already present on machine                             │
│   Normal   Created                 4m4s                 kubelet, ixchel-worker-2  Created container postgres                                                              │
│   Normal   Started                 4m4s                 kubelet, ixchel-worker-2  Started container postgres

My PVC/PV configuation looks like this:

kind: PersistentVolume
apiVersion: v1
metadata:
  name: documents-imixs-com-dbdata
spec:
  capacity:
    storage: 2Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  claimRef:
    namespace: documents-imixs-com
    name: dbdata
  csi:
    driver: driver.longhorn.io
    fsType: ext4
    volumeHandle: xxx
  storageClassName: longhorn-durable

----
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dbdata
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn-durable
  resources:
    requests:
      storage: 2Gi
  volumeName: "xxx"

Is this warning something I should be concerned of or can I still ignore this warning?

Thanks for Help

===
Ralph

aredriver bug priorit2

Source

rsoika

Most helpful comment

The error message issue has been reported multiple times. Let's use this issue to track the fix for the error message.

yasker on 25 Aug 2020

👍5

All 16 comments

No, this error message is misleading... It's actually a transient/intermediate error during the attachment, and Kubernetes/Longhorn will try it when the error happens.

shuo-wu on 24 Aug 2020

ok, thanks for clarifying.

rsoika on 24 Aug 2020

The error message issue has been reported multiple times. Let's use this issue to track the fix for the error message.

yasker on 25 Aug 2020

👍5

We can change the error message for the volume is not attached to the volume %v hasn't been attached yet. Keep the error message for other conditions to there is no block device frontend xxx.

yasker on 2 Oct 2020

Also, we need to check why FailedMount was reported, since in the previous step it's already said the attachment is successful.

yasker on 4 Oct 2020

At beginning I failed to reproduce this issue with creating a pod. Thanks @khushboo-rancher for pointing out it's easier to reproduce it with deployment.
The reason we see FailedMount after attachment succeed is because in the ControlPublishVolume we don't check the endpoint, but in NodePublishVolume we check the endpoint .
And also if it's not attached, Longhorn will do the attachment request, but it may take a while for the attachment finished, and when in NodePublishVolume checking the endpoint, sometime it hasn't been attached yet.

boknowswiki on 27 Oct 2020

Pre-merged Checklist

[x] Does the PR include the explanation for the fix or the feature?
[x] Is the backend code merged (Manager, Engine, Instance Manager, BackupStore etc)?
The PR is at https://github.com/longhorn/longhorn-manager/pull/737
fix csi attachment issue PR: https://github.com/longhorn/longhorn-manager/pull/742
[x] ~~Is the reproduce steps/test steps documented?~~
[x] Which areas/issues this PR might have potential impacts on?
Area
Issues
[x] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
The compatibility issue is filed at
[x] If labeled: area/ui Has the UI issue filed or ready to be merged?
The UI issue/PR is at
[x] if labeled: require/doc Has the necessary document PR submitted or merged?
The Doc issue/PR is at
[x] If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case?
The automation skeleton PR is at
The automation test case PR is at
[x] if labeled: require/automation-engine Has the engine integration test been merged?
The engine automation PR is at
[x] if labeled: require/manual-test-plan Has the manual test plan been documented?
The updated manual test plan is at

longhorn-io-github-bot on 27 Oct 2020

Put a test deploy yaml:

kind: PersistentVolume
apiVersion: v1
metadata:
  name: existing
spec:
  capacity:
    storage: 2Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  claimRef:
    namespace: default
    name: dbdata
  csi:
    driver: driver.longhorn.io
    fsType: ext4
    volumeHandle: existing
  storageClassName: longhorn

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dbdata
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 2Gi
  volumeName: "existing"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: volume-pv-test
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: volume-pv-test
        image: nginx:stable-alpine
        imagePullPolicy: Always
        livenessProbe:
          exec:
            command:
              - ls
              - /data/lost+found
          initialDelaySeconds: 5
          periodSeconds: 5
        volumeMounts:
        - name: vol 
          mountPath: /data
        ports:
        - containerPort: 80
      volumes:
      - name: vol 
        persistentVolumeClaim:
          claimName: dbdata

boknowswiki on 27 Oct 2020

Verified with longhorn-master - 10/29/2020

Validation - Pass

No failed FailedScheduling is seen after the successful attachment.

Events:
  Type    Reason                  Age        From                           Message
  ----    ------                  ----       ----                           -------
  Normal  Scheduled               <unknown>  default-scheduler              Successfully assigned default/wk-5-0 to khushboo-test-lh-wk1
  Normal  SuccessfulAttachVolume  2m1s       attachdetach-controller        AttachVolume.Attach succeeded for volume "volume-5"
  Normal  Pulling                 113s       kubelet, khushboo-test-lh-wk1  Pulling image "ubuntu:xenial"
  Normal  Pulled                  113s       kubelet, khushboo-test-lh-wk1  Successfully pulled image "ubuntu:xenial"
  Normal  Created                 113s       kubelet, khushboo-test-lh-wk1  Created container wk-5
  Normal  Started                 113s       kubelet, khushboo-test-lh-wk1  Started container wk-5

khushboo-rancher on 29 Oct 2020

The modification leads to a bug:
The volume can not be used by a pod after the CSI expansion complete.

And it leads to the related integration tests failed, e.g.,
test_csi_offline_expansion
test_csi_expansion_with_replica_failure

The fix of this bug: https://github.com/longhorn/longhorn-manager/pull/752

shuo-wu on 30 Oct 2020

Verified with longhorn-master - 11/04/2020 after https://github.com/longhorn/longhorn-manager/pull/752 got merged.

Validation - Pass

No failed FailedScheduling is seen after the successful attachment.

khushboo-rancher on 4 Nov 2020

Hey, im new to rancher and longhorn, can smb say me how to get this fix inside of my deployment?!


Events:
  Type     Reason                  Age              From                     Message
  ----     ------                  ----             ----                     -------
  Warning  FailedScheduling        16s              default-scheduler        persistentvolumeclaim "longhorn-nfs-provisioner" not found
  Warning  FailedScheduling        15s              default-scheduler        0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
  Warning  FailedScheduling        15s              default-scheduler        0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled               10s              default-scheduler        Successfully assigned longhorn-system/longhorn-nfs-provisioner-7d9d6c788f-gvgdl to worker1
  Normal   SuccessfulAttachVolume  6s               attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-fcd87456-2dce-4daf-9953-1713fb564835"
  Warning  FailedMount             2s (x2 over 2s)  kubelet                  MountVolume.SetUp failed for volume "pvc-fcd87456-2dce-4daf-9953-1713fb564835" : rpc error: code = InvalidArgument desc = There is no block device frontend for volume pvc-fcd87456-2dce-4daf-9953-1713fb564835

venomone on 7 Nov 2020

@venomone The fix will be available on Longhorn v1.1.0 which is planned to be released soon. Currently, the fix is only available with the Longhorn master image.
We recommend to use the released version v1.1.0 once available and upgrade your Longhorn to v1.1.0 to get this fix and other cool features.

khushboo-rancher on 8 Nov 2020

Can you please be so nice and give me a quick hint on how to do a upgrade to the latest available version. I already tried this myself by editing the longhorn deployment yaml to use the images with the master tags from dockerhub but my deployment wont come up. I already redeployed my whole k8s with no success

venomone on 8 Nov 2020

Maybe our install guide from the imixs-cloud project can help you. It is nealy the same as in longhorn docs but maybe you can find some hint to solve your issue there.

rsoika on 9 Nov 2020

@venomone This issue is not a blocker and shouldn't affect the functionality of Longhorn. It should succeed eventually. If the mount operation cannot succeed in the end, then there is something else in play there.

yasker on 9 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings