Longhorn: Automatically attach volume if was detached unexpected

Created on 5 Nov 2019  路  8Comments  路  Source: longhorn/longhorn

As long as there are still healthy replicas available, we can attach the volume.

It should cover with node reboot #375 , docker reboot #762 , Kubernetes upgrade #703 , and recovery from volume remounted as read-only #381

https://github.com/longhorn/longhorn-manager/pull/453 should help.

aremanager enhancement

Most helpful comment

Yes !!!
Thats what I have been waiting for .

All 8 comments

Yes !!!
Thats what I have been waiting for .

Validation: Failed
Longhorn version: 0.7.0-rc1

Steps to test:

  • Create Longhorn StorageClass using
    kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn-manager/master/examples/storageclass.yaml
  • Create a pod using Longhorn volume as persistent storage
    kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn-manager/master/examples/pvc.yaml

    • Inside the pod, create a file in /data directory

    • Restart docker service on the node which the pod is scheduled to using systemctl restart docker

Expected result: pod should be restarted , and Longhorn volume should be detached and reattached again. (PASSED)

  • Check the content of the file
    FAILED: listing the content of /data shows no files
    I had to manually delete and re-create the pods to be able to access the created file.

@shuo-wu can you check? Seems remount doesn't work.

Need to restart the pod container to bring the volume back.

We need document for this.

In short, the pod should be configured with a liveness probe to allow it to restart if it cannot access the volume.

Validation: FAILED

Case: Kubernetes upgrade

Steps to reproduce:

  • Using:

    • rancher v2.3.2

    • k8s v1.14.8

    • Longhorn v0.7.0-rc1

    • Create a pod using the example in https://github.com/shuo-wu/longhorn/blob/14b524130eccb0c32eb3d1fecfaed51d7612b1d0/docs/restore-volume.md

    • Create a new file and write some data to it.

    • Using rancher, upgrade k8s version from v1.14.8 to v1.15.5, wait for upgrade to complete.

    • Expected result: volume get detached and re-attached automatically, and volume become accessible after pod is restarted.

Error: pod get stuck in CrashLoopBackOff loop

  Type     Reason                  Age                     From                     Message
  ----     ------                  ----                    ----                     -------
  Warning  FailedScheduling        35m (x2 over 35m)       default-scheduler        pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
  Normal   Scheduled               35m                     default-scheduler        Successfully assigned default/volume-test to lh-worker2
  Normal   SuccessfulAttachVolume  35m                     attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-3fa008c1-0586-11ea-8955-f23c92f1b630"
  Normal   Pulling                 34m                     kubelet, lh-worker2      Pulling image "nginx:stable-alpine"
  Normal   Pulled                  34m                     kubelet, lh-worker2      Successfully pulled image "nginx:stable-alpine"
  Normal   Created                 34m                     kubelet, lh-worker2      Created container volume-test
  Normal   Started                 34m                     kubelet, lh-worker2      Started container volume-test
  Warning  FailedMount             25m (x5 over 25m)       kubelet, lh-worker2      MountVolume.MountDevice failed for volume "pvc-3fa008c1-0586-11ea-8955-f23c92f1b630" : driver name driver.longhorn.io not found in the list of registered CSI drivers
  Normal   SandboxChanged          25m                     kubelet, lh-worker2      Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled                  25m                     kubelet, lh-worker2      Container image "nginx:stable-alpine" already present on machine
  Normal   Created                 25m                     kubelet, lh-worker2      Created container volume-test
  Normal   Started                 25m                     kubelet, lh-worker2      Started container volume-test
  Warning  Unhealthy               20m                     kubelet, lh-worker2      Liveness probe errored: rpc error: code = Unknown desc = cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: dial unix /var/run/docker.sock: connect: no such file or directory
  Warning  FailedMount             16m                     kubelet, lh-worker2      MountVolume.MountDevice failed for volume "pvc-3fa008c1-0586-11ea-8955-f23c92f1b630" : driver name driver.longhorn.io not found in the list of registered CSI drivers
  Warning  FailedMount             16m (x6 over 16m)       kubelet, lh-worker2      MountVolume.SetUp failed for volume "pvc-3fa008c1-0586-11ea-8955-f23c92f1b630" : rpc error: code = InvalidArgument desc = There is no block device frontend for volume pvc-3fa008c1-0586-11ea-8955-f23c92f1b630
  Normal   SandboxChanged          15m (x2 over 15m)       kubelet, lh-worker2      Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled                  15m                     kubelet, lh-worker2      Container image "nginx:stable-alpine" already present on machine
  Normal   Created                 15m                     kubelet, lh-worker2      Created container volume-test
  Normal   Started                 15m                     kubelet, lh-worker2      Started container volume-test
  Normal   Pulled                  8m57s (x3 over 9m58s)   kubelet, lh-worker2      Container image "nginx:stable-alpine" already present on machine
  Normal   Created                 8m57s (x3 over 9m58s)   kubelet, lh-worker2      Created container volume-test
  Normal   Started                 8m57s (x3 over 9m58s)   kubelet, lh-worker2      Started container volume-test
  Normal   Killing                 8m38s (x4 over 9m58s)   kubelet, lh-worker2      Container volume-test failed liveness probe, will be restarted
  Warning  Unhealthy               8m38s (x9 over 9m53s)   kubelet, lh-worker2      Liveness probe failed: ls: /data/lost+found: I/O error
  Warning  BackOff                 4m54s (x21 over 9m23s)  kubelet, lh-worker2      Back-off restarting failed container

Validation: PARTIAL PASSED

Case: node reboot

Steps to reproduce:

  • Using:

    • rancher v2.3.2

    • k8s v1.14.8

    • Longhorn v0.7.0-rc1

    • Create a pod using the example in https://github.com/shuo-wu/longhorn/blob/14b524130eccb0c32eb3d1fecfaed51d7612b1d0/docs/restore-volume.md

    • Create a new file and write some data to it.

    • Reboot the node which pod is scheduled to.

    • Wait for node to be ready, and pod gets restarted.

    • Expected result: volume get detached and re-attached automatically, and volume become accessible after pod is restarted.

Case: docker service restart

Steps to reproduce:

  • Using:

    • rancher v2.3.2

    • k8s v1.14.8

    • Longhorn v0.7.0-rc1

    • Create a pod using the example in https://github.com/shuo-wu/longhorn/blob/14b524130eccb0c32eb3d1fecfaed51d7612b1d0/docs/restore-volume.md

    • Create a new file and write some data to it.

    • Restart docker service on the node which pod is scheduled to using systemct restart docker.service

    • Wait for node to be ready, and pod gets restarted.

    • Expected result: volume get detached and re-attached automatically, and volume become accessible after pod is restarted.

Validation: PASSED

Case: Kubernetes upgrade

Was this page helpful?
0 / 5 - 0 ratings