Longhorn: Automatically attach volume if was detached unexpected

Created on 5 Nov 2019 · 8Comments · Source: longhorn/longhorn

As long as there are still healthy replicas available, we can attach the volume.

It should cover with node reboot #375 , docker reboot #762 , Kubernetes upgrade #703 , and recovery from volume remounted as read-only #381

https://github.com/longhorn/longhorn-manager/pull/453 should help.

aremanager enhancement

Source

yasker

👍3

Most helpful comment

Yes !!!
Thats what I have been waiting for .

chrisbulgaria on 5 Nov 2019

😄3

All 8 comments

Yes !!!
Thats what I have been waiting for .

chrisbulgaria on 5 Nov 2019

😄3

Validation: Failed
Longhorn version: 0.7.0-rc1

Steps to test:

Create Longhorn StorageClass using
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn-manager/master/examples/storageclass.yaml
Create a pod using Longhorn volume as persistent storage
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn-manager/master/examples/pvc.yaml
- Inside the pod, create a file in /data directory
- Restart docker service on the node which the pod is scheduled to using systemctl restart docker

Expected result: pod should be restarted , and Longhorn volume should be detached and reattached again. (PASSED)

Check the content of the file
FAILED: listing the content of /data shows no files
I had to manually delete and re-create the pods to be able to access the created file.

meldafrawi on 8 Nov 2019

@shuo-wu can you check? Seems remount doesn't work.

yasker on 8 Nov 2019

Need to restart the pod container to bring the volume back.

shuo-wu on 8 Nov 2019

We need document for this.

In short, the pod should be configured with a liveness probe to allow it to restart if it cannot access the volume.

yasker on 8 Nov 2019

Validation: FAILED

Case: Kubernetes upgrade

Steps to reproduce:

Using:
- rancher v2.3.2
- k8s v1.14.8
- Longhorn v0.7.0-rc1
- Create a pod using the example in https://github.com/shuo-wu/longhorn/blob/14b524130eccb0c32eb3d1fecfaed51d7612b1d0/docs/restore-volume.md
- Create a new file and write some data to it.
- Using rancher, upgrade k8s version from v1.14.8 to v1.15.5, wait for upgrade to complete.
- Expected result: volume get detached and re-attached automatically, and volume become accessible after pod is restarted.

Error: pod get stuck in CrashLoopBackOff loop

  Type     Reason                  Age                     From                     Message
  ----     ------                  ----                    ----                     -------
  Warning  FailedScheduling        35m (x2 over 35m)       default-scheduler        pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
  Normal   Scheduled               35m                     default-scheduler        Successfully assigned default/volume-test to lh-worker2
  Normal   SuccessfulAttachVolume  35m                     attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-3fa008c1-0586-11ea-8955-f23c92f1b630"
  Normal   Pulling                 34m                     kubelet, lh-worker2      Pulling image "nginx:stable-alpine"
  Normal   Pulled                  34m                     kubelet, lh-worker2      Successfully pulled image "nginx:stable-alpine"
  Normal   Created                 34m                     kubelet, lh-worker2      Created container volume-test
  Normal   Started                 34m                     kubelet, lh-worker2      Started container volume-test
  Warning  FailedMount             25m (x5 over 25m)       kubelet, lh-worker2      MountVolume.MountDevice failed for volume "pvc-3fa008c1-0586-11ea-8955-f23c92f1b630" : driver name driver.longhorn.io not found in the list of registered CSI drivers
  Normal   SandboxChanged          25m                     kubelet, lh-worker2      Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled                  25m                     kubelet, lh-worker2      Container image "nginx:stable-alpine" already present on machine
  Normal   Created                 25m                     kubelet, lh-worker2      Created container volume-test
  Normal   Started                 25m                     kubelet, lh-worker2      Started container volume-test
  Warning  Unhealthy               20m                     kubelet, lh-worker2      Liveness probe errored: rpc error: code = Unknown desc = cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: dial unix /var/run/docker.sock: connect: no such file or directory
  Warning  FailedMount             16m                     kubelet, lh-worker2      MountVolume.MountDevice failed for volume "pvc-3fa008c1-0586-11ea-8955-f23c92f1b630" : driver name driver.longhorn.io not found in the list of registered CSI drivers
  Warning  FailedMount             16m (x6 over 16m)       kubelet, lh-worker2      MountVolume.SetUp failed for volume "pvc-3fa008c1-0586-11ea-8955-f23c92f1b630" : rpc error: code = InvalidArgument desc = There is no block device frontend for volume pvc-3fa008c1-0586-11ea-8955-f23c92f1b630
  Normal   SandboxChanged          15m (x2 over 15m)       kubelet, lh-worker2      Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled                  15m                     kubelet, lh-worker2      Container image "nginx:stable-alpine" already present on machine
  Normal   Created                 15m                     kubelet, lh-worker2      Created container volume-test
  Normal   Started                 15m                     kubelet, lh-worker2      Started container volume-test
  Normal   Pulled                  8m57s (x3 over 9m58s)   kubelet, lh-worker2      Container image "nginx:stable-alpine" already present on machine
  Normal   Created                 8m57s (x3 over 9m58s)   kubelet, lh-worker2      Created container volume-test
  Normal   Started                 8m57s (x3 over 9m58s)   kubelet, lh-worker2      Started container volume-test
  Normal   Killing                 8m38s (x4 over 9m58s)   kubelet, lh-worker2      Container volume-test failed liveness probe, will be restarted
  Warning  Unhealthy               8m38s (x9 over 9m53s)   kubelet, lh-worker2      Liveness probe failed: ls: /data/lost+found: I/O error
  Warning  BackOff                 4m54s (x21 over 9m23s)  kubelet, lh-worker2      Back-off restarting failed container

meldafrawi on 12 Nov 2019

Validation: PARTIAL PASSED

Case: node reboot

Steps to reproduce:

Using:
- rancher v2.3.2
- k8s v1.14.8
- Longhorn v0.7.0-rc1
- Create a pod using the example in https://github.com/shuo-wu/longhorn/blob/14b524130eccb0c32eb3d1fecfaed51d7612b1d0/docs/restore-volume.md
- Create a new file and write some data to it.
- Reboot the node which pod is scheduled to.
- Wait for node to be ready, and pod gets restarted.
- Expected result: volume get detached and re-attached automatically, and volume become accessible after pod is restarted.

Case: docker service restart

Steps to reproduce:

Using:
- rancher v2.3.2
- k8s v1.14.8
- Longhorn v0.7.0-rc1
- Create a pod using the example in https://github.com/shuo-wu/longhorn/blob/14b524130eccb0c32eb3d1fecfaed51d7612b1d0/docs/restore-volume.md
- Create a new file and write some data to it.
- Restart docker service on the node which pod is scheduled to using systemct restart docker.service
- Wait for node to be ready, and pod gets restarted.
- Expected result: volume get detached and re-attached automatically, and volume become accessible after pod is restarted.