v0.5.0.
The error message for attaching:
logs/longhorn-manager-k4qqc/longhorn-manager.log:2019-05-27T09:23:06.347103712Z time="2019-05-27T09:23:06Z" level=warning msg="Error syncing Longhorn volume longhorn-system/pvc-499dbb42-78b8-11e9-9f61-000c299a5b45: fail to sync longhorn-system/pvc-499dbb42-78b8-11e9-9f61-000c299a5b45: fail to reconcile volume state for pvc-499dbb42-78b8-11e9-9f61-000c299a5b45: no healthy replica for starting"
And it caused by the follow issue in this case. But it can be due to other reasons.
logs/longhorn-manager-k4qqc/longhorn-manager.log:2019-05-27T09:23:06.346648747Z E0527 09:23:06.346487 1 volume_controller.go:200] fail to sync longhorn-system/pvc-499dbb42-78b8-11e9-9f61-000c299a5b45: Timeout: request did not complete within requested timeout 30s
logs/longhorn-manager-k4qqc/longhorn-manager.log:2019-05-27T09:23:06.346702385Z time="2019-05-27T09:23:06Z" level=warning msg="Dropping Longhorn volume longhorn-system/pvc-499dbb42-78b8-11e9-9f61-000c299a5b45 out of the queue: fail to sync longhorn-system/pvc-499dbb42-78b8-11e9-9f61-000c299a5b45: Timeout: request did not complete within requested timeout 30s"
This is caused by the volume faulted state can only be triggered once at the moment all replicas failed. We should reconcile to the faulted state once we detected all the replicas has failed.
You can use the following command to forcefully mark the volume as faulted then it can be salvaged:
kubectl -n longhorn-system patch lhv <volume_name> --type="merge" -p '{"status":{"robustness":"faulted"}}'
pvc-499dbb42-78b8-11e9-9f61-000c299a5b45
Steps to reproduce it:
Health.kubectl -n longhorn-system patch lhv <volume-name> --type="merge" -p '{"status":{"robustness":""}}'Without fix: the volume is in unknown state
After fix: the volume is still in faulted state.
fixed
Most helpful comment
You can use the following command to forcefully mark the volume as faulted then it can be salvaged:
pvc-499dbb42-78b8-11e9-9f61-000c299a5b45