Longhorn rollback from 0.6.0 to 0.5.0 not work:
Failed to install app longhorn. Error: UPGRADE FAILED: transport is closing
It's a Rancher Bug. See:
https://github.com/rancher/rancher/issues/23017
Is there any way to recover from this without waiting for the second next stable Rancher release?
After upgrade 0.5.0 to 0.6.1 - work good, but error not clear


If you delete, then everything is fine

@aleksey005 Delete what exactly, the catalog-installed app or the resources it provisions? Does that disconnect/corrupt mounted volumes? Will it pick up the existing volumes after reinstalling?
@tbq I made backups in the S3 volumes of the Longhorn, then deleted the failed application from the Apps - Longhorn, then restored the volumes through the backup Longhorn procedure.
@tbq !!! If you use this method, you need to save and use the same volume names that were before, then everything goes well.
@rbq Deleting the app will remove all the data, make sure you have backups before doing it.
We're investigating possible ways to work around the Rancher issue, stay tuned.
I have the same problem.
Steps to reproduce this issue:
Setup environment before reproducing: Rancher version v2.2.7; Kubernetes version v1.13.9; One Kubernetes Cluster with 3 nodes;
Set Chart repo URL to https://github.com/shuo-wu/charts.git with branch test as the test chart. Please refresh the chart before reproducing it.
Use kubectl apply -f https://raw.githubusercontent.com/shuo-wu/longhorn/test/deploy/pv-test.yaml to create the environment for triggering the bug later.
Launch Longhorn v0.5.0 from the catalog and wait for it to become active.
Upgrade to v0.6.0(which is the faulted version in this chart). Wait until error message Failed to install app longhorn-system. Error: UPGRADE FAILED: timed out waiting for the condition occurs on the app detail page(typically 5~10 minutes).
Try to roll back to the previous revision (v0.5.0). The error message Failed to install app longhorn-system. Error: UPGRADE FAILED: transport is closing shows and we cannot upgrade or rollback in this step.
We're still validating the workaround.
@yasker Thank you!
I experimented with the workaround and managed to revert to 0.5. My Helm ConfigMaps seem to be called longhorn.v${n} though, without the -system part.
Then I tried to upgrade to 0.6.1 and it failed with:
Failed to install app longhorn. Error: UPGRADE FAILED: no ConfigMap with the name "longhorn-default-setting" found
I definitely didn't delete that one:
$ kubectl -n longhorn-system get cm
NAME DATA AGE
external-attacher-leader-io-rancher-longhorn 0 212d
longhorn-default-setting 1 2d23h
Anyway, I then managed to revert to 0.5 again, this time by killing the workloads, removing the Helm config map and finally killing that respawning Longhorn 0.6.1 uninstaller workload several times, as it just went on and on logging lacking API permissions.
[edit:] Don't try this at home. The deployment showed up as okay, but actually wasn't.
@rbq Saw your edit. What's wrong with your deployment?
This error message Failed to install app longhorn. Error: UPGRADE FAILED: no ConfigMap with the name "longhorn-default-setting" found is actually a Helm bug: If the new resources introduced by the new version already exist in your cluster before upgrading, this kind of error will be triggered when you try to upgrade to that new version.
For Longhorn system, this new version means v0.6.0 or v0.6.1. The failed v0.6.0 upgrading will introduce 2 new resources: a ConfigMap named longhorn-default-setting and a CRD named instancemanagers.longhorn.rancher.io. In order to upgrade to v0.6.1, you also need to force deleting these resource besides removing the Helm ConfigMaps and Longhorn workloads:
kubectl -n longhorn-system delete cm longhorn-default-setting
kubectl patch -p '{"metadata":{"finalizers": null}}' crd instancemanagers.longhorn.rancher.io
kubectl delete crd instancemanagers.longhorn.rancher.io
The lacking API permissions issue is caused by the new version(v0.6.1) uninstaller running on the old Longhorn system. You can directly remove it:
kubectl -n longhorn-system delete jobs longhorn-uninstall
@yasker The UI couldn't connect to the nodes and provisioning volumes didn't work. I then saw that there were still some workloads with version 0.6.1 running and removed those. That seemed to trigger Longhorn going on a killing spree, setting all volumes to “Deleting …” without any way to abort. I ended up uninstalling Longhorn, manually removing the rest, patching away a bunch of finalizers, deleting resources, moving the leftover data on the nodes aside and re-installing 0.6.1.
Thanks @rbq . How did you trigger the uninstaller? It shouldn't be triggered during either upgrade or rollback process, unless there is something we don't aware of.
Also, a quick update: we're still working on the workaround. The previous steps have some issues and gaps, we're validating the updated version currently. We should able to release it next week.
On a single-node cluster, Longhorn is installed and updated without failures.
Validation: PASSED
Steps to test:
We will release the fix with v0.6.2 soon.
We will keep the issue open until Rancher fixes https://github.com/rancher/rancher/issues/21070
The workaround is now available at https://github.com/longhorn/longhorn/wiki/Longhorn-v0.6.0-Upgrade:-Workaround-for-recovering-from-a-rollback-failure-in-Rancher , along with Longhorn v0.6.2 release.
Rancher v2.3 has been fixed to mitigate the issue. See https://github.com/rancher/rancher/issues/21070 for details.
@yasker issue still exists 😢
Longhorn: 1.0.0
Rancher: 2.4.4
kubernetes deployed onto Ubuntu 18.04 via rke 1.1.2
UPD: Setting option Helm Wait to false fixes problem with freezing at state installing which then becomes failed
@yasker I just upgrade longhorn from 0.8.0 to 1.0.0 on the Rancher UI and stuck at state installing just like @TemaSM
I did upgrade on Rancher 2.2.12, now I already upgrade to Rancher 2.4.5 (2.2.12 -> 2.3.0 -> 2.3.8 -> 2.4.5), I also upgrade k8s to v1.18.6.
The longhorn state is still stuck at installing. While it still functioning which all of my volumes are working fine, I can't upgrade longhorn to 1.0.1.
@cwt @TemaSM is there any more specific failure information you can share? For example, @cwt, since you mentioned the Longhorn, stuck at state installing, can you check the workload page to see if any workload isn't in healthy state?
@yasker workloads are all green. on the app page -> longhorn-system I got this message:
Failed to install app longhorn-system. Error: UPGRADE FAILED: a release named longhorn-system is in use, cannot re-use a name that is still in use
as I said previously, everything seem working fine, I can create volume, snapshot, backup. all of my pods that mount longhorn volumes are working fine too. It's just that the status is installing and I can't upgrade to 1.0.1.
@cwt It would hard for us to figure out from Longhorn side since it seems related to some pre-exist conditions. I found a related bug at https://github.com/helm/helm/issues/4174 which you might find useful.
@yasker Thanks for your help anyway. Since I already have backup for all volumes, I already plan to remove longhorn and reinstall it again. I think my problem should be fixed.
... is there any more specific failure information you can share? ...
-> No any specific/detailed info. Everything is just like @cwt described:
... as I said previously, everything seem working fine, I can create volume, snapshot, backup. all of my pods that mount longhorn volumes are working fine too. It's just that the status is installing ...
@yasker Anyway, thanks for your time! I decided to look onto Longhorn later, when it became more stable.
@TemaSM ))) Longhorn is stable, I have been using it for a long time and very rarely there were any hallucinations. The only thing that I would like to improve is to increase performance and make it a complete analog of the file system, that is, so that applications can delete data.
@aleksey005 I mean, stable for my needs. For example, currently I need to automate somehow auto-mount of all Longhorn's block devices to FS of nodes. If you have any tips doing this, will appreciate any tips from you.
@TemaSM why are you asking this question in a closed ish. Open a new one or make a request for such functionality. Culture comes first.
https://github.com/longhorn/longhorn/issues/906#issuecomment-678664652
@TemaSM We don't support automatically mount the block device needed for Longhorn now. Feel free to create an issue for the enhancement so we can track it.