What steps did you take and what happened:
I have deployed the latest beta version to use custom ca cert. The backups is performed properly with restic and I see the data in minio under mybubket/restic
But when trying to restore restic fails whit x509: certificate signed by unknown authority
What did you expect to happen:
The connection should work for the restic restore as the same for the restic backup.
The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
deployment_velero.log
velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
velero_backup_describe.log
velero backup logs <backupname>
velero_backup.log
velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
velero_restore_describe.log
velero restore logs <restorename>
velero_restore.log
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
Velero version (use velero version):
Client:
Version: v1.4.0-beta.1
Git commit: 8bf75bd4f28b27483ba2d7954ab61f406d8a7db5
Server:
Version: v1.4.0-beta.1
Velero features (use velero client config get features):
features:
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"d94a81c724ea8e1ccc9002d89b7fe81d58f89ede", GitTreeState:"clean", BuildDate:"2020-03-12T21:08:59Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"d94a81c724ea8e1ccc9002d89b7fe81d58f89ede", GitTreeState:"clean", BuildDate:"2020-03-12T21:00:06Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes installer & version:
rke v1.0.6
Cloud provider or hardware configuration:
hardware
OS (e.g. from /etc/os-release):
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ltec-fil-m-01 Ready controlplane,etcd 245d v1.15.11 10.195.177.52
ltec-fil-m-02 Ready controlplane,etcd 245d v1.15.11 10.195.177.53
ltec-fil-m-03 Ready controlplane,etcd 245d v1.15.11 10.195.177.54
ltec-fil-w-01 Ready worker 245d v1.15.11 10.195.177.55
ltec-fil-w-02 Ready worker 245d v1.15.11 10.195.177.56
ltec-fil-w-03 Ready worker 207d v1.15.11 10.195.177.57
ltec-fil-w-99 Ready worker 121d v1.15.11 10.195.200.99
Ah. I see what's going on here. While we're correctly passing the --cacert flag to the actual restic restore command, we're not passing it to the restic stats command here: https://github.com/vmware-tanzu/velero/blob/master/pkg/restic/exec_commands.go#L188-L191.
I'll work on a fix for this.
@leitaof if you're available, it'd be great to have you test out a fix for this. I should have a docker image up shortly that you can use.
@skriss Sure, will test it this after noon whit the new docker image.
Awesome, thanks!
OK, the image with the fix is: steveheptio/velero:fix-2562. You can swap it with:
kubectl -n velero set image deployment/velero velero=steveheptio/velero:fix-2562
kubectl -n velero set image daemonset/restic restic=steveheptio/velero:fix-2562
I have tried the restore but my pod is unable to find the fixed image because hes searching in the velero repo instead of steveheptio
29m Warning Failed pod/nexus-694dff6965-cbh6p Failed to pull image "velero/velero-restic-restore-helper:fix-2562": rpc error: code = Unknown desc = Error response from daemon: manifest for velero/velero-restic-restore-helper:fix-2562 not found
29m Warning Failed pod/nexus-694dff6965-cbh6p Error: ErrImagePull
29m Normal Pulling pod/nexus-694dff6965-cbh6p Pulling image "velero/velero-restic-restore-helper:fix-2562"
16m Normal BackOff pod/nexus-694dff6965-cbh6p Back-off pulling image
Ah, shoot. I retagged the image with the fix as steveheptio/velero:v1.4.0-beta.1 (despite the tag, it does include the fix). You can use that updated image, which should avoid the error you got:
kubectl -n velero set image deployment/velero velero=steveheptio/velero:v1.4.0-beta.1
kubectl -n velero set image daemonset/restic restic=steveheptio/velero:v1.4.0-beta.1
You'll have to delete the partially-restored workload and try again after updating the images.
Still same error
2m55s Normal Pulling pod/nexus-694dff6965-cbh6p Pulling image "velero/velero-restic-restore-helper:fix-2562"
2m55s Warning Failed pod/nexus-694dff6965-cbh6p Failed to pull image "velero/velero-restic-restore-helper:fix-2562": rpc error: code = Unknown desc = Error response from daemon: manifest for velero/velero-restic-restore-helper:fix-2562 not found
2m55s Warning Failed pod/nexus-694dff6965-cbh6p Error: ErrImagePull
4m17s Normal SandboxChanged pod/nexus-694dff6965-cbh6p Pod sandbox changed, it will be killed and re-created.
3m8s Normal BackOff pod/nexus-694dff6965-cbh6p Back-off pulling image "velero/velero-restic-restore-helper:fix-2562"
Events from updated image
10m Normal Created pod/velero-779455f468-bvwqh Created container velero
10m Normal Pulling pod/velero-779455f468-bvwqh Pulling image "steveheptio/velero:v1.4.0-beta.1"
10m Normal Pulled pod/velero-779455f468-bvwqh Successfully pulled image "steveheptio/velero:v1.4.0-beta.1"
10m Normal Started pod/velero-779455f468-bvwqh Started container velero
10m Normal ScalingReplicaSet deployment/velero Scaled down replica set velero-775cc8b8fd to 0
10m Normal SuccessfulDelete replicaset/velero-775cc8b8fd Deleted pod: velero-775cc8b8fd-svsjv
10m Normal Killing pod/velero-775cc8b8fd-svsjv Stopping container velero
10m Normal Killing pod/restic-g2jgq Stopping container restic
10m Normal SuccessfulDelete daemonset/restic Deleted pod: restic-g2jgq
10m Normal Pulling pod/restic-7p25r Pulling image "steveheptio/velero:v1.4.0-beta.1"
10m Normal SuccessfulCreate daemonset/restic Created pod: restic-7p25r
10m Normal Scheduled pod/restic-7p25r Successfully assigned velero/restic-7p25r to ltec-fil-w-99
10m Normal Pulled pod/restic-7p25r Successfully pulled image "steveheptio/velero:v1.4.0-beta.1"
10m Normal Created pod/restic-7p25r Created container restic
Did you delete this pod: pod/nexus-694dff6965-cbh6p (or the entire namespace) and start a new restore?
yes I have deleted the namespace and just to be sure I have delete again and did a restore again.
LAST SEEN TYPE REASON OBJECT MESSAGE
71s Normal CREATE ingress/docker-public-ingress Ingress nexus/docker-public-ingress
71s Normal CREATE ingress/docker-public-ingress Ingress nexus/docker-public-ingress
71s Normal CREATE ingress/nexus-http-ingress Ingress nexus/nexus-http-ingress
71s Normal CREATE ingress/nexus-http-ingress Ingress nexus/nexus-http-ingress
71s Normal Scheduled pod/nexus-694dff6965-cbh6p Successfully assigned nexus/nexus-694dff6965-cbh6p to ltec-fil-w-99
71s Normal ProvisioningSucceeded persistentvolumeclaim/nexus-data Successfully provisioned volume pvc-af93e765-518a-4535-9b8a-14bf2b557b6f
71s Normal Provisioning persistentvolumeclaim/nexus-data External provisioner is provisioning volume for claim "nexus/nexus-data"
71s Normal ExternalProvisioning persistentvolumeclaim/nexus-data waiting for a volume to be created, either by external provisioner "ltec-fil-nfs-client-provisioner" or manually created by system administrator
7s Normal BackOff pod/nexus-694dff6965-cbh6p Back-off pulling image "velero/velero-restic-restore-helper:fix-2562"
7s Warning Failed pod/nexus-694dff6965-cbh6p Error: ImagePullBackOff
29s Warning Failed pod/nexus-694dff6965-cbh6p Error: ErrImagePull
29s Warning Failed pod/nexus-694dff6965-cbh6p Failed to pull image "velero/velero-restic-restore-helper:fix-2562": rpc error: code = Unknown desc = Error response from daemon: manifest for velero/velero-restic-restore-helper:fix-2562 not found
29s Normal Pulling pod/nexus-694dff6965-cbh6p Pulling image "velero/velero-restic-restore-helper:fix-2562"
53s Normal UPDATE ingress/docker-public-ingress Ingress nexus/docker-public-ingress
53s Normal UPDATE ingress/docker-public-ingress Ingress nexus/docker-public-ingress
53s Normal UPDATE ingress/nexus-http-ingress Ingress nexus/nexus-http-ingress
53s Normal UPDATE ingress/nexus-http-ingress Ingress nexus/nexus-http-ingress
OK, here's the other way to work around this: you can override which image it tries to pull for the restic restore helper by providing a configmap that specifies the specific image to use:
kubectl -n velero create configmap restic-restore-action-config --from-literal=image=velero/velero-restic-restore-helper:v1.4.0-beta.1
kubectl -n velero label configmap restic-restore-action-config velero.io/plugin-config=
kubectl -n velero label configmap restic-restore-action-config velero.io/restic=RestoreItemAction
After setting this up, you'll need to (a) delete the partially-restored workload/namespace in your cluster, and (b) try a new restore.
Thanks for the patience!
Ah, I think I see why you were still getting the issue with pulling the fix-2562 restore helper tag - retagging the core velero image wasn't sufficient to have it change which tag it pulled for the restic restore helper; the velero binary needed to be fully recompiled with the different version tag.
@leitaof we went ahead and merged the code change since it seemed straight-forward and low-risk, but we'd still like to have your verification!
No problem but i will test it tomorrow and give you feedback after.
I have tested the restore whit the latest v1.4.0 and it work properly.
Thanks guys for your good work!
awesome, thanks again for the testing and feedback!
Most helpful comment
I have tested the restore whit the latest v1.4.0 and it work properly.
Thanks guys for your good work!