Velero: [v1.4.0-beta.1] Unable to restore restic data with custom certificate option.

Created on 22 May 2020  路  15Comments  路  Source: vmware-tanzu/velero

What steps did you take and what happened:
I have deployed the latest beta version to use custom ca cert. The backups is performed properly with restic and I see the data in minio under mybubket/restic

But when trying to restore restic fails whit x509: certificate signed by unknown authority

What did you expect to happen:
The connection should work for the restic restore as the same for the restic backup.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Velero version (use velero version):
    Client:
    Version: v1.4.0-beta.1
    Git commit: 8bf75bd4f28b27483ba2d7954ab61f406d8a7db5
    Server:
    Version: v1.4.0-beta.1

  • Velero features (use velero client config get features):
    features:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"d94a81c724ea8e1ccc9002d89b7fe81d58f89ede", GitTreeState:"clean", BuildDate:"2020-03-12T21:08:59Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"d94a81c724ea8e1ccc9002d89b7fe81d58f89ede", GitTreeState:"clean", BuildDate:"2020-03-12T21:00:06Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes installer & version:
    rke v1.0.6

  • Cloud provider or hardware configuration:
    hardware

  • OS (e.g. from /etc/os-release):
    kubectl get nodes -o wide
    NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
    ltec-fil-m-01 Ready controlplane,etcd 245d v1.15.11 10.195.177.52 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9
    ltec-fil-m-02 Ready controlplane,etcd 245d v1.15.11 10.195.177.53 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9
    ltec-fil-m-03 Ready controlplane,etcd 245d v1.15.11 10.195.177.54 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9
    ltec-fil-w-01 Ready worker 245d v1.15.11 10.195.177.55 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9
    ltec-fil-w-02 Ready worker 245d v1.15.11 10.195.177.56 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9
    ltec-fil-w-03 Ready worker 207d v1.15.11 10.195.177.57 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9
    ltec-fil-w-99 Ready worker 121d v1.15.11 10.195.200.99 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9

Bug

Most helpful comment

I have tested the restore whit the latest v1.4.0 and it work properly.
Thanks guys for your good work!

All 15 comments

Ah. I see what's going on here. While we're correctly passing the --cacert flag to the actual restic restore command, we're not passing it to the restic stats command here: https://github.com/vmware-tanzu/velero/blob/master/pkg/restic/exec_commands.go#L188-L191.

I'll work on a fix for this.

@leitaof if you're available, it'd be great to have you test out a fix for this. I should have a docker image up shortly that you can use.

@skriss Sure, will test it this after noon whit the new docker image.

Awesome, thanks!

OK, the image with the fix is: steveheptio/velero:fix-2562. You can swap it with:

kubectl -n velero set image deployment/velero velero=steveheptio/velero:fix-2562
kubectl -n velero set image daemonset/restic restic=steveheptio/velero:fix-2562

I have tried the restore but my pod is unable to find the fixed image because hes searching in the velero repo instead of steveheptio

29m Warning Failed pod/nexus-694dff6965-cbh6p Failed to pull image "velero/velero-restic-restore-helper:fix-2562": rpc error: code = Unknown desc = Error response from daemon: manifest for velero/velero-restic-restore-helper:fix-2562 not found
29m Warning Failed pod/nexus-694dff6965-cbh6p Error: ErrImagePull
29m Normal Pulling pod/nexus-694dff6965-cbh6p Pulling image "velero/velero-restic-restore-helper:fix-2562"
16m Normal BackOff pod/nexus-694dff6965-cbh6p Back-off pulling image

Ah, shoot. I retagged the image with the fix as steveheptio/velero:v1.4.0-beta.1 (despite the tag, it does include the fix). You can use that updated image, which should avoid the error you got:

kubectl -n velero set image deployment/velero velero=steveheptio/velero:v1.4.0-beta.1
kubectl -n velero set image daemonset/restic restic=steveheptio/velero:v1.4.0-beta.1

You'll have to delete the partially-restored workload and try again after updating the images.

Still same error

2m55s Normal Pulling pod/nexus-694dff6965-cbh6p Pulling image "velero/velero-restic-restore-helper:fix-2562"
2m55s Warning Failed pod/nexus-694dff6965-cbh6p Failed to pull image "velero/velero-restic-restore-helper:fix-2562": rpc error: code = Unknown desc = Error response from daemon: manifest for velero/velero-restic-restore-helper:fix-2562 not found
2m55s Warning Failed pod/nexus-694dff6965-cbh6p Error: ErrImagePull
4m17s Normal SandboxChanged pod/nexus-694dff6965-cbh6p Pod sandbox changed, it will be killed and re-created.
3m8s Normal BackOff pod/nexus-694dff6965-cbh6p Back-off pulling image "velero/velero-restic-restore-helper:fix-2562"

Events from updated image
10m Normal Created pod/velero-779455f468-bvwqh Created container velero
10m Normal Pulling pod/velero-779455f468-bvwqh Pulling image "steveheptio/velero:v1.4.0-beta.1"
10m Normal Pulled pod/velero-779455f468-bvwqh Successfully pulled image "steveheptio/velero:v1.4.0-beta.1"
10m Normal Started pod/velero-779455f468-bvwqh Started container velero
10m Normal ScalingReplicaSet deployment/velero Scaled down replica set velero-775cc8b8fd to 0
10m Normal SuccessfulDelete replicaset/velero-775cc8b8fd Deleted pod: velero-775cc8b8fd-svsjv
10m Normal Killing pod/velero-775cc8b8fd-svsjv Stopping container velero
10m Normal Killing pod/restic-g2jgq Stopping container restic
10m Normal SuccessfulDelete daemonset/restic Deleted pod: restic-g2jgq
10m Normal Pulling pod/restic-7p25r Pulling image "steveheptio/velero:v1.4.0-beta.1"
10m Normal SuccessfulCreate daemonset/restic Created pod: restic-7p25r
10m Normal Scheduled pod/restic-7p25r Successfully assigned velero/restic-7p25r to ltec-fil-w-99
10m Normal Pulled pod/restic-7p25r Successfully pulled image "steveheptio/velero:v1.4.0-beta.1"
10m Normal Created pod/restic-7p25r Created container restic

Did you delete this pod: pod/nexus-694dff6965-cbh6p (or the entire namespace) and start a new restore?

yes I have deleted the namespace and just to be sure I have delete again and did a restore again.

LAST SEEN TYPE REASON OBJECT MESSAGE
71s Normal CREATE ingress/docker-public-ingress Ingress nexus/docker-public-ingress
71s Normal CREATE ingress/docker-public-ingress Ingress nexus/docker-public-ingress
71s Normal CREATE ingress/nexus-http-ingress Ingress nexus/nexus-http-ingress
71s Normal CREATE ingress/nexus-http-ingress Ingress nexus/nexus-http-ingress
71s Normal Scheduled pod/nexus-694dff6965-cbh6p Successfully assigned nexus/nexus-694dff6965-cbh6p to ltec-fil-w-99
71s Normal ProvisioningSucceeded persistentvolumeclaim/nexus-data Successfully provisioned volume pvc-af93e765-518a-4535-9b8a-14bf2b557b6f
71s Normal Provisioning persistentvolumeclaim/nexus-data External provisioner is provisioning volume for claim "nexus/nexus-data"
71s Normal ExternalProvisioning persistentvolumeclaim/nexus-data waiting for a volume to be created, either by external provisioner "ltec-fil-nfs-client-provisioner" or manually created by system administrator
7s Normal BackOff pod/nexus-694dff6965-cbh6p Back-off pulling image "velero/velero-restic-restore-helper:fix-2562"
7s Warning Failed pod/nexus-694dff6965-cbh6p Error: ImagePullBackOff
29s Warning Failed pod/nexus-694dff6965-cbh6p Error: ErrImagePull
29s Warning Failed pod/nexus-694dff6965-cbh6p Failed to pull image "velero/velero-restic-restore-helper:fix-2562": rpc error: code = Unknown desc = Error response from daemon: manifest for velero/velero-restic-restore-helper:fix-2562 not found
29s Normal Pulling pod/nexus-694dff6965-cbh6p Pulling image "velero/velero-restic-restore-helper:fix-2562"
53s Normal UPDATE ingress/docker-public-ingress Ingress nexus/docker-public-ingress
53s Normal UPDATE ingress/docker-public-ingress Ingress nexus/docker-public-ingress
53s Normal UPDATE ingress/nexus-http-ingress Ingress nexus/nexus-http-ingress
53s Normal UPDATE ingress/nexus-http-ingress Ingress nexus/nexus-http-ingress

OK, here's the other way to work around this: you can override which image it tries to pull for the restic restore helper by providing a configmap that specifies the specific image to use:

kubectl -n velero create configmap restic-restore-action-config --from-literal=image=velero/velero-restic-restore-helper:v1.4.0-beta.1
kubectl -n velero label configmap restic-restore-action-config velero.io/plugin-config=
kubectl -n velero label configmap restic-restore-action-config velero.io/restic=RestoreItemAction

After setting this up, you'll need to (a) delete the partially-restored workload/namespace in your cluster, and (b) try a new restore.

Thanks for the patience!

Ah, I think I see why you were still getting the issue with pulling the fix-2562 restore helper tag - retagging the core velero image wasn't sufficient to have it change which tag it pulled for the restic restore helper; the velero binary needed to be fully recompiled with the different version tag.

@leitaof we went ahead and merged the code change since it seemed straight-forward and low-risk, but we'd still like to have your verification!

No problem but i will test it tomorrow and give you feedback after.

I have tested the restore whit the latest v1.4.0 and it work properly.
Thanks guys for your good work!

awesome, thanks again for the testing and feedback!

Was this page helpful?
0 / 5 - 0 ratings