Velero: add support for restic rclone backend

Created on 4 Mar 2019 · 40Comments · Source: vmware-tanzu/velero

What steps did you take and what happened:
I installed velero + restic on Scaleway kubernetes cluster. After having tried some simple backup, I added persistence volume backup on pods in a namespace.

I create the backup:

velero backup create gitea --include-namespaces=git

No bucket is created, and the backup sticks on "InProgress":

velero backup get
NAME      STATUS       CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
gitea     InProgress   0001-01-01 00:00:00 +0000 UTC   29d       default            <none>

Note that others backup, without persistence volume backup, are working well. Buckets are created and backups are stored.

What did you expect to happen:
Backup of the namespace + volumes in my bucket.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

kubectl logs deployment/velero -n velero
https://paste.fedoraproject.org/paste/Kq-K~yDb0JAALDLkTTWBSA
velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml

Name:         gitea
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  InProgress

Namespaces:
  Included:  git
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    <n/a>
Completed:  <n/a>

Expiration:  2019-04-03 12:15:09 +0200 CEST

Validation errors:  <none>

Persistent Volumes: <none included>

velero backup logs <backupname>

velero backup logs gitea
An error occurred: request failed: <?xml version='1.0' encoding='UTF-8'?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><RequestId>tx26c87a0b00024765b7a2f-005c7cfe63</RequestId><Key>backups/gitea/gitea-logs.gz</Key></Error>

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Velero version (use velero version): v0.11.0
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:37:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:00:57Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes installer & version: kubespray
Cloud provider or hardware configuration: Scaleway
OS (e.g. from /etc/os-release): Centos 7.4 on server, Fedora 29 for the client

EnhancemenUser Needs info P2 - Long-term important Restic

Source

metal3d

All 40 comments

OK, found the problem and I cannot fix that.

 velero restic repo get -o yaml

 message: |-
    error running command=restic init --repo=s3:https://s3.nl-ams.scw.cloud/velero/restic/git --password-file=/tmp/velero-restic-credentials-git393484349 --cache-dir=/scratch/.cache/restic, stdout=, stderr=Fatal: create repository at s3:https://s3.nl-ams.scw.cloud/velero/restic/git failed: client.BucketExists: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'nl-ams'

It's the same problem with "mc" command I tried before, I needed to use "s3v2" api version to avoid that problem.

Is there a way to fix ?

metal3d on 4 Mar 2019

To be sure that everything is ok:

I set region to "nl-ams" in the BackupStorageLocation object
that works for backup without persistenceVolumes
that give the above error only when I want to backup volumes

So it seems that "restic" doesn't use the provided region parameter.

Here is my backupstoragelocation:

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: default
  namespace: velero
spec:
  provider: aws
  objectStorage:
    bucket: velero
  config:
    region: nl-ams
    s3ForcePathStyle: "true"
    s3Url: https://s3.nl-ams.scw.cloud

metal3d on 4 Mar 2019

Hmm. I'm guessing this is an issue within restic itself. Could you try manually creating a restic repo in your object storage? The restic docs should explain how to do that. Let me know if that does or does not work.

skriss on 4 Mar 2019

You're right:

restic -r s3:https://s3.nl-ams.scw.cloud/restic init
Fatal: create repository at s3:https://s3.nl-ams.scw.cloud/restic failed: client.BucketExists: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'nl-ams'

I found this bug report here: https://github.com/restic/restic/issues/2023 where it is explained that we can use restic "rclone" on preexisting bucket. I don't know if it could be usable with velero.

metal3d on 5 Mar 2019

Hello there,

I am currently facing the same issue with scaleway,
@metal3d did you find a workaround about this issue ?

ClementLachaussee on 29 Oct 2019

👍1

Same here. @metal3d have you found a workaround in the meantime? And you @Hyrsham ?

vitobotta on 14 Jan 2020

Hello people,

I've been able to setup velero to use Scaleway's Kubernetes & Object storage the following way.

Create a Kube cluster in scw interface
download kubeconfig
create a bucket in the object storage interface (here velero-test-newton in fr-par region)
brew install velero
```cat > credentials << EOF
[default]
aws_access_key_id=SCW
aws_secret_access_key=

velero install
--provider velero.io/aws
--bucket velero-backups
--plugins velero/velero-plugin-for-aws:v1.0.0
--backup-location-config s3Url=https://s3.fr-par.scw.cloud,region=fr-par
--use-volume-snapshots=false
--secret-file=./credentials
--kubeconfig ./kubeconfig-k8s-serene-bardeen.yaml
--bucket velero-test-newton

- create an example app in Kubernetes

apiVersion: v1
kind: Namespace
metadata:
name: nginx-example
labels:
app: nginx

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nginx-logs
namespace: nginx-example
labels:
app: nginx
spec:
storageClassName: do-block-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi

apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deploy
namespace: nginx-example
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
volumes:
- name: nginx-logs
persistentVolumeClaim:
claimName: nginx-logs
containers:
- image: nginx:stable
name: nginx
ports:
- containerPort: 80
volumeMounts:
- mountPath: "/var/log/nginx"
name: nginx-logs
readOnly: false

apiVersion: v1
kind: Service
metadata:
labels:
app: nginx
name: nginx-svc
namespace: nginx-example
spec:
ports:

port: 80
targetPort: 80
selector:
app: nginx
type: LoadBalancer

- `kubectl apply -f ./nginx-example --kubeconfig kubeconfig-k8s-serene-bardeen.yaml`
- `velero backup create nginx-backup --selector app=nginx --kubeconfig kubeconfig-k8s-serene-bardeen.yaml`
- `velero backup describe nginx-backup --kubeconfig kubeconfig-k8s-serene-bardeen.yaml`
Gives 
```Name:         nginx-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  Completed

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  app=nginx

Storage Location:  default

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2020-01-16 13:53:37 +0100 CET
Completed:  2020-01-16 13:53:40 +0100 CET

Expiration:  2020-02-15 13:53:37 +0100 CET

Persistent Volumes: <none included>

newtoncorp on 16 Jan 2020

Hi @newtoncorp , if you do a describe with --detailsdoes it show that the volume has been backed up correctly? How come the storage class is DigitalOcean's in the PVC if you are on Scaleway's managed Kubernetes? I install Velero exactly the same way but Restic wouldn't restore PVCs from Scaleway object storage. I haven't tried backups because I was just restoring to a new cluster after migrating the backups from Exoscale to Scaleway. Because Scaleway is not so compatible with s3 it seems, and because I need versioning (which Exoscale doesn't support), I switched to DigitalOcean Spaces and all is good. Shame though, because Scaleway offers 75GB of free storage and is very cheap.....

Can you clarify how you got backups and restores working with Scaleway?

vitobotta on 16 Jan 2020

Hello again,

I did a mistake while testing (as I don't really know Kubernetes), I've removed the storage class as the default on Scaleway's Kubernetes is good. So the pod is running now.

16:58:44 ~/Work/tutorials/velero » kubectl get pods --all-namespaces --kubeconfig kubeconfig-k8s-serene-bardeen.yaml | grep nginx-example
nginx-example          nginx-deploy-694c85cdc8-25jk9                1/1     Running   0          89s
------------------------------------------------------------
17:00:42 ~/Work/tutorials/velero » velero backup create nginx-backup-scw --selector app=nginx --kubeconfig kubeconfig-k8s-serene-bardeen.yaml
Backup request "nginx-backup-scw" submitted successfully.
Run `velero backup describe nginx-backup-scw` or `velero backup logs nginx-backup-scw` for more details.
------------------------------------------------------------
17:00:49 ~/Work/tutorials/velero » velero backup describe nginx-backup-scw --details --kubeconfig kubeconfig-k8s-serene-bardeen.yaml
Name:         nginx-backup-scw
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  Completed

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  app=nginx

Storage Location:  default

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2020-01-16 17:00:50 +0100 CET
Completed:  2020-01-16 17:00:53 +0100 CET

Expiration:  2020-02-15 17:00:50 +0100 CET

Resource List:
  apiextensions.k8s.io/v1beta1/CustomResourceDefinition:
    - ciliumidentities.cilium.io
  apps/v1/Deployment:
    - nginx-example/nginx-deploy
  apps/v1/ReplicaSet:
    - nginx-example/nginx-deploy-694c85cdc8
  cilium.io/v2/CiliumIdentity:
    - 17697
  v1/Endpoints:
    - nginx-example/nginx-svc
  v1/Namespace:
    - nginx-example
  v1/PersistentVolume:
    - pvc-26ec25fe-6928-425c-b940-6cb055e90fe0
  v1/PersistentVolumeClaim:
    - nginx-example/nginx-logs
  v1/Pod:
    - nginx-example/nginx-deploy-694c85cdc8-25jk9
  v1/Service:
    - nginx-example/nginx-svc

Persistent Volumes: <none included>
------------------------------------------------------------
17:01:24 ~/Work/tutorials/velero » kubectl delete  -f ./nginx-example --kubeconfig kubeconfig-k8s-serene-bardeen.yaml
namespace "nginx-example" deleted
persistentvolumeclaim "nginx-logs" deleted
deployment.apps "nginx-deploy" deleted
service "nginx-svc" deleted
------------------------------------------------------------
17:02:01 ~/Work/tutorials/velero » velero restore create toto-scw --from-backup nginx-backup-scw --kubeconfig kubeconfig-k8s-serene-bardeen.yaml
Restore request "toto-scw" submitted successfully.
Run `velero restore describe toto-scw` or `velero restore logs toto-scw` for more details.
------------------------------------------------------------
17:02:40 ~/Work/tutorials/velero » velero restore describe toto-scw  --kubeconfig kubeconfig-k8s-serene-bardeen.yaml --details
Name:         toto-scw
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  Completed

Backup:  nginx-backup-scw

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto
------------------------------------------------------------
17:02:54 ~/Work/tutorials/velero » kubectl get pods --all-namespaces --kubeconfig kubeconfig-k8s-serene-bardeen.yaml | grep nginx-example
nginx-example          nginx-deploy-694c85cdc8-25jk9                1/1     Running   0          24s
------------------------------------------------------------
17:05:50 ~/Work/tutorials/velero » aws s3 ls --recursive s3://velero-test-newton
2020-01-16 17:00:53       2535 backups/nginx-backup-scw/nginx-backup-scw-logs.gz
2020-01-16 17:00:53         29 backups/nginx-backup-scw/nginx-backup-scw-podvolumebackups.json.gz
2020-01-16 17:00:53        286 backups/nginx-backup-scw/nginx-backup-scw-resource-list.json.gz
2020-01-16 17:00:53         29 backups/nginx-backup-scw/nginx-backup-scw-volumesnapshots.json.gz
2020-01-16 17:00:53       3705 backups/nginx-backup-scw/nginx-backup-scw.tar.gz
2020-01-16 17:00:53        873 backups/nginx-backup-scw/velero-backup.json
2020-01-16 13:53:40       2272 backups/nginx-backup/nginx-backup-logs.gz
2020-01-16 13:53:40         29 backups/nginx-backup/nginx-backup-podvolumebackups.json.gz
2020-01-16 13:53:40        177 backups/nginx-backup/nginx-backup-resource-list.json.gz
2020-01-16 13:53:40         29 backups/nginx-backup/nginx-backup-volumesnapshots.json.gz
2020-01-16 13:53:40       2509 backups/nginx-backup/nginx-backup.tar.gz
2020-01-16 13:53:40        865 backups/nginx-backup/velero-backup.json
2020-01-16 17:02:42       1139 restores/toto-scw/restore-toto-scw-logs.gz
2020-01-16 17:02:42         49 restores/toto-scw/restore-toto-scw-results.gz
2020-01-16 16:48:20        900 restores/toto/restore-toto-logs.gz
2020-01-16 16:48:20        200 restores/toto/restore-toto-results.gz
2020-01-16 16:52:09        816 restores/toto2/restore-toto2-logs.gz
2020-01-16 16:52:09         49 restores/toto2/restore-toto2-results.gz
2020-01-16 16:52:56        813 restores/toto3/restore-toto3-logs.gz
2020-01-16 16:52:57         49 restores/toto3/restore-toto3-results.gz

newtoncorp on 16 Jan 2020

Maybe something you are missing is a PVC translation ? I can ask someone from the Kubernetes team to answer if you need more help.

newtoncorp on 16 Jan 2020

The issue lies with the restic version not being the latest one. Velero uses 0.9.5 but the fix to use custom region arrived in 0.9.6 in Restic. Only solution is to update velero to use a recent restic release. cc @metal3d @vitobotta

Sh4d1 on 20 Jan 2020

@newtoncorp In my case I only tried restoring and not backing up. Perhaps the problem is mainly with restores, I have no idea. @Sh4d1 For now I have switched back to DigitalOcean Spaces since their API is very compatible with S3 and I have no problems with backups/restores, CORS/direct uploads, versioning... Scaleway is cheaper though so I will try it again when Velero/Restic is upgraded.

vitobotta on 20 Jan 2020

@vitobotta what issue exactly do you have when restoring? And on what type of cluster (managed or not, and where) ?

Sh4d1 on 20 Jan 2020

@Sh4d1 It was giving me two errors, one weird one CPU something which I never got with other storage backends, and the other about the region being wrong or something like that. But the settings were definitely correct.

vitobotta on 20 Jan 2020

Tested with restic 0.9.6 and it works:

Restic Backups:
  Completed:
    nginx-example/nginx-deploy-694c85cdc8-nn55h: nginx-logs

Sh4d1 on 20 Jan 2020

Hi @Sh4d1 cool, I'm going to try again. What should I do with the current version of Velero to try this?

vitobotta on 21 Jan 2020

@vitobotta you can either use the master version of the container image or wait for a new release.

@nrb any idea when a new release is going to be cut?

Sh4d1 on 21 Jan 2020

👍1

@Sh4d1 Cool, will give it a try, thanks! :)

vitobotta on 21 Jan 2020

@Sh4d1 We're aiming for v1.3 at the end of March, and a v1.2.1 to address CRD restoration issues sometime this week. @skriss Would you think we could include the restic version bump in 1.2.1?

nrb on 21 Jan 2020

we could consider it since it's just a patch version - can take a look at the release notes and see if it looks like there's anything risky for us.

skriss on 21 Jan 2020

@Sh4d1 Awesome, with master I was able to restore just fine 👍

vitobotta on 21 Jan 2020

Yay 🎉 and with a little bit a of luck it'll end up in 1.2.1 😁

Sh4d1 on 21 Jan 2020

👍1

@Sh4d1 I am having problems with backips though :(

Fatal: invalid id \"fefb274a\": no matching ID found\n: unable to find summary in restic backup command output

What can cause this? I have migrated the bucket from DO to Scaleway. With DO I didn't have this. Any idea?

vitobotta on 21 Jan 2020

Hmm good question 🤔 @newtoncorp any idea? 😅

Sh4d1 on 21 Jan 2020

could be caused by https://github.com/restic/restic/issues/2389 apparently
cf https://github.com/vmware-tanzu/velero/blob/master/pkg/restic/exec_commands.go#L164

Sh4d1 on 22 Jan 2020

Nop, I think I read too fast :sweat_smile:

Sh4d1 on 22 Jan 2020

@Sh4d1 I'm having problems with stale locks as well. Arghh restic unlock doesn't work

vitobotta on 22 Jan 2020

@vitobotta only on scaleway's s3 ?

Sh4d1 on 22 Jan 2020

How did you migrate the bucket? I had no problem when backuping directly to scaleway's s3 🤔

Sh4d1 on 22 Jan 2020

I managed to remove the lock on one repo. Now I am trying to back up one thing per time. I migrated the backups with rclone

vitobotta on 22 Jan 2020

I managed to back up redis and drone, but it's stuck backing up harbor for some reason with no errors...

vitobotta on 22 Jan 2020

Hmm weird.. I'm really not an object storage expert so can't really help I think :(
Maybe worth to open another issue with some details though :)

Sh4d1 on 22 Jan 2020

I see the problem. Velero says that the volume is being backed up with Restic but in the Restic logs there is no sign of this backup. And I can't delete it apparently because it's in progress

vitobotta on 22 Jan 2020

I restarted the pods and the backup after deleting the previous and I get this in the restic logs:

restic-cl22h restic time="2020-01-21T23:20:36Z" level=info msg="No parent snapshot found for PVC, not using --parent flag for this backup" backup=velero/test-harbor controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_
controller.go:253" name=test-harbor-z6xfd namespace=velero
restic-cl22h restic panic: runtime error: slice bounds out of range
restic-cl22h restic
restic-cl22h restic goroutine 94 [running]:
restic-cl22h restic github.com/vmware-tanzu/velero/pkg/restic.getLastLine(0xc000770000, 0x0, 0x200, 0x0, 0xc0000aa001, 0x0)
restic-cl22h restic     /go/src/github.com/vmware-tanzu/velero/pkg/restic/exec_commands.go:157 +0xcf
restic-cl22h restic github.com/vmware-tanzu/velero/pkg/restic.RunBackup.func1(0xc0006aae10, 0x1f999e0, 0xc00063e640, 0xc0006aadb0, 0xc000702480)
restic-cl22h restic     /go/src/github.com/vmware-tanzu/velero/pkg/restic/exec_commands.go:99 +0x129
restic-cl22h restic created by github.com/vmware-tanzu/velero/pkg/restic.RunBackup
restic-cl22h restic     /go/src/github.com/vmware-tanzu/velero/pkg/restic/exec_commands.go:94 +0x153

vitobotta on 22 Jan 2020

You better open up a new issue with this probem I think @vitobotta :stuck_out_tongue:

Sh4d1 on 22 Jan 2020

OK...

vitobotta on 22 Jan 2020

@vitobotta I'll try to take a look this week though :)

Sh4d1 on 22 Jan 2020

Thanks. I am thinking to try with an empty bucket on Scaleway. I have migrated everything to this new cluster and don't want to repeat it all once again...

vitobotta on 22 Jan 2020

@Sh4d1 Everything worked just fine and I was able to do a full backup (including 6 volumes) with an empty bucket 👍Apparently it didn't like the existing backups for some reason :D

vitobotta on 22 Jan 2020

@vitobotta ah perfect glad to hear it 😄

Sh4d1 on 22 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings