Velero: add support for restic rclone backend

Created on 4 Mar 2019  路  40Comments  路  Source: vmware-tanzu/velero

What steps did you take and what happened:
I installed velero + restic on Scaleway kubernetes cluster. After having tried some simple backup, I added persistence volume backup on pods in a namespace.

I create the backup:

velero backup create gitea --include-namespaces=git

No bucket is created, and the backup sticks on "InProgress":

velero backup get
NAME      STATUS       CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
gitea     InProgress   0001-01-01 00:00:00 +0000 UTC   29d       default            <none>

Note that others backup, without persistence volume backup, are working well. Buckets are created and backups are stored.

What did you expect to happen:
Backup of the namespace + volumes in my bucket.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

Name:         gitea
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  InProgress

Namespaces:
  Included:  git
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    <n/a>
Completed:  <n/a>

Expiration:  2019-04-03 12:15:09 +0200 CEST

Validation errors:  <none>

Persistent Volumes: <none included>
  • velero backup logs <backupname>
velero backup logs gitea
An error occurred: request failed: <?xml version='1.0' encoding='UTF-8'?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><RequestId>tx26c87a0b00024765b7a2f-005c7cfe63</RequestId><Key>backups/gitea/gitea-logs.gz</Key></Error>

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Velero version (use velero version): v0.11.0
  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:37:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:00:57Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes installer & version: kubespray
  • Cloud provider or hardware configuration: Scaleway
  • OS (e.g. from /etc/os-release): Centos 7.4 on server, Fedora 29 for the client
EnhancemenUser Needs info P2 - Long-term important Restic

All 40 comments

OK, found the problem and I cannot fix that.

 velero restic repo get -o yaml

 message: |-
    error running command=restic init --repo=s3:https://s3.nl-ams.scw.cloud/velero/restic/git --password-file=/tmp/velero-restic-credentials-git393484349 --cache-dir=/scratch/.cache/restic, stdout=, stderr=Fatal: create repository at s3:https://s3.nl-ams.scw.cloud/velero/restic/git failed: client.BucketExists: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'nl-ams'

It's the same problem with "mc" command I tried before, I needed to use "s3v2" api version to avoid that problem.

Is there a way to fix ?

To be sure that everything is ok:

  • I set region to "nl-ams" in the BackupStorageLocation object
  • that works for backup without persistenceVolumes
  • that give the above error only when I want to backup volumes

So it seems that "restic" doesn't use the provided region parameter.

Here is my backupstoragelocation:

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: default
  namespace: velero
spec:
  provider: aws
  objectStorage:
    bucket: velero
  config:
    region: nl-ams
    s3ForcePathStyle: "true"
    s3Url: https://s3.nl-ams.scw.cloud

Hmm. I'm guessing this is an issue within restic itself. Could you try manually creating a restic repo in your object storage? The restic docs should explain how to do that. Let me know if that does or does not work.

You're right:

restic -r s3:https://s3.nl-ams.scw.cloud/restic init
Fatal: create repository at s3:https://s3.nl-ams.scw.cloud/restic failed: client.BucketExists: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'nl-ams'

I found this bug report here: https://github.com/restic/restic/issues/2023 where it is explained that we can use restic "rclone" on preexisting bucket. I don't know if it could be usable with velero.

Hello there,

I am currently facing the same issue with scaleway,
@metal3d did you find a workaround about this issue ?

Same here. @metal3d have you found a workaround in the meantime? And you @Hyrsham ?

Hello people,

I've been able to setup velero to use Scaleway's Kubernetes & Object storage the following way.

  • Create a Kube cluster in scw interface
  • download kubeconfig
  • create a bucket in the object storage interface (here velero-test-newton in fr-par region)
  • brew install velero
    ```cat > credentials << EOF
    [default]
    aws_access_key_id=SCW
    aws_secret_access_key=

velero install
--provider velero.io/aws
--bucket velero-backups
--plugins velero/velero-plugin-for-aws:v1.0.0
--backup-location-config s3Url=https://s3.fr-par.scw.cloud,region=fr-par
--use-volume-snapshots=false
--secret-file=./credentials
--kubeconfig ./kubeconfig-k8s-serene-bardeen.yaml
--bucket velero-test-newton

- create an example app in Kubernetes

apiVersion: v1
kind: Namespace
metadata:
name: nginx-example
labels:
app: nginx


kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nginx-logs
namespace: nginx-example
labels:
app: nginx
spec:
storageClassName: do-block-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi


apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deploy
namespace: nginx-example
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
volumes:
- name: nginx-logs
persistentVolumeClaim:
claimName: nginx-logs
containers:
- image: nginx:stable
name: nginx
ports:
- containerPort: 80
volumeMounts:
- mountPath: "/var/log/nginx"
name: nginx-logs
readOnly: false


apiVersion: v1
kind: Service
metadata:
labels:
app: nginx
name: nginx-svc
namespace: nginx-example
spec:
ports:

  • port: 80
    targetPort: 80
    selector:
    app: nginx
    type: LoadBalancer
- `kubectl apply -f ./nginx-example --kubeconfig kubeconfig-k8s-serene-bardeen.yaml`
- `velero backup create nginx-backup --selector app=nginx --kubeconfig kubeconfig-k8s-serene-bardeen.yaml`
- `velero backup describe nginx-backup --kubeconfig kubeconfig-k8s-serene-bardeen.yaml`
Gives 
```Name:         nginx-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  Completed

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  app=nginx

Storage Location:  default

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2020-01-16 13:53:37 +0100 CET
Completed:  2020-01-16 13:53:40 +0100 CET

Expiration:  2020-02-15 13:53:37 +0100 CET

Persistent Volumes: <none included>

Hi @newtoncorp , if you do a describe with --detailsdoes it show that the volume has been backed up correctly? How come the storage class is DigitalOcean's in the PVC if you are on Scaleway's managed Kubernetes? I install Velero exactly the same way but Restic wouldn't restore PVCs from Scaleway object storage. I haven't tried backups because I was just restoring to a new cluster after migrating the backups from Exoscale to Scaleway. Because Scaleway is not so compatible with s3 it seems, and because I need versioning (which Exoscale doesn't support), I switched to DigitalOcean Spaces and all is good. Shame though, because Scaleway offers 75GB of free storage and is very cheap.....

Can you clarify how you got backups and restores working with Scaleway?

Hello again,

I did a mistake while testing (as I don't really know Kubernetes), I've removed the storage class as the default on Scaleway's Kubernetes is good. So the pod is running now.

16:58:44 ~/Work/tutorials/velero 禄 kubectl get pods --all-namespaces --kubeconfig kubeconfig-k8s-serene-bardeen.yaml | grep nginx-example
nginx-example          nginx-deploy-694c85cdc8-25jk9                1/1     Running   0          89s
------------------------------------------------------------
17:00:42 ~/Work/tutorials/velero 禄 velero backup create nginx-backup-scw --selector app=nginx --kubeconfig kubeconfig-k8s-serene-bardeen.yaml
Backup request "nginx-backup-scw" submitted successfully.
Run `velero backup describe nginx-backup-scw` or `velero backup logs nginx-backup-scw` for more details.
------------------------------------------------------------
17:00:49 ~/Work/tutorials/velero 禄 velero backup describe nginx-backup-scw --details --kubeconfig kubeconfig-k8s-serene-bardeen.yaml
Name:         nginx-backup-scw
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  Completed

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  app=nginx

Storage Location:  default

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2020-01-16 17:00:50 +0100 CET
Completed:  2020-01-16 17:00:53 +0100 CET

Expiration:  2020-02-15 17:00:50 +0100 CET

Resource List:
  apiextensions.k8s.io/v1beta1/CustomResourceDefinition:
    - ciliumidentities.cilium.io
  apps/v1/Deployment:
    - nginx-example/nginx-deploy
  apps/v1/ReplicaSet:
    - nginx-example/nginx-deploy-694c85cdc8
  cilium.io/v2/CiliumIdentity:
    - 17697
  v1/Endpoints:
    - nginx-example/nginx-svc
  v1/Namespace:
    - nginx-example
  v1/PersistentVolume:
    - pvc-26ec25fe-6928-425c-b940-6cb055e90fe0
  v1/PersistentVolumeClaim:
    - nginx-example/nginx-logs
  v1/Pod:
    - nginx-example/nginx-deploy-694c85cdc8-25jk9
  v1/Service:
    - nginx-example/nginx-svc

Persistent Volumes: <none included>
------------------------------------------------------------
17:01:24 ~/Work/tutorials/velero 禄 kubectl delete  -f ./nginx-example --kubeconfig kubeconfig-k8s-serene-bardeen.yaml
namespace "nginx-example" deleted
persistentvolumeclaim "nginx-logs" deleted
deployment.apps "nginx-deploy" deleted
service "nginx-svc" deleted
------------------------------------------------------------
17:02:01 ~/Work/tutorials/velero 禄 velero restore create toto-scw --from-backup nginx-backup-scw --kubeconfig kubeconfig-k8s-serene-bardeen.yaml
Restore request "toto-scw" submitted successfully.
Run `velero restore describe toto-scw` or `velero restore logs toto-scw` for more details.
------------------------------------------------------------
17:02:40 ~/Work/tutorials/velero 禄 velero restore describe toto-scw  --kubeconfig kubeconfig-k8s-serene-bardeen.yaml --details
Name:         toto-scw
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  Completed

Backup:  nginx-backup-scw

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto
------------------------------------------------------------
17:02:54 ~/Work/tutorials/velero 禄 kubectl get pods --all-namespaces --kubeconfig kubeconfig-k8s-serene-bardeen.yaml | grep nginx-example
nginx-example          nginx-deploy-694c85cdc8-25jk9                1/1     Running   0          24s
------------------------------------------------------------
17:05:50 ~/Work/tutorials/velero 禄 aws s3 ls --recursive s3://velero-test-newton
2020-01-16 17:00:53       2535 backups/nginx-backup-scw/nginx-backup-scw-logs.gz
2020-01-16 17:00:53         29 backups/nginx-backup-scw/nginx-backup-scw-podvolumebackups.json.gz
2020-01-16 17:00:53        286 backups/nginx-backup-scw/nginx-backup-scw-resource-list.json.gz
2020-01-16 17:00:53         29 backups/nginx-backup-scw/nginx-backup-scw-volumesnapshots.json.gz
2020-01-16 17:00:53       3705 backups/nginx-backup-scw/nginx-backup-scw.tar.gz
2020-01-16 17:00:53        873 backups/nginx-backup-scw/velero-backup.json
2020-01-16 13:53:40       2272 backups/nginx-backup/nginx-backup-logs.gz
2020-01-16 13:53:40         29 backups/nginx-backup/nginx-backup-podvolumebackups.json.gz
2020-01-16 13:53:40        177 backups/nginx-backup/nginx-backup-resource-list.json.gz
2020-01-16 13:53:40         29 backups/nginx-backup/nginx-backup-volumesnapshots.json.gz
2020-01-16 13:53:40       2509 backups/nginx-backup/nginx-backup.tar.gz
2020-01-16 13:53:40        865 backups/nginx-backup/velero-backup.json
2020-01-16 17:02:42       1139 restores/toto-scw/restore-toto-scw-logs.gz
2020-01-16 17:02:42         49 restores/toto-scw/restore-toto-scw-results.gz
2020-01-16 16:48:20        900 restores/toto/restore-toto-logs.gz
2020-01-16 16:48:20        200 restores/toto/restore-toto-results.gz
2020-01-16 16:52:09        816 restores/toto2/restore-toto2-logs.gz
2020-01-16 16:52:09         49 restores/toto2/restore-toto2-results.gz
2020-01-16 16:52:56        813 restores/toto3/restore-toto3-logs.gz
2020-01-16 16:52:57         49 restores/toto3/restore-toto3-results.gz

Maybe something you are missing is a PVC translation ? I can ask someone from the Kubernetes team to answer if you need more help.

The issue lies with the restic version not being the latest one. Velero uses 0.9.5 but the fix to use custom region arrived in 0.9.6 in Restic. Only solution is to update velero to use a recent restic release. cc @metal3d @vitobotta

@newtoncorp In my case I only tried restoring and not backing up. Perhaps the problem is mainly with restores, I have no idea. @Sh4d1 For now I have switched back to DigitalOcean Spaces since their API is very compatible with S3 and I have no problems with backups/restores, CORS/direct uploads, versioning... Scaleway is cheaper though so I will try it again when Velero/Restic is upgraded.

@vitobotta what issue exactly do you have when restoring? And on what type of cluster (managed or not, and where) ?

@Sh4d1 It was giving me two errors, one weird one CPU something which I never got with other storage backends, and the other about the region being wrong or something like that. But the settings were definitely correct.

Tested with restic 0.9.6 and it works:

Restic Backups:
  Completed:
    nginx-example/nginx-deploy-694c85cdc8-nn55h: nginx-logs

Hi @Sh4d1 cool, I'm going to try again. What should I do with the current version of Velero to try this?

@vitobotta you can either use the master version of the container image or wait for a new release.

@nrb any idea when a new release is going to be cut?

@Sh4d1 Cool, will give it a try, thanks! :)

@Sh4d1 We're aiming for v1.3 at the end of March, and a v1.2.1 to address CRD restoration issues sometime this week. @skriss Would you think we could include the restic version bump in 1.2.1?

we could consider it since it's just a patch version - can take a look at the release notes and see if it looks like there's anything risky for us.

@Sh4d1 Awesome, with master I was able to restore just fine 馃憤

Yay 馃帀 and with a little bit a of luck it'll end up in 1.2.1 馃榿

@Sh4d1 I am having problems with backips though :(

Fatal: invalid id \"fefb274a\": no matching ID found\n: unable to find summary in restic backup command output

What can cause this? I have migrated the bucket from DO to Scaleway. With DO I didn't have this. Any idea?

Hmm good question 馃 @newtoncorp any idea? 馃槄

Nop, I think I read too fast :sweat_smile:

@Sh4d1 I'm having problems with stale locks as well. Arghh restic unlock doesn't work

@vitobotta only on scaleway's s3 ?

How did you migrate the bucket? I had no problem when backuping directly to scaleway's s3 馃

I managed to remove the lock on one repo. Now I am trying to back up one thing per time. I migrated the backups with rclone

I managed to back up redis and drone, but it's stuck backing up harbor for some reason with no errors...

Hmm weird.. I'm really not an object storage expert so can't really help I think :(
Maybe worth to open another issue with some details though :)

I see the problem. Velero says that the volume is being backed up with Restic but in the Restic logs there is no sign of this backup. And I can't delete it apparently because it's in progress

I restarted the pods and the backup after deleting the previous and I get this in the restic logs:

restic-cl22h restic time="2020-01-21T23:20:36Z" level=info msg="No parent snapshot found for PVC, not using --parent flag for this backup" backup=velero/test-harbor controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_
controller.go:253" name=test-harbor-z6xfd namespace=velero
restic-cl22h restic panic: runtime error: slice bounds out of range
restic-cl22h restic
restic-cl22h restic goroutine 94 [running]:
restic-cl22h restic github.com/vmware-tanzu/velero/pkg/restic.getLastLine(0xc000770000, 0x0, 0x200, 0x0, 0xc0000aa001, 0x0)
restic-cl22h restic     /go/src/github.com/vmware-tanzu/velero/pkg/restic/exec_commands.go:157 +0xcf
restic-cl22h restic github.com/vmware-tanzu/velero/pkg/restic.RunBackup.func1(0xc0006aae10, 0x1f999e0, 0xc00063e640, 0xc0006aadb0, 0xc000702480)
restic-cl22h restic     /go/src/github.com/vmware-tanzu/velero/pkg/restic/exec_commands.go:99 +0x129
restic-cl22h restic created by github.com/vmware-tanzu/velero/pkg/restic.RunBackup
restic-cl22h restic     /go/src/github.com/vmware-tanzu/velero/pkg/restic/exec_commands.go:94 +0x153

You better open up a new issue with this probem I think @vitobotta :stuck_out_tongue:

OK...

@vitobotta I'll try to take a look this week though :)

Thanks. I am thinking to try with an empty bucket on Scaleway. I have migrated everything to this new cluster and don't want to repeat it all once again...

@Sh4d1 Everything worked just fine and I was able to do a full backup (including 6 volumes) with an empty bucket 馃憤Apparently it didn't like the existing backups for some reason :D

@vitobotta ah perfect glad to hear it 馃槃

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Marki4711 picture Marki4711  路  3Comments

concaf picture concaf  路  3Comments

debianmaster picture debianmaster  路  3Comments

Berndinox picture Berndinox  路  3Comments

carlisia picture carlisia  路  4Comments