Velero: Restic volume not restored when using OpenShift (DeploymentConfig + ReplicationController)

Created on 18 Oct 2019 · 21Comments · Source: vmware-tanzu/velero

What steps did you take and what happened:

I'm trying to restore a restic volume.

My backup got 2 volumes with 2 deployments

Backup


tools-bitbucket-backup-prv2   Completed                    2019-10-18 08:45:44 +0200 CEST   29d       cluster-tools      <none>


Persistent Volumes: <none included>
Restic Backups:
  Completed:
    ok101-bitbucket-pr/bitbucket-postgresql-5-vvwsm: bitbucket-postgresql-data
    ok101-bitbucket-pr/bitbucket-server-14-ddrpp: bitbucket-server-data

When I make a restore from this backup , The postgres pod is restore properly but not the bitbucket server

tools-bitbucket-backup-prv2-20191018090027   tools-bitbucket-backup-prv2   InProgress   0          0        2019-10-18 09:00:27 +0200 CEST   <none>
 velero restore describe tools-bitbucket-backup-prv2-20191018090027

Restic Restores:
  Completed:
    ok101-bitbucket-pr/bitbucket-postgresql-5-vvwsm: bitbucket-postgresql-data
  New:
    ok101-bitbucket-pr/bitbucket-server-14-ddrpp: bitbucket-server-data

The init container is not created on the bitbucket-server pod , so the restic stay stuck in "New" phase but the pod is created and running. It shouldn't

 kubectl get po
NAME                                        READY     STATUS             RESTARTS   AGE
bitbucket-server-15-glk7c                   1/1       Running            0          5m

* Restic log *

time="2019-10-18T07:00:48Z" level=debug msg="Pod is not running restic-wait init container, not enqueuing restores for pod" controller=pod-volume-restore key=ok24-velero/velero-7f5d784896-l66m7 logSource="pkg/controller/pod_volume_restore_controller.go:170"
time="2019-10-18T07:00:48Z" level=debug msg="Pod is not running restic-wait init container, not enqueuing restores for pod" controller=pod-volume-restore key=ok24-velero/velero-7f5d784896-l66m7 logSource="pkg/controller/pod_volume_restore_controller.go:170"
time="2019-10-18T07:00:59Z" level=debug msg="Restore's pod ok101-bitbucket-pr/bitbucket-postgresql-5-vvwsm not found, not enqueueing." controller=pod-volume-restore error="pod \"bitbucket-postgresql-5-vvwsm\" not found" logSource="pkg/controller/pod_volume_restore_controller.go:137" name=tools-bitbucket-backup-prv2-20191018090027-6dkgf namespace=ok24-velero restore=ok24-velero/tools-bitbucket-backup-prv2-20191018090027
time="2019-10-18T07:00:59Z" level=debug msg="Restore's pod ok101-bitbucket-pr/bitbucket-server-14-ddrpp not found, not enqueueing." controller=pod-volume-restore error="pod \"bitbucket-server-14-ddrpp\" not found" logSource="pkg/controller/pod_volume_restore_controller.go:137" name=tools-bitbucket-backup-prv2-20191018090027-gghc8 namespace=ok24-velero restore=ok24-velero/tools-bitbucket-backup-prv2-20191018090027
time="2019-10-18T07:01:01Z" level=debug msg="Pod is not running restic-wait init container, not enqueuing restores for pod" controller=pod-volume-restore key=ok101-bitbucket-pr/bitbucket-server-15-deploy logSource="pkg/controller/pod_volume_restore_controller.go:170"
time="2019-10-18T07:01:01Z" level=debug msg="Pod is not running restic-wait init container, not enqueuing restores for pod" controller=pod-volume-restore key=ok101-bitbucket-pr/bitbucket-server-15-deploy logSource="pkg/controller/pod_volume_restore_controller.go:170"
time="2019-10-18T07:01:03Z" level=debug msg="Pod is not running restic-wait init container, not enqueuing restores for pod" controller=pod-volume-restore key=ok101-bitbucket-pr/bitbucket-server-15-deploy logSource="pkg/controller/pod_volume_restore_controller.go:170"
time="2019-10-18T07:01:09Z" level=debug msg="Pod is not running restic-wait init container, not enqueuing restores for pod" controller=pod-volume-restore key=ok101-bitbucket-pr/bitbucket-server-15-glk7c logSource="pkg/controller/pod_volume_restore_controller.go:170"
time="2019-10-18T07:01:09Z" level=debug msg="Pod is not running restic-wait init container, not enqueuing restores for pod" controller=pod-volume-restore key=ok101-bitbucket-pr/bitbucket-server-15-glk7c logSource="pkg/controller/pod_volume_restore_controller.go:170"
time="2019-10-18T07:01:11Z" level=debug msg="Restore is not new, not enqueuing" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:131" name=tools-bitbucket-backup-prv2-20191018090027-6dkgf namespace=ok24-velero restore=ok24-velero/tools-bitbucket-backup-prv2-20191018090027
time="2019-10-18T07:01:12Z" level=debug msg="Restore is not new, not enqueuing" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:131" name=tools-bitbucket-backup-prv2-20191018090027-6dkgf namespace=ok24-velero restore=ok24-velero/tools-bitbucket-backup-prv2-20191018090027
time="2019-10-18T07:01:13Z" level=debug msg="Pod is not running restic-wait init container, not enqueuing restores for pod" controller=pod-volume-restore key=ok101-bitbucket-pr/bitbucket-server-15-glk7c logSource="pkg/controller/pod_volume_restore_controller.go:170"
time="2019-10-18T07:01:14Z" level=debug msg="Pod is not running restic-wait init container, not enqueuing restores for pod" controller=pod-volume-restore key=ok101-bitbucket-pr/bitbucket-server-15-deploy logSource="pkg/controller/pod_volume_restore_controller.go:170"
time="2019-10-18T07:01:14Z" level=debug msg="Pod is not running restic-wait init container, not enqueuing restores for pod" controller=pod-volume-restore key=ok101-bitbucket-pr/bitbucket-server-15-deploy logSource="pkg/controller/pod_volume_restore_controller.go:170"
time="2019-10-18T07:01:19Z" level=debug msg="Pod is not running restic-wait init container, not enqueuing restores for pod" controller=pod-volume-restore key=ok24-velero/velero-7f5d784896-l66m7 logSource="pkg/controller/pod_volume_restore_controller.go:170"
time="2019-10-18T07:01:20Z" level=debug msg="Pod is not running restic-wait init container, not enqueuing restores for pod" controller=pod-volume-restore key=ok24-velero/velero-7f5d784896-l66m7 logSource="pkg/controller/pod_volume_restore_controller.go:170"
time="2019-10-18T07:01:22Z" level=debug msg="Restore is not new, not enqueuing" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:131" name=tools-bitbucket-backup-prv2-20191018090027-6dkgf namespace=ok24-velero restore=ok24-velero/tools-bitbucket-backup-prv2-20191018090027
time="2019-10-18T07:01:31Z" level=debug msg="Restore is not new, not enqueuing" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:131" name=tools-bitbucket-backup-prv2-20191018090027-6dkgf namespace=ok24-velero restore=ok24-velero/tools-bitbucket-backup-prv2-20191018090027
W1018 07:05:36.261218       1 reflector.go:302] github.com/vmware-tanzu/velero/pkg/cmd/cli/restic/server.go:197: watch of *v1.Secret ended with: The resourceVersion for the provided watch is too old.
W1018 07:05:52.312131       1 reflector.go:302] github.com/vmware-tanzu/velero/pkg/generated/informers/externalversions/factory.go:117: watch of *v1.PodVolumeBackup ended with: The resourceVersion for the provided watch is too old.

* Velero Log *

https://gist.github.com/Stolr/02ee7e4ee7d662b94df52de93f953ab3

* PodVolumeRestore *

kubectl -n ok24-velero get podvolumerestores -l velero.io/restore-name=tools-bitbucket-backup-prv2-20191018090027  -o yaml
apiVersion: v1
items:
- apiVersion: velero.io/v1
  kind: PodVolumeRestore
  metadata:
    creationTimestamp: 2019-10-18T07:00:59Z
    generateName: tools-bitbucket-backup-prv2-20191018090027-
    generation: 1
    labels:
      velero.io/pod-uid: 0a33afb5-f175-11e9-967b-005056b9b6b7
      velero.io/restore-name: tools-bitbucket-backup-prv2-20191018090027
      velero.io/restore-uid: f7012bc8-f174-11e9-bf99-005056b9c7f4
    name: tools-bitbucket-backup-prv2-20191018090027-6dkgf
    namespace: ok24-velero
    ownerReferences:
    - apiVersion: velero.io/v1
      controller: true
      kind: Restore
      name: tools-bitbucket-backup-prv2-20191018090027
      uid: f7012bc8-f174-11e9-bf99-005056b9c7f4
    resourceVersion: "853596"
    selfLink: /apis/velero.io/v1/namespaces/ok24-velero/podvolumerestores/tools-bitbucket-backup-prv2-20191018090027-6dkgf
    uid: 0a35ffec-f175-11e9-967b-005056b9b6b7
  spec:
    backupStorageLocation: cluster-tools
    pod:
      kind: Pod
      name: bitbucket-postgresql-5-vvwsm
      namespace: ok101-bitbucket-pr
      uid: 0a33afb5-f175-11e9-967b-005056b9b6b7
    repoIdentifier: s3:http://oca-miniolb.oca.local/velero/tools/restic/ok101-bitbucket-pr
    snapshotID: 4bd49d6e
    volume: bitbucket-postgresql-data
  status:
    completionTimestamp: 2019-10-18T07:01:31Z
    message: ""
    phase: Completed
    progress:
      bytesDone: 83536468
      totalBytes: 83536468
    startTimestamp: 2019-10-18T07:01:10Z
- apiVersion: velero.io/v1
  kind: PodVolumeRestore
  metadata:
    creationTimestamp: 2019-10-18T07:00:59Z
    generateName: tools-bitbucket-backup-prv2-20191018090027-
    generation: 1
    labels:
      velero.io/pod-uid: 0a3b5638-f175-11e9-967b-005056b9b6b7
      velero.io/restore-name: tools-bitbucket-backup-prv2-20191018090027
      velero.io/restore-uid: f7012bc8-f174-11e9-bf99-005056b9c7f4
    name: tools-bitbucket-backup-prv2-20191018090027-gghc8
    namespace: ok24-velero
    ownerReferences:
    - apiVersion: velero.io/v1
      controller: true
      kind: Restore
      name: tools-bitbucket-backup-prv2-20191018090027
      uid: f7012bc8-f174-11e9-bf99-005056b9c7f4
    resourceVersion: "853188"
    selfLink: /apis/velero.io/v1/namespaces/ok24-velero/podvolumerestores/tools-bitbucket-backup-prv2-20191018090027-gghc8
    uid: 0a3d1615-f175-11e9-967b-005056b9b6b7
  spec:
    backupStorageLocation: cluster-tools
    pod:
      kind: Pod
      name: bitbucket-server-14-ddrpp
      namespace: ok101-bitbucket-pr
      uid: 0a3b5638-f175-11e9-967b-005056b9b6b7
    repoIdentifier: s3:http://oca-miniolb.oca.local/velero/tools/restic/ok101-bitbucket-pr
    snapshotID: e5a5986e
    volume: bitbucket-server-data
  status:
    completionTimestamp: null
    message: ""
    phase: ""
    startTimestamp: null
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

* Environment *

velero version
Client:
Version: v1.1.0
Git commit: a357f21
Server:
Version: v1.1.0

oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
openshift v3.11.0+bdd37ad-314
kubernetes v1.11.0+d4cacc0

The namespace does not exist before the restore so every resources is new on the cluster

Any idea ?

Thanks a lot

Bug Restic Reviewed Q2 2021

Source

Stolr

Most helpful comment

Off the top of my head, I'm not sure what's going on, although I haven't looked at the logs in detail yet. The redeployment of a new pod may well be affecting things here, since the new pod probably won't have the restic annotation. For the work my group has been doing, we actually do a two-phase backup/restore, in part to eliminate as much complexity as possible from the environment Restic is working in. We create a full backup without any restic annotations, and then a limited backup with just the PVs/PVCs and pods which mount them with the restic annotations. Then, on restore we first restore the restic backup (pods only, no deployments, deploymentconfigs, etc.) -- this is when the restic copies happen. Then those restored pods are deleted and we do the full restore (without restic annotations). I don't know that all of this is necessary for a basic backup/restore -- in our case we're using it for app migration from one cluster to another, with the possibility of running the restic/PV migration more than once before the final migration. In any case, if you're restoring deploymentconfigs which are then rolling out new pods post-restore, that could definitely interfere with restic. I don't know what the appropriate general-purpose answer is here -- our approach has been for a very specific migration use case. I wonder whether the same issue comes up with non-OpenShift resources. Daemonsets, Deployments, etc. Annotating the pod template spec, as suggested above (in addition to annotating the pod) may be the way to go here. I"m not sure whether it will resolve this issue completely or not, though.

sseago on 22 Oct 2019

👍2

All 21 comments

hmm, based on the following lines:

W1018 07:05:36.261218       1 reflector.go:302] github.com/vmware-tanzu/velero/pkg/cmd/cli/restic/server.go:197: watch of *v1.Secret ended with: The resourceVersion for the provided watch is too old.
W1018 07:05:52.312131       1 reflector.go:302] github.com/vmware-tanzu/velero/pkg/generated/informers/externalversions/factory.go:117: watch of *v1.PodVolumeBackup ended with: The resourceVersion for the provided watch is too old.

it looks like there might be an issue with the informer caches.

Could you try deleting all of the restic daemonset pods, letting them get re-created, and then trying another restore? (you'll want to delete the target namespace as well before kicking off the new restore)

skriss on 18 Oct 2019

Hi @skriss

Thanks for the answer.

I already try this.

I'm making a restore on another cluster. It is a fresh one so there should not be any cache ?

Can this be the issue ? ( Restoring to another cluster)

I'm not able to try it until monday, but not sure it will fix the issue since I already try on a fresh instance.

Any other idea ? :)

Stolr on 18 Oct 2019

Hey ,

So I made a new fresh install to test this and make sure this is not a cache issue.

Here is the whole procedure to help you debugging ( I install the restic DS before velero because I adapted for okd , this can be the issue maybe.)

Instalation on Cluster Tools && Cluster Tools-B :

kubectl create ns velero
namespace/velero created

oc annotate namespace velero openshift.io/node-selector=""
namespace/velero annotated

oc adm policy add-scc-to-user privileged system:serviceaccount:velero:velero
scc "privileged" added to: ["system:serviceaccount:velero:velero"]

oc apply -f serviceAccount.yaml
serviceaccount/velero created

kubectl apply -f daemonSetrestic.yaml
daemonset.extensions/restic created

velero install \
    --provider aws \
    --bucket velero \
    --use-restic \
    --secret-file ./credentials-velero  \
    --use-volume-snapshots=false \
    --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://oca-miniolb.oca.local/ 

CustomResourceDefinition/schedules.velero.io: attempting to create resource
CustomResourceDefinition/schedules.velero.io: created
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource
CustomResourceDefinition/deletebackuprequests.velero.io: created
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource
CustomResourceDefinition/podvolumerestores.velero.io: created
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource
CustomResourceDefinition/volumesnapshotlocations.velero.io: created
CustomResourceDefinition/backups.velero.io: attempting to create resource
CustomResourceDefinition/backups.velero.io: created
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource
CustomResourceDefinition/downloadrequests.velero.io: created
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource
CustomResourceDefinition/podvolumebackups.velero.io: created
CustomResourceDefinition/resticrepositories.velero.io: attempting to create resource
CustomResourceDefinition/resticrepositories.velero.io: created
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource
CustomResourceDefinition/backupstoragelocations.velero.io: created
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource
CustomResourceDefinition/serverstatusrequests.velero.io: created
CustomResourceDefinition/restores.velero.io: attempting to create resource
CustomResourceDefinition/restores.velero.io: created
Waiting for resources to be ready in cluster...
Namespace/velero: attempting to create resource
Namespace/velero: already exists, proceeding
Namespace/velero: created
ClusterRoleBinding/velero: attempting to create resource
ClusterRoleBinding/velero: created
ServiceAccount/velero: attempting to create resource
ServiceAccount/velero: already exists, proceeding
ServiceAccount/velero: created
Secret/cloud-credentials: attempting to create resource
Secret/cloud-credentials: created
BackupStorageLocation/default: attempting to create resource
BackupStorageLocation/default: created
Deployment/velero: attempting to create resource
Deployment/velero: created
DaemonSet/restic: attempting to create resource
DaemonSet/restic: already exists, proceeding
DaemonSet/restic: created
Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.

Cluster Tools Backup location

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  creationTimestamp: 2019-10-21T06:14:03Z
  generation: 1
  labels:
    component: velero
  name: default
  namespace: velero
  resourceVersion: "48443311"
  selfLink: /apis/velero.io/v1/namespaces/velero/backupstoragelocations/default
  uid: fb1ae7ac-f3c9-11e9-843a-005056b9cf2b
spec:
  config:
    region: minio
    s3ForcePathStyle: "true"
    s3Url: http://oca-miniolb.oca.local/
  objectStorage:
    bucket: velero
    prefix: "tools"
  provider: aws

Cluster Tools-B Backup location

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  creationTimestamp: 2019-10-21T06:16:31Z
  generation: 1
  labels:
    component: velero
  name: default
  namespace: velero
  resourceVersion: "1743001"
  selfLink: /apis/velero.io/v1/namespaces/velero/backupstoragelocations/default
  uid: 53050648-f3ca-11e9-8991-005056b92845
spec:
  config:
    region: minio
    s3ForcePathStyle: "true"
    s3Url: http://oca-miniolb.oca.local/
  objectStorage:
    bucket: velero
    prefix: "tools-b"
  provider: aws


 velero backup-location create cluster-tools \
    --provider aws \
    --bucket velero \
    --access-mode ReadOnly  \
    --config region=minio,s3ForcePathStyle="true",s3Url=http://oca-miniolb.oca.local/
 ```

And then I edited BackupLocation cluster-tools to add "tools" Prefix.

Now everything is running fine :

kubectl get po
NAME READY STATUS RESTARTS AGE
restic-2s7n8 1/1 Running 0 6m
restic-2zcr7 1/1 Running 0 6m
restic-7wbrt 1/1 Running 0 6m
restic-8zn8n 1/1 Running 0 6m
restic-cfq77 1/1 Running 0 6m
restic-djvrj 1/1 Running 0 6m
restic-kpnvt 1/1 Running 0 6m
restic-n58w2 1/1 Running 0 6m
restic-n5c6x 1/1 Running 0 6m
restic-ssvpp 1/1 Running 0 6m
restic-wjxj7 1/1 Running 0 6m
restic-wsj94 1/1 Running 0 6m
restic-xgxjt 1/1 Running 0 6m
restic-zvfvj 1/1 Running 0 6m
velero-df87fbb89-m2tbh 1/1 Running 2 6m


**Cluster Tools :** 

Creating the backup

kubectl -n ok101-bitbucket-pr annotate pod/bitbucket-postgresql-5-vvwsm backup.velero.io/backup-volumes=bitbucket-postgresql-data
kubectl -n ok101-bitbucket-pr annotate pod/bitbucket-server-14-ddrpp backup.velero.io/backup-volumes=bitbucket-server-data

velero backup create tools-bitbucket-backup --include-namespaces=ok101-bitbucket-pr

Backup get 

 velero backup get
NAME                     STATUS                       CREATED                          EXPIRES   STORAGE LOCATION   SELECTOR
tools-bitbucket-backup   PartiallyFailed (2 errors)   2019-10-21 08:23:26 +0200 CEST   29d       default            <none>

Velero Logs :

https://gist.github.com/Stolr/23d0dd11b301150ccb336a12b77107a1


**Backup description**

velero backup describe tools-bitbucket-backup --details

https://gist.github.com/Stolr/9b862178df8f951cbd9b50357bd502c8

**Backup logs**

velero backup logs tools-bitbucket-backup

https://gist.github.com/Stolr/293051c52536541fec55f924f76386be


I can see there is 2 error, but it says my restic are completed. It should not be relevant. This is probably due to some pods not correct in that namespace. First time didn't have that error but the  restic issue was here.


**Now , On Cluster Tools-B**

velero backup get
NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR
tools-bitbucket-backup PartiallyFailed (2 errors) 2019-10-21 08:23:26 +0200 CEST 29d cluster-tools

velero restore create --include-namespaces=ok101-bitbucket-pr --from-backup tools-bitbucket-backup

Restore request "tools-bitbucket-backup-20191021083745" submitted successfully.
Run velero restore describe tools-bitbucket-backup-20191021083745 or velero restore logs tools-bitbucket-backup-20191021083745 for more details.



Same issue :

The restore Stay in progress because of the restic not restored

velero restore get
NAME BACKUP STATUS WARNINGS ERRORS CREATED SELECTOR
tools-bitbucket-backup-20191021083745 tools-bitbucket-backup InProgress 0 0 2019-10-21 08:37:45 +0200 CEST

kubectl get po -n ok101-bitbucket-pr
NAME READY STATUS RESTARTS AGE
bitbucket-postgresql-5-vvwsm 0/1 PodInitializing 0 39s
bitbucket-pr-data-backup-1571351700-gfhf4 0/1 Pending 0 39s
bitbucket-pr-data-backup-1571438100-jb2dv 0/1 Pending 0 39s
bitbucket-pr-postgresql-backup-1571436000-rtsf7 0/1 Pending 0 39s
bitbucket-server-15-6528q 1/1 Running 0 34s


**Restic Logs**

https://gist.github.com/Stolr/dabac536a5235b87ecd184045ab2e7b5

**Velero Logs**

https://gist.github.com/Stolr/5dfc2f7fea9c63f0ddbd61d9276ac984


**Restore Logs**

No available
Logs for restore "tools-bitbucket-backup-20191021083745" are not available until it's finished processing. Please wait until the restore has a phase of Completed or Failed and try again.

**PodVolumeRestore**

kubectl -n velero get podvolumerestores -l velero.io/restore-name=tools-bitbucket-backup-20191021083745 -o yaml
apiVersion: v1
items:

apiVersion: velero.io/v1
kind: PodVolumeRestore
metadata:
creationTimestamp: 2019-10-21T06:37:46Z
generateName: tools-bitbucket-backup-20191021083745-
generation: 1
labels:
velero.io/pod-uid: 4ae0fee4-f3cd-11e9-bf99-005056b9c7f4
velero.io/restore-name: tools-bitbucket-backup-20191021083745
velero.io/restore-uid: 4a83ec78-f3cd-11e9-bf99-005056b9c7f4
name: tools-bitbucket-backup-20191021083745-9pm46
namespace: velero
ownerReferences:
- apiVersion: velero.io/v1
  
  controller: true
  
  kind: Restore
  
  name: tools-bitbucket-backup-20191021083745
  
  uid: 4a83ec78-f3cd-11e9-bf99-005056b9c7f4
  
  resourceVersion: "1747867"
  
  selfLink: /apis/velero.io/v1/namespaces/velero/podvolumerestores/tools-bitbucket-backup-20191021083745-9pm46
  
  uid: 4b5e4bfa-f3cd-11e9-bf99-005056b9c7f4
  
  spec:
  
  backupStorageLocation: cluster-tools
  
  pod:
  
  kind: Pod
  
  name: bitbucket-server-14-ddrpp
  
  namespace: ok101-bitbucket-pr
  
  uid: 4ae0fee4-f3cd-11e9-bf99-005056b9c7f4
  
  repoIdentifier: s3:http://oca-miniolb.oca.local/velero/tools/restic/ok101-bitbucket-pr
  
  snapshotID: d17de56d
  
  volume: bitbucket-server-data
  
  status:
  
  completionTimestamp: null
  
  message: ""
  
  phase: ""
  
  startTimestamp: null
apiVersion: velero.io/v1
kind: PodVolumeRestore
metadata:
creationTimestamp: 2019-10-21T06:37:46Z
generateName: tools-bitbucket-backup-20191021083745-
generation: 1
labels:
velero.io/pod-uid: 4adbf402-f3cd-11e9-bf99-005056b9c7f4
velero.io/restore-name: tools-bitbucket-backup-20191021083745
velero.io/restore-uid: 4a83ec78-f3cd-11e9-bf99-005056b9c7f4
name: tools-bitbucket-backup-20191021083745-mlrmd
namespace: velero
ownerReferences:
- apiVersion: velero.io/v1
  
  controller: true
  
  kind: Restore
  
  name: tools-bitbucket-backup-20191021083745
  
  uid: 4a83ec78-f3cd-11e9-bf99-005056b9c7f4
  
  resourceVersion: "1748261"
  
  selfLink: /apis/velero.io/v1/namespaces/velero/podvolumerestores/tools-bitbucket-backup-20191021083745-mlrmd
  
  uid: 4b5ecc1e-f3cd-11e9-bf99-005056b9c7f4
  
  spec:
  
  backupStorageLocation: cluster-tools
  
  pod:
  
  kind: Pod
  
  name: bitbucket-postgresql-5-vvwsm
  
  namespace: ok101-bitbucket-pr
  
  uid: 4adbf402-f3cd-11e9-bf99-005056b9c7f4
  
  repoIdentifier: s3:http://oca-miniolb.oca.local/velero/tools/restic/ok101-bitbucket-pr
  
  snapshotID: 01fc7cc8
  
  volume: bitbucket-postgresql-data
  
  status:
  
  completionTimestamp: 2019-10-21T06:38:20Z
  
  message: ""
  
  phase: Completed
  
  startTimestamp: 2019-10-21T06:37:59Z
  
  kind: List
  
  metadata:
  
  resourceVersion: ""
  
  selfLink: ""
  
```

My bitbucket data is not restored. No init container is created. But the postgres one is working as espected.

Do you find something in all theses logs that can explain this ?

Thanks for your help !

Stolr on 21 Oct 2019

@Stolr i'm not exactly sure what's going on, but I do see that during the backup, the pod that's being backed up is bitbucket-server-14-ddrpp, and then during/after restore, you end up with pod bitbucket-server-15-6528q. I do see in the Velero server log that during the restore, pod bitbucket-server-14-ddrpp is restored, but it seems like it's probably being deleted and replaced with bitbucket-server-15-6528q.

I'm not super-familiar with OpenShift's deploymentconfigs and (apparently) their use of replication controllers, but in plain vanilla Kubernetes, the way this would work is we'd restore pod "14", then restore the replicaset controlling it, and that replicaset would see pod "14" and "adopt" it. It seems like possibly, something about the deploymentconfig/replicationcontroller is preventing this "adoption" from happening, and triggering the creating of a new pod "15".

Does this ring any bells for you? Maybe we can figure it out together :)

skriss on 21 Oct 2019

@Stolr @skriss sorry to bump into conversation, just a thought. Instead of annotating the pod itself, can you try annotating the pod template spec of the parent controller, i.e.: Deployment or ReplicationController?

yashbhutwala on 21 Oct 2019

@skriss Wow thanks !!
You are right , the deployment number is not the same.

For some reason , openshift trigger a new deploy. Probably because of all resource beeing restore. No way to return on the 14 even with a rollback.

I'm not super familiar also with Openshift
I got rid of that deploymentConfig ( no point using it ) , and adapt everything using normal deployment.

Everything is working as espected using deployment.

@yashbhutwala : I Might try this when i will be able. Thanks for your answer.

Thanks both for your help. Since this issue is related to Openshift , you can close the issue if you want or rename it.

Best Regards

Thanks again for helping me getting through this.

Stolr on 22 Oct 2019

🎉1

@sseago @dymurray do you guys have any thoughts on what's going on here? (https://github.com/vmware-tanzu/velero/issues/1981#issuecomment-544709044)

skriss on 22 Oct 2019

sseago on 22 Oct 2019

👍2

To add on to what Scott said, yes we hit this same problem very early on. This is a problem that extends beyond OCP specific restores, my understanding is that any pod which is managed by another resource faces this risk.

If a pod is managed by another resource the restic restore will generally fail since both the pod and the managing resource is restored which causes the initial pod (with the restic annotation) to be overwritten. I could have sworn there was an open issue on this but I can't seem to find it right now.

dymurray on 22 Oct 2019

@dymurray not sure if this covers all of what you say, but I made an issue a month ago facing a similar issue. See: #1919

yashbhutwala on 23 Oct 2019

If a pod is managed by another resource the restic restore will generally fail since both the pod and the managing resource is restored which causes the initial pod (with the restic annotation) to be overwritten. I could have sworn there was an open issue on this but I can't seem to find it right now.

We haven't seen this, at least not with pods managed by replicasets/deployments. Per my comment (https://github.com/vmware-tanzu/velero/issues/1981#issuecomment-544709044), during a restic restore, we first restore the pod & trigger a restic restore, then restore the owning replicaset and deployment. The pod is successfully "adopted" by the replicaset, since the pod's spec matches the pod template spec from the replicaset.

If that behavior were different, then I agree it would likely cause problems with restic restores, which seems to be what we're seeing here. Can you shed any more light onto why the DeploymentConfig restore is triggering the creation of a new pod, rather than adopting the existing one?

skriss on 25 Feb 2020

From what I've seen with DeploymentConfigs they don't always trigger new pods, but sometimes they do. I believe they actually do (initially) adopt the restored pod, as expected, but if there's a ConfigChange trigger registered, then the restore event on the DeploymentConfig will sometimes trigger that if the restore process looks like a configuration change. Most of my experience here is in restoring resources to a different cluster than the backup came from, with some spec params modified by a plugin on restore ("image" references, for example, if the image is located in an in-cluster registry). The pod as restored will run for a short amount of time, but will terminate as soon as the ConfigChange triggered replacement is ready. Most recently, this week I've restored a couple DeploymentConfigs to the same cluster as the backup was run in, and in that case I did not see a replacement being created post-restore.

sseago on 28 Feb 2020

So I spent some time digging into this, and based on what I've learned I can say that yes the method Velero is currently taking with restic restores has it's shortcomings. Currently, we are lucky that a deployment doesn't trigger a new generation of the pod in 99% of the restore use cases. If you specifically trigger a redeploy during the restic restore then things will break as shown in #1919 .

With deploymentconfigs, there are a number of triggers you can set which will trigger the redeploy of a pod, but the bigger issue is that currently with DCs the pod is restored first with the restic annotation and then later adopted to the DC controller and redeployed wiping the annotation out. If a plugin is used to not restore a pod if it's managed by a DC in conjunction with placing the annotation on the DC pod template spec then the restic restore has a good chance of succeeding, but the same concern that Kubernetes could trigger a new deployment for deployments and deploymentconfigs during restore is a larger problem that needs to be solved.

dymurray on 28 Feb 2020

👍1

open to ideas on how to improve this. the data populator KEP that's making the rounds upstream may be relevant/useful, though AFAIK it's only for PVs, not any pod volume.

skriss on 6 Mar 2020

Well, I had just the same problem! Restore completed, no errors in logs but the PV is completely empty! Sucks.

dejwsz on 29 May 2020

I wanted to restore only PVC with PV itself and did it:

velero restore create --from-backup daily-20200528020046 --include-namespaces test-project --include-resources persistentvolumeclaims,persistentvolumes --restore-volumes=true

Completed, no errors. But there is no data at all.
I did not suspect that. Is there any way to make it working with restic?
I have DeploymentConfig but "replicas" is set to 0 and I removed ConfigChange from triggers.

dejwsz on 29 May 2020

What is interesting I tested it before but only after removing a whole project and then it was ok and even data was there. So it works only during restoring of whole projects? It is not possible to restore just a volume?

dejwsz on 29 May 2020

I can confirm - I can restore volumes only restoring a whole project. So a whole namespace - it must be empty.
You cannot restore volumes if there are some objects like deployments or other things. You cannot restore PVC with PV themselves separately using restic.

So in my case, I needed to restore to a mapped temporary namespace. Then go there and scale everything down. Then spin a new POD just to attach PV and rsync data out of the volume to my host. Then I deleted temporary namespace. I run the helper POD again in my original project and I needed there to connect to PV and rsync all the data there. Later I did chown with the user ID of the container. Removed helper POD and then finally scale up the deployment. And it worked and data was there from the backup snapshot. But the process is very inconvenient in such cases, very clumsy.

dejwsz on 29 May 2020

I'm facing this issue when I'm restoring a backup of prometheus-operator. My restore tests was done in the same cluster where backup lives but in another namespace. The production application was still live on it's own namespace.

My cluster is running in EKS. It's version is 1.16.

There are three PVs that should be backed-up: grafana, prometheus and alertmanager. Prometheus and grafana PVs could be restored without problems but alertmanager PV no, because alertmanager Statefulset is dynamically created by an Alertmanager object (from monitoring.coreos.com/v1 API). I can see in velero logs that it could successfully restore the alertmanager pod and could inject the restic-wait container on it. But, when Alertmanager object is restored, it creates the Statefulset which replaces the pod.

This is the velero logs that proves the restic-wait container creation on alertmanager pod:

time="2020-08-18T11:23:39Z" level=info msg="Restoring resource 'pods' into namespace 'monitoring-restored'" logSource="pkg/restore/restore.go:702" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Getting client for /v1, Kind=Pod" logSource="pkg/restore/restore.go:746" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Executing item action for pods" logSource="pkg/restore/restore.go:964" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Executing AddPVCFromPodAction" cmd=/velero logSource="pkg/restore/add_pvc_from_pod_action.go:44" pluginName=velero restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Adding PVC monitoring/alertmanager-prometheus-operator-alertmanager-db-alertmanager-prometheus-operator-alertmanager-0 as an additional item to restore" cmd=/velero logSource="pkg/restore/add_pvc_from_pod_action.go:58" pluginName=velero restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Skipping persistentvolumeclaims/monitoring-restored/alertmanager-prometheus-operator-alertmanager-db-alertmanager-prometheus-operator-alertmanager-0 because it's already been restored." logSource="pkg/restore/restore.go:844" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Executing item action for pods" logSource="pkg/restore/restore.go:964" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Executing item action for pods" logSource="pkg/restore/restore.go:964" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Executing ResticRestoreAction" cmd=/velero logSource="pkg/restore/restic_restore_action.go:69" pluginName=velero restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Restic backups for pod found" cmd=/velero logSource="pkg/restore/restic_restore_action.go:95" pluginName=velero pod=monitoring/alertmanager-prometheus-operator-alertmanager-0 restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=debug msg="Getting plugin config" cmd=/velero logSource="pkg/restore/restic_restore_action.go:99" pluginName=velero pod=monitoring/alertmanager-prometheus-operator-alertmanager-0 restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=debug msg="No config found for plugin" cmd=/velero logSource="pkg/restore/restic_restore_action.go:160" pluginName=velero pod=monitoring/alertmanager-prometheus-operator-alertmanager-0 restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Using image \"velero/velero-restic-restore-helper:v1.4.2\"" cmd=/velero logSource="pkg/restore/restic_restore_action.go:106" pluginName=velero pod=monitoring/alertmanager-prometheus-operator-alertmanager-0 restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=debug msg="No config found for plugin" cmd=/velero logSource="pkg/restore/restic_restore_action.go:195" pluginName=velero pod=monitoring/alertmanager-prometheus-operator-alertmanager-0 restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=debug msg="No config found for plugin" cmd=/velero logSource="pkg/restore/restic_restore_action.go:206" pluginName=velero pod=monitoring/alertmanager-prometheus-operator-alertmanager-0 restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Done executing ResticRestoreAction" cmd=/velero logSource="pkg/restore/restic_restore_action.go:155" pluginName=velero restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=info msg="Attempting to restore Pod: alertmanager-prometheus-operator-alertmanager-0" logSource="pkg/restore/restore.go:1070" restore=velero/monitoring
time="2020-08-18T11:23:39Z" level=debug msg="Acquiring lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:122" volumeNamespace=monitoring
time="2020-08-18T11:23:39Z" level=debug msg="Acquired lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:131" volumeNamespace=monitoring
time="2020-08-18T11:23:39Z" level=debug msg="Ready repository found" backupLocation=default logSource="pkg/restic/repository_ensurer.go:147" volumeNamespace=monitoring
time="2020-08-18T11:23:39Z" level=debug msg="Released lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:128" volumeNamespace=monitoring

1 second later, the Alertmanager object is restored:

time="2020-08-18T11:23:40Z" level=info msg="Restoring resource 'alertmanagers.monitoring.coreos.com' into namespace 'monitoring-restored'" logSource="pkg/restore/restore.go:702" restore=velero/monitoring
time="2020-08-18T11:23:40Z" level=info msg="Getting client for monitoring.coreos.com/v1, Kind=Alertmanager" logSource="pkg/restore/restore.go:746" restore=velero/monitoring
time="2020-08-18T11:23:40Z" level=info msg="Attempting to restore Alertmanager: prometheus-operator-alertmanager" logSource="pkg/restore/restore.go:1070" restore=velero/monitoring

This is the backup's content:

velero backup describe monitoring --details
Name:         monitoring
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.16.8-eks-e16311
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=16+

Phase:  Completed

Errors:    0
Warnings:  0

Namespaces:
  Included:  monitoring
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        certificates.cert-manager.io, certificaterequests.cert-manager.io, orders.acme.cert-manager.io
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2020-08-18 10:17:16 +0200 CEST
Completed:  2020-08-18 10:17:57 +0200 CEST

Expiration:  2020-09-17 10:17:16 +0200 CEST

Total items to be backed up:  234
Items backed up:              234

Resource List:
  apiextensions.k8s.io/v1/CustomResourceDefinition:
    - alertmanagers.monitoring.coreos.com
    - prometheuses.monitoring.coreos.com
    - prometheusrules.monitoring.coreos.com
    - servicemonitors.monitoring.coreos.com
  apps/v1/ControllerRevision:
    - monitoring/alertmanager-prometheus-operator-alertmanager-54df75fb5b
    - monitoring/prometheus-operator-prometheus-node-exporter-599f4fbbfd
    - monitoring/prometheus-prometheus-operator-prometheus-6cbd9d8d8b
  apps/v1/DaemonSet:
    - monitoring/prometheus-operator-prometheus-node-exporter
  apps/v1/Deployment:
    - monitoring/prometheus-operator-grafana
    - monitoring/prometheus-operator-kube-state-metrics
    - monitoring/prometheus-operator-operator
  apps/v1/ReplicaSet:
    - monitoring/prometheus-operator-grafana-5986dbf74f
    - monitoring/prometheus-operator-grafana-7ff4f8b97b
    - monitoring/prometheus-operator-kube-state-metrics-6f8cc5ffd5
    - monitoring/prometheus-operator-operator-fd978d8d7
  apps/v1/StatefulSet:
    - monitoring/alertmanager-prometheus-operator-alertmanager
    - monitoring/prometheus-prometheus-operator-prometheus
  extensions/v1beta1/Ingress:
    - monitoring/prometheus-operator-alertmanager
    - monitoring/prometheus-operator-grafana
    - monitoring/prometheus-operator-prometheus
  monitoring.coreos.com/v1/Alertmanager:
    - monitoring/prometheus-operator-alertmanager
  monitoring.coreos.com/v1/Prometheus:
    - monitoring/prometheus-operator-prometheus
  monitoring.coreos.com/v1/PrometheusRule:
    - monitoring/prometheus-operator-alertmanager.rules
    - monitoring/prometheus-operator-etcd
    - monitoring/prometheus-operator-general.rules
    - monitoring/prometheus-operator-k8s.rules
    - monitoring/prometheus-operator-kube-apiserver-slos
    - monitoring/prometheus-operator-kube-apiserver.rules
    - monitoring/prometheus-operator-kube-prometheus-general.rules
    - monitoring/prometheus-operator-kube-prometheus-node-recording.rules
    - monitoring/prometheus-operator-kube-scheduler.rules
    - monitoring/prometheus-operator-kube-state-metrics
    - monitoring/prometheus-operator-kubelet.rules
    - monitoring/prometheus-operator-kubernetes-apps
    - monitoring/prometheus-operator-kubernetes-resources
    - monitoring/prometheus-operator-kubernetes-storage
    - monitoring/prometheus-operator-kubernetes-system
    - monitoring/prometheus-operator-kubernetes-system-apiserver
    - monitoring/prometheus-operator-kubernetes-system-controller-manager
    - monitoring/prometheus-operator-kubernetes-system-kubelet
    - monitoring/prometheus-operator-kubernetes-system-scheduler
    - monitoring/prometheus-operator-node-exporter
    - monitoring/prometheus-operator-node-exporter.rules
    - monitoring/prometheus-operator-node-network
    - monitoring/prometheus-operator-node.rules
    - monitoring/prometheus-operator-prometheus
    - monitoring/prometheus-operator-prometheus-operator
  monitoring.coreos.com/v1/ServiceMonitor:
    - monitoring/prometheus-operator-alertmanager
    - monitoring/prometheus-operator-apiserver
    - monitoring/prometheus-operator-coredns
    - monitoring/prometheus-operator-grafana
    - monitoring/prometheus-operator-kube-controller-manager
    - monitoring/prometheus-operator-kube-etcd
    - monitoring/prometheus-operator-kube-proxy
    - monitoring/prometheus-operator-kube-scheduler
    - monitoring/prometheus-operator-kube-state-metrics
    - monitoring/prometheus-operator-kubelet
    - monitoring/prometheus-operator-node-exporter
    - monitoring/prometheus-operator-operator
    - monitoring/prometheus-operator-prometheus
  networking.k8s.io/v1beta1/Ingress:
    - monitoring/prometheus-operator-alertmanager
    - monitoring/prometheus-operator-grafana
    - monitoring/prometheus-operator-prometheus
  rbac.authorization.k8s.io/v1/ClusterRole:
    - prometheus-operator-grafana-clusterrole
    - prometheus-operator-kube-state-metrics
    - prometheus-operator-operator
    - prometheus-operator-operator-psp
    - prometheus-operator-prometheus
    - prometheus-operator-prometheus-psp
    - psp-prometheus-operator-kube-state-metrics
    - psp-prometheus-operator-prometheus-node-exporter
  rbac.authorization.k8s.io/v1/ClusterRoleBinding:
    - prometheus-operator-grafana-clusterrolebinding
    - prometheus-operator-kube-state-metrics
    - prometheus-operator-operator
    - prometheus-operator-operator-psp
    - prometheus-operator-prometheus
    - prometheus-operator-prometheus-psp
    - psp-prometheus-operator-kube-state-metrics
    - psp-prometheus-operator-prometheus-node-exporter
  rbac.authorization.k8s.io/v1/Role:
    - monitoring/prometheus-operator-alertmanager
    - monitoring/prometheus-operator-grafana
    - monitoring/prometheus-operator-grafana-test
  rbac.authorization.k8s.io/v1/RoleBinding:
    - monitoring/prometheus-operator-alertmanager
    - monitoring/prometheus-operator-grafana
    - monitoring/prometheus-operator-grafana-test
  v1/ConfigMap:
    - monitoring/prometheus-operator-apiserver
    - monitoring/prometheus-operator-cluster-total
    - monitoring/prometheus-operator-controller-manager
    - monitoring/prometheus-operator-etcd
    - monitoring/prometheus-operator-grafana
    - monitoring/prometheus-operator-grafana-config-dashboards
    - monitoring/prometheus-operator-grafana-datasource
    - monitoring/prometheus-operator-grafana-test
    - monitoring/prometheus-operator-k8s-coredns
    - monitoring/prometheus-operator-k8s-resources-cluster
    - monitoring/prometheus-operator-k8s-resources-namespace
    - monitoring/prometheus-operator-k8s-resources-node
    - monitoring/prometheus-operator-k8s-resources-pod
    - monitoring/prometheus-operator-k8s-resources-workload
    - monitoring/prometheus-operator-k8s-resources-workloads-namespace
    - monitoring/prometheus-operator-kubelet
    - monitoring/prometheus-operator-namespace-by-pod
    - monitoring/prometheus-operator-namespace-by-workload
    - monitoring/prometheus-operator-node-cluster-rsrc-use
    - monitoring/prometheus-operator-node-rsrc-use
    - monitoring/prometheus-operator-nodes
    - monitoring/prometheus-operator-persistentvolumesusage
    - monitoring/prometheus-operator-pod-total
    - monitoring/prometheus-operator-prometheus
    - monitoring/prometheus-operator-proxy
    - monitoring/prometheus-operator-scheduler
    - monitoring/prometheus-operator-statefulset
    - monitoring/prometheus-operator-workload-total
    - monitoring/prometheus-prometheus-operator-prometheus-rulefiles-0
  v1/Endpoints:
    - monitoring/alertmanager-operated
    - monitoring/prometheus-operated
    - monitoring/prometheus-operator-alertmanager
    - monitoring/prometheus-operator-grafana
    - monitoring/prometheus-operator-kube-state-metrics
    - monitoring/prometheus-operator-operator
    - monitoring/prometheus-operator-prometheus
    - monitoring/prometheus-operator-prometheus-node-exporter
  v1/Event:
    - monitoring/prometheus-operator-admission-create-ngxh5.162c4eac037b378f
    - monitoring/prometheus-operator-admission-create-ngxh5.162c4eac3d7a4c20
    - monitoring/prometheus-operator-admission-create-ngxh5.162c4eacfd856868
    - monitoring/prometheus-operator-admission-create-ngxh5.162c4ead0a39ac70
    - monitoring/prometheus-operator-admission-create-ngxh5.162c4ead13445eeb
    - monitoring/prometheus-operator-admission-create-ngxh5.162c4ead713ac0dc
    - monitoring/prometheus-operator-admission-create-ngxh5.162c4ead8cff268e
    - monitoring/prometheus-operator-admission-create.162c4eac0309e0cb
    - monitoring/prometheus-operator-admission-patch-4pt6r.162c4eb4cb068bca
    - monitoring/prometheus-operator-admission-patch-4pt6r.162c4eb517441275
    - monitoring/prometheus-operator-admission-patch-4pt6r.162c4eb51d3ac352
    - monitoring/prometheus-operator-admission-patch-4pt6r.162c4eb52b6739be
    - monitoring/prometheus-operator-admission-patch.162c4eb4ca533c92
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e619ce31070
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e637284176b
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e71b870b6b4
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7b2a4186a1
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7ba8d7beae
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7c23594737
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7c2b84195d
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7c36882b14
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7c67022081
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7e1d1be052
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7e2fa25cfc
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7e40871cd9
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7ec738b8ab
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7eca575457
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7ed9a8b480
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e7ed9d7309c
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e826db5397b
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e82883b0412
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4e82a018a2d3
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4eb25495f5c9
    - monitoring/prometheus-operator-grafana-5986dbf74f-nv429.162c4eb25496faf5
    - monitoring/prometheus-operator-grafana-5986dbf74f-q7q88.162c4e6199f96154
    - monitoring/prometheus-operator-grafana-5986dbf74f-q7q88.162c4e619a02fad3
    - monitoring/prometheus-operator-grafana-5986dbf74f-q7q88.162c4e619a049a56
    - monitoring/prometheus-operator-grafana-5986dbf74f.162c4e619c9d5316
    - monitoring/prometheus-operator-grafana-5986dbf74f.162c4eb254704b48
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf32ce5cd2
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf7c0b856d
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf7f2b718e
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf874cd7a1
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf9133924c
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf9468b3d9
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaf9decda56
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eafce8b1887
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eafd2252390
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eafdbc8ec47
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eafdc3af0c0
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eafe8f543b5
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk.162c4eaff4fd17b2
    - monitoring/prometheus-operator-grafana-7ff4f8b97b.162c4eaf31f3b2ba
    - monitoring/prometheus-operator-grafana.162c4eaf3087e1e1
    - monitoring/prometheus-operator-grafana.162c4eb253680d39
    - monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e71bff6bc71
    - monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e7210a514b8
    - monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e735b1ee1b9
    - monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e7405ed1b22
    - monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e74199ddb10
    - monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e7b19464cfd
    - monitoring/prometheus-operator-prometheus-node-exporter-slszj.162c4e7b1a45f166
    - monitoring/prometheus-operator-prometheus-node-exporter.162c4e71bdefdbaf
    - monitoring/prometheus-operator-prometheus-node-exporter.162c4e7b1a499523
  v1/Namespace:
    - monitoring
  v1/PersistentVolume:
    - pvc-502cf99f-99fb-4a83-abd9-2a15bcf2a30d
    - pvc-7107894a-2ede-473e-9c24-2cb5a3f9d7f1
    - pvc-e6d638c0-b4a8-4bcf-a9d1-1f66c387c7e9
  v1/PersistentVolumeClaim:
    - monitoring/alertmanager-prometheus-operator-alertmanager-db-alertmanager-prometheus-operator-alertmanager-0
    - monitoring/prometheus-operator-grafana
    - monitoring/prometheus-prometheus-operator-prometheus-db-prometheus-prometheus-operator-prometheus-0
  v1/Pod:
    - monitoring/alertmanager-prometheus-operator-alertmanager-0
    - monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk
    - monitoring/prometheus-operator-kube-state-metrics-6f8cc5ffd5-47jbw
    - monitoring/prometheus-operator-operator-fd978d8d7-cf956
    - monitoring/prometheus-operator-prometheus-node-exporter-fxl7s
    - monitoring/prometheus-prometheus-operator-prometheus-0
  v1/Secret:
    - monitoring/alertmanager-prometheus-operator-alertmanager
    - monitoring/alertmanager.ict.navinfo.cloud-tls
    - monitoring/default-token-vf8dm
    - monitoring/grafana.ict.navinfo.cloud-tls
    - monitoring/ict-admission
    - monitoring/prometheus-operator-admission
    - monitoring/prometheus-operator-alertmanager-token-jxljb
    - monitoring/prometheus-operator-grafana
    - monitoring/prometheus-operator-grafana-test-token-q5lsl
    - monitoring/prometheus-operator-grafana-token-949ch
    - monitoring/prometheus-operator-kube-state-metrics-token-9gsz5
    - monitoring/prometheus-operator-operator-token-556vs
    - monitoring/prometheus-operator-prometheus-node-exporter-token-9f545
    - monitoring/prometheus-operator-prometheus-token-bxb9w
    - monitoring/prometheus-prometheus-operator-prometheus
    - monitoring/prometheus-prometheus-operator-prometheus-tls-assets
    - monitoring/prometheus.ict.navinfo.cloud-tls
    - monitoring/sh.helm.release.v1.prometheus-operator.v1
    - monitoring/sh.helm.release.v1.prometheus-operator.v2
  v1/Service:
    - monitoring/alertmanager-operated
    - monitoring/prometheus-operated
    - monitoring/prometheus-operator-alertmanager
    - monitoring/prometheus-operator-grafana
    - monitoring/prometheus-operator-kube-state-metrics
    - monitoring/prometheus-operator-operator
    - monitoring/prometheus-operator-prometheus
    - monitoring/prometheus-operator-prometheus-node-exporter
  v1/ServiceAccount:
    - monitoring/default
    - monitoring/prometheus-operator-alertmanager
    - monitoring/prometheus-operator-grafana
    - monitoring/prometheus-operator-grafana-test
    - monitoring/prometheus-operator-kube-state-metrics
    - monitoring/prometheus-operator-operator
    - monitoring/prometheus-operator-prometheus
    - monitoring/prometheus-operator-prometheus-node-exporter

Velero-Native Snapshots: <none included>

Restic Backups:
  Completed:
    monitoring/alertmanager-prometheus-operator-alertmanager-0: alertmanager-prometheus-operator-alertmanager-db
    monitoring/prometheus-operator-grafana-7ff4f8b97b-jxwzk: storage
    monitoring/prometheus-prometheus-operator-prometheus-0: prometheus-prometheus-operator-prometheus-db

This is the restore details. Note that velero couldn't restore alertmanager-prometheus-operator-alertmanager Statefulset because it was already created by Alertmanager object. It couldn't restore prometheus-prometheus-operator-prometheus Statefulset also because it is created by Prometheus object (other prometheus-operator CRD). But, it's PV could be restored because the created Statefulset could "adopt" the restored POD. I have no clues why alertmanager Statefulset couldn't "adopt" the restored alertmanager POD. Perhaps a racing condition or something else...

Name:         monitoring
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  PartiallyFailed (run 'velero restore logs monitoring' for more information)

Warnings:
  Velero:     <none>
  Cluster:  could not restore, customresourcedefinitions.apiextensions.k8s.io "alertmanagers.monitoring.coreos.com" already exists. Warning: the in-cluster version is different than the backed-up version.
            could not restore, customresourcedefinitions.apiextensions.k8s.io "prometheuses.monitoring.coreos.com" already exists. Warning: the in-cluster version is different than the backed-up version.
            could not restore, customresourcedefinitions.apiextensions.k8s.io "prometheusrules.monitoring.coreos.com" already exists. Warning: the in-cluster version is different than the backed-up version.
            could not restore, customresourcedefinitions.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" already exists. Warning: the in-cluster version is different than the backed-up version.
            could not restore, clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-grafana-clusterrolebinding" already exists. Warning: the in-cluster version is different than the backed-up version.
            could not restore, clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-kube-state-metrics" already exists. Warning: the in-cluster version is different than the backed-up version.
            could not restore, clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-operator-psp" already exists. Warning: the in-cluster version is different than the backed-up version.
            could not restore, clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-operator" already exists. Warning: the in-cluster version is different than the backed-up version.
            could not restore, clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-prometheus-psp" already exists. Warning: the in-cluster version is different than the backed-up version.
            could not restore, clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-prometheus" already exists. Warning: the in-cluster version is different than the backed-up version.
            could not restore, clusterrolebindings.rbac.authorization.k8s.io "psp-prometheus-operator-kube-state-metrics" already exists. Warning: the in-cluster version is different than the backed-up version.
            could not restore, clusterrolebindings.rbac.authorization.k8s.io "psp-prometheus-operator-prometheus-node-exporter" already exists. Warning: the in-cluster version is different than the backed-up version.
  Namespaces:
    monitoring-restored:  could not restore, endpoints "alertmanager-operated" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, services "alertmanager-operated" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, services "prometheus-operated" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, statefulsets.apps "alertmanager-prometheus-operator-alertmanager" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, statefulsets.apps "prometheus-prometheus-operator-prometheus" already exists. Warning: the in-cluster version is different than the backed-up version.

Errors:
  Velero:   timed out waiting for all PodVolumeRestores to complete
  Cluster:    <none>
  Namespaces: <none>

Backup:  monitoring

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  monitoring=monitoring-restored

Label selector:  <none>

Restore PVs:  auto

Restic Restores:
  Completed:
    monitoring-restored/prometheus-operator-grafana-7ff4f8b97b-jxwzk: storage
    monitoring-restored/prometheus-prometheus-operator-prometheus-0: prometheus-prometheus-operator-prometheus-db
  New:
    monitoring-restored/alertmanager-prometheus-operator-alertmanager-0: alertmanager-prometheus-operator-alertmanager-db

I'll try to first restore the Pods and PVs and then the rest.

galindro on 19 Aug 2020

The PVs restore by using the bellow command was successfully executed:

velero restore create monitoring-1 --from-backup monitoring --namespace-mappings monitoring:monitoring-restored \
  --exclude-resources=alertmanager.monitoring.coreos.com,prometheuses.monitoring.coreos.com

After that, I could restore without worries the Alertmanager and Prometheuses objects:

velero restore create monitoring-cdrs --from-backup monitoring --namespace-mappings monitoring:monitoring-restored \
  --include-resources=alertmanager.monitoring.coreos.com,prometheuses.monitoring.coreos.com

galindro on 19 Aug 2020

Closing this because this issue was (mostly) resolved for the reporter, but @sseago or @dymurray feel free to reopen if you want to work on this.