Velero: Can't restore restic volumes

Created on 14 Mar 2019 · 28Comments · Source: vmware-tanzu/velero

What steps did you take and what happened:
I created a backup that included some PVCs backed up using restic (with the backup.velero.io/backup-volumes annotation). The backup completed successfully. I was using S3. These PVCs are bound to normal EBS PV's, so I know I can use snapshots, but I was trying out restic.

I then deleted the cluster and set up a new kops cluster, installed velero using the same manifests and created a restore job.

The restore job then gets stuck in pending.

What did you expect to happen:

Everything to restore.

The output of the following commands will help us better understand what's going on:

kubectl logs deployment/velero -n velero

<snip>
time="2019-03-14T01:55:43Z" level=info msg="Restoring cluster level resource 'persistentvolumes' from: /tmp/725321603/resources/persistentvolumes/cluster" backup=bts3 logSource="pkg/restore/restore.go:696" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:43Z" level=info msg="Getting client for /v1, Kind=PersistentVolume" backup=bts3 logSource="pkg/restore/restore.go:754" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:43Z" level=info msg="No snapshot found for persistent volume" backup=bts3 logSource="pkg/restore/restore.go:1148" persistentVolume=pvc-50d4544c-4526-11e9-9f1f-0aa34d933732 restore=velero/bts3-20190313185509
time="2019-03-14T01:55:43Z" level=info msg="Attempting to restore PersistentVolume: pvc-50d4544c-4526-11e9-9f1f-0aa34d933732" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:43Z" level=info msg="error restoring pvc-50d4544c-4526-11e9-9f1f-0aa34d933732: <nil>" backup=bts3 logSource="pkg/restore/restore.go:964" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="successfully restored persistent volume from snapshot" backup=bts3 logSource="pkg/restore/restore.go:1166" persistentVolume=pvc-510b74ed-4526-11e9-9f1f-0aa34d933732 providerSnapshotID=snap-094014b90adb832dc restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolume: pvc-510b74ed-4526-11e9-9f1f-0aa34d933732" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="No snapshot found for persistent volume" backup=bts3 logSource="pkg/restore/restore.go:1148" persistentVolume=pvc-53b5ffce-4526-11e9-9f1f-0aa34d933732 restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolume: pvc-53b5ffce-4526-11e9-9f1f-0aa34d933732" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="error restoring pvc-53b5ffce-4526-11e9-9f1f-0aa34d933732: <nil>" backup=bts3 logSource="pkg/restore/restore.go:964" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="No snapshot found for persistent volume" backup=bts3 logSource="pkg/restore/restore.go:1148" persistentVolume=pvc-53f0ace1-4526-11e9-9f1f-0aa34d933732 restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolume: pvc-53f0ace1-4526-11e9-9f1f-0aa34d933732" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="error restoring pvc-53f0ace1-4526-11e9-9f1f-0aa34d933732: <nil>" backup=bts3 logSource="pkg/restore/restore.go:964" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="No snapshot found for persistent volume" backup=bts3 logSource="pkg/restore/restore.go:1148" persistentVolume=pvc-54284933-4526-11e9-9f1f-0aa34d933732 restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolume: pvc-54284933-4526-11e9-9f1f-0aa34d933732" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="error restoring pvc-54284933-4526-11e9-9f1f-0aa34d933732: <nil>" backup=bts3 logSource="pkg/restore/restore.go:964" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Restoring resource 'persistentvolumeclaims' into namespace 'sgs' from: /tmp/725321603/resources/persistentvolumeclaims/namespaces/sgs" backup=bts3 logSource="pkg/restore/restore.go:694" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Getting client for /v1, Kind=PersistentVolumeClaim" backup=bts3 logSource="pkg/restore/restore.go:754" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolumeClaim: etcd-pv-0" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolumeClaim: etcd-pv-1" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolumeClaim: etcd-pv-2" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolumeClaim: postgres-pv-0" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolumeClaim: postgres-pv-1" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:56:43Z" level=warning msg="Timeout reached waiting for persistent volume pvc-50d4544c-4526-11e9-9f1f-0aa34d933732 to become ready" backup=bts3 logSource="pkg/restore/restore.go:830" restore=velero/bts3-20190313185509

<snip>

velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml

Name:         bts3
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  Completed

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2019-03-12 18:48:59 -0700 PDT
Completed:  2019-03-12 18:51:17 -0700 PDT

Expiration:  2019-04-11 18:48:59 -0700 PDT

Validation errors:  <none>

Persistent Volumes:
  pvc-510b74ed-4526-11e9-9f1f-0aa34d933732:
    Snapshot ID:        snap-094014b90adb832dc
    Type:               gp2
    Availability Zone:  us-west-2c
    IOPS:               <N/A>

Restic Backups:
  New:
    sgs/etcd-0-79894c7d66-jv9hz: data, data, data
    sgs/etcd-1-74d4fdd498-9wdxw: data, data, data
    sgs/etcd-2-bbdb44bdd-xlb27: data, data, data
    sgs/postgres-0-fd9f5dfd4-fql6b: data, data
    sgs/postgres-0-fd9f5dfd4-h6mks: data

Anything else you would like to add:

The restore also hangs, so there are no logs for the restore, only what I extract from the velero pod.

Also, this is the output of kubectl get pvc:

sgs         etcd-pv-0       Lost     pvc-53b5ffce-4526-11e9-9f1f-0aa34d933732   0                         gp2-us-west-2c   8m
sgs         etcd-pv-1       Lost     pvc-53f0ace1-4526-11e9-9f1f-0aa34d933732   0                         gp2-us-west-2c   8m
sgs         etcd-pv-2       Lost     pvc-54284933-4526-11e9-9f1f-0aa34d933732   0                         gp2-us-west-2c   8m
sgs         postgres-pv-0   Lost     pvc-50d4544c-4526-11e9-9f1f-0aa34d933732   0                         gp2-us-west-2c   8m
sgs         postgres-pv-1   Bound    pvc-510b74ed-4526-11e9-9f1f-0aa34d933732   100Gi      RWO            gp2-us-west-2c   8m

postgres-pv-1 is a volume I forgot the annotation on so it used normal snapshots. The other PVCs I set the annotation on.

Environment:

Velero version (use velero version):
0.11.0
Kubernetes version (use kubectl version):
1.11
Kubernetes installer & version:
KOPS 1.11
Cloud provider or hardware configuration:
AWS
OS (e.g. from /etc/os-release):

Source

immesys

Most helpful comment

I got into a similar situation where I was trying to do restores that included restic volumes. All of the restores were getting stuck in STATUS=New. The way I resolved it was to delete the velero pod, so the velero deployment would recreate it. Then, the newly created velero pod started picking up the restores and their STATUS changed to InProgress.

ngealy on 25 Jul 2019

👍4

All 28 comments

It doesn't look like any of your restic backups ran. Can you confirm you deployed the restic daemonset, and that the pods are running?

kubectl -n velero get daemonsets
kubectl -n velero get pods

skriss on 14 Mar 2019

The restic pods were running, one per node. How can you tell the restic backups didn't run? The backup phase was complete and there were no errors so I assumed those all ran successfully. Or do you mean the restores?

immesys on 14 Mar 2019

from your velero backup describe output:

Restic Backups:
  New:
    sgs/etcd-0-79894c7d66-jv9hz: data, data, data
    sgs/etcd-1-74d4fdd498-9wdxw: data, data, data
    sgs/etcd-2-bbdb44bdd-xlb27: data, data, data
    sgs/postgres-0-fd9f5dfd4-fql6b: data, data
    sgs/postgres-0-fd9f5dfd4-h6mks: data

If they ran, they should've shown as completed. The fact that they're still in New phase implies they weren't even picked up by the relevant controller. Do you still have the setup running? I'd want to see the pods, and the logs from the restic pods.

skriss on 14 Mar 2019

I tore it down, but happy to recreate it for you if that would help. Give me a few minutes

immesys on 14 Mar 2019

Huh, when trying to recreate it I didn't get the same result. Looks like the restic backups say complete this time:

Name:         bts5
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  Completed

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2019-03-14 11:32:52 -0700 PDT
Completed:  2019-03-14 11:35:10 -0700 PDT

Expiration:  2019-04-13 11:32:52 -0700 PDT

Validation errors:  <none>

Persistent Volumes: <none included>

Restic Backups:
  Completed:
    sgs/etcd-0-79894c7d66-lzgmq: data
    sgs/etcd-1-74d4fdd498-lrt8h: data
    sgs/etcd-2-bbdb44bdd-75fhl: data
    sgs/postgres-0-fd9f5dfd4-29kss: data
    sgs/postgres-1-5dc54b4dc6-85q8t: data

I don't recall doing anything different, but now I know to check for that. If this restores successfully I'll close the issue

immesys on 14 Mar 2019

OK sounds good - this one looks right!

skriss on 14 Mar 2019

The restore appears to have hung on:

time="2019-03-14T18:54:34Z" level=info msg="Restoring resource 'volumesnapshotlocations.velero.io' into namespace 'velero' from: /tmp/755586717/resources/volumesnapshotlocations.velero.io/namespaces/velero" backup=bts5 logSource="pkg/restore/restore.go:694" restore=velero/bts5-20190314115236
time="2019-03-14T18:54:34Z" level=info msg="Getting client for velero.io/v1, Kind=VolumeSnapshotLocation" backup=bts5 logSource="pkg/restore/restore.go:754" restore=velero/bts5-20190314115236
time="2019-03-14T18:54:34Z" level=info msg="Attempting to restore VolumeSnapshotLocation: aws-default" backup=bts5 logSource="pkg/restore/restore.go:903" restore=velero/bts5-20190314115236
time="2019-03-14T18:54:34Z" level=info msg="Skipping restore of VolumeSnapshotLocation: aws-default because it already exists in the cluster and is unchanged from the backed up version" backup=bts5 logSource="pkg/restore/restore.go:926" restore=velero/bts5-20190314115236
time="2019-03-14T18:54:35Z" level=error msg="No ready channel found for repository velero/sgs-default-flllc" logSource="pkg/restic/repository_ensurer.go:78"

immesys on 14 Mar 2019

Full log output:

https://gist.github.com/immesys/f2fc1d0efa5bfcfefcd213c75c411e0b

immesys on 14 Mar 2019

The restic pods all say

time="2019-03-14T18:48:58Z" level=info msg="Setting log-level to INFO"
time="2019-03-14T18:48:58Z" level=info msg="Starting Velero restic server ab2fc65c02ecdcb9bb1690e3afe67a3b3196aebc" logSource="pkg/cmd/cli/restic/server.go:57"
time="2019-03-14T18:48:58Z" level=info msg="Starting controllers" logSource="pkg/cmd/cli/restic/server.go:145"
time="2019-03-14T18:48:58Z" level=info msg="Controllers started successfully" logSource="pkg/cmd/cli/restic/server.go:186"
time="2019-03-14T18:48:58Z" level=info msg="Starting controller" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:76"
time="2019-03-14T18:48:58Z" level=info msg="Waiting for caches to sync" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:79"
time="2019-03-14T18:48:58Z" level=info msg="Starting controller" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:76"
time="2019-03-14T18:48:58Z" level=info msg="Waiting for caches to sync" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:79"
time="2019-03-14T18:48:59Z" level=info msg="Caches are synced" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:83"
time="2019-03-14T18:48:59Z" level=info msg="Caches are synced" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:83"

immesys on 14 Mar 2019

what does velero restore describe NAME --details show?

skriss on 14 Mar 2019

Name:         bts5-20190314115236
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Backup:  bts5

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.ark.heptio.com, backups.velero.io, restores.ark.heptio.com, restores.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

Phase:  InProgress

Validation errors:  <none>

Warnings:  <none>
Errors:    <none>

Restic Restores:
  New:
    sgs/etcd-0-79894c7d66-lzgmq: data
    sgs/etcd-1-74d4fdd498-lrt8h: data
    sgs/etcd-2-bbdb44bdd-75fhl: data
    sgs/postgres-0-fd9f5dfd4-29kss: data
    sgs/postgres-1-5dc54b4dc6-85q8t: data

immesys on 14 Mar 2019

can you describe a couple of these pods, e.g. kubectl -n sgs describe pod etcd-0-79894c7d66-lzgmq?

skriss on 14 Mar 2019

etcd-0

Name:               etcd-0-79894c7d66-lzgmq
Namespace:          sgs
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             cluster=etcd
                    name=etcd-0
                    pod-template-hash=3545073822
                    velero-restore=bts5-20190314115236
                    velero.io/backup-name=bts5
                    velero.io/restore-name=bts5-20190314115236
Annotations:        backup.velero.io/backup-volumes: data
                    snapshot.velero.io/data: 4f564002
Status:             Pending
IP:                 
Controlled By:      ReplicaSet/etcd-0-79894c7d66
Init Containers:
  restic-wait:
    Image:      gcr.io/heptio-images/velero-restic-restore-helper:v0.11.0
    Port:       <none>
    Host Port:  <none>
    Args:
      55541829-468a-11e9-8017-0a17ae9bd8c6
    Environment:
      POD_NAMESPACE:  sgs (v1:metadata.namespace)
      POD_NAME:       etcd-0-79894c7d66-lzgmq (v1:metadata.name)
    Mounts:
      /restores/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-ncw9q (ro)
Containers:
  etcd:
    Image:       quay.io/coreos/etcd:v3.3.11
    Ports:       2379/TCP, 2380/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      /usr/local/bin/etcd
      --name
      etcd-0
      --data-dir
      /data
      --initial-advertise-peer-urls
      http://etcd-0:2380
      --listen-peer-urls
      http://0.0.0.0:2380
      --listen-client-urls
      http://0.0.0.0:2379
      --advertise-client-urls
      http://etcd-0:2379
      --initial-cluster
      etcd-0=http://etcd-0:2380,etcd-1=http://etcd-1:2380,etcd-2=http://etcd-2:2380
      --initial-cluster-state
      new
    Environment:
      ETCD_AUTO_COMPACTION_RETENTION:  1
      ETCD_QUOTA_BACKEND_BYTES:        8589934592
    Mounts:
      /data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-ncw9q (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  etcd-pv-0
    ReadOnly:   false
  default-token-ncw9q:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-ncw9q
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  10m (x22 over 11m)  default-scheduler  could not find v1.PersistentVolume "pvc-44e44335-4686-11e9-a0b6-0ae02beae008" (repeated 3 times)

PVCs:

kubectl -n sgs get pvc
NAME            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
etcd-pv-0       Lost     pvc-44e44335-4686-11e9-a0b6-0ae02beae008   0                         gp2-us-west-2c   12m
etcd-pv-1       Lost     pvc-4519f4be-4686-11e9-a0b6-0ae02beae008   0                         gp2-us-west-2c   12m
etcd-pv-2       Lost     pvc-45467a0f-4686-11e9-a0b6-0ae02beae008   0                         gp2-us-west-2c   12m
postgres-pv-0   Lost     pvc-48549731-4686-11e9-a0b6-0ae02beae008   0                         gp2-us-west-2c   12m
postgres-pv-1   Lost     pvc-4888fac7-4686-11e9-a0b6-0ae02beae008   0                         gp2-us-west-2c   12m

immesys on 14 Mar 2019

Similar for postgres:

Name:               postgres-0-fd9f5dfd4-29kss
Namespace:          sgs
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             cluster=postgres
                    name=postgres-0
                    pod-template-hash=985918980
                    role=master
                    velero-restore=bts5-20190314115236
                    velero.io/backup-name=bts5
                    velero.io/restore-name=bts5-20190314115236
Annotations:        backup.velero.io/backup-volumes: data
                    snapshot.velero.io/data: b20d51b3
Status:             Pending
IP:                 
Controlled By:      ReplicaSet/postgres-0-fd9f5dfd4
Init Containers:
  restic-wait:
    Image:      gcr.io/heptio-images/velero-restic-restore-helper:v0.11.0
    Port:       <none>
    Host Port:  <none>
    Args:
      55541829-468a-11e9-8017-0a17ae9bd8c6
    Environment:
      POD_NAMESPACE:  sgs (v1:metadata.namespace)
      POD_NAME:       postgres-0-fd9f5dfd4-29kss (v1:metadata.name)
    Mounts:
      /restores/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-ncw9q (ro)
Containers:
  postgres:
    Image:      quay.io/pingthingsio/postgres:v5.1.15
    Port:       5432/TCP
    Host Port:  0/TCP
    Environment:
      POSTGRES_PASSWORD:   <set to the key 'pgpassword' in secret 'postgres-root-password'>  Optional: false
      POSTGRES_RPASSWORD:  <set to the key 'pgpassword' in secret 'postgres-repl-password'>  Optional: false
      ROLE:                master
    Mounts:
      /var/lib/postgresql/backups from data (rw)
      /var/lib/postgresql/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-ncw9q (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  postgres-pv-0
    ReadOnly:   false
  default-token-ncw9q:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-ncw9q
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  11m (x23 over 12m)  default-scheduler  could not find v1.PersistentVolume "pvc-48549731-4686-11e9-a0b6-0ae02beae008" (repeated 3 times)

immesys on 14 Mar 2019

kubectl describe pv pvc-44e44335-4686-11e9-a0b6-0ae02beae008?

skriss on 14 Mar 2019

michael@shadowpad:~/w/devops/bts$ kubectl describe pv pvc-44e44335-4686-11e9-a0b6-0ae02beae008
Error from server (NotFound): persistentvolumes "pvc-44e44335-4686-11e9-a0b6-0ae02beae008" not found
michael@shadowpad:~/w/devops/bts$ kubectl get pv --all-namespaces
No resources found.

immesys on 14 Mar 2019

Do these PVs have a reclaim policy of Retain?

skriss on 14 Mar 2019

I can check what they were, but to be clear, I am restoring the backup into a completely different cluster (I did a kops delete, kops create)

immesys on 14 Mar 2019

Yes the reclaim policy was Retain

immesys on 14 Mar 2019

OK...so I'm pretty sure you're running into https://github.com/heptio/velero/issues/1151 :( https://github.com/heptio/velero/issues/1151#issuecomment-463849604 is a proposal for how to fix for this case. Unfortunately it's not implemented yet.

skriss on 14 Mar 2019

So, if I just use snapshots instead of Restic I should avoid that issue for the time being?

immesys on 14 Mar 2019

yeah, snapshots should work fine in this scenario

skriss on 14 Mar 2019

Great. Thank you for your help. I'll follow #1151 and wait for a resolution. The restic backup is ideal because I can restore in a different region (AWS main to AWS govcloud) but snapshots will do for now.

immesys on 14 Mar 2019

👍1

ngealy on 25 Jul 2019

👍4

Same issue as @ngealy ( Velero 1.1.0 ).

When I create a restore,, I need to delete the velero pod.

Not really convenient.

Is there a fix planned for this ?

Thanks

Stolr on 16 Oct 2019

@Stolr I would need to see some logs/etc. to be able to diagnose what your issue is. This is not typical.

skriss on 16 Oct 2019

@skriss I am facing exactly the same situation described by @Stolr and @ngealy and the worse fact about it is: Velero doesn't have any interesting logs for the restore even running in debug mode....

$ velero restore describe jenkins --details
Name:         jenkins
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  

Backup:  jenkins

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

$ velero restore get
NAME      BACKUP    STATUS   ERRORS   WARNINGS   CREATED                          SELECTOR
jenkins   jenkins   New      0        0          2020-08-18 11:39:33 +0200 CEST   <none>

$ kubectl -n velero logs velero-5d599c5ccd-qbzgm |grep jenkins
time="2020-08-18T09:33:55Z" level=debug msg="Running processQueueItem" controller=restic-repository key=velero/jenkins-default-8zbcw logSource="pkg/controller/restic_repository_controller.go:110"
time="2020-08-18T09:33:55Z" level=debug msg="Checking repository for stale locks" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:139" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:33:56Z" level=debug msg="Ran restic command" command="restic unlock --repo=s3:s3-eu-west-1.amazonaws.com/nie-backup-ict-prod/restic/jenkins --password-file=/tmp/velero-restic-credentials-jenkins119613165 --cache-dir=/scratch/.cache/restic" logSource="pkg/restic/repository_manager.go:291" repository=jenkins stderr= stdout="successfully removed locks\n"
time="2020-08-18T09:33:56Z" level=debug msg=resticRepositoryController.runMaintenanceIfDue controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:216" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:33:56Z" level=debug msg="not due for maintenance" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:221" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:38:55Z" level=debug msg="Running processQueueItem" controller=restic-repository key=velero/jenkins-default-8zbcw logSource="pkg/controller/restic_repository_controller.go:110"
time="2020-08-18T09:38:55Z" level=debug msg="Checking repository for stale locks" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:139" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:39:02Z" level=debug msg="Ran restic command" command="restic unlock --repo=s3:s3-eu-west-1.amazonaws.com/nie-backup-ict-prod/restic/jenkins --password-file=/tmp/velero-restic-credentials-jenkins738132903 --cache-dir=/scratch/.cache/restic" logSource="pkg/restic/repository_manager.go:291" repository=jenkins stderr= stdout="successfully removed locks\n"
time="2020-08-18T09:39:02Z" level=debug msg=resticRepositoryController.runMaintenanceIfDue controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:216" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:39:02Z" level=debug msg="not due for maintenance" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:221" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:43:55Z" level=debug msg="Running processQueueItem" controller=restic-repository key=velero/jenkins-default-8zbcw logSource="pkg/controller/restic_repository_controller.go:110"
time="2020-08-18T09:43:55Z" level=debug msg="Checking repository for stale locks" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:139" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:43:56Z" level=debug msg="Ran restic command" command="restic unlock --repo=s3:s3-eu-west-1.amazonaws.com/nie-backup-ict-prod/restic/jenkins --password-file=/tmp/velero-restic-credentials-jenkins121428593 --cache-dir=/scratch/.cache/restic" logSource="pkg/restic/repository_manager.go:291" repository=jenkins stderr= stdout="successfully removed locks\n"
time="2020-08-18T09:43:56Z" level=debug msg=resticRepositoryController.runMaintenanceIfDue controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:216" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:43:56Z" level=debug msg="not due for maintenance" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:221" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:48:55Z" level=debug msg="Running processQueueItem" controller=restic-repository key=velero/jenkins-default-8zbcw logSource="pkg/controller/restic_repository_controller.go:110"
time="2020-08-18T09:48:55Z" level=debug msg="Checking repository for stale locks" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:139" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:48:57Z" level=debug msg="Ran restic command" command="restic unlock --repo=s3:s3-eu-west-1.amazonaws.com/nie-backup-ict-prod/restic/jenkins --password-file=/tmp/velero-restic-credentials-jenkins108811211 --cache-dir=/scratch/.cache/restic" logSource="pkg/restic/repository_manager.go:291" repository=jenkins stderr= stdout="successfully removed locks\n"
time="2020-08-18T09:48:57Z" level=debug msg=resticRepositoryController.runMaintenanceIfDue controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:216" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:48:57Z" level=debug msg="not due for maintenance" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:221" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:53:55Z" level=debug msg="Running processQueueItem" controller=restic-repository key=velero/jenkins-default-8zbcw logSource="pkg/controller/restic_repository_controller.go:110"
time="2020-08-18T09:53:55Z" level=debug msg="Checking repository for stale locks" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:139" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:53:56Z" level=debug msg="Ran restic command" command="restic unlock --repo=s3:s3-eu-west-1.amazonaws.com/nie-backup-ict-prod/restic/jenkins --password-file=/tmp/velero-restic-credentials-jenkins471339061 --cache-dir=/scratch/.cache/restic" logSource="pkg/restic/repository_manager.go:291" repository=jenkins stderr= stdout="successfully removed locks\n"
time="2020-08-18T09:53:56Z" level=debug msg=resticRepositoryController.runMaintenanceIfDue controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:216" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:53:56Z" level=debug msg="not due for maintenance" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:221" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:58:55Z" level=debug msg="Running processQueueItem" controller=restic-repository key=velero/jenkins-default-8zbcw logSource="pkg/controller/restic_repository_controller.go:110"
time="2020-08-18T09:58:55Z" level=debug msg="Checking repository for stale locks" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:139" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:58:56Z" level=debug msg="Ran restic command" command="restic unlock --repo=s3:s3-eu-west-1.amazonaws.com/nie-backup-ict-prod/restic/jenkins --password-file=/tmp/velero-restic-credentials-jenkins156838959 --cache-dir=/scratch/.cache/restic" logSource="pkg/restic/repository_manager.go:291" repository=jenkins stderr= stdout="successfully removed locks\n"
time="2020-08-18T09:58:56Z" level=debug msg=resticRepositoryController.runMaintenanceIfDue controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:216" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T09:58:56Z" level=debug msg="not due for maintenance" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:221" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:03:55Z" level=debug msg="Running processQueueItem" controller=restic-repository key=velero/jenkins-default-8zbcw logSource="pkg/controller/restic_repository_controller.go:110"
time="2020-08-18T10:03:55Z" level=debug msg="Checking repository for stale locks" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:139" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:03:56Z" level=debug msg="Ran restic command" command="restic unlock --repo=s3:s3-eu-west-1.amazonaws.com/nie-backup-ict-prod/restic/jenkins --password-file=/tmp/velero-restic-credentials-jenkins350813753 --cache-dir=/scratch/.cache/restic" logSource="pkg/restic/repository_manager.go:291" repository=jenkins stderr= stdout="successfully removed locks\n"
time="2020-08-18T10:03:56Z" level=debug msg=resticRepositoryController.runMaintenanceIfDue controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:216" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:03:56Z" level=debug msg="not due for maintenance" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:221" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:08:56Z" level=debug msg="Running processQueueItem" controller=restic-repository key=velero/jenkins-default-8zbcw logSource="pkg/controller/restic_repository_controller.go:110"
time="2020-08-18T10:08:56Z" level=debug msg="Checking repository for stale locks" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:139" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:08:57Z" level=debug msg="Ran restic command" command="restic unlock --repo=s3:s3-eu-west-1.amazonaws.com/nie-backup-ict-prod/restic/jenkins --password-file=/tmp/velero-restic-credentials-jenkins982469654 --cache-dir=/scratch/.cache/restic" logSource="pkg/restic/repository_manager.go:291" repository=jenkins stderr= stdout="successfully removed locks\n"
time="2020-08-18T10:08:57Z" level=debug msg=resticRepositoryController.runMaintenanceIfDue controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:216" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:08:57Z" level=debug msg="not due for maintenance" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:221" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:13:55Z" level=debug msg="Running processQueueItem" controller=restic-repository key=velero/jenkins-default-8zbcw logSource="pkg/controller/restic_repository_controller.go:110"
time="2020-08-18T10:13:55Z" level=debug msg="Checking repository for stale locks" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:139" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:13:57Z" level=debug msg="Ran restic command" command="restic unlock --repo=s3:s3-eu-west-1.amazonaws.com/nie-backup-ict-prod/restic/jenkins --password-file=/tmp/velero-restic-credentials-jenkins157038205 --cache-dir=/scratch/.cache/restic" logSource="pkg/restic/repository_manager.go:291" repository=jenkins stderr= stdout="successfully removed locks\n"
time="2020-08-18T10:13:57Z" level=debug msg=resticRepositoryController.runMaintenanceIfDue controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:216" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:13:57Z" level=debug msg="not due for maintenance" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:221" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:18:55Z" level=debug msg="Running processQueueItem" controller=restic-repository key=velero/jenkins-default-8zbcw logSource="pkg/controller/restic_repository_controller.go:110"
time="2020-08-18T10:18:55Z" level=debug msg="Checking repository for stale locks" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:139" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:18:56Z" level=debug msg="Ran restic command" command="restic unlock --repo=s3:s3-eu-west-1.amazonaws.com/nie-backup-ict-prod/restic/jenkins --password-file=/tmp/velero-restic-credentials-jenkins014176695 --cache-dir=/scratch/.cache/restic" logSource="pkg/restic/repository_manager.go:291" repository=jenkins stderr= stdout="successfully removed locks\n"
time="2020-08-18T10:18:56Z" level=debug msg=resticRepositoryController.runMaintenanceIfDue controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:216" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:18:56Z" level=debug msg="not due for maintenance" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:221" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:23:55Z" level=debug msg="Running processQueueItem" controller=restic-repository key=velero/jenkins-default-8zbcw logSource="pkg/controller/restic_repository_controller.go:110"
time="2020-08-18T10:23:55Z" level=debug msg="Checking repository for stale locks" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:139" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:23:56Z" level=debug msg="Ran restic command" command="restic unlock --repo=s3:s3-eu-west-1.amazonaws.com/nie-backup-ict-prod/restic/jenkins --password-file=/tmp/velero-restic-credentials-jenkins239810817 --cache-dir=/scratch/.cache/restic" logSource="pkg/restic/repository_manager.go:291" repository=jenkins stderr= stdout="successfully removed locks\n"
time="2020-08-18T10:23:56Z" level=debug msg=resticRepositoryController.runMaintenanceIfDue controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:216" name=jenkins-default-8zbcw namespace=velero
time="2020-08-18T10:23:56Z" level=debug msg="not due for maintenance" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:221" name=jenkins-default-8zbcw namespace=velero

galindro on 18 Aug 2020

Perhaps this could be caused because few hours before create this restore, other restore with the same name and same backup source was created and it finished successfully without any errors.

The only difference is that the old restore was done in other namespace by using the --namespace-mappings flag (just for validation proposes). After it was done and validated, then I deleted the restore on velero plus the restored namespace and the application namespace. Then, I created a new restore with the same settings from the old one but without the --namespace-mappings flag. And then it stuck in Status: New. Only after restart velero's pod, that the restore was started.

galindro on 18 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Velero install failed (pls close)

Berndinox · 3Comments

Prometheus monitoring

vitobotta · 3Comments

Can we take backup of only data from persistent volumes

akgunjal · 3Comments

Consider broadening ark service account permissions to cluster admin

ncdc · 3Comments

[v0.10] update `ark backup describe` for the changes to volume snapshot info storage

skriss · 4Comments