Velero: have problem when restoring crds

Created on 16 Jun 2020 · 20Comments · Source: vmware-tanzu/velero

What steps did you take and what happened:

I have create 3 crds which named App,Component,Revision. every deployment i create have an ownerrefence field to relate it. when I delete the App object , the k8s gc will help to clean everything related to it. so I delete the crd I created to simulate a disaster. when i restore to the previous enviroment, I found the crds are not back. the log shows :

time="2020-06-16T15:06:38+08:00" level=info msg="Attempting to restore APIService: v1.neo.io" logSource="pkg/restore/restore.go:1070" restore=velero/velero-default-20200615144056-20200616150508
time="2020-06-16T15:06:41+08:00" level=info msg="Skipping restore of resource because it cannot be resolved via discovery" logSource="pkg/restore/restore.go:383" resource=apps.neo.io restore=velero/velero-default-20200615144056-20200616150508
time="2020-06-16T15:06:57+08:00" level=info msg="Skipping restore of resource because it cannot be resolved via discovery" logSource="pkg/restore/restore.go:383" resource=components.neo.io restore=velero/velero-default-20200615144056-20200616150508
time="2020-06-16T15:07:28+08:00" level=info msg="Skipping restore of resource because it cannot be resolved via discovery" logSource="pkg/restore/restore.go:383" resource=revisions.neo.io restore=velero/velero-default-20200615144056-20200616150508

What did you expect to happen:

I was expecting everything can be restored. but the crds is not. I checked the ownerrefence field , its empty now. I notice the log level is info, not error. I reacreate the crds manually and run restore again, then the app,component,revision object back , but the ownerrefence field is still empty. it seems the relationship between them are lost.

Needs info Needs investigation

Source

diemus

All 20 comments

@diemus Looks like the CRD discovery did not work as expected.
Can you please share the output of kubectl get --raw "/apis" | jq . before and after running restore?
Also, can you confirm that your backup has the custom resource definition for your types?

ashish-amarnath on 18 Jun 2020

this is how our crd defines, the only purpose of this crd is create a reference and let k8s help us do the gc work

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: apps.neo.io
spec:
  group: neo.io
  version: v1
  scope: Namespaced
  names:
    plural: apps
    singular: app
    kind: App

diemus on 18 Jun 2020

yes, I am pretty sure the backup has crd in it. this is the describe of backup

Name:         velero-default-20200616011056
Namespace:    velero
Labels:       app.kubernetes.io/instance=velero
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=velero
              helm.sh/chart=velero-2.12.0
              velero.io/schedule-name=velero-default
              velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.16.8
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=16

Phase:  Completed

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2020-06-16 09:10:56 +0800 CST
Completed:  2020-06-16 09:11:16 +0800 CST

Expiration:  2020-07-16 09:10:56 +0800 CST

Velero-Native Snapshots: <none included>

this is what i have in the backup

/tmp/resources/customresourcedefinitions.apiextensions.k8s.io/v1beta1/cluster

...
-rwxr-xr-x 1 root root 1369 6月  16 09:11 apps.neo.io.json
-rwxr-xr-x 1 root root 1453 6月  16 09:11 components.neo.io.json
-rwxr-xr-x 1 root root 1439 6月  16 09:11 revisions.neo.io.json
...

/tmp/resources

...
drwxr-xr-x 4 root root 4096 6月  18 10:40 apps.neo.io
drwxr-xr-x 4 root root 4096 6月  18 10:40 components.neo.io
drwxr-xr-x 4 root root 4096 6月  18 10:40 revisions.neo.io
...

diemus on 18 Jun 2020

kubectl get --raw "/apis" | jq . before

{
  "kind": "APIGroupList",
  "apiVersion": "v1",
  "groups": [
    {
      "name": "apiregistration.k8s.io",
      "versions": [
        {
          "groupVersion": "apiregistration.k8s.io/v1",
          "version": "v1"
        },
        {
          "groupVersion": "apiregistration.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "apiregistration.k8s.io/v1",
        "version": "v1"
      }
    },
    {
      "name": "extensions",
      "versions": [
        {
          "groupVersion": "extensions/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "extensions/v1beta1",
        "version": "v1beta1"
      }
    },
    {
      "name": "apps",
      "versions": [
        {
          "groupVersion": "apps/v1",
          "version": "v1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "apps/v1",
        "version": "v1"
      }
    },
    {
      "name": "events.k8s.io",
      "versions": [
        {
          "groupVersion": "events.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "events.k8s.io/v1beta1",
        "version": "v1beta1"
      }
    },
    {
      "name": "authentication.k8s.io",
      "versions": [
        {
          "groupVersion": "authentication.k8s.io/v1",
          "version": "v1"
        },
        {
          "groupVersion": "authentication.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "authentication.k8s.io/v1",
        "version": "v1"
      }
    },
    {
      "name": "authorization.k8s.io",
      "versions": [
        {
          "groupVersion": "authorization.k8s.io/v1",
          "version": "v1"
        },
        {
          "groupVersion": "authorization.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "authorization.k8s.io/v1",
        "version": "v1"
      }
    },
    {
      "name": "autoscaling",
      "versions": [
        {
          "groupVersion": "autoscaling/v1",
          "version": "v1"
        },
        {
          "groupVersion": "autoscaling/v2beta1",
          "version": "v2beta1"
        },
        {
          "groupVersion": "autoscaling/v2beta2",
          "version": "v2beta2"
        }
      ],
      "preferredVersion": {
        "groupVersion": "autoscaling/v1",
        "version": "v1"
      }
    },
    {
      "name": "batch",
      "versions": [
        {
          "groupVersion": "batch/v1",
          "version": "v1"
        },
        {
          "groupVersion": "batch/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "batch/v1",
        "version": "v1"
      }
    },
    {
      "name": "certificates.k8s.io",
      "versions": [
        {
          "groupVersion": "certificates.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "certificates.k8s.io/v1beta1",
        "version": "v1beta1"
      }
    },
    {
      "name": "networking.k8s.io",
      "versions": [
        {
          "groupVersion": "networking.k8s.io/v1",
          "version": "v1"
        },
        {
          "groupVersion": "networking.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "networking.k8s.io/v1",
        "version": "v1"
      }
    },
    {
      "name": "policy",
      "versions": [
        {
          "groupVersion": "policy/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "policy/v1beta1",
        "version": "v1beta1"
      }
    },
    {
      "name": "rbac.authorization.k8s.io",
      "versions": [
        {
          "groupVersion": "rbac.authorization.k8s.io/v1",
          "version": "v1"
        },
        {
          "groupVersion": "rbac.authorization.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "rbac.authorization.k8s.io/v1",
        "version": "v1"
      }
    },
    {
      "name": "storage.k8s.io",
      "versions": [
        {
          "groupVersion": "storage.k8s.io/v1",
          "version": "v1"
        },
        {
          "groupVersion": "storage.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "storage.k8s.io/v1",
        "version": "v1"
      }
    },
    {
      "name": "admissionregistration.k8s.io",
      "versions": [
        {
          "groupVersion": "admissionregistration.k8s.io/v1",
          "version": "v1"
        },
        {
          "groupVersion": "admissionregistration.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "admissionregistration.k8s.io/v1",
        "version": "v1"
      }
    },
    {
      "name": "apiextensions.k8s.io",
      "versions": [
        {
          "groupVersion": "apiextensions.k8s.io/v1",
          "version": "v1"
        },
        {
          "groupVersion": "apiextensions.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "apiextensions.k8s.io/v1",
        "version": "v1"
      }
    },
    {
      "name": "scheduling.k8s.io",
      "versions": [
        {
          "groupVersion": "scheduling.k8s.io/v1",
          "version": "v1"
        },
        {
          "groupVersion": "scheduling.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "scheduling.k8s.io/v1",
        "version": "v1"
      }
    },
    {
      "name": "coordination.k8s.io",
      "versions": [
        {
          "groupVersion": "coordination.k8s.io/v1",
          "version": "v1"
        },
        {
          "groupVersion": "coordination.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "coordination.k8s.io/v1",
        "version": "v1"
      }
    },
    {
      "name": "node.k8s.io",
      "versions": [
        {
          "groupVersion": "node.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "node.k8s.io/v1beta1",
        "version": "v1beta1"
      }
    },
    {
      "name": "velero.io",
      "versions": [
        {
          "groupVersion": "velero.io/v1",
          "version": "v1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "velero.io/v1",
        "version": "v1"
      }
    },
    {
      "name": "kubeapps.com",
      "versions": [
        {
          "groupVersion": "kubeapps.com/v1alpha1",
          "version": "v1alpha1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "kubeapps.com/v1alpha1",
        "version": "v1alpha1"
      }
    },
    {
      "name": "longhorn.io",
      "versions": [
        {
          "groupVersion": "longhorn.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "longhorn.io/v1beta1",
        "version": "v1beta1"
      }
    },
    {
      "name": "external.metrics.k8s.io",
      "versions": [
        {
          "groupVersion": "external.metrics.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "external.metrics.k8s.io/v1beta1",
        "version": "v1beta1"
      }
    },
    {
      "name": "metrics.k8s.io",
      "versions": [
        {
          "groupVersion": "metrics.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "metrics.k8s.io/v1beta1",
        "version": "v1beta1"
      }
    }
  ]
}

diemus on 18 Jun 2020

by the way , what is the meaning of Skipping restore of resource because it cannot be resolved via discovery or CRD discovery, this time the restore process stuck in InProgress status, it's seems my cluster have some problems today, I think I should run a test again after I fix those problems to exclude this factor

diemus on 18 Jun 2020

The kubernetes discovery API is used by velero to discover the API's known to the API server.
This is used to create dynamic clients to perform Create, Get on resources of that type.
Coming to the log message

"Skipping restore of resource because it cannot be resolved via discovery"

This means that the API server wasn't aware of the type applications.neo.io at the time of restoring an object of that type.

Further, if a backup has a custom resource definition in it, the restore will create the CRD before attempting to restore objects of that kind.

I would also suggest checking to see if the CRDs were successfully restored at an earlier point in the logs before attempting restoring the objects of that CRD type.

ashish-amarnath on 18 Jun 2020

@diemus This was fixed in v1.4.2 with PR https://github.com/vmware-tanzu/velero/pull/2683.
Please upgrade to v1.4.2 to get the fix.
I am going to close this issue. Please feel free to re-open if you are seeing this in v1.4.2

ashish-amarnath on 17 Jul 2020

👍1

/reopen

I have the same problem on version 1.4.2. It looks like all the CRDs are there, but the CRs are failing to restore with the above message _on the first restore_
tldr; the workaround is to run the restore twice.

Velero: 1.4.2
k8s: 1.15

first restore log:

time="2020-08-11T12:00:09Z" level=info msg="Attempting to restore CustomResourceDefinition: applications.deployosaurus.dev" logSource="pkg/restore/restore.go:1070" restore=velero/staging-restore-test
time="2020-08-11T12:00:09Z" level=info msg="Executing item action for customresourcedefinitions.apiextensions.k8s.io" logSource="pkg/restore/restore.go:964" restore=velero/staging-restore-test
t
...
time="2020-08-11T12:02:09Z" level=info msg="Skipping restore of resource because it cannot be resolved via discovery" logSource="pkg/restore/restore.go:383" resource=applications.deployosaurus.dev restore=velero/staging-restore-test

I googled and found this issue, so tried running the restore again:

time="2020-08-11T12:51:42Z" level=info msg="Attempting to restore CustomResourceDefinition: applications.deployosaurus.dev" logSource="pkg/restore/restore.go:1070" restore=velero/staging-restore-test-2
time="2020-08-11T12:51:42Z" level=info msg="Restore of CustomResourceDefinition, applications.deployosaurus.dev skipped: it already exists in the cluster and is the same as the backed up version" logSource="pkg/restore/restore.go:1127" restore=velero/staging-restore-test-2
...
time="2020-08-11T12:54:17Z" level=info msg="Restoring resource 'applications.deployosaurus.dev' into namespace 'internal'" logSource="pkg/restore/restore.go:702" restore=velero/staging-restore-test-2
time="2020-08-11T12:54:17Z" level=info msg="Getting client for deployosaurus.dev/v1alpha1, Kind=Application" logSource="pkg/restore/restore.go:746" restore=velero/staging-restore-test-2
time="2020-08-11T12:54:17Z" level=info msg="Attempting to restore Application: graphql" logSource="pkg/restore/restore.go:1070" restore=velero/staging-restore-test-2

now the CR instances are there. So i guess the api doesn't update in time for the initial restore to find the CRD after it's registered?

rnrsr on 11 Aug 2020

now the CR instances are there. So i guess the api doesn't update in time for the initial restore to find the CRD after it's registered?

here's the output from kubectl get --raw "/apis" | jq .
after.log
before.log

rnrsr on 11 Aug 2020

@rnrsr Just to be sure, do you have Velero v1.4.2 on the client and server? Is the image on the Velero deployment v1.4.2?

If the image is indeed v1.4.2 on the deployment, could you provide a copy of the CRD and an instance of a CR that we could try to reproduce this with?

nrb on 11 Aug 2020

@rnrsr You will also need to re-create your backup after you've upgraded the velero server to 1.4.2.

ashish-amarnath on 11 Aug 2020

@nrb yes my client is, and source and target clusters are all on 1.4.2.

Here an example CR and CRD:
cr.log
crd.log

It could actually be _any_ CRD tho - none of them were restored on the first attempt (cert-manager, premetheusrules, etc)

@ashish-amarnath the backup i tested with was run _after_ upgrading to 1.4.2 - is there anything else i could try?

rnrsr on 12 Aug 2020

@rnrsr trying to apply the crd.log I get this error

$ kubectl apply -f ./crd.log
error: error validating "./crd.log": error validating data: ValidationError(CustomResourceDefinition.spec): unknown field "storedVersions" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.CustomResourceDefinitionSpec; if you choose to ignore these errors, turn validation off with --validate=false
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.9", GitCommit:"a17149e1a189050796ced469dbd78d380f2ed5ef", GitTreeState:"clean", BuildDate:"2020-05-01T02:22:44Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Can you please give us the correct CRD and the CR to repro this issue? This seems to be working as expected for the CRDs that I've tested this.

ashish-amarnath on 26 Aug 2020

This is the test I ran.

On a Kubernetes 1.16 cluster
bash $ kubectl version Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.9", GitCommit:"a17149e1a189050796ced469dbd78d380f2ed5ef", GitTreeState:"clean", BuildDate:"2020-05-01T02:22:44Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Deploy cert-manager and verify installation

$ kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.16.1/cert-manager.yaml
$ kubectl get crd -oname | grep cert
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io
$ kubectl -n cert-manager-test get certificaterequests.cert-manager.io,certificates.cert-manager.io,challenges.acme.cert-manager.io,clusterissuers.cert-manager.io,issuers.cert-manager.io,orders.acme.cert-manager.io
NAME                                                       READY   AGE
certificaterequest.cert-manager.io/selfsigned-cert-25g4n   True    5m30s

NAME                                          READY   SECRET                AGE
certificate.cert-manager.io/selfsigned-cert   True    selfsigned-cert-tls   5m30s

NAME                                     READY   AGE
issuer.cert-manager.io/test-selfsigned   True    5m30s

With Velero installed: using the 1.5.0-beta.1 client and velero/velero:main server container image

$ velero version
Client:
    Version: v1.5.0-beta.1
    Git commit: 45168087f117b23ef08cd4c95849ab9e660aa6fe
Server:
    Version: main

Take a backup and confirm that the CRs and the corresponding CRDs are included in the backup.

$ velero backup create crd-backup --include-namespaces=cert-manager-test
$ velero backup describe crd-backup --details | grep -A999 "Resource List:"
Resource List:
apiextensions.k8s.io/v1/CustomResourceDefinition:
    - certificaterequests.cert-manager.io
    - certificates.cert-manager.io
    - issuers.cert-manager.io
cert-manager.io/v1beta1/Certificate:
    - cert-manager-test/selfsigned-cert
cert-manager.io/v1beta1/CertificateRequest:
    - cert-manager-test/selfsigned-cert-25g4n
cert-manager.io/v1beta1/Issuer:
    - cert-manager-test/test-selfsigned
v1/Event:
    - cert-manager-test/selfsigned-cert-25g4n.162eb9144137253c
    - cert-manager-test/selfsigned-cert.162eb9143972c248
    - cert-manager-test/selfsigned-cert.162eb9143f09ee5c
    - cert-manager-test/selfsigned-cert.162eb91440bb0b50
    - cert-manager-test/selfsigned-cert.162eb9146439add4
v1/Namespace:
    - cert-manager-test
v1/Secret:
    - cert-manager-test/default-token-wsmzp
    - cert-manager-test/selfsigned-cert-tls
v1/ServiceAccount:
    - cert-manager-test/default

Velero-Native Snapshots: <none included>

Delete the namespace and the CRDs

$ kubectl delete ns cert-manager-test; kubectl get crd -oname | grep cert | xargs kubectl delete
namespace "cert-manager-test" deleted
customresourcedefinition.apiextensions.k8s.io "certificaterequests.cert-manager.io" deleted
customresourcedefinition.apiextensions.k8s.io "certificates.cert-manager.io" deleted
customresourcedefinition.apiextensions.k8s.io "challenges.acme.cert-manager.io" deleted
customresourcedefinition.apiextensions.k8s.io "clusterissuers.cert-manager.io" deleted
customresourcedefinition.apiextensions.k8s.io "issuers.cert-manager.io" deleted
customresourcedefinition.apiextensions.k8s.io "orders.acme.cert-manager.io" deleted

Confirm resource deletion

$ kubectl get ns cert-manager-test; kubectl get crd -oname | grep cert
Error from server (NotFound): namespaces "cert-manager-test" not found

Restore from backup

velero restore create crd-restore --from-backup crd-backup --wait
Restore request "crd-restore" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
...
Restore completed with status: Completed. You may check for more information using the commands `velero restore describe crd-restore` and `velero restore logs crd-restore`.

Confirm resources from backup were restored

$ kubectl get ns cert-manager-test; kc get crd -oname | grep cert
NAME                STATUS   AGE
cert-manager-test   Active   9s
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io

Note that only those CRDs whose CRs were in the backup were restored.

ashish-amarnath on 26 Aug 2020

@rnrsr trying to apply the crd.log I get this error

$ kubectl apply -f ./crd.log
error: error validating "./crd.log": error validating data: ValidationError(CustomResourceDefinition.spec): unknown field "storedVersions" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.CustomResourceDefinitionSpec; if you choose to ignore these errors, turn validation off with --validate=false
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.9", GitCommit:"a17149e1a189050796ced469dbd78d380f2ed5ef", GitTreeState:"clean", BuildDate:"2020-05-01T02:22:44Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Can you please give us the correct CRD and the CR to repro this issue? This seems to be working as expected for the CRDs that I've tested this.

you need to remove the storedVersions from the bottom of the spec, try this:
crd (1).log

On a Kubernetes 1.16 cluster

We're using 1.15

With Velero installed: using the 1.5.0-beta.1 client and velero/velero:main server container image

We're using 1.4.2 and the image used on all cluster deployments is: velero/velero:v1.4.2

Here's the command i used to install velero:

velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.1.0 \
    --bucket $BUCKET \
    --backup-location-config region=$REGION \
    --snapshot-location-config region=$REGION \
    --secret-file ./credentials

rnrsr on 28 Aug 2020

@rnrsr I was able to backup and restore the CRD that you shared.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"d94a81c724ea8e1ccc9002d89b7fe81d58f89ede", GitTreeState:"clean", BuildDate:"2020-05-01T02:31:02Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
$ # I renamed the files you uploaded
$ kubectl apply -f ./crd-1.yaml
$ kubectl apply -f ./cr.yaml
$ ./velero version
Client:
    Version: v1.4.2
    Git commit: 56a08a4d695d893f0863f697c2f926e27d70c0c5
Server:
    Version: v1.4.2

$ ./velero backup create crd-b1 --include-namespaces internal --wait
Backup request "crd-b1" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup will continue in the background.
.
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe crd-b1` and `velero backup logs crd-b1`.
$ ./velero backup describe crd-b1 --details | grep -A 20 "Resource List:
Resource List:
  apiextensions.k8s.io/v1beta1/CustomResourceDefinition:
    - applications.deployosaurus.dev
  deployosaurus.dev/v1alpha1/Application:
    - internal/polls
  v1/Namespace:
    - internal
  v1/Secret:
    - internal/default-token-scg6r
  v1/ServiceAccount:
    - internal/default

Velero-Native Snapshots: <none included>

$ kubectl delete ns internal; kc delete crd applications.deployosaurus.dev
namespace "internal" deleted
customresourcedefinition.apiextensions.k8s.io "applications.deployosaurus.dev" deleted
$ kubectl get ns internal; kubectl get crd | grep deployosaurus
Error from server (NotFound): namespaces "internal" not found
$ ./velero restore create crd-r1 --from-backup crd-b1 --wait
Restore request "crd-r1" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.

Restore completed with status: Completed. You may check for more information using the commands `velero restore describe crd-r1` and `velero restore logs crd-r1`.
$ kubectl get ns internal; kubectl get crd | grep deployosaurus; kubectl -n internal get applications.deployosaurus.dev
NAME       STATUS   AGE
internal   Active   38s
applications.deployosaurus.dev      2020-08-28T16:21:36Z
NAME    AGE
polls   38s

ashish-amarnath on 28 Aug 2020

thanks @ashish-amarnath , I think then, it must be to do with my use-case: full cluster restore.

run a backup of everything except the velero namespace: velero backup create staging-backup-N --exclude-namespaces=velero --include-resources=* --include-cluster-resources
create a new cluster
install with the command i posted above
run the restore: velero create restore my-restore-$(date +%s) --from-backup=staging-backup-N --include-resources=* --include-cluster-resources
I see the error i posted originally where it does not recognise the CRs
re-run the restore again, this time it works

I wonder if it is because of the sheer quantity of things it updated on the api? feels like a timing issue to me

rnrsr on 1 Sep 2020

https://github.com/vmware-tanzu/velero/issues/2948 might explain why this is failing in 1.4.2 despite the fix.

sseago on 17 Sep 2020

🎉1

yea after more testing it turns out it's not always the same CRs that get missed, so it is pure luck which ones get restored and which ones don't. e.g. my last test only missed the PrometheusRule CRs, so it is not consistent.

rnrsr on 18 Sep 2020

@rnrsr and @sseago I am going to close this issue and follow this up in #2948 which has an associated PR as well.
LMK if you disagree.

ashish-amarnath on 28 Sep 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Specify Velero Backup Schedules as YAML

archmangler · 3Comments

[v0.10] update `ark backup describe` for the changes to volume snapshot info storage

skriss · 4Comments

share backup location with multiple velero instances for cross cluster migration of workload

debianmaster · 3Comments

Creating or updating a schedule immediately starts a backup.

Yggdrasil · 3Comments

PVRs don't have their status set when running in namespace other than velero

nrb · 4Comments