Following on from the conversation on slack (here, here, here and here);
The 10k foot view is that I have a cluster with Rook volumes in a couple of namespaces. I'd need to update the cluster, so have created a new cluster beside it with the appropriate changes (AWS deployed, self managed, not EKS), and I need to migrate the rook volumes and state in 2 namespaces to the other cluster.
Ark works wonderfully for everything except the Rook PVs. And while I'm at it, let me thank you for creating Ark. I spent a few days researching ways to backup and restore cluster state, and for me at least, Ark is a clear winner.
What I tried:
And that's where I'm at.
Looking around I discovered backy2, so perhaps this in conjunction with the Ark Rook plugin may be successful. I'll let you know if I have any luck. Any other suggestions gratefully welcomed.
Following up. The restore with restic failed:
$ ark restore describe prod-tools4-20180615165209
Name: prod-tools4-20180615165209
Namespace: heptio-ark
Labels: <none>
Annotations: <none>
Backup: prod-tools4
Namespaces:
Included: *
Excluded: <none>
Resources:
Included: *
Excluded: nodes, events, events.events.k8s.io
Cluster-scoped: auto
Namespace mappings: <none>
Label selector: <none>
Restore PVs: auto
Phase: Completed
Validation errors: <none>
Warnings:
Ark: <none>
Cluster: not restored: clusterinformations.crd.projectcalico.org "default" already exists and is different from backed up version.
not restored: clusterrolebindings.rbac.authorization.k8s.io "cert-manager-certs" already exists and is different from backed up version.
not restored: clusterroles.rbac.authorization.k8s.io "cert-manager-certs" already exists and is different from backed up version.
not restored: clusterroles.rbac.authorization.k8s.io "prometheus" already exists and is different from backed up version.
not restored: customresourcedefinitions.apiextensions.k8s.io "certificates.certmanager.k8s.io" already exists and is different from backed up version.
not restored: customresourcedefinitions.apiextensions.k8s.io "clusterissuers.certmanager.k8s.io" already exists and is different from backed up version.
not restored: customresourcedefinitions.apiextensions.k8s.io "issuers.certmanager.k8s.io" already exists and is different from backed up version.
not restored: felixconfigurations.crd.projectcalico.org "default" already exists and is different from backed up version.
not restored: ippools.crd.projectcalico.org "default-ipv4-ippool" already exists and is different from backed up version.
Namespaces:
prod-tools: not restored: serviceaccounts "default" already exists and is different from backed up version.
Errors:
Ark: pod volume restore failed: error restoring volume: error identifying path of volume: expected one matching path, got 0
Cluster: <none>
Namespaces: <none>
The error I get from the dashboard is:
MountVolume.SetUp failed for volume "pvc-5377f68a-4fb0-11e8-aa2e-0690e07debf2" : mount command failed, status: Failure, reason: Rook: Mount volume failed: failed to attach volume replicapool/pvc-5377f68a-4fb0-11e8-aa2e-0690e07debf2: failed to map image replicapool/pvc-5377f68a-4fb0-11e8-aa2e-0690e07debf2 cluster rook-system. failed to map image replicapool/pvc-5377f68a-4fb0-11e8-aa2e-0690e07debf2: Failed to complete 'rbd': exit status 2. . output: rbd: sysfs write failed In some cases useful info is found in syslog - try "dmesg | tail". rbd: map failed: (2) No such file or directory
The restic log looks like this:
time="2018-06-15T15:29:42Z" level=info msg="Setting log-level to INFO"
time="2018-06-15T15:29:42Z" level=info msg="Starting Ark restic server v0.9.0-alpha.2" logSource="pkg/cmd/cli/restic/server.go:42"
time="2018-06-15T15:29:42Z" level=info msg="Starting controllers" logSource="pkg/cmd/cli/restic/server.go:112"
time="2018-06-15T15:29:42Z" level=info msg="Controllers started successfully" logSource="pkg/cmd/cli/restic/server.go:150"
time="2018-06-15T15:29:42Z" level=info msg="Starting controller" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:77"
time="2018-06-15T15:29:42Z" level=info msg="Waiting for caches to sync" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:80"
time="2018-06-15T15:29:42Z" level=info msg="Starting controller" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:77"
time="2018-06-15T15:29:42Z" level=info msg="Waiting for caches to sync" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:80"
time="2018-06-15T15:29:42Z" level=info msg="Caches are synced" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:84"
time="2018-06-15T15:29:42Z" level=info msg="Caches are synced" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:84"
time="2018-06-15T15:52:17Z" level=error msg="Unable to get item's pod prod-tools/mongo-mongodb-68bfb98d5c-wbl58, not enqueueing." controller=pod-volume-restore error="pod \"mongo-mongodb-68bfb98d5c-wbl58\" not found" key=heptio-ark/prod-tools4-20180615165209-7t9g5 logSource="pkg/controller/pod_volume_restore_controller.go:194"
Not quite what I expected. :slightly_frowning_face:
None-the-less, I'm going to persist trying to get something to work for me. :/
@skriss could you please assist?
@pms1969 could you provide the output of ark restore logs prod-tools4-20180615165209? Thanks!
@skriss Will do. In the middle of a production problem right now, but will get it for you as soon as I'm done with it.
ark-restore.log
@skriss log attached. Not that it reveals much.
Are any of the volumes you're trying to back up with restic hostPath?
On Thu, Jun 21, 2018 at 5:25 AM Paul Saunders notifications@github.com
wrote:
ark-restore.log
https://github.com/heptio/ark/files/2122935/ark-restore.log
@skriss https://github.com/skriss log attached. Not that it reveals
much.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/heptio/ark/issues/556#issuecomment-399036788, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAABYpQ4inRPL9lwxmUg47619dskyojuks5t-2aKgaJpZM4UpG2r
.
@ncdc not sure what that means :(
There are a few, but the one in question is created via the mongodb helm chart
my relevant values for the PVC are:
persistence:
enabled: true
storageClass: "rook-block"
accessMode: ReadWriteOnce
size: 8Gi
Ok, so the volumes you'd specified in the backup.ark.heptio.com/backup-volumes pod annotation are all Rook PVCs?
Yes
@pms1969 I was able to reproduce this issue and I see what the problem is. During restore, Ark is not properly waiting for the PV/PVC to be created and mounted before attempting to restore the contents of the volume using restic. I'll start working on a fix for this and hope to get it out to you ASAP. Thanks for testing and reporting!!
@skriss no, no.. Thank you.
@skriss adding to the v0.9.0 milestone, please let me know if that doesn't seem right to you.
@rosskukulinski yes. there were actually a few issues at play here but all should be resolved with the next alpha/beta.
The issues that were at play here should all be resolved now in master, so I'm going to close this issue out, but feel free to reopen or open a new one as needed. We should be putting out a new tagged alpha/beta shortly for testing!
Most helpful comment
@pms1969 I was able to reproduce this issue and I see what the problem is. During restore, Ark is not properly waiting for the PV/PVC to be created and mounted before attempting to restore the contents of the volume using restic. I'll start working on a fix for this and hope to get it out to you ASAP. Thanks for testing and reporting!!