i have been using velero(with restic) for my local backup and restore purposes. I did observe that during restore a folder named .velero is created in the volumes of every statefulset that is restored back.
can we have an enhancement where we can remove .velero as a cleanup?
why would this cause an issue even though it doesn't contain any data is because statefulset like kafka would parse the folder and complain them if it is not part of their topic partition table.
as a workaround i manually delete it and restart the pod but would love to have this minor enhancement included in newer release
[supreme@worker-16070 kafka-data]$ ls -a
----trimmed----
__consumer_offsets-8
__consumer_offsets-9
.lock
log-start-offset-checkpoint
meta.properties
recovery-point-offset-checkpoint
replication-offset-checkpoint
.velero
Thanks for the report @vinayus! This seems like a fairly low effort fix, so I'll add it to the backlog.
There are a couple of things that make this change non-trivial.
doneFile is created as ioutil.WriteFile(filepath.Join(volumePath, ".velero", string(restoreUID)), nil, 0644)doneFile to 0666 in the WriteFile call doesn't work because root@restic-jq4rh:/host_pods# hostname
restic-jq4rh
root@restic-jq4rh:/host_pods# umask
0022
root@restic-jq4rh:/host_pods#
os.Chmod to 0777 on the .velero and 0666 on the doneFile. This will let the restic-wait init container to delete the doneFilerestic-wait init container cannot delete the .velero directory even though the permissions on them are 0777velero-restic-restore-helper codePermissions on /restores/pvc2-vm/.velero [drwxrwxrwx]
Failed to delete directory /restores/pvc1-vm/.velero: unlinkat /restores/pvc1-vm/.velero: permission denied
From exec-ing into the restic-wait container
$ kcn csi-app exec -it csi-app1 -c restic-wait bash
nobody@csi-app1:/$ hostname
csi-app1
nobody@csi-app1:/$ rm -rf /restores/pvc1-vm/.velero/
rm: cannot remove '/restores/pvc1-vm/.velero/': Permission denied
nobody@csi-app1:/$ rm -rf /restores/pvc2-vm/.velero/
rm: cannot remove '/restores/pvc2-vm/.velero/': Permission denied
this is most likely because of how the volume is shared by the restic daemon set with the restic-wait container.
The solution to this is to have the restic daemon-set pod to delete the .velero directory once restic-wait init container has completed. But this requires the restic-wait init container to communicate w/ the restic daemon set pod that it has "finished" waiting and has completed. The deletion of the doneFile can be used to signal this.
@nrb removing this from the 1.5 milestone.
Closing because duplicate of #2812