Hello,
First, thank you for ark 馃憤馃槃
What steps did you take and what happened:
I'm currently testing the creation of a backup for a particular namespace.
This namespace contains various services, some of them using PVC. I've added freeze hooks to those services.
$ ark backup create A_BACKUP --include-namespaces A_NAMESPACE
$ ark backup get
Alas! The status of this backup is FAILED.
Looking into the logs with:
$ ark backup logs A_BACKUP
It shows this message (full logs below):
time="2019-01-14T16:41:25Z" level=info msg="Backup completed with errors: command terminated with exit code 1" backup=heptio-ark/test-master logSource="pkg/backup/backup.go:289"
What did you expect to happen:
A successful status or at least a human readable error.
The output of the following commands will help us better understand what's going on:
Logs and describe outputs are available here:
https://gist.github.com/gulien/0c5c390f80e68b45b4957bc270d6b4e5
Anything else you would like to add:
One of my service might not be configured correctly.
Indeed, the annotations are:
annotations:
pre.hook.backup.ark.heptio.com/container: {{ .Chart.Name }}-fsfreeze
pre.hook.backup.ark.heptio.com/command: '["/sbin/fsfreeze", "--freeze", "/var/www/html/sites"]'
post.hook.backup.ark.heptio.com/container: {{ .Chart.Name }}-fsfreeze
post.hook.backup.ark.heptio.com/command: '["/sbin/fsfreeze", "--unfreeze", "/var/www/html/sites"]'
While the fsfreeze is configured with two volume mounts (both are PVC):
- name: {{ .Chart.Name }}-fsfreeze
image: gcr.io/heptio-images/fsfreeze-pause:v0.10.1
securityContext:
privileged: true
volumeMounts:
- mountPath: "/var/www/html/sites/all/custom/resources"
name: {{ include "service.name" . }}-all-data
readOnly: false
- mountPath: "/var/www/html/sites/default/files"
name: {{ include "service.name" . }}-default-data
readOnly: false
Not sure if related though.
Environment:
After some investigations, it seems that the mentioned hook does not work:
"stderr: fsfreeze: /var/www/html/sites: freeze failed: Not supported\n
Any idea on how to freeze correctly two volumes?
@gulien could you please kubectl describe pod/NAME -o yaml the pod in question? I'd like to see the volumes section.
@ncdc Here you go (no yaml format output available with this command):
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hdzw5 (ro)
/var/www/html/sites/all/custom/resources from all-data (rw)
/var/www/html/sites/default/files from default-data (rw)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
all-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: all-pvc
ReadOnly: false
default-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: default-pvc
ReadOnly: false
default-token-hdzw5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hdzw5
Optional: false
Note: I've removed the fsfreeze container and it now works as expected. There are not a lot of I/O operations on those volumes, so I guess it's acceptable?
My bad, I meant kubectl get ... -o yaml 馃槃
Can you get the yaml for the PVCs named all-pvc and default-pvc and then get the yaml for their corresponding PVs?
Alright 馃槃
all-pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/gce-pd
creationTimestamp: 2019-01-09T16:29:31Z
finalizers:
- kubernetes.io/pvc-protection
name: all-pvc
namespace: prod
resourceVersion: "5640161"
selfLink: /api/v1/namespaces/prod/persistentvolumeclaims/all-pvc
uid: foo
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 150Gi
storageClassName: standard
volumeName: pvc-bdffb3e0-142b-11e9-9d9a-42010a8400f6
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 150Gi
phase: Bound
And corresponding all-pv:
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
kubernetes.io/createdby: gce-pd-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: "yes"
pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
creationTimestamp: 2019-01-09T16:29:35Z
finalizers:
- kubernetes.io/pv-protection
labels:
failure-domain.beta.kubernetes.io/region: europe-west1
failure-domain.beta.kubernetes.io/zone: europe-west1-d
name: pvc-bdffb3e0-142b-11e9-9d9a-42010a8400f6
resourceVersion: "5640159"
selfLink: /api/v1/persistentvolumes/pvc-bdffb3e0-142b-11e9-9d9a-42010a8400f6
uid: c04dda17-142b-11e9-9d9a-42010a8400f6
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 150Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: all-pvc
namespace: prod
resourceVersion: "5640137"
uid: bdffb3e0-142b-11e9-9d9a-42010a8400f6
gcePersistentDisk:
fsType: ext4
pdName: gke-ff43462e-dyn-pvc-bdffb3e0-142b-11e9-9d9a-42010a8400f6
persistentVolumeReclaimPolicy: Delete
storageClassName: standard
status:
phase: Bound
default-pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/gce-pd
creationTimestamp: 2019-01-09T16:29:31Z
finalizers:
- kubernetes.io/pvc-protection
name: default-pvc
namespace: prod
resourceVersion: "5640168"
selfLink: /api/v1/namespaces/prod/persistentvolumeclaims/default-pvc
uid: bar
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard
volumeName: pvc-be01fcb0-142b-11e9-9d9a-42010a8400f6
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
phase: Bound
And corresponding default-pv:
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
kubernetes.io/createdby: gce-pd-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: "yes"
pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
creationTimestamp: 2019-01-09T16:29:35Z
finalizers:
- kubernetes.io/pv-protection
labels:
failure-domain.beta.kubernetes.io/region: europe-west1
failure-domain.beta.kubernetes.io/zone: europe-west1-d
name: pvc-be01fcb0-142b-11e9-9d9a-42010a8400f6
resourceVersion: "5640166"
selfLink: /api/v1/persistentvolumes/pvc-be01fcb0-142b-11e9-9d9a-42010a8400f6
uid: c05e516c-142b-11e9-9d9a-42010a8400f6
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: default-pvc
namespace: prod
resourceVersion: "5640141"
uid: be01fcb0-142b-11e9-9d9a-42010a8400f6
gcePersistentDisk:
fsType: ext4
pdName: gke-ff43462e-dyn-pvc-be01fcb0-142b-11e9-9d9a-42010a8400f6
persistentVolumeReclaimPolicy: Delete
storageClassName: standard
status:
phase: Bound
Thank you!
@gulien thanks, what I meant by the yaml for the PVs was kubectl get pv/pvc-bdffb3e0-142b-11e9-9d9a-42010a8400f6 -o yaml and kubectl get pv/pvc-be01fcb0-142b-11e9-9d9a-42010a8400f6 -o yaml - could you please share that?
馃槄 @ncdc I've just updated my last comment.
Ok, as far as I know, that should work. We'll have to do some manual testing in GKE and report back to you.
+1
same issue here:
$ kubectl exec -it -c fsfreeze -n nginx-example nginx-deployment-6b8dc99f69-xnv74 sh
/ # /sbin/fsfreeze --freeze /var/log/nginx
fsfreeze: /var/log/nginx: freeze failed: Not supported
I have this same issue on Azure Kubernetes Service (AKS). I'm using the demo nginx-app, provided with the installation package. These are the specs for my storage class (azure-file):
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: standard-lrs
provisioner: kubernetes.io/azure-file
mountOptions:
FYI: I did some experimenting here, and I can confirm that fsfreeze is not supported on GKE when the nodes are running Google's Container-Optimized OS (cos), which is the default. After switching the nodes to use Ubuntu, the fsfreeze hooks work just fine.
So, if you're using cos, you will need to come up with an alternate command to use for your hook to quiesce/unquiesce your application.
I'm closing this out, as it's not a bug in Velero, but a limitation with fsfreeze and specific underlying operating systems/file systems. Users must identify & configure appropriate commands to run to quiesce/un-quiesce their apps if they're running on such environments.
Most helpful comment
+1
same issue here: