Velero: fsfreeze hooks don't work on Google Cloud when using container-optimized OS (cos)

Created on 14 Jan 2019  路  12Comments  路  Source: vmware-tanzu/velero

Hello,

First, thank you for ark 馃憤馃槃

What steps did you take and what happened:

I'm currently testing the creation of a backup for a particular namespace.

This namespace contains various services, some of them using PVC. I've added freeze hooks to those services.

$ ark backup create A_BACKUP --include-namespaces A_NAMESPACE
$ ark backup get

Alas! The status of this backup is FAILED.

Looking into the logs with:

$ ark backup logs A_BACKUP

It shows this message (full logs below):

time="2019-01-14T16:41:25Z" level=info msg="Backup completed with errors: command terminated with exit code 1" backup=heptio-ark/test-master logSource="pkg/backup/backup.go:289"

What did you expect to happen:

A successful status or at least a human readable error.

The output of the following commands will help us better understand what's going on:

Logs and describe outputs are available here:

https://gist.github.com/gulien/0c5c390f80e68b45b4957bc270d6b4e5

Anything else you would like to add:

One of my service might not be configured correctly.

Indeed, the annotations are:

annotations:
        pre.hook.backup.ark.heptio.com/container: {{ .Chart.Name }}-fsfreeze
        pre.hook.backup.ark.heptio.com/command: '["/sbin/fsfreeze", "--freeze", "/var/www/html/sites"]'
        post.hook.backup.ark.heptio.com/container: {{ .Chart.Name }}-fsfreeze
        post.hook.backup.ark.heptio.com/command: '["/sbin/fsfreeze", "--unfreeze", "/var/www/html/sites"]'

While the fsfreeze is configured with two volume mounts (both are PVC):

        - name: {{ .Chart.Name }}-fsfreeze
          image: gcr.io/heptio-images/fsfreeze-pause:v0.10.1
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: "/var/www/html/sites/all/custom/resources"
              name: {{ include "service.name" . }}-all-data
              readOnly: false
            - mountPath: "/var/www/html/sites/default/files"
              name: {{ include "service.name" . }}-default-data
              readOnly: false

Not sure if related though.

Environment:

  • Ark version: v0.10.1
  • Kubernetes version: v1.10.7
  • Cloud provider or hardware configuration: Google Cloud
  • OS: MacOS
AreClouGCP Bug

Most helpful comment

+1
same issue here:

$ kubectl exec -it -c fsfreeze -n nginx-example nginx-deployment-6b8dc99f69-xnv74 sh
/ # /sbin/fsfreeze --freeze /var/log/nginx
fsfreeze: /var/log/nginx: freeze failed: Not supported

All 12 comments

After some investigations, it seems that the mentioned hook does not work:

"stderr: fsfreeze: /var/www/html/sites: freeze failed: Not supported\n

Any idea on how to freeze correctly two volumes?

@gulien could you please kubectl describe pod/NAME -o yaml the pod in question? I'd like to see the volumes section.

@ncdc Here you go (no yaml format output available with this command):

Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hdzw5 (ro)
      /var/www/html/sites/all/custom/resources from all-data (rw)
      /var/www/html/sites/default/files from default-data (rw)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  all-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  all-pvc
    ReadOnly:   false
  default-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  default-pvc
    ReadOnly:   false
  default-token-hdzw5:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hdzw5
    Optional:    false

Note: I've removed the fsfreeze container and it now works as expected. There are not a lot of I/O operations on those volumes, so I guess it's acceptable?

My bad, I meant kubectl get ... -o yaml 馃槃

Can you get the yaml for the PVCs named all-pvc and default-pvc and then get the yaml for their corresponding PVs?

Alright 馃槃

all-pvc

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/gce-pd
  creationTimestamp: 2019-01-09T16:29:31Z
  finalizers:
  - kubernetes.io/pvc-protection
  name: all-pvc
  namespace: prod
  resourceVersion: "5640161"
  selfLink: /api/v1/namespaces/prod/persistentvolumeclaims/all-pvc
  uid: foo
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 150Gi
  storageClassName: standard
  volumeName: pvc-bdffb3e0-142b-11e9-9d9a-42010a8400f6
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 150Gi
  phase: Bound

And corresponding all-pv:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    kubernetes.io/createdby: gce-pd-dynamic-provisioner
    pv.kubernetes.io/bound-by-controller: "yes"
    pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
  creationTimestamp: 2019-01-09T16:29:35Z
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    failure-domain.beta.kubernetes.io/region: europe-west1
    failure-domain.beta.kubernetes.io/zone: europe-west1-d
  name: pvc-bdffb3e0-142b-11e9-9d9a-42010a8400f6
  resourceVersion: "5640159"
  selfLink: /api/v1/persistentvolumes/pvc-bdffb3e0-142b-11e9-9d9a-42010a8400f6
  uid: c04dda17-142b-11e9-9d9a-42010a8400f6
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 150Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: all-pvc
    namespace: prod
    resourceVersion: "5640137"
    uid: bdffb3e0-142b-11e9-9d9a-42010a8400f6
  gcePersistentDisk:
    fsType: ext4
    pdName: gke-ff43462e-dyn-pvc-bdffb3e0-142b-11e9-9d9a-42010a8400f6
  persistentVolumeReclaimPolicy: Delete
  storageClassName: standard
status:
  phase: Bound

default-pvc

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/gce-pd
  creationTimestamp: 2019-01-09T16:29:31Z
  finalizers:
  - kubernetes.io/pvc-protection
  name: default-pvc
  namespace: prod
  resourceVersion: "5640168"
  selfLink: /api/v1/namespaces/prod/persistentvolumeclaims/default-pvc
  uid: bar
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: standard
  volumeName: pvc-be01fcb0-142b-11e9-9d9a-42010a8400f6
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  phase: Bound

And corresponding default-pv:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    kubernetes.io/createdby: gce-pd-dynamic-provisioner
    pv.kubernetes.io/bound-by-controller: "yes"
    pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
  creationTimestamp: 2019-01-09T16:29:35Z
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    failure-domain.beta.kubernetes.io/region: europe-west1
    failure-domain.beta.kubernetes.io/zone: europe-west1-d
  name: pvc-be01fcb0-142b-11e9-9d9a-42010a8400f6
  resourceVersion: "5640166"
  selfLink: /api/v1/persistentvolumes/pvc-be01fcb0-142b-11e9-9d9a-42010a8400f6
  uid: c05e516c-142b-11e9-9d9a-42010a8400f6
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: default-pvc
    namespace: prod
    resourceVersion: "5640141"
    uid: be01fcb0-142b-11e9-9d9a-42010a8400f6
  gcePersistentDisk:
    fsType: ext4
    pdName: gke-ff43462e-dyn-pvc-be01fcb0-142b-11e9-9d9a-42010a8400f6
  persistentVolumeReclaimPolicy: Delete
  storageClassName: standard
status:
  phase: Bound

Thank you!

@gulien thanks, what I meant by the yaml for the PVs was kubectl get pv/pvc-bdffb3e0-142b-11e9-9d9a-42010a8400f6 -o yaml and kubectl get pv/pvc-be01fcb0-142b-11e9-9d9a-42010a8400f6 -o yaml - could you please share that?

馃槄 @ncdc I've just updated my last comment.

Ok, as far as I know, that should work. We'll have to do some manual testing in GKE and report back to you.

+1
same issue here:

$ kubectl exec -it -c fsfreeze -n nginx-example nginx-deployment-6b8dc99f69-xnv74 sh
/ # /sbin/fsfreeze --freeze /var/log/nginx
fsfreeze: /var/log/nginx: freeze failed: Not supported

I have this same issue on Azure Kubernetes Service (AKS). I'm using the demo nginx-app, provided with the installation package. These are the specs for my storage class (azure-file):

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: standard-lrs
provisioner: kubernetes.io/azure-file
mountOptions:

  • dir_mode=0777
  • file_mode=0777
  • uid=1000
  • gid=1000
    allowVolumeExpansion: true
    parameters:
    skuName: Standard_LRS

FYI: I did some experimenting here, and I can confirm that fsfreeze is not supported on GKE when the nodes are running Google's Container-Optimized OS (cos), which is the default. After switching the nodes to use Ubuntu, the fsfreeze hooks work just fine.

So, if you're using cos, you will need to come up with an alternate command to use for your hook to quiesce/unquiesce your application.

I'm closing this out, as it's not a bug in Velero, but a limitation with fsfreeze and specific underlying operating systems/file systems. Users must identify & configure appropriate commands to run to quiesce/un-quiesce their apps if they're running on such environments.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

totemcaf picture totemcaf  路  4Comments

doronmak picture doronmak  路  3Comments

akgunjal picture akgunjal  路  3Comments

vitobotta picture vitobotta  路  3Comments

onedr0p picture onedr0p  路  3Comments