Origin: deleting pvc fails to recycle pv

Created on 7 Sep 2017 · 12Comments · Source: openshift/origin

deleting pvc fails to recycle pv

Version

oc v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://127.0.0.1:8443
openshift v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7

Steps To Reproduce

Create pvc of say 100Mi.
Use pvc in a pod.
Delete pod/deployment, and wait for it/pods to clear. Then delete pvc, which works fine.
oc get pv shows this (after oc login -u admin:system):

NAME      CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS      CLAIM                       STORAGECLASS   REASON    AGE
pv0005    100Gi      RWO,ROX,RWX   Recycle         Failed      kafka-dev/datadir-kafka-1                            11m

oc describe pv pv0005 shows this:

Name:       pv0005
Labels:     volume=pv0005
Annotations:    pv.kubernetes.io/bound-by-controller=yes
StorageClass:   
Status:     Failed
Claim:      kafka-dev/datadir-kafka-1
Reclaim Policy: Recycle
Access Modes:   RWO,ROX,RWX
Capacity:   100Gi
Message:    Recycle failed: unexpected error creating recycler pod:  pods "recycler-for-pv0013" is forbidden: service account openshift-infra/pv-recycler-controller was not found, retry after the service account is created

Source:
    Type:   HostPath (bare host directory volume)
    Path:   /var/lib/origin/openshift.local.pv/pv0005
Events:
  FirstSeen LastSeen    Count   From                SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----                -------------   --------    ------          -------
  5m        5m      1   persistentvolume-controller         Warning     VolumeFailedRecycle Recycle failed: unexpected error creating recycler pod:  pods "recycler-for-pv0013" is forbidden: service account openshift-infra/pv-recycler-controller was not found, retry after the service account is created

Additional Information

I did an oc cluster down, then moved /var/lib/origin to .old and rebooted. Same issue.
However oc cluster down/up shows all pv's as fine/available when the cluster comes up.
Using ubuntu 16.04 LTS up to date with packages.
I would origin to just recycle pv's automatically, and there should be no need for me to do anything?

Here is the template to create the setup. You can do oc new-app -f <template> to create it:

apiVersion: v1
kind: Template
metadata:
  name: kafka
  annotations:
#    openshift.io/display-name: "Kafka Container Cluster"
    description: "Kafka"
    iconClass: "icon-openjdk"
    tags: "kafka,zookeeper"
objects:
# A headless service to create DNS records
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
    name: broker
  spec:
    ports:
    - port: 9092
    # [podname].broker.kafka.svc.cluster.local
    clusterIP: None
    selector:
      app: kafka
# The real service
- apiVersion: v1
  kind: Service
  metadata:
    name: kafka
  spec:
    ports:
    - port: 9092
    selector:
      app: kafka
- apiVersion: apps/v1beta1
  kind: StatefulSet
  metadata:
    name: kafka
  spec:
    serviceName: "broker"
    replicas: ${REPLICAS}
    template:
      metadata:
        labels:
          app: kafka
        annotations:
          pod.alpha.kubernetes.io/initialized: "true"
      spec:
        containers:
        - name: broker
          image: ${IMAGE}
          ports:
          - containerPort: 9092
          command:
          - sh
          - -c
          - "./bin/kafka-server-start.sh config/server.properties --override broker.id=$(hostname | awk -F'-' '{print $2}')"
          volumeMounts:
          - name: datadir
            mountPath: /opt/kafka/data
    volumeClaimTemplates:
    - metadata:
        name: datadir
        annotations:
          volume.alpha.kubernetes.io/storage-class: anything
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: ${PVC_SIZE}
- apiVersion: v1
  kind: PersistentVolumeClaim
  apiVersion: v1
  metadata:
    name: datadir-kafka-0
  spec:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: ${PVC_SIZE}
- apiVersion: v1
  kind: PersistentVolumeClaim
  apiVersion: v1
  metadata:
    name: datadir-kafka-1
  spec:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: ${PVC_SIZE}
- apiVersion: v1
  kind: PersistentVolumeClaim
  apiVersion: v1
  metadata:
    name: datadir-kafka-2
  spec:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: ${PVC_SIZE}
- apiVersion: v1
  kind: PersistentVolumeClaim
  apiVersion: v1
  metadata:
    name: datadir-kafka-3
  spec:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: ${PVC_SIZE}
- apiVersion: v1
  kind: PersistentVolumeClaim
  apiVersion: v1
  metadata:
    name: datadir-kafka-4
  spec:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: ${PVC_SIZE}
parameters:
- description: Number of kafka pods (max 5; only 5 pvcs in the template)
  name: REPLICAS
  value: '3'
- description: datadir-kafka pvc size
  name: PVC_SIZE
  value: 100Mi
- description: Kafka container image
  name: IMAGE
  value: spicysomtam/kafka:0.10
labels:
  template: kafka

componenstorage kinbug lifecyclrotten prioritP2

Source

spicysomtam

👍1

Most helpful comment

+1 for the oc create serviceaccount pv-recycler-controller -n openshift-infra resolving the issue... OpenShift Project as a whole seems pretty buggy... Also had missing directions on NFS PV create to CHMOD it before using it.

internetrush1 on 1 Nov 2017

👍3

All 12 comments

I am having this issue as well. Upgrading a v1.5 cluster to v3.6 was fine but a new v3.6 cluster does not appear to have the correct service account for the pv recycler

kincl on 26 Sep 2017

👍2

It looks like there was a move of the openshift controller roles that might not have been reflected in the recycler? That would me that upgraded clusters would still have the old roles and service accounts.

714f56a3aa75f047a05fd12fe3beb577417b6879

kincl on 26 Sep 2017

Ran into the same issue after moving from 1.4 to 3.6. Resolved it by creating the missing service account

$ oc describe pv FAILED_PV
...
Events:
  FirstSeen LastSeen    Count   From                SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----                -------------   --------    ------          -------
  13m       13m     1   persistentvolume-controller         Warning     VolumeFailedRecycle Recycle failed: unexpected error creating recycler pod:  pods "recycler-for-pv007" is forbidden: service account openshift-infra/pv-recycler-controller was not found, retry after the service account is created

oc create serviceaccount pv-recycler-controller -n openshift-infra

alexcreek on 26 Oct 2017

@alexcreek I did the same thing but my PVs are still in Failed state.

oc get ClusterRoles | grep persistent-
oc get clusterrolebindings | grep pv-recycler-controller
oadm policy add-cluster-role-to-user system:persistent-volume-provisioner pv-recycler-controller
oadm policy add-cluster-role-to-user system:controller:persistent-volume-binder pv-recycler-controller
Still no luck..

Edit:
Those PVs were claimed by some PVCs that were all deleted during a project wipeout (oc delete all --all). I don't yet know why pv-recycler-controller didn't clear them but for anyone trying to get their PVs free again, this is how I made my PVs "available". I edited each one individually with "oc edit pv pv_name", deleted "claimRef" section in the YAML files. Then they were "available" again.

Edit2:
I have created a CASE for this in Red Hat. Will be posting updates here.

co-de on 28 Oct 2017

internetrush1 on 1 Nov 2017

👍3

Using following command, solve my problem.

oc create serviceaccount pv-recycler-controller -n openshift-infra

But should it provisioned with the ansible install?

My Version:

oc v3.6.1+008f2d5
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://centos-51:8443
kubernetes v1.6.1+5115d708d7

linzhaoming on 12 Dec 2017

👍2

It solved my issue too! Thanks.
Using v3.6.0+c4dd4cf

flaviostutz on 14 Dec 2017

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot on 15 Mar 2018

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot on 14 Apr 2018

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot on 14 May 2018

/reopen

ginigangadharan on 14 Nov 2018

@ginigangadharan: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.