Origin: Can't bind PVC to StorageClass using gluster block

Created on 28 Sep 2019 · 14Comments · Source: openshift/origin

Unable to claim a PVC from a StorageClass of a block storage.

The claim still on pending state. On events it show:

| waiting for a volume to be created, either by external provisioner "gluster.org/glusterblock-infra-storage" or manually created by system administrator
-- | --

Version

oc v3.11.0+62803d0-1
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

openshift v3.11.0+d545883-301
kubernetes v1.11.0+d4cacc0

Steps To Reproduce

Create Storage
Select StorageClass: glusterfs-registry-block
See storage in pending state forever

Current Result

PVC in pending state showing:

| waiting for a volume to be created, either by external provisioner "gluster.org/glusterblock-infra-storage" or manually created by system administrator
-- | --

Expected Result

PVC should bind to PV dynamically.

Additional Information

OC ADM DIAGNOSTICS:
oc_adm_diag.txt

OC GET ALL
oc_get_all.txt

[root@master1 ~]# oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
metrics-cassandra-1 Pending glusterfs-registry-block 1h
[root@master1 ~]# oc describe pvc metrics-cassandra-1
Name: metrics-cassandra-1
Namespace: openshift-infra
StorageClass: glusterfs-registry-block
Status: Pending
Volume:
Labels:
Annotations: volume.beta.kubernetes.io/storage-class=glusterfs-registry-block
volume.beta.kubernetes.io/storage-provisioner=gluster.org/glusterblock-infra-storage
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 3m (x332 over 1h) persistentvolume-controller waiting for a volume to be created, either by external provisioner "gluster.org/glusterblock-infra-storage" or manually created by system administrator

root@master1 ~]# oc get storageclass
]NAME PROVISIONER AGE
glusterfs-registry-block gluster.org/glusterblock-infra-storage 1h
glusterfs-storage kubernetes.io/glusterfs 1h
glusterfs-storage-block gluster.org/glusterblock-app-storage 1h

[root@master1 ~]# oc describe storageclass glusterfs-registry-block
Name: glusterfs-registry-block
IsDefaultClass: No
Annotations:
Provisioner: gluster.org/glusterblock-infra-storage
Parameters: chapauthenabled=true,hacount=3,restsecretname=heketi-registry-admin-secret-block,restsecretnamespace=infra-storage,resturl=http://heketi-registry.infra-storage.svc:8080,restuser=admin
AllowVolumeExpansion:
MountOptions:
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events:

Inventory:

openshift_master_dynamic_provisioning_enabled=true

openshift_hosted_registry_storage_kind=glusterfs
openshift_hosted_registry_storage_volume_size=20Gi
openshift_hosted_registry_selector='node-role.kubernetes.io/infra=true'

openshift_metrics_install_metrics=true
openshift_metrics_cassandra_storage_type=pv
openshift_metrics_hawkular_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_cassandra_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_heapster_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_storage_volume_size=10Gi
openshift_metrics_cassandra_pvc_storage_class_name="glusterfs-registry-block"

openshift_logging_install_logging=true
openshift_logging_kibana_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_curator_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_es_pvc_size=10Gi
openshift_logging_elasticsearch_storage_type=pvc
openshift_logging_es_pvc_storage_class_name="glusterfs-registry-block"
openshift_logging_es_pvc_dynamic=true

openshift_storage_glusterfs_timeout=900
openshift_storage_glusterfs_namespace=app-storage
openshift_storage_glusterfs_storageclass=true
openshift_storage_glusterfs_storageclass_default=false
openshift_storage_glusterfs_block_deploy=true
openshift_storage_glusterfs_block_host_vol_size=50
openshift_storage_glusterfs_block_storageclass=true
openshift_storage_glusterfs_block_storageclass_default=false
openshift_storage_glusterfs_wipe=true

openshift_storage_glusterfs_registry_timeout=900
openshift_storage_glusterfs_registry_namespace=infra-storage
openshift_storage_glusterfs_registry_block_deploy=true
openshift_storage_glusterfs_registry_block_host_vol_size=30
openshift_storage_glusterfs_registry_block_storageclass=true
openshift_storage_glusterfs_registry_block_storageclass_default=false
openshift_storage_glusterfs_registry_wipe=true

[glusterfs]
node1.openshift.local glusterfs_devices='[ "/dev/sdc" ]'
node2.openshift.local glusterfs_devices='[ "/dev/sdc" ]'
node3.openshift.local glusterfs_devices='[ "/dev/sdc" ]'

[glusterfs_registry]
master1.openshift.local glusterfs_devices='[ "/dev/sdc" ]'
master2.openshift.local glusterfs_devices='[ "/dev/sdc" ]'
master3.openshift.local glusterfs_devices='[ "/dev/sdc" ]'

lifecyclstale

Source

uselessidbr

Most helpful comment

Another workaround: Revert this commit (openshift-ansible):

https://github.com/openshift/openshift-ansible/commit/3f19e9b7c1f399f84443f4b4087f32c716b4628d#diff-2d2e8abc086610e44d88bbb4810b6acd

rarguello on 25 Oct 2019

👍2

All 14 comments

On heketi pod i got this message on deploy:

Setting up heketi database
File: /var/lib/heketi/heketi.db
Size: 65536 Blocks: 104 IO Block: 131072 regular file
Device: 100004h/1048580d Inode: 12260247531261666053 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2019-09-29 02:35:55.934796553 +0000
Modify: 2019-09-29 02:45:25.856185586 +0000
Change: 2019-09-29 02:45:25.858185601 +0000
Birth: -
Heketi v9.0.0-1-g57a5f356-release-9
[heketi] INFO 2019/09/29 02:50:15 Loaded kubernetes executor
[heketi] INFO 2019/09/29 02:50:15 Volumes per cluster limit is set to default value of 1000
[heketi] INFO 2019/09/29 02:50:15 Block: Auto Create Block Hosting Volume set to true
[heketi] INFO 2019/09/29 02:50:15 Block: New Block Hosting Volume size 30 GB
[heketi] INFO 2019/09/29 02:50:15 Started Node Health Cache Monitor
[heketi] INFO 2019/09/29 02:50:15 GlusterFS Application Loaded
[heketi] INFO 2019/09/29 02:50:15 Started background pending operations cleaner

But no block volume was created.

uselessidbr on 29 Sep 2019

Hello!

I think i managed to solve the problem.

As i noticed that the pvc was waiting for the provisioner, and i couldnt see any log at the provisioner i thought that the "provisioner" tag should be wrong at storageclass definition:

Provisioner: gluster.org/glusterblock-infra-storage

Altought there was a environment in the pod:

So i changed the provisioner to "gluster.org/glusterblock" in the storageclass:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: glusterfs-registry-block-teste
provisioner: gluster.org/glusterblock
parameters:
chapauthenabled: 'true'
hacount: '3'
restsecretname: heketi-registry-admin-secret-block
restsecretnamespace: infra-storage
resturl: 'http://heketi-registry.infra-storage.svc:8080'
restuser: admin
reclaimPolicy: Delete
volumeBindingMode: Immediate

After that i had to change the secret "heketi-registry-admin-secret-block" on namespace "infra-storage" because it had the type "gluster.org/glusterblock-infra-storage" but the provisioner was trying to find "gluster.org/glusterblock".

Had to copy the content of the secret, delete it and recreate with the expected type: "gluster.org/glusterblock".

uselessidbr on 29 Sep 2019

Update.

The only thing i'm concerned about is that i have two provisioners (app-storage and infra-storage) and both are receiving request to delete volumes. I'm not sure that it would be a problem unless the name matches on both ends.

Is that an expected behaviour? Both have distinct resturl on storageclass but something is definitely wrong with the environment "provisioner_name" that is what i think that could isolated both provisioners.

uselessidbr on 29 Sep 2019

Good catch! I was having similar issues with a newly deployed OKD 3.11 cluster. An older cluster is working fine. After comparing the two, your notes matched up. The older cluster uses a provisioner type of "gluster.org/glusterblock". I suspect a bug was introduced into the Ansible playbooks, or an Ansible version upgrade has broken logic in the playbooks that is deprecated.

cgruver on 17 Oct 2019

👍1

This may be it right here:

openshift-ansible/roles/openshift_storage_glusterfs/templates/gluster-block-storageclass.yml.j2

You will see that it sets "provisioner: gluster.org/glusterblock-{{ glusterfs_namespace }}". I suspect that this should really be "provisioner: gluster.org/glusterblock"

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: glusterfs-{{ glusterfs_name }}-block
{% if glusterfs_block_storageclass_default is defined and glusterfs_block_storageclass_default %}
annotations:
storageclass.kubernetes.io/is-default-class: "true"
{% endif %}
provisioner: gluster.org/glusterblock-{{ glusterfs_namespace }}
parameters:
resturl: "http://{% if glusterfs_heketi_is_native %}heketi-{{ glusterfs_name }}.{{ glusterfs_namespace }}.svc:8080{% else %}{{ glusterfs_heketi_url }}:{{ glusterfs_heketi_port }}{% endif %}"
restuser: "admin"
chapauthenabled: "true"
hacount: "3"
{% if glusterfs_heketi_admin_key is defined %}
restsecretnamespace: "{{ glusterfs_namespace }}"
restsecretname: "heketi-{{ glusterfs_name }}-admin-secret-block"
{%- endif -%}

cgruver on 17 Oct 2019

👍1

This change was introduced on Aug. 28 to address this:

https://bugzilla.redhat.com/show_bug.cgi?id=1738394

So, we may be missing some additional context? Perhaps there's a necessary change to the provisioner itself to indicate which namespace it is in.

At this point, I am suspecting that the glusterblock-provisioner container image is not properly consuming the parameter: PROVISIONER_NAME from the deployment config. Perhaps there is a bug there, or we may just need a newer image that properly consumes that parameter.

cgruver on 17 Oct 2019

👍1

Also look here:

https://github.com/kubernetes-incubator/external-storage/issues/1168

The provisioner was modified to use the PROVISIONER_NAME provided in the environment.

This functionality is either still not working, or we ended up with older container images that still have the bug.

cgruver on 17 Oct 2019

👍1

The containers currently in quay.io or docker hub are older than the code changes to the Ansible role for Gluster install. There also appear to be some lingering issues with the glusterblock-provisioner code. It looks like it has "gluster.org/glusterblock" set as a global constant for the provisioner name, but does not replace that constant everywhere with the environment variable.

I'm testing a fix this morning.

cgruver on 18 Oct 2019

👍1

I have a fix working in my OKD 3.11 cluster. The repo at "https://github.com/kubernetes-incubator/external-storage" is marked as deprecated, so I don't know if anyone will answer a pull request. However, you can get the fix from here: "https://github.com/cgruver/external-storage".

You will need to build a local instance of the glusterblock-provisioner container image and push it to your registry. I'm using Sonatype Nexus as a local and proxy registry for my OKD clusters. I have a local registry path called openshift which is where I put the container images for installation and updates.

git clone https://github.com/cgruver/external-storage.git
cd external-storage/gluster/block
export REGISTRY=your.registry.com:5000/openshift/
export VERSION=v3.11
make container

docker login your.registry.com:5000
docker push ${REGISTRY}/glusterblock-provisioner/${VERSION}

The last thing that you will need to do is modify the DeploymentConfig for the glusterblock-provisioner to pull the correct image, if you are not doing a clean install.

If you are doing a clean install, then your Ansible inventory file just needs to know where the image is:

openshift_storage_glusterfs_block_image=your.registry.com:5000/openshift/glusterblock-provisioner:v3.11
openshift_storage_glusterfs_registry_block_image=your.registry.com:5000/openshift/glusterblock-provisioner:v3.11

That should fix it...

I'm going to provide cleaner code in the fix, and submit a pull request to the original repo.

cgruver on 18 Oct 2019

👍1

Another workaround: Revert this commit (openshift-ansible):

https://github.com/openshift/openshift-ansible/commit/3f19e9b7c1f399f84443f4b4087f32c716b4628d#diff-2d2e8abc086610e44d88bbb4810b6acd

rarguello on 25 Oct 2019

👍2

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot on 23 Jan 2020

This issue was corrected by https://github.com/kubernetes-incubator/external-storage/pull/1245

cgruver on 23 Jan 2020

👍1

/close

cgruver on 23 Jan 2020

@cgruver: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.