After a successful installation of OpenShift Origin v3.7 I tried to add a disk to the glusterfs nodes expecting to increase the amount of available storage by adding the devices names to the inventoryfile and rerunning the playbook. The playbook aborted when loading the heketi topology file
[openshift@os-bastion-1 openshift-ansible]$ ansible --version
ansible 2.5.0
config file = /opt/openshift/openshift-ansible/ansible.cfg
configured module search path = [u'/home/openshift/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /bin/ansible
python version = 2.7.5 (default, Aug 4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]
[openshift@os-bastion-1 openshift-ansible]$ git describe
openshift-ansible-3.7.44-1-30-g5b7a769
playbooks/byo/config.yml and glusterfs nodesglusterfs_devicesplaybooks/byo/config.yml againMore storage available.
The playbook run stopped at
TASK [openshift_storage_glusterfs : Load heketi topology]
with the following error
{
"changed": true,
"cmd": [
"oc",
"--config=/tmp/openshift-glusterfs-ansible-lUnfw5/admin.kubeconfig",
"rsh",
"--namespace=glusterfs",
"heketi-storage-1-kktjv",
"heketi-cli",
"-s",
"http://localhost:8080",
"--user",
"admin",
"--secret",
"redacted",
"topology",
"load",
"--json=/tmp/openshift-glusterfs-ansible-lUnfw5/topology.json",
"2>&1"
],
"delta": "0:00:03.610762",
"end": "2018-04-20 12:05:38.335115",
"failed_when_result": true,
"rc": 0,
"start": "2018-04-20 12:05:34.724353",
"stderr": "",
"stderr_lines": [],
"stdout_lines": [
" Found node os-storage-1.lab.com on cluster 6455ee6a8726324e54cdb1dddd3b6ddc",
" Found device /dev/sdc",
" Adding device /dev/sdd ... Unable to add device: Unable to execute command on glusterfs-storage-s7hk9: WARNING: Not using lvmetad because config setting use_lvmetad=0.",
" WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).",
" Device /dev/sdd not found (or ignored by filtering).",
" Found node os-storage-2.lab.com on cluster 6455ee6a8726324e54cdb1dddd3b6ddc",
" Found device /dev/sdc",
" Adding device /dev/sdd ... Unable to add device: Unable to execute command on glusterfs-storage-p8whz: WARNING: Not using lvmetad because config setting use_lvmetad=0.",
" WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).",
" Device /dev/sdd not found (or ignored by filtering).",
" Found node os-storage-3.lab.com on cluster 6455ee6a8726324e54cdb1dddd3b6ddc",
" Found device /dev/sdc",
" Adding device /dev/sdd ... Unable to add device: Unable to execute command on glusterfs-storage-nmbm5: WARNING: Not using lvmetad because config setting use_lvmetad=0.",
" WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).",
" Device /dev/sdd not found (or ignored by filtering)."
]
}
Provide any additional information which may help us diagnose the
issue.
CentOS Linux release 7.4.1708 (Core)openshift_hosted_registry_storage_kind=glusterfs
openshift_storage_glusterfs_registry_storageclass=True
openshift_storage_glusterfs_storageclass_default=True
[glusterfs]
os-storage-1.lab.com glusterfs_ip='10.3.1.122' glusterfs_devices='[ "/dev/sdc", "/dev/sdd" ]'
os-storage-2.lab.com glusterfs_ip='10.3.1.123' glusterfs_devices='[ "/dev/sdc", "/dev/sdd" ]'
os-storage-3.lab.com glusterfs_ip='10.3.1.124' glusterfs_devices='[ "/dev/sdc", "/dev/sdd" ]'
lsblk shows some mpath device below /dev/sdd[root@os-storage-3 ~]# lsblk /dev/sdd
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdd 8:48 0 100G 0 disk
鈹斺攢1VENDOR_NFS_4_0_3887_e5c9f803_e2f2_4d10_87ee_f3e46f91fd6c 253:3 0 100G 0 mpath
The GlusterFS playbooks are not guaranteed to be idempotent, and thus running them more than once per installation is not supported. To add new devices and nodes to the GlusterFS cluster you need to do so through the heketi-cli client. An example command:
oc rsh <heketi_pod> heketi-cli -s http://localhost:8080 --user admin --secret <admin_key> topology info
You can find the admin_key by running oc describe po <heketi_pod> and checking the env variables. See the device add help subcommand of heketi-cli for more information on the exact syntax.
Reading the variables
heketipod="$(oc get pod -n glusterfs | grep heketi-storage | awk '{print $1}')"
heketikey="$(oc get deploymentconfigs/heketi-storage -n glusterfs --template='{{.spec.template.spec.containers}}' | grep -oP '(?<=\[name:HEKETI_ADMIN_KEY value:)\S+(?=\])')"
oc rsh -n=glusterfs "$heketipod" heketi-cli -s http://localhost:8080 --user admin --secret "$heketikey" topology info
When I then manually
[root@os-master-1 ~]# oc rsh -n=glusterfs "$heketipod" heketi-cli -s http://localhost:8080 --user admin --secret "$heketikey" device add --name=/dev/sdd --node=0ecc0dcbf279ce5bceeaff1e026a3dd0
Error: Unable to execute command on glusterfs-storage-p8whz: WARNING: Not using lvmetad because config setting use_lvmetad=0.
WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
Device /dev/sdd not found (or ignored by filtering).
command terminated with exit code 255
The same problem occurs
Does lsblk from inside the GlusterFS containers show the device? If not, did you try running pvscan --cache from inside the GlusterFS pods?
Hi there,
I fiddled around a little more and I'm pretty sure the multipath thing is causing the problem.
First of all, I can see the devices even from inside the pods everything is good. The main problem is that even on the storage host pvcreate /dev/sdd fails with the same error.
When I instead use the multipath device I can create the pv using
pvcreate /dev/mapper/1VENDOR_NFS_4_0_3887_e5c9f803_e2f2_4d10_87ee_f3e46f91fd6c
Also I can delete the mutlipath device using multipath -f and after that creating the pv with /dev/sdd works again too.
I'm not very familiar with the whole multipathing business but as far as I understand it gets configured when roles/openshift_node/tasks/storage_plugins/iscsi.yml runs and therefore the problem doesn't occur on the first install.
I added the multipath devices using heketi and that seemed to work
Node Id: a6399d6114ba9b2ff377a305b4e76a25
State: online
Cluster Id: 6455ee6a8726324e54cdb1dddd3b6ddc
Zone: 1
Management Hostnames: os-storage-1.lab.com
Storage Hostnames: 10.3.1.148
Devices:
Id:9e07c4579599eac9bb5c97f637be0759 Name:/dev/sdc State:online Size (GiB):499 Used (GiB):103 Free (GiB):396
Bricks:
Id:995b4c692f354d1329b866d499119090 Size (GiB):1 Path: /var/lib/heketi/mounts/vg_9e07c4579599eac9bb5c97f637be0759/brick_995b4c692f354d1329b866d499119090/brick
Id:9a3125acdf8c9c77141c4812c54a48e8 Size (GiB):2 Path: /var/lib/heketi/mounts/vg_9e07c4579599eac9bb5c97f637be0759/brick_9a3125acdf8c9c77141c4812c54a48e8/brick
Id:b9d6823a45b04c1cbc9072f5f3af56d0 Size (GiB):50 Path: /var/lib/heketi/mounts/vg_9e07c4579599eac9bb5c97f637be0759/brick_b9d6823a45b04c1cbc9072f5f3af56d0/brick
Id:d5e843ba70b23396dd4498aec95e0b09 Size (GiB):50 Path: /var/lib/heketi/mounts/vg_9e07c4579599eac9bb5c97f637be0759/brick_d5e843ba70b23396dd4498aec95e0b09/brick
Id:bbca5e50263de14c4f58ea553101a4f7 Name:/dev/mapper/1VENDOR_NFS_4_0_3885_2d256b78_f72f_49af_ba67_9667a406a204State:online Size (GiB):99 Used (GiB):0 Free (GiB):99
Bricks:
Does this look good?
Just to recap: The currently suggested method for extending CNS with additional disks is by using the heketi-cli?
Thanks for your help and for taking the time
Yes, heketi-cli is the current recommended method.
Hmm... this seems somewhat cumbersome. Can you provide the exact commands you used to get it to work? Also can you say more on why this doesn't impact initial deployment, is it because device mapper isn't enabled initially thus the pre-existing devices aren't remapped?
I used heketis device add changing the --name from /dev/sdd to the generated devicemapper name
heketipod="$(oc get pod -n glusterfs | grep heketi-storage | awk '{print $1}')"
heketikey="$(oc get deploymentconfigs/heketi-storage -n glusterfs --template='{{.spec.template.spec.containers}}' | grep -oP '(?<=\[name:HEKETI_ADMIN_KEY value:)\S+(?=\])')"
oc rsh -n=glusterfs "$heketipod" heketi-cli -s http://localhost:8080 --user admin --secret "$heketikey" device add --name=/dev/mapper/1VENDOR_NFS_4_0_3885_2d256b78_f72f_49af_ba67_9667a406a204 --node=0ecc0dcbf279ce5bceeaff1e026a3dd0
I did that three times in total since I have 3 nodes.
Also can you say more on why this doesn't impact initial deployment, is it because device mapper isn't enabled initially thus the pre-existing devices aren't remapped?
Yes thats exactly what I am talking about. My freshly installed hosts have only the /dev/sd*
Oh, okay, so the multipath -f thing was only to get the /dev/sd* naming to working again. Got it.
Hmm... this presents a problem. I'm not sure how to best get around this. Was this a disk that was already in the machine prior to OpenShift installation, or was it a disk that was added to the node after installation?
We actually had this same issue on a fresh OCP 3.7 deployment where automation was added outside of openshift-ansible to prepare each node hosting glusterfs by staging prereqs (including installation of multipathd + loading dm_multipath ) prior to first execution of cns-deploy topology load. We worked around the issue with a multipath -F and systemctl stop multipathd then reran the original cns-deploy. After successful execution we restarted multipathd.
@jarrpa yes, those were just my debugging steps in order to understand the issue.
The disks were introduced after the initial installation. I'm using VMs and just did a shutdown, add disk, boot for each of the VMs.
With @liveaverage 's method of completely flushing the multipath dms it should be possible to extend glusterfs using the playbook (haven't tried, though). For me this is a valid way.
If the playbook at some point wants to handle cns storage scaling "officially" this issue would have to be resolved in a clean way.
All right, thanks! Can the issue be closed, then?
Yes. Thank you
PR https://github.com/openshift/openshift-ansible/pull/7367 will solve the issue. The 3.7 merge is still pending (https://github.com/openshift/openshift-ansible/pull/8152)
Most helpful comment
The GlusterFS playbooks are not guaranteed to be idempotent, and thus running them more than once per installation is not supported. To add new devices and nodes to the GlusterFS cluster you need to do so through the heketi-cli client. An example command:
You can find the
admin_keyby runningoc describe po <heketi_pod>and checking the env variables. See thedevice add helpsubcommand ofheketi-clifor more information on the exact syntax.