Rook: Ceph: Allow setting min_size for pools to prevent suspended I/O

Created on 17 Jan 2020  路  4Comments  路  Source: rook/rook

Is this a bug report or feature request?

  • Feature Request

What should the feature do:
Allow specification of min_size for Ceph pools (RBD and CephFS).

What is use case behind this feature:
In our Rook operated Ceph cluster (4 worker/storage nodes, 2 workers per datacenter, failureDomain set to zone, node labels set accordingly) with .spec.replicated.size set to 2 in CephBlockPool we see min_size automatically set to 2 on pools as well. This prevents I/O when one datacenter is lost although data is available. Manually setting min_size to 1 recovers from this situation and allows data access even when one site is lost. See https://docs.ceph.com/docs/jewel/rados/operations/pools/#set-the-number-of-object-replicas and https://docs.ceph.com/docs/jewel/rados/operations/pools/#size.

Environment:

[root@rook-ceph-tools-595594bf67-hp5pv /]# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME          STATUS REWEIGHT PRI-AFF 
 -1       0.09357 root default                               
 -4       0.04678     zone dc1                               
-11       0.02339         host n0204                         
  2   ssd 0.02339             osd.2      up  1.00000 1.00000 
 -3       0.02339         host n0205                         
  1   ssd 0.02339             osd.1      up  1.00000 1.00000 
 -8       0.04678     zone dc2                               
 -7       0.02339         host n0206                         
  0   ssd 0.02339             osd.0      up  1.00000 1.00000 
-13       0.02339         host n0207                         
  3   ssd 0.02339             osd.3      up  1.00000 1.00000 

CephBlockPool:

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: rbdpool
  namespace: rook-ceph
spec:
  crushRoot: ""
  deviceClass: ""
  erasureCoded:
    algorithm: ""
    codingChunks: 0
    dataChunks: 0
  failureDomain: zone
  replicated:
    min_size: 1
    size: 2
$ kubectl exec -ti -n rook-ceph rook-ceph-tools-595594bf67-hp5pv bash
# ceph osd pool get rbdpool min_size
min_size: 2
# ceph osd pool get rbdpool size    
size: 2
# ceph -v
ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable)
feature

All 4 comments

https://github.com/rook/rook/issues/2543 seems to be related. In https://github.com/rook/rook/pull/4638, min_size was removed which now causes issues for me.

When it comes to min_size we deliberately decided to let Ceph choose the right value. If you think the value should be different you should either:

Unfortunately, this is not something we are willing to fix in Rook.

Frankly, this is neither helpful nor consistent. The Rook Operator aims to bundle operator know-how so it makes sense that it can customize and configure whatever is helpful on behalf of the admin. Recommending manual intervention is nothing more than a work around. Maybe it makes sense to generalize the idea and allow overriding any parameter on a pool. There will always be situations were the default settings are not suitable.

Rook aims at deploying/managing/upgrading and maintaining a desired state for a storage cluster. The goal is not to create yet another interface on top of Ceph, the priority is to get Ceph smarter so if defaults are not suitable then Ceph should tune itself. We have been doing this on several occasions and we will continue to do so.
The only time where Rook can help is when something critical rises in the management strategy and needs a fix.

Why not opening a tracker bug in Ceph?

Was this page helpful?
0 / 5 - 0 ratings