Rke: failed to resize ceph rbd image

Created on 18 Nov 2020  路  14Comments  路  Source: rancher/rke

RKE version:
rke_linux-amd64_v1.2.3

kubectl version 
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.12", GitCommit:"e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725", GitTreeState:"clean", BuildDate:"2020-05-20T20:41:06Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.12", GitCommit:"7cd5e9086de8ae25d6a1514d0c87bac67ca4a481", GitTreeState:"clean", BuildDate:"2020-11-12T09:11:15Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Docker version: (docker version,docker info preferred)

$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 28
  Running: 15
  Paused: 0
  Stopped: 13
 Images: 12
 Server Version: 19.03.13
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-124-generic
 Operating System: Ubuntu 18.04.5 LTS
 OSType: linux
 Architecture: x86_64  
 CPUs: 4
 Total Memory: 3.852GiB
 Name: test-k8s-v18-01 
 ID: 6FQB:YXFQ:XOQM:EYNK:FGYY:QJBR:66BO:WDEX:IZBQ:ZUVY:KT5B:KZ4N
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false   
 Insecure Registries:  
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

$  uname -r
4.15.0-124-generic

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Test cluster with 3 VMs in proxmox

cluster.yml file:
attached too issue as cluster.yml.zip

Steps to Reproduce:
Install new cluster, configure new storageclass rbd as default storage for cluster.

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: rbd
parameters:
  adminId: admin-test-k8s-v18
  adminSecretName: ceph-key
  adminSecretNamespace: kube-system
  fsType: ext4
  imageFeatures: layering
  imageFormat: "2"
  monitors: x.x.x.25:6789,x.x.x.16:6789,x.x.x.26:6789
  pool: test-k8s-v18
  userId: admin-test-k8s-v18
  userSecretName: ceph-key
  userSecretNamespace: kube-system
provisioner: kubernetes.io/rbd
reclaimPolicy: Delete
volumeBindingMode: Immediate
---
apiVersion: v1
data:
  key: some_base64_data==
kind: Secret
metadata:
  name: ceph-key
type: kubernetes.io/rbd

Deploy simple PersistentVolumeClaim

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-resize-test
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

wait until PersistentVolumeClaim created sucsessfuly and PersistentVolume provisioned, usually 1-2 min
Update spec.resources.storage to 2Gi and update pvc-resize-test pvc

Results:
View decription of updateted pvc, there is a lot of error messages in log with text error 'expanding volume' and pv can't be resized by kubernetes.io/rbd

  k-test describe pvc pvc-resize-test
Name:         kube-system
Namespace:     default
StorageClass:  rbd
Status:        Bound
Volume:        pvc-032f042a-1e1c-4d5b-8e16-7dd2c2546973
Labels:        <none>
Annotations:   kubectl.kubernetes.io/last-applied-configuration:
                 {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"pvc-resize-test","namespace":"kube-system"},"spec":...
               pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/rbd
               volume.kubernetes.io/storage-resizer: kubernetes.io/rbd
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      2Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    <none>
Conditions:
  Type       Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----       ------  -----------------                 ------------------                ------  -------
  Resizing   True    Mon, 01 Jan 0001 00:00:00 +0000   Wed, 18 Nov 2020 16:38:46 +0200           
Events:
  Type     Reason                 Age   From                         Message
  ----     ------                 ----  ----                         -------
  Normal   ProvisioningSucceeded  45m   persistentvolume-controller  Successfully provisioned volume pvc-032f042a-1e1c-4d5b-8e16-7dd2c2546973 using kubernetes.io/rbd
  Warning  VolumeResizeFailed     43m   volume_expand                error expanding volume "kube-system/pvc-resize-test" of plugin "kubernetes.io/rbd": rbd info failed, error: parse rbd info output failed: 2020-11-18 14:38:47.044399 7fced6d3b0c0 -1 did not load config file, using default settings.
2020-11-18 14:38:47.052824 7fced6d3b0c0 -1 Errors while parsing config file!
2020-11-18 14:38:47.052875 7fced6d3b0c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.052878 7fced6d3b0c0 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.052879 7fced6d3b0c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.054238 7fced6d3b0c0 -1 Errors while parsing config file!
2020-11-18 14:38:47.054259 7fced6d3b0c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.054261 7fced6d3b0c0 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.054263 7fced6d3b0c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
{"name":"kubernetes-dynamic-pvc-27bef4bd-355d-4deb-bfcc-b209cbd7c3f7","size":2147483648,"objects":512,"order":22,"object_size":4194304,"block_name_prefix":"rbd_data.d7eaf26b8b4567","format":2,"features":["layering"],"flags":[],"create_timestamp":"Wed Nov 18 14:36:39 2020"}
, invalid character '-' after top-level value
  Warning  VolumeResizeFailed  43m  volume_expand  error expanding volume "kube-system/pvc-resize-test" of plugin "kubernetes.io/rbd": rbd info failed, error: parse rbd info output failed: 2020-11-18 14:38:47.136128 7f36d00620c0 -1 did not load config file, using default settings.
2020-11-18 14:38:47.141907 7f36d00620c0 -1 Errors while parsing config file!
2020-11-18 14:38:47.141937 7f36d00620c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.141959 7f36d00620c0 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.141961 7f36d00620c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.143274 7f36d00620c0 -1 Errors while parsing config file!
2020-11-18 14:38:47.143295 7f36d00620c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.143298 7f36d00620c0 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.143301 7f36d00620c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
{"name":"kubernetes-dynamic-pvc-27bef4bd-355d-4deb-bfcc-b209cbd7c3f7","size":2147483648,"objects":512,"order":22,"object_size":4194304,"block_name_prefix":"rbd_data.d7eaf26b8b4567","format":2,"features":["layering"],"flags":[],"create_timestamp":"Wed Nov 18 14:36:39 2020"}
, invalid character '-' after top-level value

Short summary:
kubernetes.io/rbd plugin can provision new persistentvolumes, but failled to resize it.

statumore-info

Most helpful comment

Apparently backporting was forgotten, the current workaround would be to add extra_binds to the service and mount an empty file on the host to the container(s) to make this work on provisioning.

All 14 comments

this workaround fixed resize, should be executes once on all controlplane nodes

docker exec -it  kube-controller-manager sh -c "touch /etc/ceph/ceph.conf"

This seems to be the same as https://github.com/kubernetes/kubernetes/issues/72393, without a fix and with the workaround you provided. I think it's harmless to add this to the image, can you test if you run into the same thing with k8s 1.19?

unfortunately , this issue exists in 1.19.4 (kubernetes: rancher/hyperkube:v1.19.4-rancher1 from cluster.yml) and my provided workaround also works

./kubectl --kubeconfig=kube_config_cluster.yml version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:17:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:09:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}

if if necessary, I can add detailed error logs for 1.19.4, but they are similar as in 1.18

@superseb Absolutely, please add it to the image! It has always been a nuisance...
I also add an empty /etc/ceph/ceph.keyring, but don't know if it's actually required.

@adampl I've tested on 17.13\18.12\19.4 cluster versions and empty file /etc/ceph/ceph.conf is enough to restore rbd csi resize

@niko-lay Hmm, yet these comments suggest that both files are required. What Ceph version have you tested with?

@adampl I use ceph based on proxmox

# ceph version
ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) nautilus (stable)

I'm still on Ceph 12.2.13 Luminous, and now I wonder if it will work without the keyring file. I guess I will have to check it somehow.

Kubernetes 1.20 has just been released with the fix for this issue, so we should rather focus on getting it ported to older versions: https://github.com/kubernetes/kubernetes/pull/92027

Apparently backporting was forgotten, the current workaround would be to add extra_binds to the service and mount an empty file on the host to the container(s) to make this work on provisioning.

This issue/PR has been automatically marked as stale because it has not had activity (commit/comment/label) for 60 days. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

not stale

@adampl This fix is in 1.20 which will be supported in the next release, as upstream did not backport the fix, the workaround can be used for older versions til 1.20 is available. Let me know if you sea specific need to backport this to older versions.

OK, you can close it - if @niko-lay is OK with that.

Was this page helpful?
0 / 5 - 0 ratings