RKE version:
rke_linux-amd64_v1.2.3
kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.12", GitCommit:"e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725", GitTreeState:"clean", BuildDate:"2020-05-20T20:41:06Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.12", GitCommit:"7cd5e9086de8ae25d6a1514d0c87bac67ca4a481", GitTreeState:"clean", BuildDate:"2020-11-12T09:11:15Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Docker version: (docker version,docker info preferred)
$ docker info
Client:
Debug Mode: false
Server:
Containers: 28
Running: 15
Paused: 0
Stopped: 13
Images: 12
Server Version: 19.03.13
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-124-generic
Operating System: Ubuntu 18.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 3.852GiB
Name: test-k8s-v18-01
ID: 6FQB:YXFQ:XOQM:EYNK:FGYY:QJBR:66BO:WDEX:IZBQ:ZUVY:KT5B:KZ4N
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Operating system and kernel: (cat /etc/os-release, uname -r preferred)
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
$ uname -r
4.15.0-124-generic
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Test cluster with 3 VMs in proxmox
cluster.yml file:
attached too issue as cluster.yml.zip
Steps to Reproduce:
Install new cluster, configure new storageclass rbd as default storage for cluster.
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"
name: rbd
parameters:
adminId: admin-test-k8s-v18
adminSecretName: ceph-key
adminSecretNamespace: kube-system
fsType: ext4
imageFeatures: layering
imageFormat: "2"
monitors: x.x.x.25:6789,x.x.x.16:6789,x.x.x.26:6789
pool: test-k8s-v18
userId: admin-test-k8s-v18
userSecretName: ceph-key
userSecretNamespace: kube-system
provisioner: kubernetes.io/rbd
reclaimPolicy: Delete
volumeBindingMode: Immediate
---
apiVersion: v1
data:
key: some_base64_data==
kind: Secret
metadata:
name: ceph-key
type: kubernetes.io/rbd
Deploy simple PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-resize-test
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
wait until PersistentVolumeClaim created sucsessfuly and PersistentVolume provisioned, usually 1-2 min
Update spec.resources.storage to 2Gi and update pvc-resize-test pvc
Results:
View decription of updateted pvc, there is a lot of error messages in log with text error 'expanding volume' and pv can't be resized by kubernetes.io/rbd
k-test describe pvc pvc-resize-test
Name: kube-system
Namespace: default
StorageClass: rbd
Status: Bound
Volume: pvc-032f042a-1e1c-4d5b-8e16-7dd2c2546973
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"pvc-resize-test","namespace":"kube-system"},"spec":...
pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/rbd
volume.kubernetes.io/storage-resizer: kubernetes.io/rbd
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 2Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: <none>
Conditions:
Type Status LastProbeTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
Resizing True Mon, 01 Jan 0001 00:00:00 +0000 Wed, 18 Nov 2020 16:38:46 +0200
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ProvisioningSucceeded 45m persistentvolume-controller Successfully provisioned volume pvc-032f042a-1e1c-4d5b-8e16-7dd2c2546973 using kubernetes.io/rbd
Warning VolumeResizeFailed 43m volume_expand error expanding volume "kube-system/pvc-resize-test" of plugin "kubernetes.io/rbd": rbd info failed, error: parse rbd info output failed: 2020-11-18 14:38:47.044399 7fced6d3b0c0 -1 did not load config file, using default settings.
2020-11-18 14:38:47.052824 7fced6d3b0c0 -1 Errors while parsing config file!
2020-11-18 14:38:47.052875 7fced6d3b0c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.052878 7fced6d3b0c0 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.052879 7fced6d3b0c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.054238 7fced6d3b0c0 -1 Errors while parsing config file!
2020-11-18 14:38:47.054259 7fced6d3b0c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.054261 7fced6d3b0c0 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.054263 7fced6d3b0c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
{"name":"kubernetes-dynamic-pvc-27bef4bd-355d-4deb-bfcc-b209cbd7c3f7","size":2147483648,"objects":512,"order":22,"object_size":4194304,"block_name_prefix":"rbd_data.d7eaf26b8b4567","format":2,"features":["layering"],"flags":[],"create_timestamp":"Wed Nov 18 14:36:39 2020"}
, invalid character '-' after top-level value
Warning VolumeResizeFailed 43m volume_expand error expanding volume "kube-system/pvc-resize-test" of plugin "kubernetes.io/rbd": rbd info failed, error: parse rbd info output failed: 2020-11-18 14:38:47.136128 7f36d00620c0 -1 did not load config file, using default settings.
2020-11-18 14:38:47.141907 7f36d00620c0 -1 Errors while parsing config file!
2020-11-18 14:38:47.141937 7f36d00620c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.141959 7f36d00620c0 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.141961 7f36d00620c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.143274 7f36d00620c0 -1 Errors while parsing config file!
2020-11-18 14:38:47.143295 7f36d00620c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.143298 7f36d00620c0 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2020-11-18 14:38:47.143301 7f36d00620c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
{"name":"kubernetes-dynamic-pvc-27bef4bd-355d-4deb-bfcc-b209cbd7c3f7","size":2147483648,"objects":512,"order":22,"object_size":4194304,"block_name_prefix":"rbd_data.d7eaf26b8b4567","format":2,"features":["layering"],"flags":[],"create_timestamp":"Wed Nov 18 14:36:39 2020"}
, invalid character '-' after top-level value
Short summary:
kubernetes.io/rbd plugin can provision new persistentvolumes, but failled to resize it.
this workaround fixed resize, should be executes once on all controlplane nodes
docker exec -it kube-controller-manager sh -c "touch /etc/ceph/ceph.conf"
This seems to be the same as https://github.com/kubernetes/kubernetes/issues/72393, without a fix and with the workaround you provided. I think it's harmless to add this to the image, can you test if you run into the same thing with k8s 1.19?
unfortunately , this issue exists in 1.19.4 (kubernetes: rancher/hyperkube:v1.19.4-rancher1 from cluster.yml) and my provided workaround also works
./kubectl --kubeconfig=kube_config_cluster.yml version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:17:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:09:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
if if necessary, I can add detailed error logs for 1.19.4, but they are similar as in 1.18
@superseb Absolutely, please add it to the image! It has always been a nuisance...
I also add an empty /etc/ceph/ceph.keyring, but don't know if it's actually required.
@adampl I've tested on 17.13\18.12\19.4 cluster versions and empty file /etc/ceph/ceph.conf is enough to restore rbd csi resize
@niko-lay Hmm, yet these comments suggest that both files are required. What Ceph version have you tested with?
@adampl I use ceph based on proxmox
# ceph version
ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) nautilus (stable)
I'm still on Ceph 12.2.13 Luminous, and now I wonder if it will work without the keyring file. I guess I will have to check it somehow.
Kubernetes 1.20 has just been released with the fix for this issue, so we should rather focus on getting it ported to older versions: https://github.com/kubernetes/kubernetes/pull/92027
Apparently backporting was forgotten, the current workaround would be to add extra_binds to the service and mount an empty file on the host to the container(s) to make this work on provisioning.
This issue/PR has been automatically marked as stale because it has not had activity (commit/comment/label) for 60 days. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
not stale
@adampl This fix is in 1.20 which will be supported in the next release, as upstream did not backport the fix, the workaround can be used for older versions til 1.20 is available. Let me know if you sea specific need to backport this to older versions.
OK, you can close it - if @niko-lay is OK with that.
Most helpful comment
Apparently backporting was forgotten, the current workaround would be to add
extra_bindsto the service and mount an empty file on the host to the container(s) to make this work on provisioning.