RKE version:
v0.3.2
Docker version: (docker version,docker info preferred)
...
Server Version: 18.06.3-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932181eb9de8e72e92e616e86e
runc version: a592beb5bc4c4092b1c0cac971afed27687340c5
init version: fec3683b97ad9c3ef73f284f176e12c44b448662
Security Options:
seccomp
Profile: default
selinux
Kernel Version: 4.19.66-coreos
Operating System: Container Linux by CoreOS 2191.4.1 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 16.26GiB
Name: kw1.example.com
ID: KTTJ:Q3RN:ZLSU:WTLK:EWZ3:TB3T:DWPK:ONVB:EDQB:Z57U:4SQA:KR1L
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
HTTP Proxy: http://proxy.example.com
HTTPS Proxy: http://proxy.example.com
No Proxy: localhost, 127.0.0.0/8, repo.example.com, 172.16.0.0/16, proxy.example.com, proxy.example.com, proxy.example.com
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Operating system and kernel: (cat /etc/os-release, uname -r preferred) 4.19.66-coreos
$ uname -r
4.19.66-coreos
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Bar-metal
cluster.yml file:
# If you intened to deploy Kubernetes in an air-gapped environment,
# please consult the documentation on how to configure custom RKE images.
nodes:
- address: 172.16.101.190
port: "22"
internal_address: ""
role:
- controlplane
- etcd
hostname_override: "master-0"
user: arash
docker_socket: /var/run/docker.sock
ssh_key: ""
ssh_key_path: ~/.ssh/id_rsa
ssh_cert: ""
ssh_cert_path: ""
labels: {}
- address: 172.16.101.191
port: "22"
internal_address: ""
role:
- controlplane
- etcd
hostname_override: "master-1"
user: arash
docker_socket: /var/run/docker.sock
ssh_key: ""
ssh_key_path: ~/.ssh/id_rsa
ssh_cert: ""
ssh_cert_path: ""
labels: {}
- address: 172.16.101.192
port: "22"
internal_address: ""
role:
- controlplane
- etcd
hostname_override: "master-2"
user: arash
docker_socket: /var/run/docker.sock
ssh_key: ""
ssh_key_path: ~/.ssh/id_rsa
ssh_cert: ""
ssh_cert_path: ""
labels: {}
- address: 172.16.101.193
port: "22"
internal_address: ""
role:
- worker
hostname_override: "worker-0"
user: arash
docker_socket: /var/run/docker.sock
ssh_key: ""
ssh_key_path: ~/.ssh/id_rsa
ssh_cert: ""
ssh_cert_path: ""
labels: {}
- address: 172.16.101.194
port: "22"
internal_address: ""
role:
- worker
hostname_override: "worker-1"
user: arash
docker_socket: /var/run/docker.sock
ssh_key: ""
ssh_key_path: ~/.ssh/id_rsa
ssh_cert: ""
ssh_cert_path: ""
labels: {}
- address: 172.16.101.195
port: "22"
internal_address: ""
role:
- worker
hostname_override: "worker-2"
user: arash
docker_socket: /var/run/docker.sock
ssh_key: ""
ssh_key_path: ~/.ssh/id_rsa
ssh_cert: ""
ssh_cert_path: ""
labels: {}
services:
etcd:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
external_urls: []
ca_cert: ""
cert: ""
key: ""
path: ""
snapshot: null
retention: ""
creation: ""
backup_config: null
kube-api:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
service_cluster_ip_range: 10.1.0.0/16
service_node_port_range: ""
pod_security_policy: false
always_pull_images: false
kube-controller:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
cluster_cidr: 10.0.0.0/16
service_cluster_ip_range: 10.1.0.0/16
scheduler:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
kubelet:
image: ""
extra_args:
pods-per-core: 50
max-pods: 1000
volume-plugin-dir: /opt/kubernetes/kubelet-plugins/volume/exec
extra_binds:
- /opt/kubernetes/kubelet-plugins/volume/exec:/opt/kubernetes/kubelet-plugins/volume/exec
extra_env: []
extra_binds: []
extra_env: []
cluster_domain: kube.example.com
infra_container_image: ""
cluster_dns_server: 10.1.0.10
fail_swap_on: false
kubeproxy:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
network:
plugin: calico
options: {}
authentication:
strategy: x509
sans: [172.16.101.196]
webhook: null
addons: ""
addons_include: []
ssh_key_path: ~/.ssh/id_rsa
ssh_cert_path: ""
ssh_agent_auth: false
authorization:
mode: rbac
options: {}
ignore_docker_version: false
kubernetes_version: "v1.15.5-rancher1-2"
private_registries: []
ingress:
provider: ""
options: {}
node_selector: {}
extra_args: {}
cluster_name: "test"
cloud_provider:
name: ""
prefix_path: ""
addon_job_timeout: 0
bastion_host:
address: ""
port: ""
user: ""
ssh_key: ""
ssh_key_path: ""
ssh_cert: ""
ssh_cert_path: ""
monitoring:
provider: ""
options: {}
restore:
restore: false
snapshot_name: ""
dns: null
Results:
$ kubectl describe -n kube-system pod calico-node-d24qb
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/calico-node-d24qb to master-1
Normal Pulled 33m kubelet, master-1 Container image "rancher/calico-cni:v3.8.1" already present on machine
Normal Created 33m kubelet, master-1 Created container upgrade-ipam
Normal Started 33m kubelet, master-1 Started container upgrade-ipam
Normal Pulled 33m kubelet, master-1 Container image "rancher/calico-cni:v3.8.1" already present on machine
Normal Created 33m kubelet, master-1 Created container install-cni
Normal Started 33m kubelet, master-1 Started container install-cni
Normal Created 32m (x4 over 33m) kubelet, master-1 Created container flexvol-driver
Warning Failed 32m (x4 over 33m) kubelet, master-1 Error: failed to start container "flexvol-driver": Error response from daemon: error while creating mount source path '/usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds': mkdir /usr/libexec/kubernetes: read-only file system
Normal Pulled 31m (x5 over 33m) kubelet, master-1 Container image "rancher/calico-pod2daemon-flexvol:v3.8.1" already present on machine
Warning BackOff 3m2s (x140 over 32m) kubelet, master-1 Back-off restarting failed container
When we try to upgrade kubernetes cluster to v1.16.2-rancher1-1 the CNI pods will not use our flexvolume path and because of our Read-Only filesystem on CoreOS, the pod will not be running and Init:CrashLoopBackOff happen
Same problem with a fresh CoreOS cluster, flexvol-driver reports mkdir /usr/libexec/kubernetes: read-only file system
I have same problem here
This is the same issue as in https://github.com/projectcalico/calico/issues/2712.
A workaround is documented in https://docs.projectcalico.org/v3.10/reference/faq#are-the-calico-manifests-compatible-with-coreos.
You have to configure kube-controller with a writable flexvolume plugin directory, for example:
kube-controller:
extra_args:
flex-volume-plugin-dir: "/var/lib/kubelet/volumeplugins/"
And then edit the daemonset accordingly:
- name: flexvol-driver-host
hostPath:
type: DirectoryOrCreate
path: /var/lib/kubelet/volumeplugins/nodeagent~uds
Any progress on fixing this? This makes Calico unusable on RKE.
@superseb this is really annoying and our production cluster is stock at v1.15.7.
any chance to have a workaround? or any tips from rke side?
I ran into this as well, it's the calico daemonset that's built into rancher is hard coding the flexvol path for Kubernetes 1.16+, and our rancher clusters are stuck at 1.15 as well.
@superseb Our clusters are locked in 1.15.X version and this will put us in a bad situation
I've managed to find the workaround on a test cluster.
after upgrading the cluster. I've edited calico-node daemonset
kubectl --kubeconfig kube_config_cluster.yml edit daemonset calico-node -n kube-system
then replaced host path /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds which calico was trying to use with /opt/kubernetes/kubelet-plugins/volume/exec which is writable in coreos and calico started working.
- hostPath:
path: /opt/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
type: DirectoryOrCreate
name: flexvol-driver-host
I'm searching for a way to set this path as option to network plugin section of cluster.yaml file to make this change persistent on next upgrades.
something like:
network:
plugin: calico
options:
- flexvol-driver-path: /opt/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
Any ideas?
This need to be validated in RKE RCs for both v1.0(cut from branch release/v2.3) and v1.1 (cut from branch master)
Available to test with v1.0.5-rc1 and v1.1.0-rc1.
The following validations are done with RKE CLI v1.0.5-rc1 and v1.1.0-rc13
This AMI in AWS is used for nodes: CoreOS-stable-2345.3.0-hvm (ami-08c51fc1b1cc85501)
provision the following two clusters using the designated RKE CLI
# rancher-cluster.yml
# cluster 1
nodes:
- address:
internal_address:
user: core
role: [etcd, controlplane, worker]
ssh_key_path:
network:
plugin: calico
options:
calico_flex_volume_plugin_dir: /opt/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
flannel_backend_type: vxlan
services:
kube-controller:
extra_args:
flex-volume-plugin-dir: /opt/kubernetes/kubelet-plugins/volume/exec/
# rancher-cluster.yml
# cluster 2
nodes:
- address:
internal_address:
user: core
role: [etcd, controlplane, worker]
ssh_key_path:
network:
plugin: canal
options:
canal_flex_volume_plugin_dir: /opt/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
flannel_backend_type: vxlan
services:
kube-controller:
extra_args:
flex-volume-plugin-dir: /opt/kubernetes/kubelet-plugins/volume/exec/
Results:
Have the same problem, after v.1.15 to v1.16 upgrade
https://forums.rancher.com/t/failed-k8s-upgrade-from-v1-15-to-1-16/17096
@mikekuzak it should be fixed in Rancher 2.3.6. See this commit: https://github.com/rancher/rancher/commit/de91c61b02f95c188216e5e211229150bdd5705d
You should be able to configure the Flexvolume path
Most helpful comment
The following validations are done with RKE CLI
v1.0.5-rc1andv1.1.0-rc13This AMI in AWS is used for nodes:
CoreOS-stable-2345.3.0-hvm (ami-08c51fc1b1cc85501)provision the following two clusters using the designated RKE CLI
Results: