For general technical and non-technical questions, we are happy to help you on our Rook.io Slack.
Sounds great, how would I get the required @rook.io email address?
Did you already search the existing open issues for anything similar?
yes
Is this a bug report or feature request?
Deviation from expected behavior:
rook-ceph-agent-nhjmt 0/1 CrashLoopBackOff 3 1m
rook-ceph-agent-sfcdh 0/1 CrashLoopBackOff 3 1m
rook-ceph-agent-vtjbq 0/1 CrashLoopBackOff 3 1m
Expected behavior:
running rook-ceph-agent-xxxxx pods
How to reproduce it (minimal and precise):
cd rook-0.8.3/cluster/examples/kubernetes/ceph
kubectl create -f operator.yaml
Environment:
uname -a): Linux ip-172-xx-xx-xx 4.15.0-24-generic #26-Ubuntu SMP Wed Jun 13 08:44:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linurook version inside of a Rook Pod):kubectl version):Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
ceph health in the [Rook Ceph toolbox][root@rook-ceph-operator-745f756bd8-vsbgw /]# ceph health
2018-10-03 22:05:02.707056 7fd245fc5700 -1 Errors while parsing config file!
2018-10-03 22:05:02.707067 7fd245fc5700 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-10-03 22:05:02.707068 7fd245fc5700 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-10-03 22:05:02.707068 7fd245fc5700 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
@mabushey can you share the logs for one of the crashlooping agents? Also kubectl describe on the pod may be helpful.
$ kubectl -n rook-ceph-system logs rook-ceph-agent-nhjmt
failed to open log file "/var/log/pods/7f041868-c756-11e8-8fa0-064ce4bb95ae/rook-ceph-agent/9.log": open /var/log/pods/7f041868-c756-11e8-8fa0-064ce4bb95ae/rook-ceph-agent/9.log: no such file or directory
$ kubectl -n rook-ceph-system describe pod rook-ceph-agent-nhjmt
Name: rook-ceph-agent-nhjmt
Namespace: rook-ceph-system
Priority: 0
PriorityClassName: <none>
Node: ip-10-132-3-115.us-west-2.compute.internal/10.132.3.115
Start Time: Wed, 03 Oct 2018 21:51:34 +0000
Labels: app=rook-ceph-agent
controller-revision-hash=1106037285
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 10.132.3.115
Controlled By: DaemonSet/rook-ceph-agent
Containers:
rook-ceph-agent:
Container ID: docker://30f328ec772cb6b11cca9a9851490526e4e89b2bda8c926d75dd8e7ac69db0ac
Image: rook/ceph:v0.8.3
Image ID: docker-pullable://rook/ceph@sha256:a53bfec40e05d771b420c060fbd580d5b92f71c9c3e7129323e130cb4b54082a
Port: <none>
Host Port: <none>
Args:
ceph
agent
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: ContainerCannotRun
Message: error while creating mount source path '/usr/libexec/kubernetes/kubelet-plugins/volume/exec': mkdir /usr/libexec/kubernetes: read-only file system
Exit Code: 128
Started: Wed, 03 Oct 2018 22:12:54 +0000
Finished: Wed, 03 Oct 2018 22:12:54 +0000
Ready: False
Restart Count: 9
Environment:
POD_NAMESPACE: rook-ceph-system (v1:metadata.namespace)
NODE_NAME: (v1:spec.nodeName)
Mounts:
/dev from dev (rw)
/flexmnt from flexvolume (rw)
/lib/modules from libmodules (rw)
/sys from sys (rw)
/var/run/secrets/kubernetes.io/serviceaccount from rook-ceph-system-token-g4cvd (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
flexvolume:
Type: HostPath (bare host directory volume)
Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
HostPathType:
dev:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType:
sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType:
libmodules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
rook-ceph-system-token-g4cvd:
Type: Secret (a volume populated by a Secret)
SecretName: rook-ceph-system-token-g4cvd
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 23m (x5 over 24m) kubelet, ip-10-132-3-115.us-west-2.compute.internal Container image "rook/ceph:v0.8.3" already present on machine
Normal Created 23m (x5 over 24m) kubelet, ip-10-132-3-115.us-west-2.compute.internal Created container
Warning Failed 23m (x5 over 24m) kubelet, ip-10-132-3-115.us-west-2.compute.internal Error: failed to start container "rook-ceph-agent": Error response from daemon: error while creating mount source path '/usr/libexec/kubernetes/kubelet-plugins/volume/exec': mkdir /usr/libex
ec/kubernetes: read-only file system
Warning BackOff 4m (x90 over 24m) kubelet, ip-10-132-3-115.us-west-2.compute.internal Back-off restarting failed container
This is the key error:
error while creating mount source path '/usr/libexec/kubernetes/kubelet-plugins/volume/exec': mkdir /usr/libexec/kubernetes: read-only file system
Looks like you need to configure the flex volume as described here
Thank you. I have CoreOS, so the default path is RO.
I added
- name: FLEXVOLUME_DIR_PATH
value: "/var/lib/kubelet/volumeplugins"
to operator.yml and the agents come up now.
There's a section Configuring the Kubernetes kubelet that makes no sense, I don't know what You need to add the flexvolume flag with the path to all nodes鈥檚 kubelet in the Kubernetes cluster. What is an all nodes鈥檚 kubelet?
I found https://github.com/kubernetes/kops/issues/5539:
kops update cluster --state=s3://myco-k8s
add to spec:
kubelet:
volumePluginDirectory: /var/lib/kubelet/volumeplugins
kops update cluster --state=s3://myco-k8s --yes
kops rolling-update cluster --state=s3://myco-k8s
kops rolling-update cluster --state=s3://myco-k8s --yes