rook-ceph-agent in CrashLoopBackOff

Created on 4 Oct 2018 · 5Comments · Source: rook/rook

For general technical and non-technical questions, we are happy to help you on our Rook.io Slack.
Sounds great, how would I get the required @rook.io email address?
Did you already search the existing open issues for anything similar?
yes

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:
rook-ceph-agent-nhjmt 0/1 CrashLoopBackOff 3 1m
rook-ceph-agent-sfcdh 0/1 CrashLoopBackOff 3 1m
rook-ceph-agent-vtjbq 0/1 CrashLoopBackOff 3 1m

Expected behavior:
running rook-ceph-agent-xxxxx pods

How to reproduce it (minimal and precise):
cd rook-0.8.3/cluster/examples/kubernetes/ceph
kubectl create -f operator.yaml

Environment:

OS (e.g. from /etc/os-release): client - ubuntu 18.04, master/nodes - coreOS
Kernel (e.g. uname -a): Linux ip-172-xx-xx-xx 4.15.0-24-generic #26-Ubuntu SMP Wed Jun 13 08:44:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linu

Cloud provider or hardware configuration: AWS via kops / Terraform
Rook version (use rook version inside of a Rook Pod):
[root@rook-ceph-operator-745f756bd8-vsbgw /]# rook version
rook: v0.8.3
Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): AWS / Kops
Storage backend status (e.g. for Ceph use ceph health in the [Rook Ceph toolbox]
(https://rook.io/docs/Rook/master/toolbox.html)):
Not sure what toolbox means, and the link is a 404.

[root@rook-ceph-operator-745f756bd8-vsbgw /]# ceph health
2018-10-03 22:05:02.707056 7fd245fc5700 -1 Errors while parsing config file!
2018-10-03 22:05:02.707067 7fd245fc5700 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-10-03 22:05:02.707068 7fd245fc5700 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-10-03 22:05:02.707068 7fd245fc5700 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)

Istio version: 1.0.2

ceph

Source

mabushey

All 5 comments

@mabushey can you share the logs for one of the crashlooping agents? Also kubectl describe on the pod may be helpful.

travisn on 4 Oct 2018

👍1

$ kubectl -n rook-ceph-system logs rook-ceph-agent-nhjmt
failed to open log file "/var/log/pods/7f041868-c756-11e8-8fa0-064ce4bb95ae/rook-ceph-agent/9.log": open /var/log/pods/7f041868-c756-11e8-8fa0-064ce4bb95ae/rook-ceph-agent/9.log: no such file or directory

$ kubectl -n rook-ceph-system describe pod rook-ceph-agent-nhjmt

Name:               rook-ceph-agent-nhjmt                                                                                                                                                                                                                                                 
Namespace:          rook-ceph-system                                                                                                                                                                                                                                                      
Priority:           0                                                                                                                                                                                                                                                                     
PriorityClassName:  <none>                                                                                                                                                                                                                                                                
Node:               ip-10-132-3-115.us-west-2.compute.internal/10.132.3.115                                                                                                                                                                                                               
Start Time:         Wed, 03 Oct 2018 21:51:34 +0000                                                                                                                                                                                                                                       
Labels:             app=rook-ceph-agent                                                                                                                                                                                                                                                   
                    controller-revision-hash=1106037285                                                                                                                                                                                                                                   
                    pod-template-generation=1                                                                                                                                                                                                                                             
Annotations:        <none>                                                                                                                                                                                                                                                                
Status:             Running
IP:                 10.132.3.115
Controlled By:      DaemonSet/rook-ceph-agent
Containers:
  rook-ceph-agent:
    Container ID:  docker://30f328ec772cb6b11cca9a9851490526e4e89b2bda8c926d75dd8e7ac69db0ac
    Image:         rook/ceph:v0.8.3
    Image ID:      docker-pullable://rook/ceph@sha256:a53bfec40e05d771b420c060fbd580d5b92f71c9c3e7129323e130cb4b54082a
    Port:          <none>
    Host Port:     <none>
    Args:
      ceph
      agent
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       ContainerCannotRun
      Message:      error while creating mount source path '/usr/libexec/kubernetes/kubelet-plugins/volume/exec': mkdir /usr/libexec/kubernetes: read-only file system
      Exit Code:    128
      Started:      Wed, 03 Oct 2018 22:12:54 +0000
      Finished:     Wed, 03 Oct 2018 22:12:54 +0000
    Ready:          False
    Restart Count:  9
    Environment:
      POD_NAMESPACE:  rook-ceph-system (v1:metadata.namespace)
      NODE_NAME:       (v1:spec.nodeName)
    Mounts:
      /dev from dev (rw)
      /flexmnt from flexvolume (rw)
      /lib/modules from libmodules (rw)
      /sys from sys (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from rook-ceph-system-token-g4cvd (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  flexvolume:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
    HostPathType:
  dev:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:
  sys:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:
  libmodules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  rook-ceph-system-token-g4cvd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rook-ceph-system-token-g4cvd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:
  Type     Reason   Age                From                                                 Message
  ----     ------   ----               ----                                                 -------
  Normal   Pulled   23m (x5 over 24m)  kubelet, ip-10-132-3-115.us-west-2.compute.internal  Container image "rook/ceph:v0.8.3" already present on machine
  Normal   Created  23m (x5 over 24m)  kubelet, ip-10-132-3-115.us-west-2.compute.internal  Created container
  Warning  Failed   23m (x5 over 24m)  kubelet, ip-10-132-3-115.us-west-2.compute.internal  Error: failed to start container "rook-ceph-agent": Error response from daemon: error while creating mount source path '/usr/libexec/kubernetes/kubelet-plugins/volume/exec': mkdir /usr/libex
ec/kubernetes: read-only file system
  Warning  BackOff  4m (x90 over 24m)  kubelet, ip-10-132-3-115.us-west-2.compute.internal  Back-off restarting failed container

mabushey on 4 Oct 2018

👍1

This is the key error:

error while creating mount source path '/usr/libexec/kubernetes/kubelet-plugins/volume/exec': mkdir /usr/libexec/kubernetes: read-only file system

Looks like you need to configure the flex volume as described here

travisn on 4 Oct 2018

👍1

Thank you. I have CoreOS, so the default path is RO.
I added

- name: FLEXVOLUME_DIR_PATH
  value: "/var/lib/kubelet/volumeplugins"

to operator.yml and the agents come up now.

There's a section Configuring the Kubernetes kubelet that makes no sense, I don't know what You need to add the flexvolume flag with the path to all nodes’s kubelet in the Kubernetes cluster. What is an all nodes’s kubelet?

mabushey on 4 Oct 2018

I found https://github.com/kubernetes/kops/issues/5539:

kops update cluster --state=s3://myco-k8s
add to spec:

  kubelet:
    volumePluginDirectory: /var/lib/kubelet/volumeplugins

kops update cluster --state=s3://myco-k8s --yes

kops rolling-update cluster --state=s3://myco-k8s
kops rolling-update cluster --state=s3://myco-k8s --yes

mabushey on 4 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

LoadBalancer IPs rather than ServiceIPs

ryholt · 4Comments

openshift: rook ceph toolbox container is not correctly loaded

kdoustar · 3Comments

Secrets "rook-ceph-csi" not found (fresh install of Ceph 1.1.4)

stephan2012 · 3Comments

Upgrade from v0.7 to v0.8 for nodes with long hostname broken

galexrt · 4Comments

minikube : monitor data filesystem reached concerning levels of available storage space

ksingh7 · 4Comments