rook-ceph-agent in CrashLoopBackOff

Created on 4 Oct 2018  路  5Comments  路  Source: rook/rook

  1. For general technical and non-technical questions, we are happy to help you on our Rook.io Slack.
    Sounds great, how would I get the required @rook.io email address?

  2. Did you already search the existing open issues for anything similar?
    yes

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
rook-ceph-agent-nhjmt 0/1 CrashLoopBackOff 3 1m
rook-ceph-agent-sfcdh 0/1 CrashLoopBackOff 3 1m
rook-ceph-agent-vtjbq 0/1 CrashLoopBackOff 3 1m

Expected behavior:
running rook-ceph-agent-xxxxx pods

How to reproduce it (minimal and precise):
cd rook-0.8.3/cluster/examples/kubernetes/ceph
kubectl create -f operator.yaml

Environment:

  • OS (e.g. from /etc/os-release): client - ubuntu 18.04, master/nodes - coreOS
  • Kernel (e.g. uname -a): Linux ip-172-xx-xx-xx 4.15.0-24-generic #26-Ubuntu SMP Wed Jun 13 08:44:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linu
  • Cloud provider or hardware configuration: AWS via kops / Terraform
  • Rook version (use rook version inside of a Rook Pod):
    [root@rook-ceph-operator-745f756bd8-vsbgw /]# rook version
    rook: v0.8.3
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): AWS / Kops
  • Storage backend status (e.g. for Ceph use ceph health in the [Rook Ceph toolbox]
    (https://rook.io/docs/Rook/master/toolbox.html)):
    Not sure what toolbox means, and the link is a 404.
[root@rook-ceph-operator-745f756bd8-vsbgw /]# ceph health
2018-10-03 22:05:02.707056 7fd245fc5700 -1 Errors while parsing config file!
2018-10-03 22:05:02.707067 7fd245fc5700 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-10-03 22:05:02.707068 7fd245fc5700 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-10-03 22:05:02.707068 7fd245fc5700 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
  • Istio version: 1.0.2
ceph

All 5 comments

@mabushey can you share the logs for one of the crashlooping agents? Also kubectl describe on the pod may be helpful.

$ kubectl -n rook-ceph-system logs rook-ceph-agent-nhjmt
failed to open log file "/var/log/pods/7f041868-c756-11e8-8fa0-064ce4bb95ae/rook-ceph-agent/9.log": open /var/log/pods/7f041868-c756-11e8-8fa0-064ce4bb95ae/rook-ceph-agent/9.log: no such file or directory

$ kubectl -n rook-ceph-system describe pod rook-ceph-agent-nhjmt

Name:               rook-ceph-agent-nhjmt                                                                                                                                                                                                                                                 
Namespace:          rook-ceph-system                                                                                                                                                                                                                                                      
Priority:           0                                                                                                                                                                                                                                                                     
PriorityClassName:  <none>                                                                                                                                                                                                                                                                
Node:               ip-10-132-3-115.us-west-2.compute.internal/10.132.3.115                                                                                                                                                                                                               
Start Time:         Wed, 03 Oct 2018 21:51:34 +0000                                                                                                                                                                                                                                       
Labels:             app=rook-ceph-agent                                                                                                                                                                                                                                                   
                    controller-revision-hash=1106037285                                                                                                                                                                                                                                   
                    pod-template-generation=1                                                                                                                                                                                                                                             
Annotations:        <none>                                                                                                                                                                                                                                                                
Status:             Running
IP:                 10.132.3.115
Controlled By:      DaemonSet/rook-ceph-agent
Containers:
  rook-ceph-agent:
    Container ID:  docker://30f328ec772cb6b11cca9a9851490526e4e89b2bda8c926d75dd8e7ac69db0ac
    Image:         rook/ceph:v0.8.3
    Image ID:      docker-pullable://rook/ceph@sha256:a53bfec40e05d771b420c060fbd580d5b92f71c9c3e7129323e130cb4b54082a
    Port:          <none>
    Host Port:     <none>
    Args:
      ceph
      agent
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       ContainerCannotRun
      Message:      error while creating mount source path '/usr/libexec/kubernetes/kubelet-plugins/volume/exec': mkdir /usr/libexec/kubernetes: read-only file system
      Exit Code:    128
      Started:      Wed, 03 Oct 2018 22:12:54 +0000
      Finished:     Wed, 03 Oct 2018 22:12:54 +0000
    Ready:          False
    Restart Count:  9
    Environment:
      POD_NAMESPACE:  rook-ceph-system (v1:metadata.namespace)
      NODE_NAME:       (v1:spec.nodeName)
    Mounts:
      /dev from dev (rw)
      /flexmnt from flexvolume (rw)
      /lib/modules from libmodules (rw)
      /sys from sys (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from rook-ceph-system-token-g4cvd (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  flexvolume:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
    HostPathType:
  dev:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:
  sys:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:
  libmodules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  rook-ceph-system-token-g4cvd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rook-ceph-system-token-g4cvd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:
  Type     Reason   Age                From                                                 Message
  ----     ------   ----               ----                                                 -------
  Normal   Pulled   23m (x5 over 24m)  kubelet, ip-10-132-3-115.us-west-2.compute.internal  Container image "rook/ceph:v0.8.3" already present on machine
  Normal   Created  23m (x5 over 24m)  kubelet, ip-10-132-3-115.us-west-2.compute.internal  Created container
  Warning  Failed   23m (x5 over 24m)  kubelet, ip-10-132-3-115.us-west-2.compute.internal  Error: failed to start container "rook-ceph-agent": Error response from daemon: error while creating mount source path '/usr/libexec/kubernetes/kubelet-plugins/volume/exec': mkdir /usr/libex
ec/kubernetes: read-only file system
  Warning  BackOff  4m (x90 over 24m)  kubelet, ip-10-132-3-115.us-west-2.compute.internal  Back-off restarting failed container

This is the key error:

error while creating mount source path '/usr/libexec/kubernetes/kubelet-plugins/volume/exec': mkdir /usr/libexec/kubernetes: read-only file system

Looks like you need to configure the flex volume as described here

Thank you. I have CoreOS, so the default path is RO.
I added

- name: FLEXVOLUME_DIR_PATH
  value: "/var/lib/kubelet/volumeplugins"

to operator.yml and the agents come up now.

There's a section Configuring the Kubernetes kubelet that makes no sense, I don't know what You need to add the flexvolume flag with the path to all nodes鈥檚 kubelet in the Kubernetes cluster. What is an all nodes鈥檚 kubelet?

I found https://github.com/kubernetes/kops/issues/5539:

kops update cluster --state=s3://myco-k8s
add to spec:

  kubelet:
    volumePluginDirectory: /var/lib/kubelet/volumeplugins

kops update cluster --state=s3://myco-k8s --yes

kops rolling-update cluster --state=s3://myco-k8s
kops rolling-update cluster --state=s3://myco-k8s --yes

Was this page helpful?
0 / 5 - 0 ratings