Charts: [stable/prometheus-operator] pod has unbound immediate PersistentVolumeClaims

Created on 7 Apr 2020 · 6Comments · Source: helm/charts

Describe the bug
I try to deploy the stable/prometheus-operator chart to a minikube cluster for testing.
Everything starts as expected, except the prometheus container with the specified PV.

Version of Helm and Kubernetes:
Helm:
version.BuildInfo{Version:"v3.1.2", GitCommit:"d878d4d45863e42fd5cff6743294a11d28a9abce", GitTreeState:"dirty", GoVersion:"go1.14.1"}

minikube version: v1.9.0
commit: 48fefd43444d2f8852f527c78f0141b377b1e42a

kubectl:

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"archive", BuildDate:"2020-02-29T16:37:45Z", GoVersion:"go1.14", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:50:46Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}

Which chart:

stable/prometheus

What happened:

The prometheus container is hanging in a CrashLoopBackOff state:

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  <unknown>          default-scheduler  running "VolumeBinding" filter plugin for pod "prometheus-joking-joker-prometheus-op-prometheus-0": pod has unbound immediate PersistentVolumeClaims
  Normal   Scheduled         <unknown>          default-scheduler  Successfully assigned monitoring/prometheus-joking-joker-prometheus-op-prometheus-0 to minikube
  Warning  FailedScheduling  <unknown>          default-scheduler  running "VolumeBinding" filter plugin for pod "prometheus-joking-joker-prometheus-op-prometheus-0": pod has unbound immediate PersistentVolumeClaims

I can't explain this.. when I look into minikube via minikube ssh I see an empty prometheus-db directory in /mnt/sda1/data.

What you expected to happen:

I expected the prometheus-operator to use this empty directory and start up correctly.

How to reproduce it (as minimally and precisely as possible):

First I created the necessary namespace monitoring on a fresh cluster:
kubectl create namespace monitoring

This is my persistent volume:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pvc-prometheus-operator-1586196970-prometheus-0
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 5G
  hostPath:
    path: "/data/"

Applied via kubectl apply -f pv.yml

This is my values.yml:

prometheus:
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          volumeName: pvc-prometheus-operator-1586196970-prometheus-0
          accessModes: ["ReadWriteOnce"]
          storageClassName: manual
          resources:
            requests:
              storage: 5G

applied via helm install joking-joker stable/prometheus-operator --namespace=monitoring -f values.yml

kubectl describe pv output:

Name:            pvc-prometheus-operator-1586196970-prometheus-0
Labels:          <none>
Annotations:     kubectl.kubernetes.io/last-applied-configuration:
                   {"apiVersion":"v1","kind":"PersistentVolume","metadata":{"annotations":{},"name":"pvc-prometheus-operator-1586196970-prometheus-0"},"spec"...
                 pv.kubernetes.io/bound-by-controller: yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    manual
Status:          Bound
Claim:           monitoring/prometheus-joking-joker-prometheus-op-prometheus-db-prometheus-joking-joker-prometheus-op-prometheus-0
Reclaim Policy:  Retain
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        5G
Node Affinity:   <none>
Message:         
Source:
    Type:          HostPath (bare host directory volume)
    Path:          /data/
    HostPathType:  
Events:            <none>

kubectl describe pvc -n monitoring output:

Name:          prometheus-joking-joker-prometheus-op-prometheus-db-prometheus-joking-joker-prometheus-op-prometheus-0
Namespace:     monitoring
StorageClass:  manual
Status:        Bound
Volume:        pvc-prometheus-operator-1586196970-prometheus-0
Labels:        app=prometheus
               prometheus=joking-joker-prometheus-op-prometheus
Annotations:   pv.kubernetes.io/bind-completed: yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      5G
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    prometheus-joking-joker-prometheus-op-prometheus-0
Events:        <none>

Anything else we need to know:

I have tried to use the standard storageclass too and I tried creating a storage class.. everything with the same result.

lifecyclstale

Source

shibumi

All 6 comments

Ok,
I had a closer look into the logs:

❯ k logs -n monitoring po/prometheus-joking-joker-prometheus-op-prometheus-0 -c prometheus
level=info ts=2020-04-07T14:06:44.059Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.2, branch=HEAD, revision=d9613e5c466c6e9de548c4dae1b9aabf9aaf7c57)"
level=info ts=2020-04-07T14:06:44.059Z caller=main.go:331 build_context="(go=go1.13.5, user=root@688433cf4ff7, date=20200106-14:50:51)"
level=info ts=2020-04-07T14:06:44.060Z caller=main.go:332 host_details="(Linux 4.19.107 #1 SMP Thu Mar 26 11:33:10 PDT 2020 x86_64 prometheus-joking-joker-prometheus-op-prometheus-0 (none))"
level=info ts=2020-04-07T14:06:44.060Z caller=main.go:333 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-04-07T14:06:44.060Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2020-04-07T14:06:44.060Z caller=query_logger.go:85 component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7ffcf5a280e3, 0xb, 0x14, 0x2c635a0, 0xc0004d13e0, 0x2c635a0)
    /app/promql/query_logger.go:115 +0x48c
main.main()
    /app/cmd/prometheus/main.go:362 +0x5229

This looks like a permission problem, So I've changed (for testing) the permissions in /data to 777. Now it works as expected, but I don't think this is a good way on solving this, so maybe a noob question:"How would I solve this issue without setting 777 to the directory?" I hoped the helm chart would automatically set the right permissions..

shibumi on 7 Apr 2020

👎1

@shibumi same issue here :-/

A workaround would be much appreciated

BarthV on 7 Apr 2020

it seems a pretty well known issue with several other references .
It seems that we need to start an initContainer that will change volumes permissions using securityContext UID & GID.

but I'm still wondering how to add this container since prometheus-operator doesn't support initContainer feature in its prometheus template :-/

Any advices appreciated.

BarthV on 8 Apr 2020

@BarthV I guess this is related then:
https://github.com/helm/charts/issues/12176

shibumi on 8 Apr 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.