Charts: [stable/prometheus-operator] mkdir /prometheus/wal: permission denied

Created on 13 Mar 2019 · 13Comments · Source: helm/charts

Is this a request for help?

Yes.

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Version of Helm and Kubernetes:
helm version:

Client: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
````
The issue is the same with helm  v2.12.3.
kubernetes version:

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:08:12Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:00:57Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

**Which chart**:
stable/prometheus-operator

**What happened**:
the container `prometheus` does not work:
`prometheus-test-prometheus-operator-prometheus-0       2/3     CrashLoopBackOff   6 `
the logs：

kubectl logs -f prometheus-test-prometheus-operator-prometheus-0 -c prometheus
level=warn ts=2019-03-13T11:06:40.410423052Z caller=main.go:295 deprecation_notice="\"storage.tsdb.retention\" flag is deprecated use \"storage.tsdb.retention.time\" instead."
level=info ts=2019-03-13T11:06:40.410620036Z caller=main.go:302 msg="Starting Prometheus" version="(version=2.7.1, branch=HEAD, revision=62e591f928ddf6b3468308b7ac1de1c63aa7fcf3)"
level=info ts=2019-03-13T11:06:40.410660121Z caller=main.go:303 build_context="(go=go1.11.5, user=root@f9f82868fc43, date=20190131-11:16:59)"
level=info ts=2019-03-13T11:06:40.410706994Z caller=main.go:304 host_details="(Linux 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64 prometheus-test-prometheus-operator-prometheus-0 (none))"
level=info ts=2019-03-13T11:06:40.410747325Z caller=main.go:305 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-03-13T11:06:40.410781187Z caller=main.go:306 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-03-13T11:06:40.413968368Z caller=main.go:620 msg="Starting TSDB ..."
level=info ts=2019-03-13T11:06:40.414104205Z caller=web.go:416 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-03-13T11:06:40.414561626Z caller=main.go:489 msg="Stopping scrape discovery manager..."
level=info ts=2019-03-13T11:06:40.414595497Z caller=main.go:503 msg="Stopping notify discovery manager..."
level=info ts=2019-03-13T11:06:40.414606866Z caller=main.go:525 msg="Stopping scrape manager..."
level=info ts=2019-03-13T11:06:40.414617696Z caller=main.go:499 msg="Notify discovery manager stopped"
level=info ts=2019-03-13T11:06:40.414657126Z caller=main.go:485 msg="Scrape discovery manager stopped"
level=info ts=2019-03-13T11:06:40.41467679Z caller=main.go:519 msg="Scrape manager stopped"
level=info ts=2019-03-13T11:06:40.414699787Z caller=manager.go:736 component="rule manager" msg="Stopping rule manager..."
level=info ts=2019-03-13T11:06:40.414719773Z caller=manager.go:742 component="rule manager" msg="Rule manager stopped"
level=info ts=2019-03-13T11:06:40.414748959Z caller=notifier.go:521 component=notifier msg="Stopping notification manager..."
level=info ts=2019-03-13T11:06:40.414769872Z caller=main.go:679 msg="Notifier manager stopped"
level=error ts=2019-03-13T11:06:40.415381657Z caller=main.go:688 err="opening storage failed: create dir: mkdir /prometheus/wal: permission denied"

**What you expected to happen**:
the pod should be running.


**How to reproduce it** (as minimally and precisely as possible):

helm install stable/prometheus-operator --name test \
--set prometheusOperator.securityContext.fsGroup=2000 \
--set prometheusOperator.securityContext.runAsNonRoot=true \
--set prometheusOperator.securityContext.runAsUser=1000 \
--set prometheus.prometheusSpec.securityContext.fsGroup=2000 \
--set prometheus.prometheusSpec.securityContext.runAsNonRoot=true \
--set prometheus.prometheusSpec.securityContext.runAsUser=1000 \
--set alertmanager.alertmanagerSpec.securityContext.securityContext.fsGroup=2000 \
--set alertmanager.alertmanagerSpec.securityContext.runAsNonRoot=true \
--set alertmanager.alertmanagerSpec.securityContext.runAsUser=1000 \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=1Gi \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.selector.matchLabels.app="prometheus" \
--set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=1Gi \
--set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.selector.matchLabels.app="alertmanager"
``And I created two PersistentVolume, but I tried to usenfsandhostPath` ,it did not work.
Anything else we need to know:
Here is the same issue: https://github.com/coreos/prometheus-operator/issues/966 .
Had it been fixed?

Source

sunyl527

Most helpful comment

@vsliouniaev Thanks for your reply! It helps me! I edit the statefulset with this:

  securityContext:
    fsGroup: 0
    runAsNonRoot: false
    runAsUser: 0

It works!

By the way,it seams that could not use --set XXX.securityContext.runAsUser=0 for some reasons of the helm.

sunyl527 on 21 Mar 2019

👍4

All 13 comments

@justlaputa @vsliouniaev Could anyone help me?I'm sorry to bother you.

sunyl527 on 14 Mar 2019

This seems like an issue not specific to the helm chart. The closest I've come across was an issue that was something to do with the volume provisioner here: https://github.com/coreos/prometheus-operator/issues/2182

vsliouniaev on 20 Mar 2019

@vsliouniaev Thanks for your reply! It helps me! I edit the statefulset with this:

  securityContext:
    fsGroup: 0
    runAsNonRoot: false
    runAsUser: 0

It works!

By the way,it seams that could not use --set XXX.securityContext.runAsUser=0 for some reasons of the helm.

sunyl527 on 21 Mar 2019

👍4

Is it good set it run as root in a prod env?

dongwangdw on 15 Aug 2019

@dongwangdw it is not recommended to do this. From a security point of view this is a bad idea. There are some other setups where even this wouldn't help.

vsliouniaev on 15 Aug 2019

@vsliouniaev
Yes, nowadays many platform run the containers with no root access.
I think this is not a good solution.

I got some issue regarding this, but none gave the root cause and solution.
I saw you got the same problem in issue.
https://github.com/coreos/prometheus-operator/issues/2182.

Did you get it resolved?

dongwangdw on 15 Aug 2019

I make it by adding the securityContext without setting the user root.

I can not understand why should set a securityContext.
What is the root cause?

dongwangdw on 15 Aug 2019

This isn't required, for the last 5 months. Are you having the same issue?

vsliouniaev on 15 Aug 2019

yes, I made it resolved by setting a securityContext.
But I got another issue because of this added securityContext.

dongwangdw on 15 Aug 2019

I added this to my Prometheus definition to avoid running as root:

  ...
  securityContext:
    fsGroup: 489
    runAsUser: 489
  ...
  initContainers:
  - command:
    - chown
    - -R
    - 489:489
    - /prometheus
    image: busybox:1.30
    imagePullPolicy: IfNotPresent
    name: init-chown-data
    resources: {}
    securityContext:
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /prometheus
      name: prometheus-service-prometheus-db
      subPath: prometheus-db

maresja1 on 13 Jan 2020

🚀1

@maresja1
How did you add it? Do you use helm chart stable/prometheus-operator? I see that it's possible to set securityContext via values.yaml, but can't see any option for init containers

jdomag on 5 Feb 2020

👍1

@jdomag I just had this problem and maresja's advice works perfectly. The config he pasted goes in your "prometheus" object. I don't know where the helm chart puts that, but you should have one, as that's the resource that the operator uses to create your prometheus instance. You may need to adjust the name of the volume mount.