Prometheus-operator: Using helm to install Prometheus tooltip errors

Created on 30 Oct 2017 · 48Comments · Source: prometheus-operator/prometheus-operator

[root@k8s-master01 prometheus-operator]# helm install --name kube-prometheus --namespace=monitoring helm/kube-prometheus
Error: found in requirements.yaml, but missing in charts/ directory: alertmanager, prometheus
[root@k8s-master01 prometheus-operator]# pwd
/root/prometheus-operator
[root@k8s-master01 prometheus-operator]# ls helm/
alertmanager grafana kube-prometheus prometheus prometheus-operator

Error as above, I don't know where there is a problem. I have the directory below helm directory, but still prompted errors,help me...

Source

ghost

Most helpful comment

Can you fill a new issue about that, we should add a warning about the delete --purge behaviour.

Starting with Kubernetes 1.8 and v0.14.0 of the Prometheus Operator, this problem will disappear as all resources have the appropriate OwnerReference meaning that when helm delete --purge is run, then the objects will be garbage collected. (pre Kubernetes 1.8 OwnerReferences didn't do anything for CRDs, but starting with 1.8 it's fixed)

brancz on 23 Nov 2017

👍2

All 48 comments

I have this error too.

sokyrko on 7 Nov 2017

There are some known issues with the helm charts, I recommend reading through this article: https://itnext.io/kubernetes-monitoring-with-prometheus-in-15-minutes-8e54d1de2e13

brancz on 7 Nov 2017

@brancz Following that article doesn't solve the problem mentioned above. I'm getting this also

rawkode on 10 Nov 2017

@gianrubio ^ I think you have the most experience, could you maybe look into this?

brancz on 10 Nov 2017

comment the lines 2 to 9 on kube-prometheus/requirements.yaml and run the command again. This will be fixed after #676 merged.

gianrubio on 10 Nov 2017

Also, helm upgrade --install --namespace=monitoring --set rbacEnable=false kube-prometheus helm/kube-prometheus

Fails, as the required Helm charts don't get passed rbacEnable=false. Is there a way to pass this down, or does this essential make the flag useless?

rawkode on 10 Nov 2017

@rawkode Use --set exporter-kube-state.rbacEnable=false.

blackbaud-brandonstirnaman on 11 Nov 2017

👍1

@rawkode #776 fixed the helm dependencies.

gianrubio on 22 Nov 2017

@gianrubio Yeah, so the Helm install works now - but sadly none of the kube-exporters ever enter a healthy state and there are no endpoints for prometheus or alertmanager services.

rawkode on 23 Nov 2017

# values.yaml
prometheus:
  rbacEnable: false

exporter-kube-state:
  rbacEnable: false

# CI
- helm init --client-only
- helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/
- helm upgrade --install --namespace=monitoring -f values.yaml kube-prometheus coreos/kube-prometheus

Am I just missing configuration, am I just being silly? I'm really keen to get this working and willing to help in anyway, if it's not just me being daft.

rawkode on 23 Nov 2017

@rawkode can you elaborate what you mean by "a healthy state"? Are the pods crashlooping?

brancz on 23 Nov 2017

NAME                                                   READY     STATUS              RESTARTS   AGE
alertmanager-alertmanager-0                            0/2       ContainerCreating   0          12h
grafana-grafana-5fdf676c68-fftsx                       2/2       Running             0          15h
kube-prometheus-exporter-kube-state-659754647d-fnc5w   0/2       ContainerCreating   0          10s
kube-prometheus-exporter-node-4jr7v                    0/1       CrashLoopBackOff    1          10s
kube-prometheus-exporter-node-927lq                    0/1       CrashLoopBackOff    1          10s
kube-prometheus-exporter-node-9zn72                    0/1       CrashLoopBackOff    1          10s
kube-prometheus-exporter-node-rkc5v                    0/1       Error               1          10s
kube-prometheus-exporter-node-xsjdw                    0/1       CrashLoopBackOff    1          10s

Logs from a pod

flag provided but not defined: -path.procfs

rawkode on 23 Nov 2017

Can you check the logs of one of those containers? Sounds like something is actually mis-configured.

brancz on 23 Nov 2017

The only error in the logs is the one mentioned above, every exporter is the same

rawkode on 23 Nov 2017

I believe that was introduced by this PR: https://github.com/coreos/prometheus-operator/pull/754

I think we may have missed the v in the tag.

brancz on 23 Nov 2017

AlertManager logs also

  Warning  FailedMount            28s (x10 over 4m)  kubelet, gke-production-preemptible-pool-2aeb813f-28kc  MountVolume.SetUp failed for volume "config-volume" : secrets "alertmanager-alertmanager" not found
  Warning  FailedMount            20s (x2 over 2m)   kubelet, gke-production-preemptible-pool-2aeb813f-28kc  Unable to mount volumes for pod "alertmanager-alertmanager-0_monitoring(9050459e-d032-11e7-8074-42010a84003b)": timeout expired waiting for volumes to attach/mount for pod "monitoring"/"alertmanager-alertmanager-0". list of unattached/unmounted volumes=[config-volume]
  Warning  FailedSync             20s (x2 over 2m)   kubelet, gke-production-preemptible-pool-2aeb813f-28kc  Error syncing pod

rawkode on 23 Nov 2017

That one looks like the alertmanager configuration is not created. @gianrubio are you aware whether the charts create the alertmanager config Secret?

brancz on 23 Nov 2017

@gianrubio are you aware whether the charts create the alertmanager config Secret?
Not sure if the ./helm upgrade command can handle the change, just installed a fresh chart using helm install and it created the secret. However the node-exporter are in a crashLoopbackOff state.

I'll submit a PR to fix that, #754 broked that.

 ./kubectl logs -n monitoring kube-prometheus-exporter-node-70n1w
flag provided but not defined: -path.procfs
Usage of /bin/node_exporter:

gianrubio on 23 Nov 2017

PR submitted

https://github.com/coreos/prometheus-operator/pull/781

rawkode on 23 Nov 2017

Any idea how to resolve the secret problem? It looks like the operator should be creating this

rawkode on 23 Nov 2017

Regarding the Secret problem, can you check whether a Secret called alertmanager-<alertmanager-object-name> exists in the same namespace as the Alertmanager object?

brancz on 23 Nov 2017

It does

NAME                           TYPE                                  DATA      AGE
alertmanager-kube-prometheus   Opaque                                1         36m

but the pod is looking for alertmanager-alertmanager

rawkode on 23 Nov 2017

That's curious, from the templates it looks like the Alertmanager name, should be the same as the helm release name. And the Secret also seems to be correctly templated.

brancz on 23 Nov 2017

I don't know if it's useful, but this is the output of describe alertmanager kube-prometheus

Name:         kube-prometheus
Namespace:    monitoring
Labels:       alertmanager=kube-prometheus
              app=alertmanager
              chart=alertmanager-0.0.5
              heritage=Tiller
              release=kube-prometheus
Annotations:  <none>
API Version:  monitoring.coreos.com/v1
Kind:         Alertmanager
Metadata:
  Cluster Name:                   
  Creation Timestamp:             2017-11-23T09:45:56Z
  Deletion Grace Period Seconds:  <nil>
  Deletion Timestamp:             <nil>
  Resource Version:               12855941
  Self Link:                      /apis/monitoring.coreos.com/v1/namespaces/monitoring/alertmanagers/kube-prometheus
  UID:                            1abb7e56-d033-11e7-8074-42010a84003b
Spec:
  Base Image:    quay.io/prometheus/alertmanager
  External URL:  http://kube-prometheus-alertmanager.monitoring:9093
  Paused:        false
  Replicas:      1
  Resources:
  Version:  v0.5.1
Events:     <none>

rawkode on 23 Nov 2017

StatefulSet config looks incorrect:

Name:               alertmanager-alertmanager
Namespace:          monitoring
CreationTimestamp:  Sat, 18 Nov 2017 18:20:23 +0000
Selector:           alertmanager=alertmanager,app=alertmanager
Labels:             alertmanager=alertmanager
                    app=alertmanager
                    chart=alertmanager-0.0.4
                    heritage=Tiller
                    release=alertmanager
Annotations:        <none>
Replicas:           1 desired | 1 total
Pods Status:        0 Running / 1 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  alertmanager=alertmanager
           app=alertmanager
  Containers:
   alertmanager:
    Image:  quay.io/prometheus/alertmanager:v0.7.1
    Ports:  9093/TCP, 6783/TCP
    Args:
      -config.file=/etc/alertmanager/config/alertmanager.yaml
      -web.listen-address=:9093
      -mesh.listen-address=:6783
      -storage.path=/etc/alertmanager/data
      -web.external-url=http://alertmanager-alertmanager.monitoring:9093
      -web.route-prefix=/
      -mesh.peer=alertmanager-alertmanager-0.alertmanager-operated.monitoring.svc
    Requests:
      memory:     200Mi
    Liveness:     http-get http://:web/api/v1/status delay=0s timeout=3s period=10s #success=1 #failure=10
    Readiness:    http-get http://:web/api/v1/status delay=3s timeout=3s period=5s #success=1 #failure=10
    Environment:  <none>
    Mounts:
      /etc/alertmanager/config from config-volume (rw)
      /var/alertmanager/data from alertmanager-alertmanager-db (rw)
   config-reloader:
    Image:  quay.io/coreos/configmap-reload:v0.0.1
    Port:   <none>
    Args:
      -webhook-url=http://localhost:9093/-/reload
      -volume-dir=/etc/alertmanager/config
    Limits:
      cpu:        5m
      memory:     10Mi
    Environment:  <none>
    Mounts:
      /etc/alertmanager/config from config-volume (ro)
  Volumes:
   config-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  alertmanager-alertmanager
    Optional:    false
   alertmanager-alertmanager-db:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
Volume Claims:  <none>
Events:         <none>

rawkode on 23 Nov 2017

There seems to be no concept of using the release name within the alertmanager, or prometheus, operators:

https://github.com/coreos/prometheus-operator/blob/master/pkg/alertmanager/operator.go#L283

rawkode on 23 Nov 2017

There seems to be no concept of using the release name within the alertmanager, or prometheus, operators:

https://github.com/coreos/prometheus-operator/blob/master/pkg/alertmanager/operator.go#L283

That code is unrelated. The Prometheus Operator just uses the name you give the objects, which is the release name as that's what's templated via the helm charts.

brancz on 23 Nov 2017

@rawkode I just reproduced all the steps and I can't reproduce the alertmanager error. Can you do a clean install and see if this solve your issue? I'm working on to have a better helm integration, it will take few days.

gianrubio on 23 Nov 2017

@gianrubio What values.yaml did you use?

rawkode on 23 Nov 2017

@gianrubio OK, I worked out why I have this problem. The operators / statefulsets aren't removed during helm delete --purge, so I assume they were a hangover from before the charts repo

The two -operated services also need removed manually

rawkode on 23 Nov 2017

The operators / statefulsets aren't removed during helm delete --purge, so I assume they were a hangover from before the charts repo

Can you fill a new issue about that, we should add a warning about the delete --purge behaviour.

gianrubio on 23 Nov 2017

Can you fill a new issue about that, we should add a warning about the delete --purge behaviour.

brancz on 23 Nov 2017

👍2

I've actually purged everything and deployed again. I've still got the node-exporters failing until the repo is updated.

I can't seem to browse to alertmanager using http://localhost:8001/api/v1/proxy/namespaces/monitoring/services/kube-prometheus-alertmanager:9093

Hopefully this is finally me being silly :joy:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "no endpoints available for service \"kube-prometheus-alertmanager\"",
  "reason": "ServiceUnavailable",
  "code": 503http://localhost:8001/api/v1/proxy/namespaces/monitoring/services/kube-prometheus-alertmanager:9093
}

rawkode on 23 Nov 2017

I've actually purged everything and deployed again. I've still got the node-exporters failing until the repo is updated.

When the helm charts are merged to master they are not automatically updated, I'm also working on this :)

gianrubio on 23 Nov 2017

Any thoughts on why I can't browse to the service?

rawkode on 23 Nov 2017

Not sure, I realise that kube-prometheus is creating a new service that it's already generated by alertmanager (I promisse to fix all this issue).

Could you list the service and endpoints at this namespace?
kubect get svc,ep -n monitoring.

gianrubio on 23 Nov 2017

Thanks for your continued help with this, @gianrubio - it's appreciated a whole lot and I'm happy to help with the Helm charts, if possible:

NAME                                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
svc/kube-prometheus-alertmanager          ClusterIP   10.55.240.229   <none>        9093/TCP   1h
svc/kube-prometheus-exporter-kube-state   ClusterIP   10.55.243.125   <none>        80/TCP     1h
svc/kube-prometheus-exporter-node         ClusterIP   10.55.240.218   <none>        9100/TCP   1h
svc/kube-prometheus-prometheus            ClusterIP   10.55.250.2     <none>        9090/TCP   1h

NAME                                     ENDPOINTS        AGE
ep/kube-prometheus-alertmanager          <none>           1h
ep/kube-prometheus-exporter-kube-state   10.52.6.5:8080   1h
ep/kube-prometheus-exporter-node                          1h
ep/kube-prometheus-prometheus            <none>           1h

rawkode on 23 Nov 2017

I missed one step on the helm packaging, PR will come at the end of the day ;)
Contribution to the helm charts are always welcome.

gianrubio on 23 Nov 2017

Thanks :+1:

rawkode on 23 Nov 2017

Curious: are these charts supposed to spin up Grafana also?

rawkode on 23 Nov 2017

@rawkode I don't think it is installed with the kube-prometheus chart today, I believe the Grafana one is separate currently. Feel free to change that :slightly_smiling_face: .

brancz on 23 Nov 2017

@gianrubio Any update on the node exporters?

rawkode on 27 Nov 2017

It'll be update after #784 be merged

gianrubio on 27 Nov 2017

Hi. I Try to install prometeus operator using helm, step by step using this article https://itnext.io/kubernetes-monitoring-with-prometheus-in-15-minutes-8e54d1de2e13.

kubectl create ns monitoring git clone [email protected]:coreos/prometheus-operator.git cd prometheus-operator - This fine.

helm install --name prometheus-operator --set rbacEnable=true --namespace=monitoring helm/prometheus-operator - This fine.

helm install --name prometheus --set serviceMonitorsSelector.app=prometheus --set ruleSelector.app=prometheus --namespace=monitoring helm/prometheus helm install --name alertmanager --namespace=monitoring helm/alertmanager helm install --name grafana --namespace=monitoring helm/grafana - This fine.

helm install --name kube-prometheus --namespace=monitoring helm/kube-prometheus - And here I have error.

Error: found in requirements.yaml, but missing in charts/ directory: alertmanager, prometheus, exporter-kube-api, exporter-kube-controller-manager, exporter-kube-dns, exporter-kube-etcd, exporter-kube-scheduler, exporter-kube-state, exporter-kubelets, exporter-kubernetes, exporter-node, grafana
Maybe I missed some step? Or some step missed in article? Or this is kind of issue?

kubectl get pods --namespace monitoring NAME READY STATUS RESTARTS AGE alertmanager-alertmanager-0 2/2 Running 0 10m grafana-grafana-5fdf676c68-mzrgm 2/2 Running 0 10m prometheus-operator-prometheus-operator-5d86ffb9bf-zw8rm 1/1 Running 0 11m prometheus-prometheus-0 2/2 Running 0 10m

All previous step seems work fine.

xpowerman on 29 Nov 2017

@xpowerman please see the comment https://github.com/coreos/prometheus-operator/issues/717#issuecomment-343531797
I'm still working to fix all this helm issues

gianrubio on 29 Nov 2017

Still same problem for me.

I've just tried to install prometheus-operator from this repo master:
https://github.com/coreos/prometheus-operator

On a GCLoud Kubernetes cluster with autoscaling min nodes 4 max 6.

commands on the new created cluster:
helm init
kubectl create clusterrolebinding permissive-binding --clusterrole=cluster-admin --user=admin --user=kubelet --group=system:serviceaccounts;
kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin [email protected]
(after cloning repo)
cd prometheus-operator
kubectl apply -f bundle.yaml
kubectl create namespace monitoring
cd helm
helm install --name prometheus-operator --namespace monitoring --set exporter-kube-state.rbacEnable=false ./kube-prometheus/

And the result still the same as @rawkode

kubectl get pods --namespace=monitoring

NAME READY STATUS RESTARTS AGE
alertmanager-alertmanager-0 0/2 ContainerCreating 0 3s
alertmanager-kube-prometheus-0 1/2 Running 0 3s
dashboard-kubernetes-dashboard-7ffc6f67d4-z4bzt 1/1 Running 0 33m
grafana-grafana-5fdf676c68-gtw67 2/2 Running 0 11m
kube-prometheus-exporter-kube-state-79886f7544-6jnmq 2/2 Running 0 10m
kube-prometheus-exporter-node-b88xm 0/1 CrashLoopBackOff 7 11m
kube-prometheus-exporter-node-bv6jt 0/1 CrashLoopBackOff 7 11m
kube-prometheus-exporter-node-jz9nj 0/1 CrashLoopBackOff 6 10m
kube-prometheus-exporter-node-npzss 0/1 CrashLoopBackOff 7 11m
kube-prometheus-exporter-node-xbfhz 0/1 CrashLoopBackOff 6 11m
kube-prometheus-grafana-5cb5c9cdb8-9zkwr 2/2 Running 0 11m
prometheus-kube-prometheus-0 2/2 Running 0 11m
prometheus-operator-prometheus-operator-5d86ffb9bf-fn4c2 1/1 Running 0 12m
prometheus-prometheus-0 2/2 Running 0 11m

kubectl describe pod --namespace monitoring kube-prometheus-exporter-node-b88xm

Name: kube-prometheus-exporter-node-b88xm
Namespace: monitoring
Node: gke-xussof-cluster-default-pool-9c62e8bb-f6lr/10.128.0.3
Start Time: Mon, 11 Dec 2017 14:29:00 +0100
Labels: app=kube-prometheus-exporter-node
component=node-exporter
controller-revision-hash=204858866
pod-template-generation=1
release=kube-prometheus
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"monitoring","name":"kube-prometheus-exporter-node","uid":"3f901855-de77-11e7-90c4-...
Status: Running
IP: 10.128.0.3
Created By: DaemonSet/kube-prometheus-exporter-node
Controlled By: DaemonSet/kube-prometheus-exporter-node
Containers:
node-exporter:
Container ID: docker://33c1bc317cd70c786e0f5576916ffc43f316673ec829bc7264f08c97c6007353
Image: quay.io/prometheus/node-exporter:v0.14.0
Image ID: docker-pullable://quay.io/prometheus/node-exporter@sha256:b376a1b4f6734ed610b448603bc0560106c2e601471b49f72dda5bd40da095dd
Port: 9100/TCP
Args:
--path.procfs=/host/proc
--path.sysfs=/host/sys
--web.listen-address=0.0.0.0:9100
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Mon, 11 Dec 2017 14:39:46 +0100
Finished: Mon, 11 Dec 2017 14:39:46 +0100
Ready: False
Restart Count: 7
Limits:
cpu: 200m
memory: 50Mi
Requests:
cpu: 100m
memory: 30Mi
Environment:
Mounts:
/host/proc from proc (ro)
/host/sys from sys (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mhvd8 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
proc:
Type: HostPath (bare host directory volume)
Path: /proc
sys:
Type: HostPath (bare host directory volume)
Path: /sys
default-token-mhvd8:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-mhvd8
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: :NoSchedule
node.alpha.kubernetes.io/notReady:NoExecute
node.alpha.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulMountVolume 11m kubelet, gke-xussof-cluster-default-pool-9c62e8bb-f6lr MountVolume.SetUp succeeded for volume "sys"
Normal SuccessfulMountVolume 11m kubelet, gke-xussof-cluster-default-pool-9c62e8bb-f6lr MountVolume.SetUp succeeded for volume "proc"
Normal SuccessfulMountVolume 11m kubelet, gke-xussof-cluster-default-pool-9c62e8bb-f6lr MountVolume.SetUp succeeded for volume "default-token-mhvd8"
Normal Pulled 11m (x4 over 11m) kubelet, gke-xussof-cluster-default-pool-9c62e8bb-f6lr Container image "quay.io/prometheus/node-exporter:v0.14.0" already present on machine
Normal Created 11m (x4 over 11m) kubelet, gke-xussof-cluster-default-pool-9c62e8bb-f6lr Created container
Normal Started 11m (x4 over 11m) kubelet, gke-xussof-cluster-default-pool-9c62e8bb-f6lr Started container
Warning BackOff 11m (x5 over 11m) kubelet, gke-xussof-cluster-default-pool-9c62e8bb-f6lr Back-off restarting failed container
Warning FailedSync 1m (x48 over 11m) kubelet, gke-xussof-cluster-default-pool-9c62e8bb-f6lr Error syncing pod

xussof on 11 Dec 2017

@xussof we're waiting #784 be merged, for now you could edit the node-exporter daemonset and apply the changes #781

gianrubio on 11 Dec 2017

👍1

Anybody able to come up with solution on this..?

helm install bitnami/kube-prometheus prometheus --values /tmp/prometheus.values --namespace monitoring
Error: failed to download "prometheus" (hint: runninghelm repo updatemay help)

Same problem