Charts: [stable/prometheus-operator] Prometheus unable to restart on AKS

Created on 13 Mar 2020 · 5Comments · Source: helm/charts

Describe the bug
When Prometheus crashes for any reason it is unable to come back up again.

Version of Helm and Kubernetes:

Kubernetes: v1.15.7
Helm: version.BuildInfo{Version:"v3.1.1", GitCommit:"afe70585407b420d0097d07b21c47dc511525ac8", GitTreeState:"clean", GoVersion:"go1.13.8"}

Which chart:
[stable/prometheus-operator]

What happened:
Any crash of the Prometheus pod apparently creates a corruption of the WAL on Prometheus. Once this happens the pod is unable to recover in time and the liveness probes kill it before I can work through the corrupt WAL.

The following options where used to install the chart:
Name: pulse-monitor
Namespace: monitoring
Persistent Volume Below:

  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        metadata:
          name: pvc
        spec:
          storageClassName: managed-premium
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi

Once the pod has crashed and it restarts it tries to start but fails with a liveness probe 503 not available.

See the broken state below 2/3 pods ready.

  $ kubectl get po -n monitoring
  NAME                                                     READY   STATUS    RESTARTS   AGE

  alertmanager-pulse-monitor-prometheus-o-alertmanager-0   2/2     Running   0          4d19h
  prometheus-pulse-monitor-prometheus-o-prometheus-0       2/3     Running   0          3m33s
  pulse-monitor-grafana-f974449-vmbms                      2/2     Running   0          4d19h
  pulse-monitor-kube-state-metrics-84ff56f86b-4rtln        1/1     Running   0          4d19h
  pulse-monitor-prometheus-node-exporter-k2b6j             1/1     Running   0          3d23h
  pulse-monitor-prometheus-node-exporter-lw4bg             1/1     Running   0          4d19h
  pulse-monitor-prometheus-node-exporter-m9b69             1/1     Running   0          4d19h
  pulse-monitor-prometheus-node-exporter-p48fx             1/1     Running   0          4d19h
  pulse-monitor-prometheus-o-operator-57c9cbbdbc-7cgff     2/2     Running   0          4d19h

If describe the broken pod it shows something like this:

Name:           prometheus-pulse-monitor-prometheus-o-prometheus-0
Namespace:      monitoring
Priority:       0
Node:           aks-pulsedev01-14986555-vmss000000/172.15.20.4
Start Time:     Tue, 10 Mar 2020 09:57:09 +1100
Labels:         app=prometheus
                controller-revision-hash=prometheus-pulse-monitor-prometheus-o-prometheus-8d546bfb4
                prometheus=pulse-monitor-prometheus-o-prometheus
                statefulset.kubernetes.io/pod-name=prometheus-pulse-monitor-prometheus-o-prometheus-0
Annotations:    <none>
Status:         Running
IP:             172.15.20.40
IPs:            <none>
Controlled By:  StatefulSet/prometheus-pulse-monitor-prometheus-o-prometheus
Containers:
  prometheus:
    Container ID:  docker://8699fb42be37a1cd5815ff354cf85ae30087842a11d41b27b04e21dbd2b6fc32
    Image:         quay.io/prometheus/prometheus:v2.15.2
    Image ID:      docker-pullable://quay.io/prometheus/prometheus@sha256:914525123cf76a15a6aaeac069fcb445ce8fb125113d1bc5b15854bc1e8b6353
    Port:          9090/TCP
    Host Port:     0/TCP
    Args:
      --web.console.templates=/etc/prometheus/consoles
      --web.console.libraries=/etc/prometheus/console_libraries
      --config.file=/etc/prometheus/config_out/prometheus.env.yaml
      --storage.tsdb.path=/prometheus
      --storage.tsdb.retention.time=10d
      --web.enable-lifecycle
      --storage.tsdb.no-lockfile
      --web.external-url=http://pulse-monitor-prometheus-o-prometheus.monitoring:9090
      --web.route-prefix=/
    State:          Running
      Started:      Tue, 10 Mar 2020 09:59:46 +1100
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:web/-/healthy delay=0s timeout=3s period=5s #success=1 #failure=6
    Readiness:      http-get http://:web/-/ready delay=0s timeout=3s period=5s #success=1 #failure=120
    Environment:    <none>
    Mounts:
      /etc/prometheus/certs from tls-assets (ro)
      /etc/prometheus/config_out from config-out (ro)
      /etc/prometheus/rules/prometheus-pulse-monitor-prometheus-o-prometheus-rulefiles-0 from prometheus-pulse-monitor-prometheus-o-prometheus-rulefiles-0 (rw)
      /prometheus from prometheus-pulse-monitor-prometheus-o-prometheus-db (rw,path="prometheus-db")
      /var/run/secrets/kubernetes.io/serviceaccount from pulse-monitor-prometheus-o-prometheus-token-mwxld (ro)
  prometheus-config-reloader:
    Container ID:  docker://c3dae7bda4ef4dfc6bb186865e0ac558f5d9d23fb4ac30ee603d09b499021620
    Image:         quay.io/coreos/prometheus-config-reloader:v0.36.0
    Image ID:      docker-pullable://quay.io/coreos/prometheus-config-reloader@sha256:74cb2dcf9d8c61f90fb28b82a0358962fbda956a798c762e0ddf1214bb7a9955
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/prometheus-config-reloader
    Args:
      --log-format=logfmt
      --reload-url=http://127.0.0.1:9090/-/reload
      --config-file=/etc/prometheus/config/prometheus.yaml.gz
      --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
    State:          Running
      Started:      Tue, 10 Mar 2020 09:59:54 +1100
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  25Mi
    Requests:
      cpu:     100m
      memory:  25Mi
    Environment:
      POD_NAME:  prometheus-pulse-monitor-prometheus-o-prometheus-0 (v1:metadata.name)
    Mounts:
      /etc/prometheus/config from config (rw)
      /etc/prometheus/config_out from config-out (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from pulse-monitor-prometheus-o-prometheus-token-mwxld (ro)
  rules-configmap-reloader:
    Container ID:  docker://d492c9db712280f7ef4fddf1cf84bdbcf950ee25bd258da4451321d3a4594307
    Image:         quay.io/coreos/configmap-reload:v0.0.1
    Image ID:      docker-pullable://quay.io/coreos/configmap-reload@sha256:e2fd60ff0ae4500a75b80ebaa30e0e7deba9ad107833e8ca53f0047c42c5a057
    Port:          <none>
    Host Port:     <none>
    Args:
      --webhook-url=http://127.0.0.1:9090/-/reload
      --volume-dir=/etc/prometheus/rules/prometheus-pulse-monitor-prometheus-o-prometheus-rulefiles-0
    State:          Running
      Started:      Tue, 10 Mar 2020 09:59:59 +1100
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  25Mi
    Requests:
      cpu:        100m
      memory:     25Mi
    Environment:  <none>
    Mounts:
      /etc/prometheus/rules/prometheus-pulse-monitor-prometheus-o-prometheus-rulefiles-0 from prometheus-pulse-monitor-prometheus-o-prometheus-rulefiles-0 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from pulse-monitor-prometheus-o-prometheus-token-mwxld (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  prometheus-pulse-monitor-prometheus-o-prometheus-db:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  prometheus-pulse-monitor-prometheus-o-prometheus-db-prometheus-pulse-monitor-prometheus-o-prometheus-0
    ReadOnly:   false
  config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-pulse-monitor-prometheus-o-prometheus
    Optional:    false
  tls-assets:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-pulse-monitor-prometheus-o-prometheus-tls-assets
    Optional:    false
  config-out:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  prometheus-pulse-monitor-prometheus-o-prometheus-rulefiles-0:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-pulse-monitor-prometheus-o-prometheus-rulefiles-0
    Optional:  false
  pulse-monitor-prometheus-o-prometheus-token-mwxld:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pulse-monitor-prometheus-o-prometheus-token-mwxld
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                     From                                         Message
  ----     ------                  ----                    ----                                         -------
  Normal   Scheduled               6m15s                   default-scheduler                            Successfully assigned monitoring/prometheus-pulse-monitor-prometheus-o-prometheus-0 to aks-pulsedev01-14986555-vmss000000
  Warning  FailedAttachVolume      6m15s                   attachdetach-controller                      Multi-Attach error for volume "pvc-fff51b37-c203-48d1-8d58-71dfe4ce3880" Volume is already exclusively attached to one node and can't be attached to another
  Normal   SuccessfulAttachVolume  5m3s                    attachdetach-controller                      AttachVolume.Attach succeeded for volume "pvc-fff51b37-c203-48d1-8d58-71dfe4ce3880"
  Warning  FailedMount             4m12s                   kubelet, aks-pulsedev01-14986555-vmss000000  Unable to mount volumes for pod "prometheus-pulse-monitor-prometheus-o-prometheus-0_monitoring(a3265ac5-0337-4b12-8725-9b55b838dcaa)": timeout expired waiting for volumes to attach or mount for pod "monitoring"/"prometheus-pulse-monitor-prometheus-o-prometheus-0". list of unmounted volumes=[prometheus-pulse-monitor-prometheus-o-prometheus-db]. list of unattached volumes=[prometheus-pulse-monitor-prometheus-o-prometheus-db config tls-assets config-out prometheus-pulse-monitor-prometheus-o-prometheus-rulefiles-0 pulse-monitor-prometheus-o-prometheus-token-mwxld]
  Normal   Pulling                 3m56s                   kubelet, aks-pulsedev01-14986555-vmss000000  Pulling image "quay.io/prometheus/prometheus:v2.15.2"
  Normal   Pulled                  3m45s                   kubelet, aks-pulsedev01-14986555-vmss000000  Successfully pulled image "quay.io/prometheus/prometheus:v2.15.2"
  Normal   Created                 3m38s                   kubelet, aks-pulsedev01-14986555-vmss000000  Created container prometheus
  Normal   Started                 3m38s                   kubelet, aks-pulsedev01-14986555-vmss000000  Started container prometheus
  Normal   Pulling                 3m38s                   kubelet, aks-pulsedev01-14986555-vmss000000  Pulling image "quay.io/coreos/prometheus-config-reloader:v0.36.0"
  Normal   Pulled                  3m32s                   kubelet, aks-pulsedev01-14986555-vmss000000  Successfully pulled image "quay.io/coreos/prometheus-config-reloader:v0.36.0"
  Normal   Created                 3m30s                   kubelet, aks-pulsedev01-14986555-vmss000000  Created container prometheus-config-reloader
  Normal   Started                 3m30s                   kubelet, aks-pulsedev01-14986555-vmss000000  Started container prometheus-config-reloader
  Normal   Pulling                 3m30s                   kubelet, aks-pulsedev01-14986555-vmss000000  Pulling image "quay.io/coreos/configmap-reload:v0.0.1"
  Normal   Pulled                  3m26s                   kubelet, aks-pulsedev01-14986555-vmss000000  Successfully pulled image "quay.io/coreos/configmap-reload:v0.0.1"
  Normal   Created                 3m25s                   kubelet, aks-pulsedev01-14986555-vmss000000  Created container rules-configmap-reloader
  Normal   Started                 3m25s                   kubelet, aks-pulsedev01-14986555-vmss000000  Started container rules-configmap-reloader
  Warning  Unhealthy               2m27s (x12 over 3m22s)  kubelet, aks-pulsedev01-14986555-vmss000000  Readiness probe failed: HTTP probe failed with statuscode: 503

When looking at the logs for the prometheus pod I see the following:

cornelius@namphi-ubuntu:/var/crash$ kubectl logs -f -n monitoring prometheus-pulse-monitor-prometheus-o-prometheus-0 prometheus
level=info ts=2020-03-10T01:06:52.353Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.2, branch=HEAD, revision=d9613e5c466c6e9de548c4dae1b9aabf9aaf7c57)"
level=info ts=2020-03-10T01:06:52.353Z caller=main.go:331 build_context="(go=go1.13.5, user=root@688433cf4ff7, date=20200106-14:50:51)"
level=info ts=2020-03-10T01:06:52.353Z caller=main.go:332 host_details="(Linux 4.15.0-1069-azure #74-Ubuntu SMP Fri Feb 7 17:22:24 UTC 2020 x86_64 prometheus-pulse-monitor-prometheus-o-prometheus-0 (none))"
level=info ts=2020-03-10T01:06:52.353Z caller=main.go:333 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-03-10T01:06:52.353Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-03-10T01:06:52.456Z caller=main.go:648 msg="Starting TSDB ..."
level=info ts=2020-03-10T01:06:52.456Z caller=web.go:506 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-03-10T01:06:52.495Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1583637494073 maxt=1583647200000 ulid=01E2WWVRZ0PXGCKYH9CQ7PV6WH
level=info ts=2020-03-10T01:06:52.534Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1583647200000 maxt=1583712000000 ulid=01E2YKW6FP8AP7YKVDXF8KTRES
level=info ts=2020-03-10T01:06:52.573Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1583733600000 maxt=1583740800000 ulid=01E2Z8BEXGZQPHV850CARM95CV
level=info ts=2020-03-10T01:06:52.573Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1583712000000 maxt=1583733600000 ulid=01E2Z8HXVKZV5HJYKA82HYJ2R2
level=info ts=2020-03-10T01:06:52.573Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1583740800000 maxt=1583748000000 ulid=01E2ZF765BDTS0H5C4Y3RSGSAR
level=info ts=2020-03-10T01:07:32.098Z caller=head.go:584 component=tsdb msg="replaying WAL, this may take awhile"
level=info ts=2020-03-10T01:09:28.424Z caller=head.go:608 component=tsdb msg="WAL checkpoint loaded"
level=info ts=2020-03-10T01:09:42.418Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=62 maxSegment=242
level=info ts=2020-03-10T01:09:56.439Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=63 maxSegment=242
level=info ts=2020-03-10T01:10:10.551Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=64 maxSegment=242
level=info ts=2020-03-10T01:10:25.043Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=65 maxSegment=242
level=info ts=2020-03-10T01:10:32.093Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=66 maxSegment=242
level=info ts=2020-03-10T01:10:46.789Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=67 maxSegment=242
level=info ts=2020-03-10T01:11:02.835Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=68 maxSegment=242
level=info ts=2020-03-10T01:11:20.737Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=69 maxSegment=242
level=info ts=2020-03-10T01:11:38.966Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=70 maxSegment=242
level=warn ts=2020-03-10T01:11:46.148Z caller=main.go:494 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2020-03-10T01:11:46.148Z caller=main.go:517 msg="Stopping scrape discovery manager..."
level=info ts=2020-03-10T01:11:46.148Z caller=main.go:531 msg="Stopping notify discovery manager..."
level=info ts=2020-03-10T01:11:46.148Z caller=main.go:553 msg="Stopping scrape manager..."
level=info ts=2020-03-10T01:11:46.148Z caller=main.go:513 msg="Scrape discovery manager stopped"
level=info ts=2020-03-10T01:11:46.148Z caller=main.go:734 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2020-03-10T01:11:46.148Z caller=main.go:527 msg="Notify discovery manager stopped"
level=info ts=2020-03-10T01:11:46.148Z caller=main.go:547 msg="Scrape manager stopped"
level=info ts=2020-03-10T01:11:46.156Z caller=kubernetes.go:190 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2020-03-10T01:11:46.158Z caller=kubernetes.go:190 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2020-03-10T01:11:46.159Z caller=kubernetes.go:190 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2020-03-10T01:11:46.160Z caller=kubernetes.go:190 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2020-03-10T01:11:46.161Z caller=kubernetes.go:190 component="discovery manager notify" discovery=k8s msg="Using pod service account via in-cluster config"

Essentially it looks like after the pod crashes it tries to recover the WAL however it is unable to recover in the time required by the liveness probes and thus it goes into a never ending restart loop.

What you expected to happen:

The pod to restart. I suspect this is due some underlying PVC issue on AKS.

How to reproduce it (as minimally and precisely as possible):
I have reliably reproduced this on several AKS clusters.Note that the pod will need to run out of memory or have some other hard crash for it to work. Killing/deleting the pod does not work.

Anything else we need to know:
To be honest I dont think this is an issue with the Helm install/chart however just reaching out to see if anyone has some form of guidance for me. I am probably going to use a VM based Prometheus installation shortly as this is not workable on AKS as is.

Source