Alertmanager: Unable to create data directory

Created on 22 Jun 2018  路  9Comments  路  Source: prometheus/alertmanager

What did you do?
Upgrade alertmanager from 0.14.0 to 0.15.0 in kubernetes
The only change in deployment manifest is .spec.template.spec.containers[0].image

What did you see instead? Under which circumstances?

level=info ts=2018-06-22T16:43:31.7236863Z caller=main.go:174 msg="Starting Alertmanager" version="(version=0.15.0, branch=HEAD, revision=462c969d85cf1a473587754d55e4a3c4a2abc63c)"
level=info ts=2018-06-22T16:43:31.723793048Z caller=main.go:175 build_context="(go=go1.10.3, user=root@bec9939eb862, date=20180622-11:58:41)"
level=error ts=2018-06-22T16:43:31.723826343Z caller=main.go:179 msg="Unable to create data directory" err="mkdir data/: read-only file system"

Environment
Kubernetes-1.10.5 on bare metal (ubuntu xenial)
docker-ce-17.03.2

  • System information:

    Linux 4.13.0-43-generic x86_64

  • Alertmanager version:

    version=0.15.0, branch=HEAD, revision=462c969d85cf1a473587754d55e4a3c4a2abc63c

  • Prometheus version:

    prometheus, version 2.3.1 (branch: HEAD, revision: 188ca45bd85ce843071e768d855722a9d9dabe03)

  • Alertmanager configuration file:

global:
  smtp_from: '[email protected]'
  smtp_smarthost: 'smarthost'
  smtp_auth_username: 'username'
  smtp_auth_password: 'password'

# The directory from which notification templates are read.
templates:
- '/etc/alertmanager/templates/*.tmpl'

# The root route on which each incoming alert enters.
route:
  # The labels by which incoming alerts are grouped together. For example,
  # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  # be batched into a single group.
  group_by: ['alertname', 'cluster', 'service']
  #, 'alertname', 'cluster', 'service']

  # When a new group of alerts is created by an incoming alert, wait at
  # least 'group_wait' to send the initial notification.
  # This way ensures that you get multiple alerts for the same group that start
  # firing shortly after another are batched together on the first
  # notification.
  group_wait: 30s

  # When the first notification was sent, wait 'group_interval' to send a betch
  # of new alerts that started firing for that group.
  group_interval: 5m

  # If an alert has successfully been sent, wait 'repeat_interval' to
  # resend them.
  repeat_interval: 3h

  # A default receiver
  receiver: martian-ops

  # All the above attributes are inherited by all child routes and can
  # overwritten on each.

  # The child route trees.
  routes:
  # This routes performs a regular expression match on alert labels to
  # catch alerts that are related to a list of services.
  - match_re:
      service: ^(.*)$
      severity: ^(warning|critical)$
    receiver: martian-ops
    continue: true
  - match_re:
      service: ^(.*)$
      severity: info
    receiver: martian-ops-info
    continue: true
    # The service has a sub-route for critical alerts, any alerts
    # that do not match, i.e. severity != critical, fall-back to the
    # parent node and are sent to 'martian-ops'
    # routes:
  - match:
      severity: critical
    receiver: telephone-ops
    continue: true

  - receiver: 'amplifr-slack'
    match_re:
      cluster: amplifr
      severity: ^(warning|critical)$
    continue: true

  - receiver: 'amplifr-email'
    match_re:
      cluster: amplifr
      severity: ^(warning|critical)$
    continue: true

  - receiver: 'martian-alerta'
    match_re:
      cluster: amplifr
      severity: ^(warning|critical)$
    continue: true

# Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is
# already critical.
inhibit_rules:
- source_match:
    severity: 'critical'
  target_match:
    severity: 'warning'
  equal: ['alertname', 'cluster', 'component', 'service']
- source_match:
    severity: 'warning'
  target_match:
    severity: 'info'
  equal: ['alertname', 'cluster', 'component', 'service']

receivers:
- name: 'martian-ops'
  email_configs:
  - to: '[email protected]'
    send_resolved: true
  slack_configs:
    - api_url: 'https://hooks.slack.com/services/some/hook'
      channel: '#admin-alerts'
      username: 'amplifr-prometheus'
      text: '{{ .CommonAnnotations.description }}'
      send_resolved: true
- name: 'martian-ops-info'
  slack_configs:
    - api_url: 'https://hooks.slack.com/services/some/otherhook'
      channel: '#info-alerts'
      username: 'amplifr-prometheus'
      text: '{{ .CommonAnnotations.description }}'
      send_resolved: true

- name: 'amplifr-email'
  email_configs:
  - to: '[email protected]'
    send_resolved: true

- name: 'amplifr-slack'
  slack_configs:
    - api_url: 'https://hooks.slack.com/services/some/other/hook'
      channel: '#admin'
      username: 'prometheus'
      icon_url: 'https://prometheus.io/assets/prometheus_logo-cb55bb5c346.png'
      text: '{{ .CommonAnnotations.description }}'
      send_resolved: true

- name: 'telephone-ops'
  opsgenie_configs:
    - api_key: 'key'
      message: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonAnnotations.summary }}'
      description: '{{ .CommonAnnotations.description }}'
      tags: '{{ .CommonLabels.severity }},{{ .CommonLabels.cluster }},{{ .CommonLabels.component }},{{ .CommonLabels.service }}'

- name: 'martian-alerta'
  webhook_configs:
  - url: 'http://alerta.alerta.svc.kubernetes.local/webhooks/prometheus?api-key=key'
    send_resolved: true
  • Logs:
$ kubectl logs alertmanager-74c85949d7-dpch5
level=info ts=2018-06-22T16:43:31.7236863Z caller=main.go:174 msg="Starting Alertmanager" version="(version=0.15.0, branch=HEAD, revision=462c969d85cf1a473587754d55e4a3c4a2abc63c)"
level=info ts=2018-06-22T16:43:31.723793048Z caller=main.go:175 build_context="(go=go1.10.3, user=root@bec9939eb862, date=20180622-11:58:41)"
level=error ts=2018-06-22T16:43:31.723826343Z caller=main.go:179 msg="Unable to create data directory" err="mkdir data/: read-only file system"

Most helpful comment

it's caused by this commit
https://github.com/prometheus/alertmanager/pull/1313/files

There was change in default WORKDIR in dockerfile and default --storage.path is set relatively to data/

So Alertmanager tries to create the directory in your mounted configmap

you can temporarily solve it by setting --storage.path="/alertmanager/data" I suppose

imho proper solution would be to set the default storage.path to absolute path here
https://github.com/prometheus/alertmanager/blob/release-0.15/cmd/alertmanager/main.go#L143

All 9 comments

Deployment manifest:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/port: "9093"
    prometheus.io/scrape: "true"
  labels:
    app: alertmanager
  name: alertmanager
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: alertmanager
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: alertmanager
      name: alertmanager
    spec:
      containers:
      - args:
        - --config.file=/etc/alertmanager/alertmanager.yml
        image: prom/alertmanager:v0.15.0
        imagePullPolicy: IfNotPresent
        name: alertmanager
        ports:
        - containerPort: 9093
          name: web
          protocol: TCP
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/alertmanager
          name: config-volume
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          name: alertmanager
        name: config-volume

it's caused by this commit
https://github.com/prometheus/alertmanager/pull/1313/files

There was change in default WORKDIR in dockerfile and default --storage.path is set relatively to data/

So Alertmanager tries to create the directory in your mounted configmap

you can temporarily solve it by setting --storage.path="/alertmanager/data" I suppose

imho proper solution would be to set the default storage.path to absolute path here
https://github.com/prometheus/alertmanager/blob/release-0.15/cmd/alertmanager/main.go#L143

@stuartnelson3 I actually wanted to send PR but now thinking about running AM out of the docker setting the storage.path default to absolute path is not good idea either.

The problem is that your commit introduced issue that if someone using docker has any cmd overridden it will after upgrade from 0.14 to 0.15 start to store data to the /etc/alertmanager/data instead of former /alertmanager/data

I suppose best solution would be to revert ENTRYPOINT and CMD to former state as was in 0.14?

@FUSAKLA thank you, custom --storage.path= saved my upgrade.

Thanks @Bregor for creating the issue! Thanks @FUSAKLA for the quick help.

I myself am not sure why this was moved to /etc/alertmanager. I would like to move this discussion to the initial PR (See https://github.com/prometheus/alertmanager/pull/1313#issuecomment-399819727).

I will close here as the immediate fix proposed by @FUSAKLA seems to solve the issue for you @Bregor. Feel free to reopen here or get involved in https://github.com/prometheus/alertmanager/pull/1313.

give the permission to the directory where alertmanager is installed or downloaded

How can i access "data" directory from terminal?

Thanks

It makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.

@roidelapluie it might make sense to you, but for end users these results are what show up first when googling these problems

Was this page helpful?
0 / 5 - 0 ratings