What did you do?
Upgrade alertmanager from 0.14.0 to 0.15.0 in kubernetes
The only change in deployment manifest is .spec.template.spec.containers[0].image
What did you see instead? Under which circumstances?
level=info ts=2018-06-22T16:43:31.7236863Z caller=main.go:174 msg="Starting Alertmanager" version="(version=0.15.0, branch=HEAD, revision=462c969d85cf1a473587754d55e4a3c4a2abc63c)"
level=info ts=2018-06-22T16:43:31.723793048Z caller=main.go:175 build_context="(go=go1.10.3, user=root@bec9939eb862, date=20180622-11:58:41)"
level=error ts=2018-06-22T16:43:31.723826343Z caller=main.go:179 msg="Unable to create data directory" err="mkdir data/: read-only file system"
Environment
Kubernetes-1.10.5 on bare metal (ubuntu xenial)
docker-ce-17.03.2
System information:
Linux 4.13.0-43-generic x86_64
Alertmanager version:
version=0.15.0, branch=HEAD, revision=462c969d85cf1a473587754d55e4a3c4a2abc63c
Prometheus version:
prometheus, version 2.3.1 (branch: HEAD, revision: 188ca45bd85ce843071e768d855722a9d9dabe03)
Alertmanager configuration file:
global:
smtp_from: '[email protected]'
smtp_smarthost: 'smarthost'
smtp_auth_username: 'username'
smtp_auth_password: 'password'
# The directory from which notification templates are read.
templates:
- '/etc/alertmanager/templates/*.tmpl'
# The root route on which each incoming alert enters.
route:
# The labels by which incoming alerts are grouped together. For example,
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
# be batched into a single group.
group_by: ['alertname', 'cluster', 'service']
#, 'alertname', 'cluster', 'service']
# When a new group of alerts is created by an incoming alert, wait at
# least 'group_wait' to send the initial notification.
# This way ensures that you get multiple alerts for the same group that start
# firing shortly after another are batched together on the first
# notification.
group_wait: 30s
# When the first notification was sent, wait 'group_interval' to send a betch
# of new alerts that started firing for that group.
group_interval: 5m
# If an alert has successfully been sent, wait 'repeat_interval' to
# resend them.
repeat_interval: 3h
# A default receiver
receiver: martian-ops
# All the above attributes are inherited by all child routes and can
# overwritten on each.
# The child route trees.
routes:
# This routes performs a regular expression match on alert labels to
# catch alerts that are related to a list of services.
- match_re:
service: ^(.*)$
severity: ^(warning|critical)$
receiver: martian-ops
continue: true
- match_re:
service: ^(.*)$
severity: info
receiver: martian-ops-info
continue: true
# The service has a sub-route for critical alerts, any alerts
# that do not match, i.e. severity != critical, fall-back to the
# parent node and are sent to 'martian-ops'
# routes:
- match:
severity: critical
receiver: telephone-ops
continue: true
- receiver: 'amplifr-slack'
match_re:
cluster: amplifr
severity: ^(warning|critical)$
continue: true
- receiver: 'amplifr-email'
match_re:
cluster: amplifr
severity: ^(warning|critical)$
continue: true
- receiver: 'martian-alerta'
match_re:
cluster: amplifr
severity: ^(warning|critical)$
continue: true
# Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is
# already critical.
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'cluster', 'component', 'service']
- source_match:
severity: 'warning'
target_match:
severity: 'info'
equal: ['alertname', 'cluster', 'component', 'service']
receivers:
- name: 'martian-ops'
email_configs:
- to: '[email protected]'
send_resolved: true
slack_configs:
- api_url: 'https://hooks.slack.com/services/some/hook'
channel: '#admin-alerts'
username: 'amplifr-prometheus'
text: '{{ .CommonAnnotations.description }}'
send_resolved: true
- name: 'martian-ops-info'
slack_configs:
- api_url: 'https://hooks.slack.com/services/some/otherhook'
channel: '#info-alerts'
username: 'amplifr-prometheus'
text: '{{ .CommonAnnotations.description }}'
send_resolved: true
- name: 'amplifr-email'
email_configs:
- to: '[email protected]'
send_resolved: true
- name: 'amplifr-slack'
slack_configs:
- api_url: 'https://hooks.slack.com/services/some/other/hook'
channel: '#admin'
username: 'prometheus'
icon_url: 'https://prometheus.io/assets/prometheus_logo-cb55bb5c346.png'
text: '{{ .CommonAnnotations.description }}'
send_resolved: true
- name: 'telephone-ops'
opsgenie_configs:
- api_key: 'key'
message: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonAnnotations.summary }}'
description: '{{ .CommonAnnotations.description }}'
tags: '{{ .CommonLabels.severity }},{{ .CommonLabels.cluster }},{{ .CommonLabels.component }},{{ .CommonLabels.service }}'
- name: 'martian-alerta'
webhook_configs:
- url: 'http://alerta.alerta.svc.kubernetes.local/webhooks/prometheus?api-key=key'
send_resolved: true
$ kubectl logs alertmanager-74c85949d7-dpch5
level=info ts=2018-06-22T16:43:31.7236863Z caller=main.go:174 msg="Starting Alertmanager" version="(version=0.15.0, branch=HEAD, revision=462c969d85cf1a473587754d55e4a3c4a2abc63c)"
level=info ts=2018-06-22T16:43:31.723793048Z caller=main.go:175 build_context="(go=go1.10.3, user=root@bec9939eb862, date=20180622-11:58:41)"
level=error ts=2018-06-22T16:43:31.723826343Z caller=main.go:179 msg="Unable to create data directory" err="mkdir data/: read-only file system"
Deployment manifest:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "9093"
prometheus.io/scrape: "true"
labels:
app: alertmanager
name: alertmanager
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: alertmanager
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: alertmanager
name: alertmanager
spec:
containers:
- args:
- --config.file=/etc/alertmanager/alertmanager.yml
image: prom/alertmanager:v0.15.0
imagePullPolicy: IfNotPresent
name: alertmanager
ports:
- containerPort: 9093
name: web
protocol: TCP
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/alertmanager
name: config-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: alertmanager
name: config-volume
it's caused by this commit
https://github.com/prometheus/alertmanager/pull/1313/files
There was change in default WORKDIR in dockerfile and default --storage.path is set relatively to data/
So Alertmanager tries to create the directory in your mounted configmap
you can temporarily solve it by setting --storage.path="/alertmanager/data" I suppose
imho proper solution would be to set the default storage.path to absolute path here
https://github.com/prometheus/alertmanager/blob/release-0.15/cmd/alertmanager/main.go#L143
@stuartnelson3 I actually wanted to send PR but now thinking about running AM out of the docker setting the storage.path default to absolute path is not good idea either.
The problem is that your commit introduced issue that if someone using docker has any cmd overridden it will after upgrade from 0.14 to 0.15 start to store data to the /etc/alertmanager/data instead of former /alertmanager/data
I suppose best solution would be to revert ENTRYPOINT and CMD to former state as was in 0.14?
@FUSAKLA thank you, custom --storage.path= saved my upgrade.
Thanks @Bregor for creating the issue! Thanks @FUSAKLA for the quick help.
I myself am not sure why this was moved to /etc/alertmanager. I would like to move this discussion to the initial PR (See https://github.com/prometheus/alertmanager/pull/1313#issuecomment-399819727).
I will close here as the immediate fix proposed by @FUSAKLA seems to solve the issue for you @Bregor. Feel free to reopen here or get involved in https://github.com/prometheus/alertmanager/pull/1313.
give the permission to the directory where alertmanager is installed or downloaded
How can i access "data" directory from terminal?
Thanks
It makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.
@roidelapluie it might make sense to you, but for end users these results are what show up first when googling these problems
Most helpful comment
it's caused by this commit
https://github.com/prometheus/alertmanager/pull/1313/files
There was change in default WORKDIR in dockerfile and default
--storage.pathis set relatively todata/So Alertmanager tries to create the directory in your mounted configmap
you can temporarily solve it by setting
--storage.path="/alertmanager/data"I supposeimho proper solution would be to set the default storage.path to absolute path here
https://github.com/prometheus/alertmanager/blob/release-0.15/cmd/alertmanager/main.go#L143