What did you do?
Deployed 3 instances of Alertmanager v0.15.0-rc.1
What did you expect to see?
All three Alertmanagers should be Clustered and displayed in Status page of AM
What did you see instead? Under which circumstances?
Clustering did not work as expected
Environment
QA
Alertmanager version:
alertmanager -version v0.15.0-rc.1
Prometheus version:
2.0
Alertmanager configuration file:
alertmanager:
image: prom/alertmanager:0.15.0-rc.1
ports:
- mapping: "9093:9093"
name: alertmanager
- mapping: "8001:8001"
name: meshport
constraints:
- [ "hostname", "CLUSTER", "localhost.aaa.net" ]
parameters:
- key: memory-swap
value: "-1"
healthChecks:
- protocol: "HTTP"
path: "/-/healthy"
gracePeriodSeconds: 300
intervalSeconds: 60
portIndex: 0
timeoutSeconds: 20
maxConsecutiveFailures: 3
args:
- "--storage.path"
- "/alertmanager"
- "--config.file"
- "/etc/alertmanager/config.yml"
- "--cluster.listen-address"
- ":8001"
- "--cluster.gossip-interval"
- "100ms"
- "--log.level"
- info
- "--cluster.peer"
- r00000000:8001
- "--cluster.peer"
- r00000001:8001
Can you share the AlertManager's logs from all instances please?
Also with which orchestration system do you deploy AlertManager? I suspect it is Marathon but please confirm.
Yes, we use Mesos/Marathon for orchestration and Nomad as well.
Currently the setup is in Mesos/Marathon
Please find the attached screen shot of the AM status page, which doesn't show the Peers(other 2 AM's)

Will be adding the logs soon
@vivekbny I am not very familiar with _Mesos/Marathon_, sorry in advance for any stupid statements.
--cluster.listen-address needs to be routable by the other Alertmanager peers. Just setting it to the port (:8001) only works, if all peers listen on the same loopback device. We fixed the same issue in the Prometheus operator for Kubernetes.
Log:
level=warn ts=2018-04-13T10:57:33.269880787Z caller=cluster.go:150 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
Alert manager Arguments
Alert manager 1
args:
- "--storage.path"
- "/alertmanager"
- "--config.file"
- "/etc/alertmanager/config.yml"
- "--cluster.listen-address"
- ":8001"
- "--cluster.gossip-interval"
- "100ms"
- "--log.level"
- info
- "--cluster.peer"
- AlertmanagerB:8001
- "--cluster.peer"
- AlertmanagerC:8001
Alert manager 2
args:
- "--storage.path"
- "/alertmanager"
- "--config.file"
- "/etc/alertmanager/config.yml"
- "--cluster.listen-address"
- ":8001"
- "--cluster.gossip-interval"
- "100ms"
- "--log.level"
- info
- "--cluster.peer"
- AlertmanagerA:8001
- "--cluster.peer"
- AlertmanagerC:8001
Alert manager 3
args:
- "--storage.path"
- "/alertmanager"
- "--config.file"
- "/etc/alertmanager/config.yml"
- "--cluster.listen-address"
- ":8001"
- "--cluster.gossip-interval"
- "100ms"
- "--log.level"
- info
- "--cluster.peer"
- AlertmanagerA:8001
- "--cluster.peer"
- AlertmanagerB:8001
Any update on the this.
Can you add all of the startup logs? I'm wondering if there are any errors being reported, such as DNS resolution errors in the clustering library.
As @mxinden stated, you also need to have a routable address for --cluster.listen-address, like an ip:port, not just the port.
@vivekbny as described in my comment above: Have you tried setting the --cluster.listen-address flag to the Alertmanager name, instead of just a port. For Alertmanager 1 this would be AlertmanagerA:8001.
Most helpful comment
@vivekbny as described in my comment above: Have you tried setting the
--cluster.listen-addressflag to the Alertmanager name, instead of just a port. For Alertmanager 1 this would beAlertmanagerA:8001.