Alertmanager: Alert manager Clustering is not working with v0.15.0-rc.1

Created on 12 Apr 2018 · 7Comments · Source: prometheus/alertmanager

What did you do?
Deployed 3 instances of Alertmanager v0.15.0-rc.1

What did you expect to see?
All three Alertmanagers should be Clustered and displayed in Status page of AM

What did you see instead? Under which circumstances?
Clustering did not work as expected

Environment
QA

Alertmanager version:

alertmanager -version v0.15.0-rc.1
Prometheus version:

2.0
Alertmanager configuration file:

  alertmanager:
    image: prom/alertmanager:0.15.0-rc.1
    ports:
      - mapping: "9093:9093"
        name: alertmanager
      - mapping: "8001:8001"
        name: meshport        
    constraints:
      - [ "hostname", "CLUSTER", "localhost.aaa.net" ]

    parameters:
      - key: memory-swap
        value: "-1"
    healthChecks:
      - protocol: "HTTP"
        path: "/-/healthy"
        gracePeriodSeconds: 300
        intervalSeconds: 60
        portIndex: 0
        timeoutSeconds: 20
        maxConsecutiveFailures: 3
    args:
      - "--storage.path"
      - "/alertmanager"
      - "--config.file"
      - "/etc/alertmanager/config.yml"
      - "--cluster.listen-address"
      - ":8001"
      - "--cluster.gossip-interval"
      - "100ms"
      - "--log.level"
      - info
      - "--cluster.peer"
      - r00000000:8001
      - "--cluster.peer"
      - r00000001:8001

componenhigh availability kinmore-info-needed kinsupport

Source

vivekbny

Most helpful comment

@vivekbny as described in my comment above: Have you tried setting the --cluster.listen-address flag to the Alertmanager name, instead of just a port. For Alertmanager 1 this would be AlertmanagerA:8001.

mxinden on 18 Apr 2018

👍2

All 7 comments

Can you share the AlertManager's logs from all instances please?
Also with which orchestration system do you deploy AlertManager? I suspect it is Marathon but please confirm.

simonpasquier on 12 Apr 2018

Yes, we use Mesos/Marathon for orchestration and Nomad as well.
Currently the setup is in Mesos/Marathon
Please find the attached screen shot of the AM status page, which doesn't show the Peers(other 2 AM's)

am status

Will be adding the logs soon

vivekbny on 12 Apr 2018

@vivekbny I am not very familiar with _Mesos/Marathon_, sorry in advance for any stupid statements.

--cluster.listen-address needs to be routable by the other Alertmanager peers. Just setting it to the port (:8001) only works, if all peers listen on the same loopback device. We fixed the same issue in the Prometheus operator for Kubernetes.

mxinden on 13 Apr 2018

Log:
level=warn ts=2018-04-13T10:57:33.269880787Z caller=cluster.go:150 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"

Alert manager Arguments
Alert manager 1
args:
- "--storage.path"
- "/alertmanager"
- "--config.file"
- "/etc/alertmanager/config.yml"
- "--cluster.listen-address"
- ":8001"
- "--cluster.gossip-interval"
- "100ms"
- "--log.level"
- info
- "--cluster.peer"
- AlertmanagerB:8001
- "--cluster.peer"
- AlertmanagerC:8001

Alert manager 2
args:
- "--storage.path"
- "/alertmanager"
- "--config.file"
- "/etc/alertmanager/config.yml"
- "--cluster.listen-address"
- ":8001"
- "--cluster.gossip-interval"
- "100ms"
- "--log.level"
- info
- "--cluster.peer"
- AlertmanagerA:8001
- "--cluster.peer"
- AlertmanagerC:8001

Alert manager 3
args:
- "--storage.path"
- "/alertmanager"
- "--config.file"
- "/etc/alertmanager/config.yml"
- "--cluster.listen-address"
- ":8001"
- "--cluster.gossip-interval"
- "100ms"
- "--log.level"
- info
- "--cluster.peer"
- AlertmanagerA:8001
- "--cluster.peer"
- AlertmanagerB:8001