Alertmanager: Alert manager Clustering is not working with v0.15.0-rc.1

Created on 12 Apr 2018  路  7Comments  路  Source: prometheus/alertmanager

What did you do?
Deployed 3 instances of Alertmanager v0.15.0-rc.1

What did you expect to see?
All three Alertmanagers should be Clustered and displayed in Status page of AM

What did you see instead? Under which circumstances?
Clustering did not work as expected

Environment
QA

  • Alertmanager version:

    alertmanager -version v0.15.0-rc.1

  • Prometheus version:

    2.0

  • Alertmanager configuration file:

  alertmanager:
    image: prom/alertmanager:0.15.0-rc.1
    ports:
      - mapping: "9093:9093"
        name: alertmanager
      - mapping: "8001:8001"
        name: meshport        
    constraints:
      - [ "hostname", "CLUSTER", "localhost.aaa.net" ]

    parameters:
      - key: memory-swap
        value: "-1"
    healthChecks:
      - protocol: "HTTP"
        path: "/-/healthy"
        gracePeriodSeconds: 300
        intervalSeconds: 60
        portIndex: 0
        timeoutSeconds: 20
        maxConsecutiveFailures: 3
    args:
      - "--storage.path"
      - "/alertmanager"
      - "--config.file"
      - "/etc/alertmanager/config.yml"
      - "--cluster.listen-address"
      - ":8001"
      - "--cluster.gossip-interval"
      - "100ms"
      - "--log.level"
      - info
      - "--cluster.peer"
      - r00000000:8001
      - "--cluster.peer"
      - r00000001:8001
componenhigh availability kinmore-info-needed kinsupport

Most helpful comment

@vivekbny as described in my comment above: Have you tried setting the --cluster.listen-address flag to the Alertmanager name, instead of just a port. For Alertmanager 1 this would be AlertmanagerA:8001.

All 7 comments

Can you share the AlertManager's logs from all instances please?
Also with which orchestration system do you deploy AlertManager? I suspect it is Marathon but please confirm.

Yes, we use Mesos/Marathon for orchestration and Nomad as well.
Currently the setup is in Mesos/Marathon
Please find the attached screen shot of the AM status page, which doesn't show the Peers(other 2 AM's)

am status

Will be adding the logs soon

@vivekbny I am not very familiar with _Mesos/Marathon_, sorry in advance for any stupid statements.

--cluster.listen-address needs to be routable by the other Alertmanager peers. Just setting it to the port (:8001) only works, if all peers listen on the same loopback device. We fixed the same issue in the Prometheus operator for Kubernetes.

Log:
level=warn ts=2018-04-13T10:57:33.269880787Z caller=cluster.go:150 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"

Alert manager Arguments
Alert manager 1
args:
- "--storage.path"
- "/alertmanager"
- "--config.file"
- "/etc/alertmanager/config.yml"
- "--cluster.listen-address"
- ":8001"
- "--cluster.gossip-interval"
- "100ms"
- "--log.level"
- info
- "--cluster.peer"
- AlertmanagerB:8001
- "--cluster.peer"
- AlertmanagerC:8001

Alert manager 2
args:
- "--storage.path"
- "/alertmanager"
- "--config.file"
- "/etc/alertmanager/config.yml"
- "--cluster.listen-address"
- ":8001"
- "--cluster.gossip-interval"
- "100ms"
- "--log.level"
- info
- "--cluster.peer"
- AlertmanagerA:8001
- "--cluster.peer"
- AlertmanagerC:8001

Alert manager 3
args:
- "--storage.path"
- "/alertmanager"
- "--config.file"
- "/etc/alertmanager/config.yml"
- "--cluster.listen-address"
- ":8001"
- "--cluster.gossip-interval"
- "100ms"
- "--log.level"
- info
- "--cluster.peer"
- AlertmanagerA:8001
- "--cluster.peer"
- AlertmanagerB:8001

Any update on the this.

Can you add all of the startup logs? I'm wondering if there are any errors being reported, such as DNS resolution errors in the clustering library.

As @mxinden stated, you also need to have a routable address for --cluster.listen-address, like an ip:port, not just the port.

@vivekbny as described in my comment above: Have you tried setting the --cluster.listen-address flag to the Alertmanager name, instead of just a port. For Alertmanager 1 this would be AlertmanagerA:8001.

Was this page helpful?
0 / 5 - 0 ratings