Alertmanager: OpsGenie notify fails with 422 status code

Created on 15 Nov 2017  路  12Comments  路  Source: prometheus/alertmanager

What did you do?

I have simple rule on test instance:

groups:
- name: example
  rules:
  - alert: BlaAlert
    expr: up == 1
    for: 0s
    labels: []
    annotations:
      severity: critical
      message: Something wrong
      summary: OpsMeOut

I logged request it sends:

{"alias":"1d1d7ea4d3921fc7909a53d3738caeb62f0e9e5c6affccbf5f3323d2ee1667c9","message":"[FIRING:3] BlaAlert (critical)","description":"OpsMeOut
Alerts Firing:
Labels:
 - alertname = BlaAlert
 - instance = localhost:9090
 - job = prometheus
 - severity = critical
Annotations:
 - summary = OpsMeOut
Source: http://prometheus-2:9090/graph?g0.expr=up+%3D%3D+1\\u0026g0.tab=1
Labels:
 - alertname = BlaAlert
 - instance = localhost:9100
 - job = node
 - severity = critical
Annotations:
 - summary = OpsMeOut
Source: http://prometheus-2:9090/graph?g0.expr=up+%3D%3D+1\\u0026g0.tab=1
Labels:
 - alertname = BlaAlert
 - instance = localhost:9187
 - job = postgres
 - severity = critical
Annotations:
 - summary = OpsMeOut
Source: http://prometheus-2:9090/graph?g0.expr=up+%3D%3D+1\\u0026g0.tab=1

","details":{},"source":"http://prometheus20:9093/#/alerts?receiver=opsgenie","teams":"test-prometheus"}

What did you expect to see?

I expected successful response 200 from OpsGenie, instead I got 422 response which means: "Semantic errors in request body".

But I don't see any problems with requests because it have message field which is only required field from what I can see in OpsGenie API documentation.

Environment

  • System information:

Linux prometheus20 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

  • Prometheus version:
prometheus, version 2.0.0 (branch: HEAD, revision: 0a74f98628a0463dddc90528220c94de5032d1a0)
  build user:       root@615b82cb36b6
  build date:       20171108-07:11:59
  go version:       go1.9.2
  • Alertmanager version:
alertmanager, version 0.10.0 (branch: HEAD, revision: 133c888ef3644b47a52acbaeffb09f4cc637df1b)
  build user:       root@01302b7cd08a
  build date:       20171109-15:34:53
  go version:       go1.9.2
  • Alertmanager configuration file:
global:
  smtp_smarthost: 'localhost:25'
  smtp_from: '[email protected]'

templates: []

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h 
  receiver: sink
  routes:
  - match_re:
      service: ^.*$
    receiver: sink
    routes:
    - match:
        severity: critical
      receiver: opsgenie

receivers:
  - name: 'sink'
    email_configs:
      - to: '[email protected]'
  - name: 'opsgenie'
    opsgenie_configs:
      - api_key: 'foo-valid-opsgenie-key'
        teams: 'test-prometheus'
  • Prometheus configuration file:
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - localhost:9093
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "*_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'node'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9100']
  - job_name: 'postgres'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9187']
  • Logs:
level=debug ts=2017-11-15T14:16:46.144797798Z caller=notify.go:600 component=dispatcher msg="Notify attempt failed" attempt=1 integration=opsgenie err="unexpected status code 422"

Most helpful comment

@josedonizetti @fatalsaint #1108

All 12 comments

@Tom-Fawcett created the PR to move to V2, perhaps he has some insight

hey @XooR this is a bug caused by the API upgrade. The problem is that the 'teams' should be sent in a different structure. For now, you can make it work by removing the 'teams' option from your configuration. I'll have a PR to fix it by EOD.

Yep sorry guys :disappointed: I caused this bug.

@XooR for now I would advise either not using the teams config, or rolling back a version.

@stuartnelson3 @josedonizetti I identified this issue a few days ago, and created #1101, however, it hasn't progressed.

@Tom-Fawcett Awesome the PR exist already.

I would like to chime in that I have this same problem with the newer version 0.10.0 alertmanager but it's the 'tags' field causing my problems, not the 'teams' field (I don't use the teams field in alertmanager for opsgenie). When I remove the 'tags' field from my config opsgenie successfully gets the post.

@fatalsaint apologies, I'll fix that.

@josedonizetti @fatalsaint #1108

@fatalsaint @XooR are either of you able to build from head and confirm the behavior is fixed for you?

My setup utilizes the Docker containers. I can try when the master tag in hub gets updated. The Dockerfile in the repo looks like it requires a pre-built binary.

Tested it on the account I've used to debug, and it worked perfectly. Both tags and teams.
screen shot 2017-11-15 at 8 57 50 pm

Master tag updated, so I tested tags, and it worked.

Thanks for the fast response all.

I'm also confirming that it works both tags and teams option.

Was this page helpful?
0 / 5 - 0 ratings