Victoriametrics: Please add openstack_sd to vmagent and victoria-metrics promscrape

Created on 28 Aug 2020 · 12Comments · Source: VictoriaMetrics/VictoriaMetrics

Is your feature request related to a problem? Please describe.
We have two sites with deployed openstack. We use prometheus openstack_sd for monitoring hypervisors and openstack virtual machines. Lack of openstack_sd in vmagent and victoria-metrics breaks our monitoring approach and makes replacement of prometheus impossible.

Describe the solution you'd like
We kindly ask you to implement openstack_sd in vmagent and victoria-metrics. It will allow us to migrate quite easily.

Describe alternatives you've considered
We can consider another sd but it will cause major reconfiguration of scrape configs and relabeling. There is no point for making such effort especially it would require additional software to be deployed and maintained.

Additional context
We'd like to migrate to vm stack since we observe two major benefits:

victoria metrics is very efficient
vmagent consumes 15-30% of ram compared to the prometheus running the same configs with minimal time retention

Thanks!

enhancement vmagent

Source

bojleros

👍12

All 12 comments

OpenStack service discovery has been added into vmagent starting from the commit cbe3cf683bdc5f4520a791cfed9759d4fa68f114 .

@bojleros , could you build vmagent from the commit cbe3cf683bdc5f4520a791cfed9759d4fa68f114 according to these docs and verify whether OpenStack service discovery works as expected?

valyala on 5 Oct 2020

FYI, support for openstack_sd_config has been added in release v1.43.0. It supports only OpenStack API v3.

valyala on 6 Oct 2020

@valyala , Pardon , i was having a late holidays. Thank you for this release! We are going to have it tested soon.

bojleros on 12 Oct 2020

error when discovering openstack targets forjob_name"nodes_openstacksd": cannot refresh OpenStack api token: auth failed, bad status code: 404, want: 201; skipping it

@valyala Should we expect sd configuration taken from prometheus to work out of box? We were using prometheus with openstack stein sd configured in this way:

- identity_endpoint: https://ostack.example.com:5000/v3.0 username: dedicatedMonUser region: RegionOne domain_id: default project_name: admin password: a_password role: instance port: 9100 tls_config: ca_file: /etc/ipa/ca.crt

bojleros on 16 Oct 2020

error when discovering openstack targets forjob_name"nodes_openstacksd": cannot refresh OpenStack api token: auth failed, bad status code: 404, want: 201; skipping it

@valyala Should we expect sd configuration taken from prometheus to work out of box? We were using prometheus with openstack stein sd configured in this way:

- identity_endpoint: https://ostack.example.com:5000/v3.0 username: dedicatedMonUser region: RegionOne domain_id: default project_name: admin password: a_password role: instance port: 9100 tls_config: ca_file: /etc/ipa/ca.crt

Can you try to replace identity_endpoint https://ostack.example.com:5000/v3.0 with https://ostack.example.com:5000/v3? If it helps, i'll add hot-fix.

f41gh7 on 16 Oct 2020

👍1

Thank you @f41gh7 . It started to work. Now i get following error in log but it looks that metrics are transported into the Victoria Metrics:

Oct 16 09:13:42 vmagent-infra01 vmagent[16822]: 2020-10-16T09:13:42.308Z error VictoriaMetrics/lib/promscrape/scraper.go:273 skipping duplicate scrape target with identical labels; endpoint=http://x.x.x.20:9100/metrics, labels={app="haproxy", app_group="ceph", env="production", instance="lbl02-s3", job="nodes_openstacksd", location="wdc02", project="infrastructure", role="s3", status="ACTIVE"}; make sure service discovery and relabeling is set up properly; see also https://victoriametrics.github.io/vmagent.html#troubleshooting Oct 16 09:13:42 vmagent-infra01 vmagent[16822]: 2020-10-16T09:13:42.308Z error VictoriaMetrics/lib/promscrape/scraper.go:273 skipping duplicate scrape target with identical labels; endpoint=http://x.x.x.24:9100/metrics, labels={app="haproxy", app_group="ceph", env="production", instance="lbl01-s3", job="nodes_openstacksd", location="wdc02", project="infrastructure", role="s3", status="ACTIVE"}; make sure service discovery and relabeling is set up properly; see also https://victoriametrics.github.io/vmagent.html#troubleshooting

What i see here is two uniqe series. It is an instance label that makes the difference. This relabeling also works on prometheus:

relabel_configs:
  # Keep only instances which are flagged for scraping
  - source_labels: [__meta_openstack_tag_monitoring_node_scrape]
    action: keep
    regex: 'true'

  # define project label , each project id must be listed here
  - source_labels: [__meta_openstack_project_id]
    regex: '7ee2774c3782489093fdb7dedabec07d'
    #aka admin
    replacement: 'infrastructure'
    target_label: project

  #use custom node_exporter port if openstack vm tag is defined
  - source_labels:
    - __address__
    - __meta_openstack_tag_monitoring_node_port
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__

  #use custom node_exporter address if openstack vm tag is defined
  - source_labels:
    - __meta_openstack_tag_monitoring_node_addr
    - __address__
    action: replace
    regex: \s*(\d+\.\d+\.\d+\.\d+)\s*;\S+:(\d+)
    replacement: $1:$2
    target_label: __address__

  #populate standard tags
  - source_labels: [__meta_openstack_tag_app_group]
    target_label: app_group
  - source_labels: [__meta_openstack_tag_app]
    target_label: app
  - source_labels: [__meta_openstack_tag_role]
    target_label: role
  - source_labels: [__meta_openstack_instance_name]
    target_label: instance
  - source_labels: [instance]
    action: replace
    regex: (\S+):\d+
    replacement: $1
    target_label: instance
  - source_labels: [__meta_openstack_instance_status]
    target_label: status`

bojleros on 16 Oct 2020

@bojleros thx, endpoint will be fixed.

Prometheus simply suppress such errors, you can obtain the same behavior with flag -promscrape.suppressDuplicateScrapeTargetErrors=true.

I don't know is it possible to setup keep_if_equal for openstack.

f41gh7 on 16 Oct 2020

@f41gh7 Are you sure this needs an additional suppression? This warning looks very nice in term of finding misconfiguration so maybe it is not a good idea to disable it.

We have two virtual servers on openstack:
lbl01-s3
lbl02-s3

They have the same set of openstack tags/metadata because they share the same role. Now this error message is missleading since it does not take instance label into the account. Maybe it's because it is generated before relabeling but prints the state just after relabeling was applied. Do you think that this may deserve a separate issue?

Regards,
Bart

bojleros on 16 Oct 2020

This message indicates, that you have 2 duplicate targets, and i assume, that this part of configuration rewrites original __address__ label and produces duplicated values:

  #use custom node_exporter address if openstack vm tag is defined
  - source_labels:
    - __meta_openstack_tag_monitoring_node_addr
    - __address__
    action: replace
    regex: \s*(\d+\.\d+\.\d+\.\d+)\s*;\S+:(\d+)
    replacement: $1:$2
    target_label: __address__

Let me give you and example. You have virtual server lbl01-s3 with 2 pools p1 and p2, for each private ip address in this pools service_discovery generates a target. In our case it will be target 1 with __address__ = p1_private_ip:cfg_port and target 2 with __address__ = p2_private_ip:cfg_port.
During relabeling you replace private_ip with value of tag __meta_openstack_tag_monitoring_node_addr and get 2 targets with the same __address__.

A target are checked for uniqueness after relabeling. You can get original values at http://localhost:8429/targets?show_original_labels=true api.

f41gh7 on 16 Oct 2020

👍1

@f41gh7 Thank you :) :+1:

bojleros on 16 Oct 2020

Btw, try upgrading to v1.44.0 - it provides the original set of labels in error log, so it may simplify finding the root cause for duplicate scrape targets.

valyala on 16 Oct 2020

👍1

FYI, the bugfix for OpenStack v3.0 API has been included in v1.45.0.

valyala on 2 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings