Is your feature request related to a problem? Please describe.
We have two sites with deployed openstack. We use prometheus openstack_sd for monitoring hypervisors and openstack virtual machines. Lack of openstack_sd in vmagent and victoria-metrics breaks our monitoring approach and makes replacement of prometheus impossible.
Describe the solution you'd like
We kindly ask you to implement openstack_sd in vmagent and victoria-metrics. It will allow us to migrate quite easily.
Describe alternatives you've considered
We can consider another sd but it will cause major reconfiguration of scrape configs and relabeling. There is no point for making such effort especially it would require additional software to be deployed and maintained.
Additional context
We'd like to migrate to vm stack since we observe two major benefits:
Thanks!
OpenStack service discovery has been added into vmagent starting from the commit cbe3cf683bdc5f4520a791cfed9759d4fa68f114 .
@bojleros , could you build vmagent from the commit cbe3cf683bdc5f4520a791cfed9759d4fa68f114 according to these docs and verify whether OpenStack service discovery works as expected?
FYI, support for openstack_sd_config has been added in release v1.43.0. It supports only OpenStack API v3.
@valyala , Pardon , i was having a late holidays. Thank you for this release! We are going to have it tested soon.
error when discovering openstack targets forjob_name"nodes_openstacksd": cannot refresh OpenStack api token: auth failed, bad status code: 404, want: 201; skipping it
@valyala Should we expect sd configuration taken from prometheus to work out of box? We were using prometheus with openstack stein sd configured in this way:
- identity_endpoint: https://ostack.example.com:5000/v3.0
username: dedicatedMonUser
region: RegionOne
domain_id: default
project_name: admin
password: a_password
role: instance
port: 9100
tls_config:
ca_file: /etc/ipa/ca.crt
error when discovering openstack targets forjob_name"nodes_openstacksd": cannot refresh OpenStack api token: auth failed, bad status code: 404, want: 201; skipping it@valyala Should we expect sd configuration taken from prometheus to work out of box? We were using prometheus with openstack stein sd configured in this way:
- identity_endpoint: https://ostack.example.com:5000/v3.0 username: dedicatedMonUser region: RegionOne domain_id: default project_name: admin password: a_password role: instance port: 9100 tls_config: ca_file: /etc/ipa/ca.crt
Can you try to replace identity_endpoint https://ostack.example.com:5000/v3.0 with https://ostack.example.com:5000/v3? If it helps, i'll add hot-fix.
Thank you @f41gh7 . It started to work. Now i get following error in log but it looks that metrics are transported into the Victoria Metrics:
Oct 16 09:13:42 vmagent-infra01 vmagent[16822]: 2020-10-16T09:13:42.308Z error VictoriaMetrics/lib/promscrape/scraper.go:273 skipping duplicate scrape target with identical labels; endpoint=http://x.x.x.20:9100/metrics, labels={app="haproxy", app_group="ceph", env="production", instance="lbl02-s3", job="nodes_openstacksd", location="wdc02", project="infrastructure", role="s3", status="ACTIVE"}; make sure service discovery and relabeling is set up properly; see also https://victoriametrics.github.io/vmagent.html#troubleshooting
Oct 16 09:13:42 vmagent-infra01 vmagent[16822]: 2020-10-16T09:13:42.308Z error VictoriaMetrics/lib/promscrape/scraper.go:273 skipping duplicate scrape target with identical labels; endpoint=http://x.x.x.24:9100/metrics, labels={app="haproxy", app_group="ceph", env="production", instance="lbl01-s3", job="nodes_openstacksd", location="wdc02", project="infrastructure", role="s3", status="ACTIVE"}; make sure service discovery and relabeling is set up properly; see also https://victoriametrics.github.io/vmagent.html#troubleshooting
What i see here is two uniqe series. It is an instance label that makes the difference. This relabeling also works on prometheus:
relabel_configs:
# Keep only instances which are flagged for scraping
- source_labels: [__meta_openstack_tag_monitoring_node_scrape]
action: keep
regex: 'true'
# define project label , each project id must be listed here
- source_labels: [__meta_openstack_project_id]
regex: '7ee2774c3782489093fdb7dedabec07d'
#aka admin
replacement: 'infrastructure'
target_label: project
#use custom node_exporter port if openstack vm tag is defined
- source_labels:
- __address__
- __meta_openstack_tag_monitoring_node_port
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
#use custom node_exporter address if openstack vm tag is defined
- source_labels:
- __meta_openstack_tag_monitoring_node_addr
- __address__
action: replace
regex: \s*(\d+\.\d+\.\d+\.\d+)\s*;\S+:(\d+)
replacement: $1:$2
target_label: __address__
#populate standard tags
- source_labels: [__meta_openstack_tag_app_group]
target_label: app_group
- source_labels: [__meta_openstack_tag_app]
target_label: app
- source_labels: [__meta_openstack_tag_role]
target_label: role
- source_labels: [__meta_openstack_instance_name]
target_label: instance
- source_labels: [instance]
action: replace
regex: (\S+):\d+
replacement: $1
target_label: instance
- source_labels: [__meta_openstack_instance_status]
target_label: status`
`
@bojleros thx, endpoint will be fixed.
Prometheus simply suppress such errors, you can obtain the same behavior with flag -promscrape.suppressDuplicateScrapeTargetErrors=true.
Related information https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmagent#troubleshooting
I don't know is it possible to setup keep_if_equal for openstack.
@f41gh7 Are you sure this needs an additional suppression? This warning looks very nice in term of finding misconfiguration so maybe it is not a good idea to disable it.
We have two virtual servers on openstack:
lbl01-s3
lbl02-s3
They have the same set of openstack tags/metadata because they share the same role. Now this error message is missleading since it does not take instance label into the account. Maybe it's because it is generated before relabeling but prints the state just after relabeling was applied. Do you think that this may deserve a separate issue?
Regards,
Bart
This message indicates, that you have 2 duplicate targets, and i assume, that this part of configuration rewrites original __address__ label and produces duplicated values:
#use custom node_exporter address if openstack vm tag is defined
- source_labels:
- __meta_openstack_tag_monitoring_node_addr
- __address__
action: replace
regex: \s*(\d+\.\d+\.\d+\.\d+)\s*;\S+:(\d+)
replacement: $1:$2
target_label: __address__
Let me give you and example. You have virtual server lbl01-s3 with 2 pools p1 and p2, for each private ip address in this pools service_discovery generates a target. In our case it will be target 1 with __address__ = p1_private_ip:cfg_port and target 2 with __address__ = p2_private_ip:cfg_port.
During relabeling you replace private_ip with value of tag __meta_openstack_tag_monitoring_node_addr and get 2 targets with the same __address__.
A target are checked for uniqueness after relabeling. You can get original values at http://localhost:8429/targets?show_original_labels=true api.
@f41gh7 Thank you :) :+1:
Btw, try upgrading to v1.44.0 - it provides the original set of labels in error log, so it may simplify finding the root cause for duplicate scrape targets.
FYI, the bugfix for OpenStack v3.0 API has been included in v1.45.0.