Ambassador: Allow Envoy admin to bind to 0.0.0.0 so to scrape directly Envoy metrics / prometheus

Created on 21 Nov 2018  路  16Comments  路  Source: datawire/ambassador

Please describe your use case / problem.
Scraping Envoy metrics directly from Envoy's admin port at /stats or /stats/prometheus routes. Envoy itself has a much cleaner metrics format for http metrics on instances with many clusters than does using the prom/statsd-exporter sidecar. We can also remove the statsd sidecar with this approach as well.

Describe the solution you'd like
An option to bind Envoy admin to 0.0.0.0. Currently this is hard coded as 127.0.0.1 in envoy.j2 at admin.address which means you can't setup a metrics scraper from outside an Ambassador Pod to pull these metrics directly from Envoy. Probably want to make it configurable and leave the default at 127.0.0.1 so people have a more secure configuration out of the box but allow setting 0.0.0.0 for those that desire it.

Describe alternatives you've considered
Alternatives are customizing the envoy.j2 template and overriding admin.address to tcp://0.0.0.0:{{ admin.admin_port }}. This can be accomplished as both a ConfigMap value or building our own Ambassador container image. Both seem overkill to accomplish a port bind override in one part of the template and add additional operational burden to keep the templates or images up to date.

Additional context
In binding Envoy Admin to 0.0.0.0 it does need to be brought to people's attention that the admin port should only be exposed to trusted networks as it exposes the entire Envoy admin control plane.

Most helpful comment

I'm going to close this one since we have the /metrics mapping now (thanks, @rotemtam! :smile:). If we need more, we can open additional issues.

All 16 comments

This would be really nice - I expect to start using Istio at some point, so having "native" Envoy Prometheus metrics from Ambassador's Envoy instances would mean less change when I do that.

meanwhile as a workaround, this is the configuration i'm using for the exporter to change back the stats into the way envoy sends it.

---
defaults:
  timer_type: histogram
mappings:
- match: envoy.cluster.*.*
  name: envoy_cluster_${2}
  labels:
    envoy_cluster_name: ${1}
- match: listener\.(.*)\.http\.(.*)\.(downstream.*)
  name: envoy_listener_http_${3}
  match_type: regex
  labels:
    envoy_listener_address: ${1}
    envoy_http_conn_manager_prefix: ${2}
- match: listener\.([[:alpha:]]+)\.(.*)
  name: envoy_listener_${2}
  match_type: regex
  labels:
    envoy_listener_address: ${1}
- match: listener\.([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+_[0-9]+)\.(.*)
  name: envoy_listener_${2}
  match_type: regex
  labels:
    envoy_listener_address: ${1}
- match: envoy.http.*.*
  name: envoy_http_${2}
  labels:
    envoy_http_conn_manager_prefix: ${1}

Earlier I asked about exposing envoy's native prometheus metrics in slack and was pointed by @richarddli to this issue.

Personally I don't think exposing the admin interface is a good idea as it would require a trusted network (as previously mentioned). However it should be fairly straight forward to proxy only the stats route either ambassador internal or using a sidecar.

As a proof of concept I am currently running an nginx sidecar to expose the metrics:

# ...
        containers:
        - name: metrics-proxy
          image: nginx:1.15
          imagePullPolicy: IfNotPresent
          resources:
            limits:
              cpu: 100m
              memory: 64Mi
            requests:
              cpu: 50m
              memory: 16Mi
          ports:
          - name: metrics
            containerPort: 9102
          volumeMounts:
          - mountPath: /etc/nginx/conf.d
            name: proxy-conf
            readOnly: true
# ...

With this simple nginx config:

apiVersion: v1
kind: ConfigMap
metadata:
  name: ambassador-proxy-config
data:
  proxy.conf:
    server {
      listen 9102;
      listen [::]:9102;

      server_name _;

      location /stats {
        proxy_pass http://127.0.0.1:8001;
        proxy_http_version 1.1;
      }
    }

Now each ambassador instance can be scraped by prometheus. This approach is probably not worse than the statsd sidecar... In case this is already sufficient I wouldn't mind upstreaming ab example and documentation.

P.S. As far as I know envoy prometheus metrics should be on par with statsd as of https://github.com/envoyproxy/envoy/pull/5601

@kflynn asked about our use case, because a mapping can expose the metrics as well. I am working for a company that works with a lot of banks.
This basically means we use cryptic names on the "outside", e.g. jk1iodsa9083sod.somedomain.com, but proper names on the "inside" svc/customer-a-ambassador. Exposing the metrics or even the diagnostics leaks information.

As far as I understand it is currently not possible to add a mapping strictly to a specifc port, e.g. add /metrics to just the admin port 8877.

Exposing the whole envoy admin interface is potentially a security concern. Being able to expose the desired envoy internals via the admin interface therefore sounds like a reasonable solution.

If this is something that aligns with ambassadors goals, I am happy to contribute.

@rotemtam is looking into exposing /metrics/ via a proxy with diagd

Lost all day trying to find a simple way to expose admin interface (or just /stats/prometheus route) via ambassador and failed. IMHO the best solution is to expose this via diagd. Looking forward to this!

@richarddli do you happen to know why is this not possible to expose via ambassador mapping. Just like ambassador healthcheck and diagd. I tried to set the mapping to 127.0.0.1:8001 and it registers on 8877 listener but diagd returns 404 even though i have set a prefix /stats/prometheus (not sure why that traffic fails on 8877)

Has there been any progress @rotemtam ? Having /metrics proxy the requests to envoyadmin /stats/prometheus would be the easiest solution right?

I agree with @volatilemolotov above, and with this installed:

apiVersion: getambassador.io/v1
kind: Mapping
metadata:
    name: stats-mapping
spec:
    prefix: /metrics
    rewrite: /stats/prometheus
    service: 127.0.0.1:8001

I can curl http://$AMBASSADOR_IP/metrics and get what looks like a valid stats set. Anyone have a Prometheus operator config for this? :wink:

(Yeah, I'm doing some stuff where I actually need to use this in anger, so...)

OK, I believe that I have this working with Prometheus + Grafana and _no_ statsd exporter running at all. Anyone want to try to replicate?

  1. Use the Mapping above. Note that it's (obviously) a 0.70+ CRD, but hopefully it'll be obvious how to translate back to an annotation if necessary.

  2. Here's the definition I'm using with the Prometheus operator:

apiVersion: v1
kind: Service
metadata:
  name: ambassador-monitor
  namespace: default
  labels:
    service: ambassador-monitor
spec:
  selector:
    service: ambassador
  type: ClusterIP
  clusterIP: None
  ports:
  - name: prometheus-metrics
    port: 8080
    targetPort: 8080
    protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: ambassador-monitor
  namespace: default
  labels:
    ambassador: monitoring
spec:
  selector:
    matchLabels:
      service: ambassador-monitor
  endpoints:
  - port: prometheus-metrics

Note the selector in the ambassador-monitor K8s Service: it's set up to refer to my actual Ambassador pod.

End result is that the Prometheus operator goes directly to scrape via the /metrics mapping at port 8080 of my Ambassador pod. Seems rather happier than having statsd in the mix.

@kflynn i have tried using the following mapping

  ---
  apiVersion: ambassador/v1
  kind: Mapping
  name: prom-mapping
  prefix: /metrics
  rewrite: /stats/prometheus
  service: 127.0.0.1:8001

and i get not found, what could have i done wrong

EDIT: my mapping in diag comes out like this :

http://localhost:8877/metrics | 127.0.0.1:8001

Why does it map to the 8877 listener and judging by the service that you have given for example yours is mapped to the 8080 listener. Is there a difference in behavior between CRD and annotation mapping or should these two behave the same. Using ambassador v0.60.3

In my case the metrics were available in https://:8443/metrics, and not 8080. It's kind of cumbersome to make Prometheus scrape it using https tho.

I added prometheus.io/scheme: 'https' as an annotation to the pod, and then in the Prometheus scrape configs (I am not using the operator) the following block

    - job_name: "kubernetes-pods"
      kubernetes_sd_configs:
      - role: pod

      tls_config:
        insecure_skip_verify: true

      relabel_configs:
      - action: keep
        source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)

8443 is valid if you have https enabled, im not sure why does mine route to 8877

@kflynn i like the approach! is there a way to have this mapping accessible only from within the cluster? i wouldnt want to share my ambassador /metrics endpoint with the world

@rotemtam There isn't at present; I've been thinking about that. You can use auth to protect it, of course...

@volatilemolotov There's definitely no reason that referencing port 8001 should land with the 8877 cluster (which is the diag API). Please open another issue -- what would help a lot, if you can provide it, is to kubectl exec into an Ambassador pod and run

python3 grab-snapshots.py

and then either include sanitized.tgz in the issue or send it to me on our Slack.

I'm going to close this one since we have the /metrics mapping now (thanks, @rotemtam! :smile:). If we need more, we can open additional issues.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ppeble picture ppeble  路  3Comments

nilanjan-samajdar picture nilanjan-samajdar  路  4Comments

gregbacchus picture gregbacchus  路  3Comments

josephglanville picture josephglanville  路  3Comments

danielmittelman picture danielmittelman  路  3Comments