Please describe your use case / problem.
Scraping Envoy metrics directly from Envoy's admin port at /stats or /stats/prometheus routes. Envoy itself has a much cleaner metrics format for http metrics on instances with many clusters than does using the prom/statsd-exporter sidecar. We can also remove the statsd sidecar with this approach as well.
Describe the solution you'd like
An option to bind Envoy admin to 0.0.0.0. Currently this is hard coded as 127.0.0.1 in envoy.j2 at admin.address which means you can't setup a metrics scraper from outside an Ambassador Pod to pull these metrics directly from Envoy. Probably want to make it configurable and leave the default at 127.0.0.1 so people have a more secure configuration out of the box but allow setting 0.0.0.0 for those that desire it.
Describe alternatives you've considered
Alternatives are customizing the envoy.j2 template and overriding admin.address to tcp://0.0.0.0:{{ admin.admin_port }}. This can be accomplished as both a ConfigMap value or building our own Ambassador container image. Both seem overkill to accomplish a port bind override in one part of the template and add additional operational burden to keep the templates or images up to date.
Additional context
In binding Envoy Admin to 0.0.0.0 it does need to be brought to people's attention that the admin port should only be exposed to trusted networks as it exposes the entire Envoy admin control plane.
This would be really nice - I expect to start using Istio at some point, so having "native" Envoy Prometheus metrics from Ambassador's Envoy instances would mean less change when I do that.
meanwhile as a workaround, this is the configuration i'm using for the exporter to change back the stats into the way envoy sends it.
---
defaults:
timer_type: histogram
mappings:
- match: envoy.cluster.*.*
name: envoy_cluster_${2}
labels:
envoy_cluster_name: ${1}
- match: listener\.(.*)\.http\.(.*)\.(downstream.*)
name: envoy_listener_http_${3}
match_type: regex
labels:
envoy_listener_address: ${1}
envoy_http_conn_manager_prefix: ${2}
- match: listener\.([[:alpha:]]+)\.(.*)
name: envoy_listener_${2}
match_type: regex
labels:
envoy_listener_address: ${1}
- match: listener\.([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+_[0-9]+)\.(.*)
name: envoy_listener_${2}
match_type: regex
labels:
envoy_listener_address: ${1}
- match: envoy.http.*.*
name: envoy_http_${2}
labels:
envoy_http_conn_manager_prefix: ${1}
Earlier I asked about exposing envoy's native prometheus metrics in slack and was pointed by @richarddli to this issue.
Personally I don't think exposing the admin interface is a good idea as it would require a trusted network (as previously mentioned). However it should be fairly straight forward to proxy only the stats route either ambassador internal or using a sidecar.
As a proof of concept I am currently running an nginx sidecar to expose the metrics:
# ...
containers:
- name: metrics-proxy
image: nginx:1.15
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 100m
memory: 64Mi
requests:
cpu: 50m
memory: 16Mi
ports:
- name: metrics
containerPort: 9102
volumeMounts:
- mountPath: /etc/nginx/conf.d
name: proxy-conf
readOnly: true
# ...
With this simple nginx config:
apiVersion: v1
kind: ConfigMap
metadata:
name: ambassador-proxy-config
data:
proxy.conf:
server {
listen 9102;
listen [::]:9102;
server_name _;
location /stats {
proxy_pass http://127.0.0.1:8001;
proxy_http_version 1.1;
}
}
Now each ambassador instance can be scraped by prometheus. This approach is probably not worse than the statsd sidecar... In case this is already sufficient I wouldn't mind upstreaming ab example and documentation.
P.S. As far as I know envoy prometheus metrics should be on par with statsd as of https://github.com/envoyproxy/envoy/pull/5601
@kflynn asked about our use case, because a mapping can expose the metrics as well. I am working for a company that works with a lot of banks.
This basically means we use cryptic names on the "outside", e.g. jk1iodsa9083sod.somedomain.com, but proper names on the "inside" svc/customer-a-ambassador. Exposing the metrics or even the diagnostics leaks information.
As far as I understand it is currently not possible to add a mapping strictly to a specifc port, e.g. add /metrics to just the admin port 8877.
Exposing the whole envoy admin interface is potentially a security concern. Being able to expose the desired envoy internals via the admin interface therefore sounds like a reasonable solution.
If this is something that aligns with ambassadors goals, I am happy to contribute.
@rotemtam is looking into exposing /metrics/ via a proxy with diagd
Lost all day trying to find a simple way to expose admin interface (or just /stats/prometheus route) via ambassador and failed. IMHO the best solution is to expose this via diagd. Looking forward to this!
@richarddli do you happen to know why is this not possible to expose via ambassador mapping. Just like ambassador healthcheck and diagd. I tried to set the mapping to 127.0.0.1:8001 and it registers on 8877 listener but diagd returns 404 even though i have set a prefix /stats/prometheus (not sure why that traffic fails on 8877)
Has there been any progress @rotemtam ? Having /metrics proxy the requests to envoyadmin /stats/prometheus would be the easiest solution right?
I agree with @volatilemolotov above, and with this installed:
apiVersion: getambassador.io/v1
kind: Mapping
metadata:
name: stats-mapping
spec:
prefix: /metrics
rewrite: /stats/prometheus
service: 127.0.0.1:8001
I can curl http://$AMBASSADOR_IP/metrics and get what looks like a valid stats set. Anyone have a Prometheus operator config for this? :wink:
(Yeah, I'm doing some stuff where I actually need to use this in anger, so...)
OK, I believe that I have this working with Prometheus + Grafana and _no_ statsd exporter running at all. Anyone want to try to replicate?
Use the Mapping above. Note that it's (obviously) a 0.70+ CRD, but hopefully it'll be obvious how to translate back to an annotation if necessary.
Here's the definition I'm using with the Prometheus operator:
apiVersion: v1
kind: Service
metadata:
name: ambassador-monitor
namespace: default
labels:
service: ambassador-monitor
spec:
selector:
service: ambassador
type: ClusterIP
clusterIP: None
ports:
- name: prometheus-metrics
port: 8080
targetPort: 8080
protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: ambassador-monitor
namespace: default
labels:
ambassador: monitoring
spec:
selector:
matchLabels:
service: ambassador-monitor
endpoints:
- port: prometheus-metrics
Note the selector in the ambassador-monitor K8s Service: it's set up to refer to my actual Ambassador pod.
End result is that the Prometheus operator goes directly to scrape via the /metrics mapping at port 8080 of my Ambassador pod. Seems rather happier than having statsd in the mix.
@kflynn i have tried using the following mapping
---
apiVersion: ambassador/v1
kind: Mapping
name: prom-mapping
prefix: /metrics
rewrite: /stats/prometheus
service: 127.0.0.1:8001
and i get not found, what could have i done wrong
EDIT: my mapping in diag comes out like this :
http://localhost:8877/metrics | 127.0.0.1:8001
Why does it map to the 8877 listener and judging by the service that you have given for example yours is mapped to the 8080 listener. Is there a difference in behavior between CRD and annotation mapping or should these two behave the same. Using ambassador v0.60.3
In my case the metrics were available in https://
I added prometheus.io/scheme: 'https' as an annotation to the pod, and then in the Prometheus scrape configs (I am not using the operator) the following block
- job_name: "kubernetes-pods"
kubernetes_sd_configs:
- role: pod
tls_config:
insecure_skip_verify: true
relabel_configs:
- action: keep
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
8443 is valid if you have https enabled, im not sure why does mine route to 8877
@kflynn i like the approach! is there a way to have this mapping accessible only from within the cluster? i wouldnt want to share my ambassador /metrics endpoint with the world
@rotemtam There isn't at present; I've been thinking about that. You can use auth to protect it, of course...
@volatilemolotov There's definitely no reason that referencing port 8001 should land with the 8877 cluster (which is the diag API). Please open another issue -- what would help a lot, if you can provide it, is to kubectl exec into an Ambassador pod and run
python3 grab-snapshots.py
and then either include sanitized.tgz in the issue or send it to me on our Slack.
I'm going to close this one since we have the /metrics mapping now (thanks, @rotemtam! :smile:). If we need more, we can open additional issues.
Most helpful comment
I'm going to close this one since we have the /metrics mapping now (thanks, @rotemtam! :smile:). If we need more, we can open additional issues.