Ambassador: v0.50.1 breaks statsd/prometheus metrics collection

Created on 11 Feb 2019  路  11Comments  路  Source: datawire/ambassador

Describe the bug
Upgrading to v0.50.1 from v0.40.2 breaks the metric collection in prometheues

To Reproduce
Upgrade with the procedure in the datawire blog

Upgrading to Ambassador 0.50.1
Ambassador relies on Kubernetes deployments for updates. To update Ambassador, change your Kubernetes manifest to point to quay.io/datawire/ambassador:0.50.1 and run kubectl apply on the updated manifest. Kubernetes will apply a rolling update and update to 0.50.1.

Also note that it is not possible to upgrade vial Helm chart as it still points to 0.40.2 and the Chart is used to define the tag of the docker image.

Expected behavior

  • Metrics are displayed in prometheus
  • Documentation reflects reality ;)

Versions (please complete the following information):

  • Ambassador: v0.50.1
  • Kubernetes environment: AWS via Kops
  • Version: 1.10.X

There are some hints on the issues and PRs here and there about a change in this regard, it would be nice to know how to get this working

All 11 comments

Can you give some more specifics about how "things are broken"? We've tested statistics collection extensively in 0.50 and 0.50.1 and haven't seen any issues. Have you carefully followed the documentation here: https://www.getambassador.io/reference/statistics#exposing-statistics-via-statsd?

Thanks for the link @richarddli here is what my configuration looks like right now:
```apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "6"
creationTimestamp: null
generation: 1
labels:
app: ambassador
chart: ambassador-0.40.2
heritage: Tiller
release: ambassador
name: ambassador
selfLink: /apis/apps/v1/namespaces/ambassador/deployments/ambassador
spec:
progressDeadlineSeconds: 600
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
app: ambassador
release: ambassador
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "9102"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: ambassador
release: ambassador
service: ambassador
spec:
containers:
- args:
- -statsd.listen-address=:8125
- -statsd.mapping-config=/statsd-exporter/mapping-config.yaml
image: prom/statsd-exporter:v0.6.0
imagePullPolicy: IfNotPresent
name: statsd-sink
ports:
- containerPort: 9102
name: metrics
protocol: TCP
- containerPort: 8125
name: listener
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /statsd-exporter/
name: stats-exporter-mapping-config
readOnly: true
- env:
- name: STATSD_ENABLED
value: "true"
- name: AMBASSADOR_ID
value: production
- name: AMBASSADOR_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/datawire/ambassador:0.50.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /ambassador/v0/check_alive
port: admin
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
name: ambassador
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 80
name: https
protocol: TCP
- containerPort: 8877
name: admin
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ambassador/v0/check_ready
port: admin
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 300m
memory: 256Mi
requests:
cpu: 100m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: ambassador
serviceAccountName: ambassador
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
items:
- key: exporterConfiguration
path: mapping-config.yaml
name: ambassador-config
name: stats-exporter-mapping-config
status: {}

However if I port-forward into any of the pods to port 9102 in `/metrics` I don't see any `envoy*` related metrics. This used to work in `v0.40.2`

This are my mappings 

apiVersion: v1
data:
exporterConfiguration: |
---
mappings:
- match: 'envoy.cluster..upstream_cx_connect_ms'
name: "envoy_cluster_upstream_cx_connect_time"
timer_type: 'histogram'
labels:
cluster_name: "$1"
- match: 'envoy.cluster.
.upstream_rq_total'
name: "envoy_cluster_upstream_rq_total"
timer_type: 'histogram'
labels:
cluster_name: "$1"
kind: ConfigMap
metadata:
creationTimestamp: null
name: ambassador-config
selfLink: /api/v1/namespaces/ambassador/configmaps/ambassador-config
```

Any clues as to what I might be missing here? I have a pretty vanilla setup generated from the helm chart

having a similar issue, here is the issue I am getting

2019-02-11 14:45:44 diagd 0.50.1 [P44TAmbassadorEventWatcher] ERROR: Unable to resolve statsd-sink to IP : [Errno -2] Name does not resolve
2019-02-11 14:45:44 diagd 0.50.1 [P44TAmbassadorEventWatcher] ERROR: Stats will not be exported to statsd-sink

Update:

was able to fix this issue by setting STATSD_HOST to localhost in the ambassador env. Might be worth having this set by default if possible?

Thanks @tzilist I can confirm this is indeed the problem. It seems that the project is moving away from sidecar containers to a single statsd sink (this from a PR not yet merged of the helm chart) so maybe it was just tested with such a configuration.

@AlexRRR glad to help :)
@richarddli It might be worth having someone run through the docs for setting up monitoring as well as the sections about istio, both seem to be outdated now. Thanks for all of the hard work you guys do though, loving ambassador so far!

@tzilist Thanks. Am I sensing that you're volunteering? :-)

@richarddli Oof, I wish I had time! I will in about 2 weeks, if they haven't been updated, I'll go through them and add some extra things I did to debug and set up ambassador properly :)

Can confirm that STATSD_ENABLED=true and STATSD_HOST=localhost works with the prom exporter :)

An updated ambassador chart was just merged to helm/charts! 馃帀 would be great if you could test it :)

@Flydiverny thank you for getting that in! I'll test it today!

I've updated the docs in #1220, and DaveOps is also working on some additional improvements. I believe that addresses this issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

josephglanville picture josephglanville  路  3Comments

Viacheslav-Akimov picture Viacheslav-Akimov  路  6Comments

cakuros picture cakuros  路  4Comments

HT154 picture HT154  路  6Comments

riker09 picture riker09  路  4Comments