Describe the bug
I am trying to update to Ambassador 0.40.0 and have set the environment variable STATSD_ENABLED just like the doc says plus I have deployed a statsd-sink service (https://github.com/datawire/ambassador/blob/master/statsd-sink/prometheus/prom-statsd-sink.yaml). When I start Ambassador, I first see:
"statsd-sink exists
statsd-sink is reachable, starting socat ..."
However, after that, I see the message "socat[46] W waitpid(-1, {}, WNOHANG): no child has exited" in the logs that is repeating itself. What am I doing wrong!?
Here are the logs of interest:
```statsd-sink exists
statsd-sink is reachable, starting socat ...
2018/09/27 17:38:37 socat[46] W waitpid(-1, {}, WNOHANG): no child has exited
[2018-09-27 17:38:40.766][28][info][main] source/server/drain_manager_impl.cc:63] shutting down parent after drain
2018/09/27 17:38:42 socat[46] W waitpid(-1, {}, WNOHANG): no child has exited
2018/09/27 17:38:44 socat[46] W waitpid(-1, {}, WNOHANG): no child has exited
2018/09/27 17:38:44 socat[46] W waitpid(-1, {}, WNOHANG): no child has exited
2018/09/27 17:38:45 socat[46] W waitpid(-1, {}, WNOHANG): no child has exited
Additionally, here's what I see when I check the running process in the container:
```/ambassador $ ps auxfw
PID USER TIME COMMAND
1 8888 0:00 {entrypoint.sh} /bin/sh ./entrypoint.sh
20 8888 0:01 {diagd} /usr/bin/python3 /usr/bin/diagd --no-debugging /ambassador/ambassador-config
21 8888 0:00 /usr/bin/python3 /ambassador/hot-restarter.py /ambassador/start-envoy.sh
22 8888 0:02 /usr/bin/python3 /ambassador/kubewatch.py watch /ambassador/ambassador-config /ambassador/envoy.json -p 21 --delay 15
26 8888 0:00 {entrypoint.sh} /bin/sh ./entrypoint.sh
46 8888 0:11 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
926 8888 0:01 {diagd} /usr/bin/python3 /usr/bin/diagd --no-debugging /ambassador/ambassador-config
2306 8888 0:03 /usr/local/bin/envoy -c /ambassador/envoy-2.json --restart-epoch 1 --drain-time-s 5 --service-cluster ambassador-tex-dev --parent-shutdown-time-s 10
6856 8888 0:00 sh
10201 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10202 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10203 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10204 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10205 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10206 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10207 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10208 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10209 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10210 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10211 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10212 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10213 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10214 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10215 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10216 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10217 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10218 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10219 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10220 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10221 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10222 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10223 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10224 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10225 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10226 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10227 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10228 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10229 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10230 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10231 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10232 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10233 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10234 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10235 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10236 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10237 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10238 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10239 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10240 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10241 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10242 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10243 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10244 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10245 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10246 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10247 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10248 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10249 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10250 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10251 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10252 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10253 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10254 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10255 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10256 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10257 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10258 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10259 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10260 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10261 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10262 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10263 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10264 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10265 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10266 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10267 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10268 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10269 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10270 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10271 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10272 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10273 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10274 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10275 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10276 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10277 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10278 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10279 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10280 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10281 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10282 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10283 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10284 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10285 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10286 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10287 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10288 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10289 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10290 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10291 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10292 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10293 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10294 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10295 8888 0:00 ps auxfw
10296 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10297 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10298 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
10299 8888 0:00 socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:statsd-sink:8125
To Reproduce
Create the following ressources in Kubernetes...
```$ kubectl describe deployment ambassador-dev
Name: ambassador-dev
CreationTimestamp: Fri, 08 Jun 2018 11:29:46 -0400
Annotations: deployment.kubernetes.io/revision=19
Selector: service=ambassador
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: service=ambassador
Annotations: sidecar.istio.io/inject=false
Service Account: ambassador-dev
Containers:
ambassador:
Image: quay.io/datawire/ambassador:0.40.0
Ports: 8080/TCP, 8877/TCP
Host Ports: 0/TCP, 0/TCP
Limits:
cpu: 200m
memory: 200Mi
Requests:
cpu: 200m
memory: 200Mi
Liveness: http-get http://:8877/ambassador/v0/check_alive delay=30s timeout=1s period=3s #success=1 #failure=3
Readiness: http-get http://:8877/ambassador/v0/check_ready delay=30s timeout=1s period=3s #success=1 #failure=3
Environment:
AMBASSADOR_NAMESPACE: (v1:metadata.namespace)
AMBASSADOR_SINGLE_NAMESPACE: true
STATSD_ENABLED: true
Mounts:
Volumes:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets:
$ kubectl describe deployment statsd-sink
Name: statsd-sink
CreationTimestamp: Thu, 27 Sep 2018 13:20:56 -0400
Annotations: deployment.kubernetes.io/revision=4
Selector: service=statsd-sink
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: service=statsd-sink
Containers:
statsd-sink:
Image: prom/statsd-exporter:v0.6.0
Ports: 8125/UDP, 9102/TCP
Host Ports: 0/UDP, 0/TCP
Args:
-statsd.listen-address=:8125
Limits:
cpu: 50m
memory: 100Mi
Requests:
cpu: 50m
memory: 100Mi
Environment:
Mounts:
Volumes:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets:
NewReplicaSet: statsd-sink-79d7cc6454 (1/1 replicas created)
$ kubectl describe service statsd-sink
Name: statsd-sink
Labels: service=statsd-sink
Annotations: prometheus.io/port=9102
prometheus.io/scrape=true
Selector: service=statsd-sink
Type: ClusterIP
IP: 100.68.166.210
Port: statsd-metrics 8125/UDP
TargetPort: 8125/UDP
Endpoints: 10.36.0.5:8125
Port: prometheus-metrics 9102/TCP
TargetPort: 9102/TCP
Endpoints: 10.36.0.5:9102
Session Affinity: None
Events:
```
Expected behavior
Ambassador shouldn't throw the errors and my metrics should be sent to statsd-sink which should be then gathered by my Prometheus instance.
Versions (please complete the following information):
Additional context
Nothing to add...
Socat appears to be creating and exiting processes as UDP packets are sent through socat to the stats-sink pod. Running ps you may see the processes build up but they are exited when they complete which you can see by running ps multiple times in the Ambassador container.
This error message may be due to the volume of packets being sent through socat at a time and a small timing error may occur where the parent process needs to wait for the children to exit. The message can be muted by turning off a verbosity flag when calling socat in entrypoint.sh. This is done by removing the -d in the call to socat: socat -d UDP-RECVFROM:8125,fork UDP-SENDTO:${STATSD_HOST}:8125.
This was present during earlier implementations of statsd as a sidecar to Ambassador (Ambassador 0.38.0 and earlier).
@nbkrause FYI we're currently running ambassador:0.32.2 and I can't seem to find the above behaviour in both containers of one of our ambassador pods (I looked out of curiosity) :
```
$ kubectl exec -ti ambassador-live-6f8f47cc85-2r6m9 sh
Defaulting container name to ambassador.
Use 'kubectl describe pod/ambassador-live-6f8f47cc85-2r6m9 -n kraken-live-pc' to see all of the containers in this pod.
/application # ps auxfww
PID USER TIME COMMAND
1 root 0:00 {entrypoint.sh} /bin/sh ./entrypoint.sh
11 root 39:56 {diagd} /usr/bin/python3 /usr/bin/diagd --no-debugging /etc/ambassador-config
12 root 0:00 /usr/bin/python3 /application/hot-restarter.py /application/start-envoy.sh
13 root 0:07 /usr/bin/python3 /application/kubewatch.py watch /etc/ambassador-config /etc/envoy.json -p 12 --delay 15
35 root 333:48 /usr/local/bin/envoy -c /etc/envoy-2.json --restart-epoch 1 --drain-time-s 5 --parent-shutdown-time-s 10
50 root 0:00 sh
56 root 0:00 ps auxfww
$ kubectl exec -ti ambassador-live-6f8f47cc85-2r6m9 -c statsd-sink sh
/ # ps auxfww
PID USER TIME COMMAND
1 root 106:13 /bin/statsd_exporter -statsd.listen-address=:8125
24 root 0:00 sh
30 root 0:00 ps auxfww```
@nbkrause I think you are looking at the wrong container. You are looking at statsd-sink, rather than the statsd container.
Also your version of ambassador uses the diagd to send stats to the sink, which has been changed in a new version of ambassador.
@bhvr-fmarceau Interesting. I will take a look at what changed between those implementations. Thank you. Statsd is still working when enabling STATSD_ENABLED on ambassador 0.40.0 correct? I have been able to see the same log output but it has not affected statsd's performance.
@mpluhar I have verified I was looking in the correct containers. What makes you believe I was looking in the wrong place? Do you see behavior different than I did when troubleshooting this?
@nbkrause apologies, I have tagged the wrong person!
@bhvr-fmarceau my initial comment was for you
I think the problem with those short lived socat processes is that they still generate a lot of DNS queries ( no matter if statsd runs in a different or in the same container).
@mpluhar @mpatelpa Just so we're clear, the metrics are reaching my statsd-exporter service. What I'm not comfortable with in the current state of things is yes DNS queries plus the spam in the logs! We are indexing our Ambassador logs in ES and we can't have all our Ambassador containers spamming the logs with the waitpid() child process messages...
With respect to the frequent DNS lookups, this seems to be a statsd limitation with how it uses UDP: https://github.com/Shopify/statsd-instrument/blob/master/README.md (see the DNS section). The only workaround seems to be to hard-code the ClusterIP, or install something like nscd.
Regarding the logs, we can remove the -d flag which should reduce verbosity.
NSCD seems reasonably straightforward: https://prefetch.net/blog/2011/03/27/configuring-nscd-to-cache-dns-host-lookups/
What do folks think?
Update: Alpine Linux doesn't have an nscd package, but looks like it has dnscache.
@richarddli We have considered adding a caching daemon as well, but it seems a bit of a hack. Also the DNS TTL (at least in my cluster is 30 seconds). There will be still quite a number of DNS requests, especially when running a number of ambassador instances.
This did not work for me either, though pod just went into a crash loop backoff. Issue was:
Args:
-statsd.listen-address=:8125
Solution for me was to use:
Args:
--statsd.listen-udp=:8125
Tagging for EA4. The newer versions of Envoy support emitting statsd UDP to a hostname (according to the proto) -- this is likely the best approach.
I believe that this is fixed in EA7 -- @rdl, can you verify?
@kflynn EA7 resolves this.
Most helpful comment
@mpluhar @mpatelpa Just so we're clear, the metrics are reaching my statsd-exporter service. What I'm not comfortable with in the current state of things is yes DNS queries plus the spam in the logs! We are indexing our Ambassador logs in ES and we can't have all our Ambassador containers spamming the logs with the waitpid() child process messages...