Ambassador: statsd socat beats the DNS to death

Created on 28 Jun 2018  路  11Comments  路  Source: datawire/ambassador

When deploying kubernetes on single node cluster using kubeadm, coredns services shows high cpu usage (30-40% per process) if ambassador was deployed.

How to reproduce:

  1. Install kubeadm on single node.
  2. Set net.bridge.bridge-nf-call-iptables=1
  3. Allow master to run pods kubectl taint nodes --all node-role.kubernetes.io/master-
  4. Install network plugins, flannel or romana (reproduced on both).
  5. Deploy ambassador.

Versions:

  • Ambassador 0.34.3 or 0.35.0
  • kubeadm v1.11.0 on vds
  • kubectl v1.11.0

Ambassador with no connected services, without ingress. High kube-dns and dnsmasq cpu usage (cannot attach a screenshot, since I have already updated k8s). After upgrade, kube-dns was replaced with coredns, but cpu usage is identical. After deleting ambassador CPU usage reduced to normal value (1-3% in my case). Ambassador 0.30.1 doesn't have this issue.

top screenshot:
image

high

Most helpful comment

@kflynn No, I'm not in the channel.
For configuration i use this manual with RBAC enabled. Statsd isn't separately configured.
I looked at the logs, there is a repeated message

2018/06/28 15:25:11 socat[6173] E getaddrinfo("statsd-sink", "NULL", {1,0,2,17}, {}): Name does not resolve

about 100 times per second.
Then found the solution in a similar issue - https://github.com/datawire/ambassador/issues/465 ...

Tested with statsd container disabled - all is ok. Thanks for the right questions ;) this is my inattention.

All 11 comments

@jar3b Thanks for the report! Are you open to running a prerelease Ambassador image to see if it exhibits this behavior?

@jar3b Actually, I take that back. First things first: are you on our Slack channel? 馃槃

Beyond that, do you have the statsd container configured? What happens if you delete it?

@kflynn No, I'm not in the channel.
For configuration i use this manual with RBAC enabled. Statsd isn't separately configured.
I looked at the logs, there is a repeated message

2018/06/28 15:25:11 socat[6173] E getaddrinfo("statsd-sink", "NULL", {1,0,2,17}, {}): Name does not resolve

about 100 times per second.
Then found the solution in a similar issue - https://github.com/datawire/ambassador/issues/465 ...

Tested with statsd container disabled - all is ok. Thanks for the right questions ;) this is my inattention.

@jar3b Thanks for confirming that dropping the statsd container solved it for you! I'm leaving this open, because the statsd container needs to be better behaved than that.

I have the same issue, after scaling to several instances cluster is dying. Ambassador version 0.38.0 on GKE

This is being worked on. Expect a fix in a few hours.

Sorry for the delay, loooong weekend :|
The fix is at #725, if someone wants to try it out. Container image for the PR: quay.io/datawire/ambassador:concaf-fix-socat-5d30875
Do read the instructions at https://github.com/datawire/ambassador/pull/725#issue-208206832

This might take a couple of days to merge because it's sort of a backwards incompatible change.

I'm trying to run this on our GKE.

@containscafeine should I remove statsd sidecar from deployment now?

@containscafeine I've tested it on our infrastructure and it's working great!

Before installing Ambassador:
https://i.imgur.com/9EM7qQf.jpg

Installing 0.38.0
https://i.imgur.com/RTge7XT.jpg

Purging all and installing your patched image and removing statsd from depoyment.
https://i.imgur.com/I9siOCr.jpg
https://i.imgur.com/CTwPWbP.jpg

service is working correctly (didn't know what else can go wrong :))

Thanks for the feedback @exu. This is now in 0.39.0 release - https://github.com/datawire/ambassador/releases/latest

Was this page helpful?
0 / 5 - 0 ratings

Related issues

vishal-yadav picture vishal-yadav  路  4Comments

josephglanville picture josephglanville  路  3Comments

ngrigoriev picture ngrigoriev  路  3Comments

klarose picture klarose  路  5Comments

Viacheslav-Akimov picture Viacheslav-Akimov  路  6Comments