Athens: Athens leaks defunct zombie git processes

Created on 23 Mar 2019  路  14Comments  路  Source: gomods/athens

Describe the bug

I've had an Athens (0.2) pod running in Kubernetes since 24 Jan 2019. It's leaked 3,675 zombie processes since then.

Error Message

    PID TTY      STAT   TIME COMMAND
3136069 ?        Ss     0:00 /bin/sh
3136083 ?        R+     0:00  \_ ps afx
3136084 ?        R+     0:00  \_ less
      1 ?        Ssl  1216:37 /bin/athens-proxy -config_file=/config/config.toml
  14066 ?        Z      0:00 [git-remote-http] 
  16603 ?        Z      0:00 [git-remote-http] 
  16817 ?        Z      0:00 [git-remote-http] 
  21024 ?        Z      0:00 [git-remote-http] 
  21078 ?        Z      0:00 [git-remote-http] 
  21368 ?        Z      0:00 [git-remote-http] 
  21419 ?        Z      0:00 [git-remote-http] 
  21737 ?        Z      0:00 [git-remote-http] 
  21778 ?        Z      0:00 [git-remote-http] 
  21826 ?        Z      0:00 [git-remote-http] 
  21862 ?        Z      0:00 [git-remote-http] 
  30244 ?        Z      0:00 [git-remote-http] 
  32825 ?        Z      0:00 [git-remote-http] 
  33181 ?        Z      0:00 [git-remote-http] 
  34421 ?        Z      0:00 [git-remote-http] 
  35672 ?        Z      0:00 [git-remote-http] 
  44237 ?        Z      0:00 [git-remote-http] 
  44265 ?        Z      0:00 [git] 
  47997 ?        Z      0:00 [git-remote-http] 
  48175 ?        Z      0:00 [git-remote-http] 
....

/ # ps afx | grep defunct | wc -l
3675

To Reproduce

Run Athens for a bit.

Expected behavior

Processes are waited upon and don't leave zombies.

Environment (please complete the following information):

  • OS: linux
  • Go version : whatever the official 0.2 container was built with
  • Proxy version : 0.2
  • Storage (fs/mongodb/s3 etc.) : fs
bug

All 14 comments

@bradfitz Athens does not use git directly. It shells out to go which uses git internally.
The only two commands that Athens uses which could end up using git are

  1. go mod download here: https://github.com/gomods/athens/blob/master/pkg/module/go_get_fetcher.go#L131
  2. go list -m here: https://github.com/gomods/athens/blob/master/pkg/download/upstream_lister.go#L52

In both cases, we run cmd.Run() which waits for the go command to exit. Is there any more cleaning up we'd have to do on the Athens side to ensure children processes are cleaned up?

Otherwise, I imagine this might be a bug in go modules itself?

Also iirc Athens v0.2.0 is built with go version 1.11.x (I think 1.11.4 but not at my machine to confirm), in case that helps

@arschles Athens v0.2.0 uses golang:1.11-alpine while Athens v0.3.1 uses golang:1.12-alpine here

If I'm looking at the code from 1.11 correctly, running git commands for modules comes from here which seems to use Run as well.

Hopefully I'm not looking at the right code because otherwise I'm not sure where the processes are being abandoned.

@bradfitz is there any way to see the actual command that ran those abandoned processes?

Thanks

Actually I noticed that athens-proxy is pid 1 above.

I suspect this is just a packaging issue in your container.

You should probably wrap athens-proxy in https://github.com/krallin/tini (see https://github.com/golang/go/issues/23705 where Go discovered this too for some of our services)

@bradfitz

Disclaimer: I'm using this issue as a learning moment because it's the first time I hear about Docker/GKE not cleaning up properly finished processes. So thank you very much for taking the time to report the issue and help out :)

As for your last comment,

I suspect this is just a packaging issue in your container.

Do you mean we should just include tini? Or do you mean there's an actual mistake in our Dockerfile?

Here is the Dockerfile we use for reference: https://github.com/gomods/athens/blob/master/cmd/proxy/Dockerfile

According to Tini's author in What is the advantage of Tini?, if the code knows not to leave defunct processes then you don't need Tini. Tini is mainly for running processes that you don't have control over like Jenkins.

In our case we _do_ have control over both Athens and Go. So my thinking is that Tini would work but it would not solve the underlying solution of some bad code somewhere not properly waiting on child processes. The only possibility I see is that git might run a child git command and doesn't wait for it, but that sounds a lil ridiculous.

Furthermore, I saw the following in the tini README:

NOTE: If you are using Docker 1.13 or greater, Tini is included in Docker itself. This includes all versions of Docker CE. To enable Tini, just pass the --init flag to docker run.

Can you just pass the --init flag when you run Athens in your environment?

Which makes me think that --init should always be defaulted to true on the Docker side no? Because if all it does is fix an issue that has nothing to do with the software we write, then why not just default it to true.

Lastly, the issue you linked seemed to be a GKE problem here: https://github.com/golang/go/issues/23705#issuecomment-364530560

Could this mean that GKE might have this issue fixed and all you need to do is update the cluster?

Lastly, I'm obviously very happy to include tini in our Dockerfile, but I'd love to know _why_ and if there's a possibility of fixing some code somewhere, that'd be more ideal (we can include tini here for now, and hunt for the real problem down the line).

It also seems like tini wouldn't hurt to be included on the Athens Dockerfile in case someone else is using an old GKE cluster or an old Docker version...but it still blows my mind a tiny bit that we have to worry about this type of stuff on the application layer.

Thanks again!

Hmm i think i saw those defunct git processes at some point with 0.2, havent seen that happening since i have updated to 0.3. It was maybe related to this https://github.com/census-instrumentation/opencensus-go/issues/1046.

Nevermind i just checked and can see 33 zombie git processes with gomods/athens:v0.3.1.

What do folks think about this problem? @bradfitz did you happen to update GKE and see if that fixed the problem? Otherwise, can we give you a custom image, which executes Athens with tini, to try in your cluster?

@bradfitz v0.4.0 is released which has tini in the Dockerfile (docker pull gomods/athens:v0.4.0) -- feel free to check it out and please let us know if the issue still persists.

I personally have done something similar to https://github.com/ramr/go-reaper in my own containers.
It may be an option here.

@atomi https://github.com/gomods/athens/pull/1260 has a fix for this. Can you take a look and see if it's similar to what you were thinking?

@arschles I don't have any issues with tini. It makes a lot of sense since Athens already has dependencies from alpine such as vcs and openssh packages.
I think using tini in this case is more in line with "keeping it simple." Apologies for adding extra noise here. You're all good. :+1:

@atomi that's all good! I'm really glad you brought up your go-reaper project 馃槃, I'll go check it out sometime. Cheers!

@arschles It's not my project. I've just used it in the past before --init was available.

It' was an interesting solution at the time. Anyway, Cheers.

Was this page helpful?
0 / 5 - 0 ratings