Nomad: Tens of thousands of open file descriptors to a single nomad alloc logs directory

Created on 23 Nov 2016 · 56Comments · Source: hashicorp/nomad

Nomad version

Nomad v0.4.1

Operating system and Environment details

Linux ip-10-201-5-129 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Issue

Nomad has tens of thousands of open file descriptors to an alloc log directory.

nomad 2143 root *530r DIR 202,80 4096 8454154 /var/lib/ssi/nomad/alloc/14e62a40-8598-2fed-405e-ca237bc940c6/alloc/logs

Something similar to that repeated ~60000 times.
lsof -p 2143 | wc -l returns ~60000

I stopped the alloc but the descriptors are still there.

In addition, the nomad process is approaching 55 GB of memory used.

stagneeds-investigation themapi typbug

Source

sheldonkwok

All 56 comments

Hey could you post the open file descriptors in a gist?

Couple questions:
1) Are you running raw_exec jobs?
2) Can you grab the memory usage of all nomad processes (tree view would be great)
3) Do you have enable_debug set? If so can you grab a memory profile of the agent by hitting:

go tool pprof -output=mem.profile http://127.0.0.1:4646/debug/pprof/heap

dadgar on 23 Nov 2016

1) Not running raw_exec jobs.
2/3) Will do next time we run into the issue.

sheldonkwok on 23 Nov 2016

process

Here's a glimpse of htop. The process just ran out of memory and rebooted.
The logs are filled with "dropping update to alloc"

Unfortunately, this agent did not have enable_debug set.

sheldonkwok on 23 Nov 2016

Here's a graph of memory usage from our alerting system.

sheldonkwok on 23 Nov 2016

@sheldonkwok can you attach the logs and the open file descriptors.

Does this happen regularly? The memory profile will help a lot!

dadgar on 26 Nov 2016

Using the docker driver. Debug is on now so I'll have to catch it when it's in the bad state before it reboots.

sheldonkwok on 29 Nov 2016

Bad news. Getting net/http: timeout awaiting response headers when trying to run go tool pprof. Notably we were running the logging command before this and the 60000 open file descriptors are

nomad   2113 root *012r      DIR  202,80     4096 8456000 /var/lib/ssi/nomad/alloc/ddf27f57-9179-2fbf-9c44-69f54a28f17b/alloc/logs
nomad   2113 root *013r      DIR  202,80     4096 8456000 /var/lib/ssi/nomad/alloc/ddf27f57-9179-2fbf-9c44-69f54a28f17b/alloc/logs
nomad   2113 root *014r      DIR  202,80     4096 8456000 /var/lib/ssi/nomad/alloc/ddf27f57-9179-2fbf-9c44-69f54a28f17b/alloc/logs

repeated many many times.

sheldonkwok on 30 Nov 2016

got anything in the client or server logs relevant to the issue? e.g. repeated actions ?

jippi on 30 Nov 2016

A lot of
client: alloc "00e0bb67-624c-5221-087b-ba31f5740b43" in terminal status, waiting for destroy
in the client but that seems more of a product of the client running out of memory

sheldonkwok on 30 Nov 2016

We encountered this issue again after hitting the logging api for one of the allocs. It's crashing multiple production nomad clients.

sheldonkwok on 1 Dec 2016

@sheldonkwok Can you show nomad fs of that folder? What is the logging like of the task? What is the log config

dadgar on 1 Dec 2016

Mode        Size    Modified Time          Name
-rw-r--r--  31 KiB  12/01/16 21:25:59 UTC  service.stderr.0
-rw-r--r--  0 B     12/01/16 06:31:26 UTC  service.stdout.0

      LogConfig: {
        MaxFileSizeMB: 10,
        MaxFiles: 10,
      },

Nothing too crazy it seems.

sheldonkwok on 1 Dec 2016

Does this reproduce every time you log or just occasionally. What is the log command you are running? Are the file descriptors open on the nomad client or the executor?

dadgar on 1 Dec 2016

/v1/client/fs/logs/<Allocation-ID>
follow=true,origin=start,offset=0

We're not sure if it's every time or just occasionally. The descriptors are open on the client.

sheldonkwok on 1 Dec 2016

@sheldonkwok I can not reproduce this on either 0.4.1/0.5.0/master. It would be great if you could get a vagrant box or some reproductions steps so I can debug this. As of now it is a bit hard since you can't even attach a profiler

dadgar on 1 Dec 2016

We're still working on some reproduction steps because nothing we do immediately crashes the system and we're alerted of issues later.

sheldonkwok on 1 Dec 2016

Managed to get the client into a state where the memory low but the number of file descriptors open is still high. Here's the mem profile.

http://pastebin.com/XbPH9qhw

sheldonkwok on 2 Dec 2016

Could you get the profile too and send both the heap and profile output

Thanks,
Alex

On Dec 1, 2016, 3:16 PM -0800, Sheldon Kwok notifications@github.com, wrote:

Managed to get the client into a state where the memory low but the number of file descriptors open is still high. Here's the mem profile.
http://pastebin.com/XbPH9qhw
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

dadgar on 2 Dec 2016

Tree: http://pastebin.com/Up3kLmvi
Top: http://pastebin.com/GxfDm6Hg
Dot: http://pastebin.com/iVLYVUY8

Does this help?

justinwalz on 2 Dec 2016

We upgraded to Nomad 0.5 and we're still experiencing this issue when we make multiple requests to an allocation's log route simultaneously. Our use case is that many of our devs want to see logs for their service at the same time.

sheldonkwok on 6 Dec 2016

@sheldonkwok Can you get me this: curl http://127.0.0.1:4646/debug/pprof/goroutine?debug=2

How are retrieving the logs? Programatically?

dadgar on 6 Dec 2016

Will do next time it happens.
We are using the route /v1/client/fs/logs/<Allocation-ID> when one of the devs asks for logs.

sheldonkwok on 6 Dec 2016

My current thought is you may never be closing the connection which is causing this. The goroutines should help correlate number of descriptors to times its been called (goroutines handling the request)

dadgar on 6 Dec 2016

I was considering that but the number of file descriptors jumped from dozens to the tens of thousands pretty quickly when we only made a handful of simultaneous requests.

sheldonkwok on 6 Dec 2016

I reproduced this behavior with nomad logs command. When job contains more that one task and you trying to see logs from them e.g.:
nomad logs -tail -f -stderr -job job_name

leonoff on 7 Dec 2016

We unfortunately had to stop using the nomad logs/logs route because of this issue. We are instead ingesting the logs through Filebeat/Logstash/Elasticsearch. It doesn't handle streaming as well though. Will continue to try and reproduce though.

sheldonkwok on 7 Dec 2016

@sheldonkwok Sorry to hear that! Hope to be able to resolve soon and get you using it again. Still haven't been able to reproduce unfortunately!

@leonoff When I do that I get a message telling me to pick which task? Is that not the behavior you see?

dadgar on 7 Dec 2016

@dadgar I don't see any messages telling me to pick up task,
my job: https://gist.github.com/leonoff/1169e6337ba0ccc56c044a6a857207f1
step-by-step actions:

nomad run -check-index 0 job.hcl
wait until running
nomad logs -tail -f -stderr -job hello
see no logs(it's ok)
exit, ^C
CPU usage on nomad spikes up to 100% & count of opened file descriptors grown up to 2-3k

I'm running Nomad v0.5.0

leonoff on 8 Dec 2016

@leonoff @sheldonkwok Got it to reproduce!!! Thanks for the instruction @leonoff. I have reproduced it. For me though the file descriptors close after a while and it is a few hundred not thousand. Still bad behavior though.

Will begin investigating!

dadgar on 8 Dec 2016

🎉1

@dadgar I suppose that number of opened file descriptors at a time limited by cpu resource :). And yes, in most cases file descriptors close after a while, but some times it didn't and only way to stop it is to stop the job. Also, I mentioned a strange "side effect": after an "incident" average CPU usage of nomad is increased. e.g. when job(specified above) running nomad uses 0.3-0.7% of cpu, after an incident(when file descriptors closed) 20-30%.

leonoff on 8 Dec 2016

Yeah I noticed the jump in CPU too.

dadgar on 8 Dec 2016

I just found a bug in nomad-ui that would cause it to never ever stop tailing files

if you use nomad-ui it could be part of the cause, maybe

jippi on 9 Dec 2016

@sheldonkwok @leonoff Can you please test the fix? We released new binaries https://releases.hashicorp.com/nomad/0.5.1-rc2/

diptanu on 10 Dec 2016

Will do. Thanks for the quick turnaround!

On Fri, Dec 9, 2016, 19:15 Diptanu Choudhury notifications@github.com
wrote:

@sheldonkwok https://github.com/sheldonkwok @leonoff
https://github.com/leonoff Can you please test the fix? We released new
binaries https://releases.hashicorp.com/nomad/0.5.1-rc2/

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/nomad/issues/2024#issuecomment-266175267,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABbBAWHjTMosjS-0xdrTqeO4MNHnPyK1ks5rGhlmgaJpZM4K6KhC
.

sheldonkwok on 10 Dec 2016

@diptanu @dadgar Tested new binaries, issue fixed for me.

leonoff on 12 Dec 2016

@leonoff Awesome! Thank you!

dadgar on 12 Dec 2016

@dadgar @diptanu Please see https://github.com/hashicorp/nomad/issues/2089.

Had no issues today with logging, but did have many clients panic.

justinwalz on 13 Dec 2016

The file descriptor issue just creeped back on to our system today running Nomad 0.5.2... Will try to reproduce. Just letting everyone know.

sheldonkwok on 11 Jan 2017

This has still been happening a lot for us lately. Today there was an actual panic - I opened a separate ticket here, but there was heavy utilization of logs during the time of the crash. I still have a feeling there's an edge case in here somewhere...

justinwalz on 23 Jan 2017

@justinwalz Do you have a list of the file descriptors that were open?

dadgar on 23 Jan 2017

I got to the machine after the nomad process had already died, but I will get a dump next time it occurs live.

justinwalz on 23 Jan 2017

Hi, it occurred again

Nomad is using up a ton of RAM, and a high count of file descriptors

ps aux | grep nomad

root       2449  649 39.0 59051896 49124000 ?   Ssl  19:17 868:09 /usr/local/bin/nomad agent -config /etc/nomad

truncated output from lsof -p 2449

nomad   2449 root *810r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *811r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *812r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *813r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *814r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *815r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *816r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *817r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *818r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *819r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *820r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *821r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *822r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *823r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *824r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *825r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *826r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *827r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *828r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *829r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *830r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *831r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *832r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *833r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *834r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *835r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *836r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *837r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *838r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *839r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *840r      DIR             202,81     4096 2621596 /var/lib/ssi/nomad/alloc/32b3a6a3-7fde-418b-2006-7c65b8ec1660/alloc/logs
nomad   2449 root *841r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *842r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs
nomad   2449 root *843r      DIR             202,81     4096 2621588 /var/lib/ssi/nomad/alloc/bd1fa749-7fa1-da62-faf1-8a3187ecc508/alloc/logs

nomad -v
Nomad v0.5.2

justinwalz on 26 Jan 2017

To add more context, I've attached the client logs. There are some errors about the logs API and too many open files included.

For what it's worth, nomad healed itself - both memory usage back under control and file descriptors back to normal without needing to restart.

client-logs.tar.gz

justinwalz on 26 Jan 2017

@justinwalz Okay re-opening this. Do you have any reproduction steps?

dadgar on 28 Jan 2017

We're still running into this issue every week. It makes the server unresponsive and forces reallocations for all of the jobs on it. This issue happens across all of our jobs intermittently when we attempt to fetch logs. It's difficult to reproduce but seems guaranteed to popup with enough logging requests over time on a job. The shared logging config is

      LogConfig: {
        MaxFileSizeMB: 10, 
        MaxFiles: 10, 
      },

sheldonkwok on 14 Mar 2017

@sheldonkwok I will make time to play with this soon. I was never able to reproduce however. Are you running both server/client on one machine?

dadgar on 14 Mar 2017

Thanks Alex. We are running 5 servers on their own machines and the clients are run separately as well.

sheldonkwok on 14 Mar 2017

Hey just tried the following and couldn't reproduce:

Ran this job:

job "example" {
  datacenters = ["us-west-2"]
  type = "service"

  group "cache" {
    count = 20

    restart {
      attempts = 60
      interval = "1m"
      delay = "1s"
      mode = "delay"
    }

    ephemeral_disk {
      size = 300
    }

    task "redis" {
      driver = "docker"
      config {
        image = "redis:3.2"
        command = "/bin/bash"
        args = ["-c", "base64 /dev/urandom | head -c 10000000"]
        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 500 # 500 MHz
        memory = 256 # 256MB
        network {
          mbits = 10
          port "db" {}
        }
      }
    }
  }
}

The job produces 10 MB of logs every few seconds and I streamed the logs of all 20 allocations to 3 different machines. I did this for 30+ minutes and transferred many GBs of logs from the client.

@sheldonkwok I did look at your logs and it looks like the streaming connection is held for 50+ minutes but it is still streaming log stdout.0/stderr.0? Are the apps just not logging much?

I am going to try that as well and will report back.

dadgar on 20 Apr 2017

Just tailed logs for over an hour and nothing as well.

@sheldonkwok any more detail on the logging pattern would be great.

dadgar on 20 Apr 2017

Hey @dadgar, I really appreciate the efforts you've spent trying to reproduce it! A normal logging pattern that we see is multiple people tailing the logs of multiple services at the same time for many hours. Some of these services may be the same service as well.

sheldonkwok on 21 Apr 2017

@sheldonkwok Hm unfortunately I did try that.

So I had 3 nodes each running ~7 allocations. I then was following the logs of all 20 allocations on 3 different machines for over an hour and a half.

I repeated this for two different workloads:

1) Small trickle of logs outputted by each allocation
2) 10 MBs of logs every few seconds.

Neither reproduced. Further when I stopped the logging, not a single extra FD was used (as in there was no leaking that would eventually cause a problem).

dadgar on 22 Apr 2017

Unfortunately, we cannot reproduce it effectively either. It generally happens once a week. We have thousands of allocations being created and running daily over hundreds of machines. In addition, users can be tailing thousands of them at once. Thanks again @dadgar

sheldonkwok on 24 Apr 2017

👍1

@sheldonkwok Do you retain Nomad log files? Would you be willing to run a debug adds some logging that may help get to the bottom of this?

dadgar on 24 Apr 2017

Sure, what do we set and how much extra logging does it generate?

sheldonkwok on 24 Apr 2017

@sheldonkwok You will set log_level = "DEBUG". I don't know how much extra logging as I need to spend some time building the debug build. I may get to it this week. Will let you know.

dadgar on 24 Apr 2017

@sheldonkwok I'm doing some cleanup of old issues and came across this one. The Nomad logging infrastructure has undergone many changes since this was originally opened. If you're still experiencing any problems would you please open a new issue?

Thanks!