Nomad 0.4.1
Ubuntu 14.04
At a certain allocation/network request load we get a failure to bind on the specified NOMAD_ADDR port.
reserved_ports_issue.zip
@dadgar I've attached a minmal example. When I hit 60 allocations on my vagrant I hit the issue. I've attached the .nomad file and http_server.js. NOTE: you may need to tweak the region and dc's to match your vagrant.
As a side note I did 3 separate runs increasing the Count each time without tearing down the old allocations.
First run: Count = 25
Second run: Count = 40
Third Run: Count = 60
Below is the results from nomad status reservedports and nomad fs and nomad alloc-status
ID = reservedports
Name = reservedports
Type = service
Priority = 50
Datacenters = dev-vagrant-blu-1
Status = running
Periodic = false
ESC[1mSummaryESC[0mESC[0m
Task Group Queued Starting Running Failed Complete Lost
reservedports 0 0 59 1 0 0
ESC[1mAllocationsESC[0mESC[0m
ID Eval ID Node ID Task Group Desired Status Created At
42455534 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:52:12 PDT
c9ad80f5 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:52:12 PDT
bebab092 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:52:12 PDT
92acd6d8 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:52:12 PDT
82e7f17e 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:52:12 PDT
72c54c3a 8c48311c 7f3f9ab9 reservedports run failed 09/20/16 17:52:12 PDT
239e94ac 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:52:12 PDT
0525a315 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:52:12 PDT
2b450d21 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:52:12 PDT
41157bc5 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:52:12 PDT
1b325f6e 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:52 PDT
f92b367c 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:52 PDT
33724e53 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:52 PDT
d1c2e5c2 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:52 PDT
c51794ec 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:52 PDT
117dfe76 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:52 PDT
2e9a6744 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:52 PDT
aedff0fd 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:52 PDT
8768e29e 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:52 PDT
5d2df93a 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:52 PDT
42899d96 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:01 PDT
6323b9e5 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:01 PDT
6c072f74 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:01 PDT
21bb8e79 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:01 PDT
62659cb0 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:01 PDT
5a19abe4 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:01 PDT
895109a7 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:01 PDT
8c14bfe1 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:01 PDT
91b4e157 8c48311c 7f3f9ab9 reservedports run running 09/20/16 17:50:01 PDT
...
vagrant@blocks1:/usr/lib/reserved_ports$ nomad fs 72c54c3a alloc/logs/reservedports.stderr.0
events.js:141
throw er; // Unhandled 'error' event
^
Error: listen EADDRINUSE :::57967
at Object.exports._errnoException (util.js:873:11)
at exports._exceptionWithHostPort (util.js:896:20)
at Server._listen2 (net.js:1250:14)
at listen (net.js:1286:10)
at Server.listen (net.js:1382:5)
at Object.<anonymous> (/usr/lib/reserved_ports/http_server.js:66:8)
at Module._compile (module.js:409:26)
at Object.Module._extensions..js (module.js:416:10)
at Module.load (module.js:343:32)
at Function.Module._load (module.js:300:12)
vagrant@blocks1:/usr/lib/reserved_ports$ nomad alloc-status 72c54c3a
ID = 72c54c3a
Eval ID = 8c48311c
Name = reservedports.reservedports[58]
Node ID = 7f3f9ab9
Job ID = reservedports
Client Status = failed
Task "reservedports" is "dead"
Task Resources
CPU Memory Disk IOPS Addresses
20 MHz 30 MiB 300 MiB 0 http: 127.0.0.1:57967
Recent Events:
Time Type Description
09/20/16 17:52:39 PDT Not Restarting Error was unrecoverable
09/20/16 17:52:39 PDT Driver Failure failed to start task 'reservedports' for alloc '72c54c3a-7362-4a6e-5a9e-2ebbfc28307b': Couldn't create destination file /opt/nomad/data/alloc/72c54c3a-7362-4a6e-5a9e-2ebbfc28307b/reservedports/usr/bin/node: open /opt/nomad/data/alloc/72c54c3a-7362-4a6e-5a9e-2ebbfc28307b/reservedports/usr/bin/node: text file busy
09/20/16 17:52:21 PDT Restarting Task restarting in 18.027863591s
09/20/16 17:52:21 PDT Terminated Exit Code: 1
09/20/16 17:52:21 PDT Started Task started by client
09/20/16 17:52:13 PDT Received Task received by client
see attached
Hey @dadgar was wondering how this one was going, let me know if you need anything more from me
Sorry have not yet had a chance to dig into it, hopefully soon
Hey @jshaw86,
Just an update. I tried to reproduce this on both 0.4.1 and master with what you gave me. I went from count = 1 to count = 100, back and forth about 8 times and never hit this.
@dadgar hmmm and you used the example I gave? Because you have to be doing some outbound tcp/http calls to get it.
I will try to spin up the nomad vagrant and reproduce since it's been a week. I'm not sure what to tell you other than we get it a lot and very reproducible.
@jshaw86 Yeah the steps I did was:
1) cd /nomad
2) vagrant up
3) Download, unzip your attached archive
4) Install nodejs 6.7
5) move the http_server.js to /usr/lib/reserved_ports
Then I tried with exec as you had it but it was a bit slow to bring them up and down, so then I also tried with raw_exec
@dadgar another engineer just hit it in our staging:
ubuntu@nomad-sched2:~$ nomad alloc-status 5014d47f
ID = 5014d47f
Eval ID = 72157fe3
Name = block-4818.event_handler_4903_1_pdx3[0]
Node ID = 0c5a6638
Job ID = block-4818
Client Status = failed
Task "event_handler_4903_1_pdx3" is "dead"
Task Resources
CPU Memory Disk IOPS Addresses
30 MHz 100 MiB 100 MiB 0 http: 172.16.102.60:41354
Recent Events:
Time Type Description
10/03/16 23:47:57 UTC Not Restarting Error was unrecoverable
10/03/16 23:47:57 UTC Driver Failure failed to start task 'event_handler_4903_1_pdx3' for alloc '5014d47f-b1b1-6c8a-9d7c-58f5ba743c6b': Couldn't create destination file /opt/nomad/mnt/5014d47f-b1b1-6c8a-9d7c-58f5ba743c6b/event_handler_4903_1_pdx3/usr/bin/blocks-usage: open /opt/nomad/mnt/5014d47f-b1b1-6c8a-9d7c-58f5ba743c6b/event_handler_4903_1_pdx3/usr/bin/blocks-usage: text file busy
10/03/16 23:47:51 UTC Restarting Task restarting in 6.087348763s
10/03/16 23:47:51 UTC Terminated Exit Code: 1
10/03/16 23:47:50 UTC Started Task started by client
10/03/16 23:47:49 UTC Received Task received by client
ubuntu@nomad-sched2:~$ nomad fs 5014d47f alloc/logs
Mode Size Modified Time Name
-rw-r--r-- 650 B 10/03/16 23:47:51 UTC event_handler_4903_1_pdx3.stderr.0
-rw-r--r-- 0 B 10/03/16 23:47:50 UTC event_handler_4903_1_pdx3.stdout.0
ubuntu@nomad-sched2:~$ nomad fs 5014d47f alloc/logs/event_handler_4903_1_pdx3.stderr.0
events.js:141
throw er; // Unhandled 'error' event
^
Error: listen EADDRINUSE :::41354
at Object.exports._errnoException (util.js:907:11)
at exports._exceptionWithHostPort (util.js:930:20)
at Server._listen2 (net.js:1250:14)
at Server._listen2 (/usr/lib/pn-blocks/src/node_modules/continuation-local-storage/node_modules/async-listener/index.js:51:23)
at listen (net.js:1286:10)
at Server.listen (net.js:1382:5)
at Object.<anonymous> (/usr/lib/pn-blocks/src/main.js:346:8)
at Module._compile (module.js:409:26)
at Object.Module._extensions..js (module.js:416:10)
at Module.load (module.js:343:32)
@dadgar did you get up to instance count = 60 with exec before switching to raw_exec?
@jshaw86 Yeah I was up to 100. If that has just happened can you use lsof to see what is attached to that port?
If not maybe a dump of all the allocations. I want to know if nomad double assigned that address
@dadgar nothing listed on lsof for that port, but we were about 5 minutes late, so whatever had it went away.
@jshaw86 Can you send me this?
curl http://127.0.0.1:4646/v1/allocations | jq .[].ID | xargs -n1 -I {} curl http://127.0.0.1:4646/v1/allocation/{} > dump
If it is sensitive, you have my email
@dadgar yea, i'm trying to repo on nomad vagrant for you, will try to get a lsof, and that for you
Awesome! Thanks! Want to resolve this for you guys so you can just cherry
pick the commit in the mean time.
On Mon, Oct 3, 2016 at 5:38 PM jshaw86 [email protected] wrote:
@dadgar https://github.com/dadgar yea, i'm trying to repo on nomad
vagrant for you, will try to get a lsof, and that for you—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/nomad/issues/1728#issuecomment-251266522,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA_9aksXqBSdKx5mGxj05hX5hBT_sECUks5qwaATgaJpZM4KDHRf
.
@dadgar Hit it pretty easily Count = 60, lsof was empty for the port 43409. As a aside some of the containers were having issues resolving httpbin and failed with ENOTFOUND which shouldn't be related and maybe just something with our office network right now, but if you see it ignore it or just do a dig on httpbin and replace the DNS with the ip in the code. I'll send a new zip in a min, but basically the same as before.
I've attached the curl to allocations here:
dump.txt
Here's my alloc status and fs
vagrant@nomad-server01:/opt/gopath/src/github.com/hashicorp/nomad$ nomad alloc-status 02001faf
ID = 02001faf
Eval ID = b3951268
Name = reservedports.reservedports[47]
Node ID = ee231055
Job ID = reservedports
Client Status = failed
Task "reservedports" is "dead"
Task Resources
CPU Memory Disk IOPS Addresses
20 MHz 30 MiB 300 MiB 0 http: 127.0.0.1:43409
Recent Events:
Time Type Description
10/04/16 00:54:33 UTC Not Restarting Error was unrecoverable
10/04/16 00:54:33 UTC Driver Failure failed to start task 'reservedports' for alloc '02001faf-eb76-64ca-2afa-6049dd09034b': Couldn't create destination file /opt/nomad/data/alloc/02001faf-eb76-64ca-2afa-6049dd09034b/reservedports/usr/bin/node: open /opt/nomad/data/alloc/02001faf-eb76-64ca-2afa-6049dd09034b/reservedports/usr/bin/node: text file busy
10/04/16 00:54:10 UTC Restarting Task restarting in 16.831422678s
10/04/16 00:54:10 UTC Terminated Exit Code: 1
10/04/16 00:54:10 UTC Started Task started by client
10/04/16 00:53:32 UTC Received Task received by client
vagrant@nomad-server01:/opt/gopath/src/github.com/hashicorp/nomad$ nomad fs 02001faf alloc/logs/reservedports.stderr.0
events.js:141
throw er; // Unhandled 'error' event
^
Error: listen EADDRINUSE :::43409
at Object.exports._errnoException (util.js:907:11)
at exports._exceptionWithHostPort (util.js:930:20)
at Server._listen2 (net.js:1250:14)
at listen (net.js:1286:10)
at Server.listen (net.js:1382:5)
at Object.<anonymous> (/usr/lib/reserved_ports/http_server.js:66:8)
at Module._compile (module.js:409:26)
at Object.Module._extensions..js (module.js:416:10)
at Module.load (module.js:343:32)
at Function.Module._load (module.js:300:12)
@dadgar Another one two failures across 60 allocations: dump2.txt
Here is the zip. I've included an install.sh and a nomad_sched.json so you can have the exact config that i'm using on the nomad vagrant. Just run the install.sh and then execute the echo'ed 3 commands and should be gtg.
@dadgar make sure you up your memory to 4GB so you can run 60 jobs with the cpu/mem requirements of the job
@dadgar did this do it for you?
@jshaw86 I'm attempting to reproduce this myself, but I noticed all of your port conflicts have been in the default ephemeral port range (32768-60999 on my Ubuntu 16.04 box; see cat /proc/sys/net/ipv4/ip_local_port_range).
Do you mind adding those as reserved ports on your client and seeing if that prevents collisions?
If it does, the fix might be for nomad to automatically add that range to reserved ports.
You may need to try on 0.4.0 since there is that other reserve port bug introduced in 0.4.1
@schmichael @dadgar I'm on 14.04 rather than 16.04 if it makes a difference. One of our SRE's Mckenzie is going to reply back to your suggestion about the ephemeral port range as he did try that a while back and ran into some issues.
I think though there is some difference in ideology on how this should work. Currently in nomad the implementation is to blacklist ports rather than whitelist which is causing the issue. I think the way to avoid this is to give nomad a whitelist of ports and then we change the kernels ephemeral port range outside of what we give nomad. Mckenzie can provide more details as he's more experienced in this area.
I think Mckenzie and @diptanu had a chat about this at HashiConf as well along these lines
@jshaw86 They both should accomplish the same thing
@schmichael the problem we are faced with, is that with a nomad agent port black list (reserved_ports), you can only tell nomad what ports not to use. This means that the nomad agent still uses the ephemeral port range to dynamically provision ports. This needs to be a white list.
Any application that is deployed onto the nomad agent, that uses the ephemeral port range, has the potential to consume a port that was meant to be provisioned by the nomad agent. This is where we get the conflict.
We have two buckets of ports, ip_local_reserved_ports (we'll call this local) and ip_local_port_range (we'll call this ephemeral).
The local bucket holds a list of ports that we have determined should not be dynamically assigned. These ports are used in configuration files to start application outside of any nomad process. This is how we prevent any containers from getting these ports.
All of the containers deployed by nomad use the ephemeral bucket. Any application that is started in the container also uses the ephemeral port range to make outbound connections. Again, our conflict. This is where we are seeing port collision.
The nomad agent needs to be able to use a range of ports reserved from the local bucket, for its dynamic port allocation. Then any process in a container that needs to get a port, uses the ephemeral port range.
@schmichael and @dadgar I think we have some more issues, or at the least, some confusion. From the docs: https://www.nomadproject.io/docs/jobspec/networking.html The dynamic port range is 20000 60000. In testing, setting
net.ipv4.ip_local_port_range = 32768 Â Â 61000
Nomad does not respect the ephemeral port range, scheduling at random from the hard coded range. Here is a list of ports from our test:
PORT 55289
PORT 23032
PORT 30699
PORT 24497
PORT 39010
PORT 26186
PORT 28465
PORT 29137
PORT 25844
PORT 47243
PORT 30238
PORT 27880
PORT 57696
PORT 41776
PORT 28093
PORT 55455
If you can confirm that nomad does not use the ephemeral range and in fact uses a hard coded (white list) of 20000 60000 (from the kernel docs: "If possible, it is better these numbers have different parity."). All we have to do is use is
net.ipv4.ip_local_reserved_ports = 20000-60000
to solve this problem. At this time, this would leave us about 10k ports for outbound connections. I think we can get by with this. But we might want consider making the hard coded range a bit smaller.
source: https://github.com/hashicorp/nomad/blob/master/nomad/structs/network.go
@softslug what you are saying is correct. Additionally if you need more ports you can add the port range 50000-60000 to the port blacklist and now you have 20,000 ports for ephemeral usage (to whatever extreme you want). Nomad can't control what ephemeral port applications launched under it get assigned. That is done by the kernel.
So you need to set the net.ipv4.ip_local_reserved_ports
@jshaw86 @mac-abdon How did it go?
@dadgar we're building nomad off this: https://github.com/pubnub/nomad/commits/v0.4.2. Includes @diptanu chroot changes and we changed the port range that nomad can assign so we can better tune the kernel, and get around the existing 0.4.1 reserved port bug on AWS. We will hopefully have this deployed with the kernel changes by EOB today. Will close if everything checks out. Thx!
Any update?
Yep we deployed to both our staging servers looks like it's resolved going to wait to close till we're in production which will be later this week.
We didn't realize the nomad servers did the port assignments so it took us a bit longer to deploy in staging then we wanted to practice the roll before we did it in production to make sure we didn't break anything. Which is why it's taken longer. Thx!
We've successfully rolled this into most of our production infrastructure. I'm going to close this. If anyone comes across this in the future all we did is change the nomad port range to be the top end of the ephemeral port range, https://github.com/pubnub/nomad/commit/7db9c44e1ae2ded099059b9c9827be41d2030130. Note that 65536 is non-inclusive.
Then tuned our sysctl ephemeral port range to be outside of that.
net.ipv4.ip_local_port_range = 10001 40000
net.ipv4.ip_local_reserved_ports = 40001-65535
Once this issue is resolved: https://github.com/hashicorp/nomad/issues/1617. You should be able to just specify reserved ports and just change the sysctl ephemeral port range rather than having to do the code change.
So I think the real outcome of this is that we should make that range configurable and adds some docs on how to avoid this issue.
@dadgar do you want another ticket for those or are you guys tracking internally?
Can you open another ticket!
@dadgar was this change, making the min/max port configurable ever fixed? :)
@jippi Not yet. Working with a user to test a hypothesis and then will make appropriate changes in Nomad to avoid this problem. It more or less will break down to determining the kernels ephemeral port range and trying to avoid an overlap and also shrinking Nomad's own ephemeral port range.
@dadgar Met the same problem here. Any update on the change?