Nomad 0.3.0 and 0.3.1
Amazon Linux running the following kernel version:
Linux 4.1.19-24.31.amzn1.x86_64 x86_64 GNU/Linux
It seems there's an issue while executing a job that uses the isolated exec driver. I get permission denied while running simple jobs that for example pings a website. I started a thread in the google group and @dadgar suggested moving the conversation to an issue. At the time of that thread I was using Nomad 0.3.0, we've moved to 0.3.1 and the same thing's happening.
$ nomad alloc-status 619ea328
ID = 619ea328
Eval ID = dfdf0bf7
Name = staging-health.daemon[0]
Node ID = 8b976f92
Job ID = staging-health
Client Status = failed
Evaluated Nodes = 10
Filtered Nodes = 6
Exhausted Nodes = 0
Allocation Time = 179.833碌s
Failures = 0
==> Task "health" is "dead"
Recent Events:
Time Type Description
21/03/16 18:26:31 VET Restarts Exceeded Task exceeded restart policy
21/03/16 18:26:31 VET Driver Failure error starting process via the plugin: error starting command: fork/exec /bin/ping: permission denied
21/03/16 18:26:29 VET Received Task received by client
==> Status
Allocation "619ea328" status "failed" (6/10 nodes filtered)
* Class "dev" filtered 2 nodes
* Class "prod" filtered 4 nodes
* Constraint "${node.class} = staging" filtered 6 nodes
* Score "8b976f92-2bfa-83bf-458c-7a9159006400.binpack" = 17.857757
* Score "d4b067c4-b831-db76-8256-f9e8959fa8ae.binpack" = 17.476360
* Score "52b3d380-ffa2-e6b9-1761-12d646cb4511.binpack" = 1.460294
* Score "88999bee-acb8-8399-da9c-b98002ff00d5.binpack" = 1.460294
==> Task Resources
Task: "health"
CPU Memory MB Disk MB IOPS Addresses
20 10 300 0 http: 172.17.18.167:20708
job "staging-health" {
type = "service"
priority = 50
constraint {
attribute = "${node.class}"
value = "staging"
}
update {
stagger = "30s"
max_parallel = 1
}
group "ping" {
count = 1
restart {
attempts = 15
delay = "15s"
interval = "5m"
mode = "delay"
}
task "health" {
driver = "exec"
config {
command = "/bin/ping"
args = ["-c", "20", "google.com"]
}
resources {
cpu = 20
memory = 10
network {
mbits = 2
port "http"{
}
}
}
}
}
}
I hope this is useful! Thanks.
What AMI are you using. I could not reproduce on a recent Amazon Linux AMI.
Sorry I forgot to mention this, I'm running in amzn-ami-hvm-2015.09.2.x86_64-gp2 but yum updates has been run so kernel and packages were updated.
Is #1009 related to this?
@consultantRR I do not believe they are related. Is there something you are seeing that leads you to that? May help debugging
For the following drivers:
exec, java
running on RHEL 6.5 we get the same error - Permission Denied:
10/20/16 11:12:09 CEST Driver Failure failed to start task 'config' for alloc '83253d09-b591-7be7-b486-3714d04fc859': fork/exec /usr/bin/java: permission denied
This made no difference with setting user.
running raw_exec had no problem, when not setting a user, whilst if a user was configured in the task, the same error as above was experienced
@czerwina can you try to check the alloc directory permissions for the user you're using (eg try temporarily to chmod o+rwX -R the entire alloc dir. Make sure to revert back to sane settings after testing)?
I think that I found a cause of the issue. I have something like that
03/14/17 14:04:51 CET Driver Failure failed to start task "app" for alloc "54a0c7cb-ca12-43b1-0bb7-057ab60940c6": failed to start command path="usr/lib/jvm/java-1.7.0-openjdk-1.7.0.131.x86_64/jre/bin/java" --- args=["usr/lib/jvm/java-1.7.0-openjdk-1.7.0.131.x86_64/jre/bin/java" "-Xmx512m" "-Xms256m" "-Dserver.port=42693" "-jar" "/tmp/app.jar"]: fork/exec usr/lib/jvm/java-1.7.0-openjdk-1.7.0.131.x86_64/jre/bin/java: permission denied
When I looked into directory tmp located in allocation I saw:
-rw-rw---- 1 root root 45343103 Mar 14 13:02 app.jar
then less of executor.out:
2017/03/14 13:06:45.765696 [DEBUG] executor: launching command /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.131.x86_64/jre/bin/java -Xmx512m -Xms256m -Dserver.port=32386 -jar /tmp/app.jar
2017/03/14 13:06:45.765716 [DEBUG] 2017/03/14 13:06:45.765696 [DEBUG] executor: launching command /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.131.x86_64/jre/bin/java -Xmx512m -Xms256m -Dserver.port=32386 -jar /tmp/app.jar
2017/03/14 13:06:45.765716 [DEBUG] executor: running command as nobody
The crucial part is executor: running command as nobody and lack of read permission for nobody :)
-rw-rw---- 1 root root 45343103 Mar 14 13:02 app.jar
Unfortunately I don't know how to add read permission on Artifact stanza :(
Any help appreciated :)
Is this on the roadmap to be fixed?
Most helpful comment
I think that I found a cause of the issue. I have something like that
03/14/17 14:04:51 CET Driver Failure failed to start task "app" for alloc "54a0c7cb-ca12-43b1-0bb7-057ab60940c6": failed to start command path="usr/lib/jvm/java-1.7.0-openjdk-1.7.0.131.x86_64/jre/bin/java" --- args=["usr/lib/jvm/java-1.7.0-openjdk-1.7.0.131.x86_64/jre/bin/java" "-Xmx512m" "-Xms256m" "-Dserver.port=42693" "-jar" "/tmp/app.jar"]: fork/exec usr/lib/jvm/java-1.7.0-openjdk-1.7.0.131.x86_64/jre/bin/java: permission deniedWhen I looked into directory tmp located in allocation I saw:
-rw-rw---- 1 root root 45343103 Mar 14 13:02 app.jarthen less of executor.out:
2017/03/14 13:06:45.765696 [DEBUG] executor: launching command /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.131.x86_64/jre/bin/java -Xmx512m -Xms256m -Dserver.port=32386 -jar /tmp/app.jar 2017/03/14 13:06:45.765716 [DEBUG] 2017/03/14 13:06:45.765696 [DEBUG] executor: launching command /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.131.x86_64/jre/bin/java -Xmx512m -Xms256m -Dserver.port=32386 -jar /tmp/app.jar 2017/03/14 13:06:45.765716 [DEBUG] executor: running command as nobodyThe crucial part is
executor: running command as nobodyand lack of read permission for nobody :)-rw-rw---- 1 root root 45343103 Mar 14 13:02 app.jarUnfortunately I don't know how to add read permission on Artifact stanza :(
Any help appreciated :)