I tried to run a Nomad job that should execute a Python application on a Raspberry Pi 3B+. When I try to run the job, I see the following error:
failed to launch command with executor: rpc error: code = Unknown desc = container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: cannot set memory limit: container could not join or create cgroup
This error seems to be related to https://github.com/hashicorp/nomad/issues/8635. The solution presented there, was to check if the Nomad agent is running as root. I verified that I am running the agent as root under systemd.
I tried modifying the ExecStart with and without sudo, and also tried setting User=root, but all without success.
My job is pretty trivial:
job "sunblinds-server" {
datacenters = ["dc"]
group "sunblinds" {
task "sunblinds-api-and-ui" {
driver = "exec"
config {
command = "/usr/bin/python3"
args = [
"./somfy/operateShutters.py",
"-c",
"/home/pi/sunblinds/operateShutters.conf",
"-a",
"-e",
"-m"]
}
artifact {
source = "git::https://github.com/Nickduino/Pi-Somfy"
destination = "somfy/"
}
}
}
}
And my systemd service:
[Unit]
Description=Nomad
Documentation=https://nomadproject.io/docs/
Wants=network-online.target
After=network-online.target
[Service]
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=sudo /usr/bin/nomad agent -config /etc/nomad.d
KillMode=process
KillSignal=SIGINT
LimitNOFILE=infinity
LimitNPROC=infinity
Restart=on-failure
RestartSec=2
StartLimitBurst=3
StartLimitIntervalSec=10
TasksMax=infinity
[Install]
WantedBy=multi-user.target
I tried the raw_exec driver, and that is working. Why does the exec driver fail to run this job?
Hi @trietsch! So the error you're seeing is bubbling up from our libcontainer dependency when we try to create the memory cgroup for the container (ref fs.go#L331) to use for resource isolation.
If you're running as root and can't do this, I'm wondering if there might be something unusual about the boot configuration, in particular whether the memory cgroup is mounted. You might already have the pids cgroup mounted, which is why raw_exec is working without specifically disabling cgroups. You should be able to check /proc/cgroups to verify that:
$ cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 2 1 1
cpu 5 71 1
cpuacct 5 71 1
blkio 4 71 1
memory 11 104 1
devices 8 71 1
freezer 6 1 1
net_cls 3 1 1
perf_event 10 1 1
net_prio 3 1 1
hugetlb 9 1 1
pids 7 72 1
(I've formatted this here for legibility, but you should be looking for the last column of the memory line to be 1).
I don't have much hands-on experience with Raspberry Pis, but my guess is you have a recent-enough kernel to use cgroups and that it's just a config issue. My understanding is that there's a bootstrap script where you can add the cgroup_enable=memory value you'll need. You didn't mention which OS you're using, but this post might be a helpful pointer for you regardless.
Thanks for all the info @tgross! I'll follow the post and use the info you provided to see whether I can get exec to work.
After running into this with my own pi4 running raspberry OS, indeed all that was need was to add cgroup_enable=memory to the kernel command line string in /boot/cmdline.txt + reboot.
$ cat /boot/cmdline.txt
console=serial0,115200 console=tty1 root=PARTUUID=e65e3b8f-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait cgroup_enable=memory
$ cat /proc/cgroups | grep memory
memory 9 68 1
This blog post seems to do a good job of explaining what is happening and why.
Documented in https://github.com/hashicorp/nomad/pull/9442