Nomad v0.9.3 (c5e8b66c3789e4e7f9a83b4e188e9a937eea43ce)
In our case it's very important to know when a task was killed by OOM-killer.
Nomad has metric which indicates that task was killed by OOM, but currently it works not as expected:
For now we have to parse nomad logs, find lines like this:
2019/04/08 06:03:16.608873 [INFO] client: task "ebook-similarities-service-worker" for alloc "c5d0e905-946b-d847-412f-4d727da98ab2" failed: Wait returned exit code 137, signal 0, and error OOM Killed
and increment counter in prometheus exporter.
It works good, but in v0.9.x I can't find this message in log file.
Could you please get it back?
The weird part is that the UI does show the message though at the task level.
Is there any news? :)
Are there any news or workarounds?
We also relied on parsing nomad logs for "OOM Killed" but now that doesn't work anymore with newer releases. We are currently using v0.10.5.
It looks like this issue was addressed several years ago here:
https://github.com/hashicorp/nomad/issues/2203
and was moved to the docker plugin here:
you get a werr (web error maybe?) for it, but no log message.
On a larger note - it seems to me that "Allocation Status" should be logged somewhere. If the allocation gets GC'd before you have a chance to investigate - you've lost vital information and context. These statuses don't go to Telemetry either, so unless you're actively scraping the API periodically - you will lose them.
As noted, this is closed by #2203.
@tgross it's still bugged as it won't show in the logs, not sure why you closed the issue.
not sure why you closed the issue.
We're doing some cleanup of stale issues. It looked like the feature request had been resolved by #2203 (I don't see a bug here, not sure why it was labelled as such). I can reopen and mark for discussion in the roadmap.