Nomad v0.8.6
Ubuntu 16.04.5 LTS
When a nomad periodic job is submitted and stopped via nomad UI, after sometime I am not able to access any job details on UI. I can access the nomad UI but I cannot get go to any job by clicking on job.
When I checked on browser console, I got the following errror
GET http://test.nomadserver:4646/v1/job/3559326d29233f205921eeacb6c29bb6 404 (Not Found)
Submit a periodic job and stop it. It sometimes take time to face this issue.
NA
NA
Test job similar to what we use.
job "job" {
datacenters = ["test-dc"]
type = "batch"
periodic {
cron = "*/1 * * * * *"
prohibit_overlap = true
}
group "monitor" {
count = 1
task "monitor" {
driver = "docker"
config {
image = "test/image:latest"
}
resources {
cpu = 200
memory = 30
}
}
}
}
A workaround I found is to submit the job again and then stop. Then I can access the jobs again.
Hi,
job details could be getting cleaned up due to the periodic GC.
can you try tweaking the GC timeouts (on the agents) and see if helps?
BTW */1 would be the same as just *, right?
Hi! I can confirm that we are experiencing the same issues.
After some investigation, I have found the flow to reproduce:
/v1/system/gc)After step 5, where garbage collection has purged the parent job, the issue seems to reproduce.
This happens because the child is still running when Nomad UI queries for jobs (/v1/jobs).
The client then tries to get the parent job and since it has been purged, it receives 404, but for some reason it still saves the object (of the parent) in memory, without any data (null).
The javascript then fails when it tries to acess the data in memory.
Thanks for the detailed steps, @losnir, it鈥檚 the first time I鈥檓 actually able to reproduce, now hopefully I can figure out how to fix it 馃
Are you able to try this out with Nomad 0.9.2 or later? I believe it has been fixed since that version.
@backspace I can confirm that after upgrading to v0.9.5 the issue resolved.
However it looks like the upgrade introduced new issues with batch / periodic jobs (they disappear abruptly).
Will open another issue on the matter if needed.
Thanks!
Thanks for letting us know that it鈥檚 fixed. Please do open a new issue with reproduction steps for this new bug and we can look into it. I appreciate your diligence! 馃挒