Nomad: [UI] error loading jobs view in firefox

Created on 13 Apr 2020 · 13Comments · Source: hashicorp/nomad

Nomad version

Nomad v0.10.5 (4eb2ca3f8d8786c897bb47878f1c12577011ddd3)

Operating system and Environment details

Firefox 75.0

Issue

Can't click through to individual job from jobs ui. (open in new tab works as expected)

This is the consul after a cold load of /ui/jobs

Jobs_-_Nomad

No additional errors or network activity on click of job.

Reproduction steps

Load jobs UI /ui/jobs
See errors in console
Click any job and see infinite spinner

themui typbug

Source

dbachelder

All 13 comments

Similar in Chrome:

Jobs_-_Nomad

dbachelder on 13 Apr 2020

Hi @dbachelder, I've been working to reproduce this and I found a condition that does this, but it's a bit fringe, so it may not be your condition as well.

Do you happen to run parameterized or periodic jobs? Did you or someone else using Nomad happen to purge one of them but leave the children jobs around?

DingoEatingFuzz on 14 Apr 2020

@DingoEatingFuzz We don't have any parameterized or periodic jobs (that we are aware of).

dbachelder on 15 Apr 2020

@DingoEatingFuzz For us it started after we stopped a periodic job, at least that is my observation. In the list of that stopped periodic job are many with the status dead. Maybe that helps.

frederikbosch on 15 Apr 2020

And once we relaunched that stopped periodic job, the error was gone again.

frederikbosch on 15 Apr 2020

@DingoEatingFuzz I stand corrected, we have a single periodic job.. but it is running normally as far as I can tell. Is there further debugging I can do on my end to help?

dbachelder on 15 Apr 2020

My reproduction is what @frederikbosch observes here and what @gmichalec-pandora observes in #7710, which is that when a periodic or parameterized job is purged/gc'd, it orphans all child jobs. The UI then tries to fetch the parent job for the orphaned jobs, gets a 404, and goes 💥

I'm working on a fix for this, but in the meantime, manually purging dead child jobs should fix this.

Based on the way that job IDs work, if you have one periodic job that changed names, that too could lead to older children of the periodic job becoming orphaned.

DingoEatingFuzz on 15 Apr 2020

👍1

manually purging dead child jobs should fix this.

You mean by running nomad system gc?

frederikbosch on 15 Apr 2020

I'd start with that, and if that doesn't work you can manually delete jobs from existence with nomad stop -purge

DingoEatingFuzz on 15 Apr 2020

👍1

Is there an easy way to find the orphans?

dbachelder on 15 Apr 2020

found them! it fixed our issue for now... in our prod cluster it seemed like I only had one job to clean up which was a child of a still valid periodic task.. there was one active child running, and one dead one from that last run (6 hours ago)

dbachelder on 15 Apr 2020

Thanks for the help @DingoEatingFuzz !

frederikbosch on 16 Apr 2020

This should be fixed in 0.12.1 (See the explanation here).

I'm going to close this issue to centralize the conversation about this bug in #5936.

DingoEatingFuzz on 23 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings