Nomad: [UI] error loading jobs view in firefox

Created on 13 Apr 2020  路  13Comments  路  Source: hashicorp/nomad

Nomad version

Nomad v0.10.5 (4eb2ca3f8d8786c897bb47878f1c12577011ddd3)

Operating system and Environment details

Firefox 75.0

Issue

Can't click through to individual job from jobs ui. (open in new tab works as expected)

This is the consul after a cold load of /ui/jobs

Jobs_-_Nomad

No additional errors or network activity on click of job.

Reproduction steps

Load jobs UI /ui/jobs
See errors in console
Click any job and see infinite spinner

themui typbug

All 13 comments

Similar in Chrome:

Jobs_-_Nomad

Hi @dbachelder, I've been working to reproduce this and I found a condition that does this, but it's a bit fringe, so it may not be your condition as well.

Do you happen to run parameterized or periodic jobs? Did you or someone else using Nomad happen to purge one of them but leave the children jobs around?

@DingoEatingFuzz We don't have any parameterized or periodic jobs (that we are aware of).

@DingoEatingFuzz For us it started after we stopped a periodic job, at least that is my observation. In the list of that stopped periodic job are many with the status dead. Maybe that helps.

And once we relaunched that stopped periodic job, the error was gone again.

@DingoEatingFuzz I stand corrected, we have a single periodic job.. but it is running normally as far as I can tell. Is there further debugging I can do on my end to help?

My reproduction is what @frederikbosch observes here and what @gmichalec-pandora observes in #7710, which is that when a periodic or parameterized job is purged/gc'd, it orphans all child jobs. The UI then tries to fetch the parent job for the orphaned jobs, gets a 404, and goes 馃挜

I'm working on a fix for this, but in the meantime, manually purging dead child jobs should fix this.

Based on the way that job IDs work, if you have one periodic job that changed names, that too could lead to older children of the periodic job becoming orphaned.

manually purging dead child jobs should fix this.

You mean by running nomad system gc?

I'd start with that, and if that doesn't work you can manually delete jobs from existence with nomad stop -purge

Is there an easy way to find the orphans?

found them! it fixed our issue for now... in our prod cluster it seemed like I only had one job to clean up which was a child of a still valid periodic task.. there was one active child running, and one dead one from that last run (6 hours ago)

Thanks for the help @DingoEatingFuzz !

This should be fixed in 0.12.1 (See the explanation here).

I'm going to close this issue to centralize the conversation about this bug in #5936.

Was this page helpful?
0 / 5 - 0 ratings