Nomad: Nomad UI jobs page error

Created on 14 Apr 2020  Â·  6Comments  Â·  Source: hashicorp/nomad

For reporting security vulnerabilities please refer to the website.

If you have a question, prepend your issue with [question] or preferably use the nomad mailing list.

If filing a bug please include the following:

Nomad version

Output from nomad version

0.10.4+ent

Operating system and Environment details

Debian Stretch

Issue

If we load the 'jobs' page of the nomad UI for some of our regions (e.g. http://nomad.service.sv5.consul:4646/ui/jobs), clicking on an link will cause the loading animation to spin forever.
Looking at the JS console, it appears we have several jobs returned by the /jobs API call that don't actually exist - there are 404s for the specific job API call (e.g. http://nomad.service.consul:4646/v1/job/hedwig-prod.queue_reports 404 (Not Found))
there is then a JS error:
vendor-d62e8ec23cd05cedaa719acf0f8b5554.js:7454 Uncaught TypeError: Cannot read property 'eachAttribute' of null at e.get (vendor-d62e8ec23cd05cedaa719acf0f8b5554.js:7454) at e.<anonymous> (vendor-d62e8ec23cd05cedaa719acf0f8b5554.js:8972) at e.<computed> [as createSnapshot] (vendor-d62e8ec23cd05cedaa719acf0f8b5554.js:8956) at vendor-d62e8ec23cd05cedaa719acf0f8b5554.js:7872 at r._fetchRecord (vendor-d62e8ec23cd05cedaa719acf0f8b5554.js:7874) at h (vendor-d62e8ec23cd05cedaa719acf0f8b5554.js:7880) at r._flushPendingFetchForType (vendor-d62e8ec23cd05cedaa719acf0f8b5554.js:7887) at Map.forEach (<anonymous>) at r.flushAllPendingFetches (vendor-d62e8ec23cd05cedaa719acf0f8b5554.js:7879) at t.invoke (vendor-d62e8ec23cd05cedaa719acf0f8b5554.js:4951)

My guess is that the API is not properly handling the 404 errors and needs some armor to prevent accessing object properties of a 'null' API response

themui typbug

All 6 comments

Thank you @gmichalec-pandora . This feels like a duplicate of https://github.com/hashicorp/nomad/issues/7698 but additional information.

I'm a bit puzzled with how you got into a state where the jobs API return non-existing APIs that result into 404s. I can think of cases where jobs are deleted or GCed after the jobs page got loaded. Did you run into another case as well?

Hi - thanks for the quick response!

Yes - we have multiple instances of this happening across several
regions - I'm still trying to track down what is going on on the nomad
end with these jobs.
Some were bad job specs that specified a bad datacenter for the region.
Another was an orphaned batch job whose parent no longer exists. Keep in
mind we been running these regions for several years now across multiple
upgrades, so some of this may just be cruft.
Regardless of whatever is happening on the backend, it seems that the UI
needs to be a little more resilient.

On 4/13/20 4:54 PM, Mahmood Ali wrote:

Thank you @gmichalec-pandora
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_gmichalec-2Dpandora&d=DwMCaQ&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw&r=7xRKKdC206uCkDBoddKQ7NF6TleKa9BSfczq0PFn-iU&m=3-rWXwjmioaPm3hm7FaLCMCNlrIVNoIGFflCHpEnoIo&s=A0CgRIbvTczsKme1vWecXuHl785VLuLM8AO6_9sEcMQ&e=
. This feels like a duplicate of #7698
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_hashicorp_nomad_issues_7698&d=DwMCaQ&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw&r=7xRKKdC206uCkDBoddKQ7NF6TleKa9BSfczq0PFn-iU&m=3-rWXwjmioaPm3hm7FaLCMCNlrIVNoIGFflCHpEnoIo&s=RyNbb1xYx_OVU8VFiFcrJmLHGEu-HfHoZa4D5hIMsRE&e=
but additional information.

I'm a bit puzzled with how you got into a state where the jobs API
return non-existing APIs that result into 404s. I can think of cases
where jobs are deleted or GCed after the jobs page got loaded. Did you
run into another case as well?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_hashicorp_nomad_issues_7710-23issuecomment-2D613152671&d=DwMCaQ&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw&r=7xRKKdC206uCkDBoddKQ7NF6TleKa9BSfczq0PFn-iU&m=3-rWXwjmioaPm3hm7FaLCMCNlrIVNoIGFflCHpEnoIo&s=uu0IrvZSAYX5wPHmvLqcG-UP4RlBZtdNiOLg_-vKuLQ&e=,
or unsubscribe
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AJK2XINVIODH7ACXFHPR5XLRMOQ3NANCNFSM4MHJ6EHA&d=DwMCaQ&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw&r=7xRKKdC206uCkDBoddKQ7NF6TleKa9BSfczq0PFn-iU&m=3-rWXwjmioaPm3hm7FaLCMCNlrIVNoIGFflCHpEnoIo&s=KN44XfmrAVRTluee7BXj_JWeBtk0uwXLL3WEhlsEfRg&e=.

Thanks! We are on it. Absolutely, the frontend needs to be resilient to such errors. we also appreciate the scenarios so we can incorporate them into test suite or consider them when doing manual testing.

Just to update - after purging all the jobs that were 404-ing (AFAICT none were actually running, and had not been submitted for over 10 months), the nomad UI jobs page works fine now for all our regions. So, in some ways, it was good to help us ID some of the cruft we had running :)

Hi @gmichalec-pandora, I too was able to reproduce this condition via orphaned child jobs. The child job tries to load its parent to satisfy the belongsTo relationship modeled in the UI, but since the parent is a 404, it goes 💥.

I'll make sure to patch this up.

This should be fixed in 0.12.1 (See the explanation here).

I'm going to close this issue to centralize the conversation about this bug in #5936.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mancusogmu picture mancusogmu  Â·  3Comments

joliver picture joliver  Â·  3Comments

funkytaco picture funkytaco  Â·  3Comments

Smuerdt picture Smuerdt  Â·  3Comments

stongo picture stongo  Â·  3Comments