Directly after installing the salt-minion on an AGX Xavier, the system fell into an unrecoverable boot loop.
N/A - didn't get far enough to configure the minion.
NVidia Jetpack v 4.3
current Salt release.
Boot log recovered via bootloader serial console
xavier_RIP_0.txt
@theunkn0wn1 I'm sorry I see nothing relative to Salt in your log. Do you have a way to exclude the possibility of hardware failure? I'd suggest to install a clean OS by official instruction without Salt and perform RAM and storage.
@saltstack/team-core thoughts?
Wondering how installed ? pip or bootstrap.
The only ARM packages we have are Debian (Raspbian) and those are for armhl
Bootstrap has issues at the moment that are being worked on, cannot select 3000, have to use latest when specifying the version.
Wondering if the issue could be one of the dependencies that Salt needs having issues when it is installed and causing the boot loop, a classic case of executing 0 possibly the cause.
Looking at the xavier log, lots of bad hardware issues there, e.g [ 0.000000] OF: fdt:Reserved memory: failed to reserve memory for node 'fb2_carveout': base 0x0000000000000000, size 0 MiB
Appears to get better later, but
[ 19.708051] ras_core_serr_callback: Scanning Core Error Records for Uncorrectable Errors
[ 19.708340] CPU2: SError detected, daif=140, spsr=0x60400045, mpidr=80000100, esr=be000000
quite a bit of load, reboot, reload, reboot to get the system up
wondering if the system is coming up cleanly, way too many hardware errors for my liking in the log
@DmitryKuzmenko This is done with an Xavier that is pretty much fresh out of the box, There is no indication the hardware is faulty beyond what is pointed out in the above logs.
To recover from the boot loop, we re-flashed the device with a clean OS image from Nvidia's official install guide & tooling. (The same image I attempted to install Salt via bootstrap on.)
@dmurphy18 the faulty install was done via the bootstrap installer.
I don't think the system was coming up cleanly, given how it was hard-resetting as it reached the login prompt.
I did a manual install via the Hacking guide and the minion appears to be behaving itself. (Was not aware the pip package existed, the install documentation didn't make it clear that was an option.
I do suspect something in the bootstrap script installed an amd64 package, which most probably contributed to the boot loop. (I don't have logs, and right now we cannot reproduce the failure mode due to time constraints.)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
Thank you for updating this issue. It is no longer marked as stale.
@dmurphy18 @DmitryKuzmenko please follow up here
@theunkn0wn1 Wondering if you were able to make any progress on this issue. Note, that there was a new point release last week, Salt 3000.1 and you should be able to bootstrap that version.
Xavier hardware is unavailable to me, esp with the virus and everyone working remotely. However if you have any new logs of failed attempts, I would be interested in reviewing them for any insight I might see, in resolving your issue
@dmurphy18 Due to the virus we are working remotely and I currently as well do not have access to Xavier Hardware.
After doing a manual install via the hacker guide, The salt minion on our Xavier has been working perfectly and we haven't yet encountered issues that can be attributed to a buggy Salt minion. The logs provided in the original post are all the logs I have available that pertain to this issue thread.
We were working against a deadline so I was unable to risk disabling the Xavier again until that deadline passed; which happened to be around the same time the campus was shut down. Once the virus situation resolves and we are allowed back into the lab, I will attempt the bootstrap against 3000.1 (or newer, depending on how far out it is), and will update this post with the results.
@theunkn0wn1 ATM I don't have good place to put this issue so it is in blocked that we are waiting to hear from you and the stale bot won't close it (getting rid of that bot soon). I will follow up again, as well. Do keep us posted.
Checking in - I don't see any comments here so leaving it open, will check on it again in 2 weeks time.
@sagetherage regarding the last comment we're waiting for the end of the pandemic. So put your timer to 1-2 months.
Turns out one of my classmates took the XAvier home with them when the campus shut down so he could continue working with it, and I obtained it from him.
Currently preparing it with a clean jetpack image and will update soon :tm:
@sagetherage
Update: Installed clean Jetpack 4.3 onto the Xavier, then used the boostrap installer for Salt as described here
Xavier survived a reboot and was able to successfully connect with a Salt master; appears to be functional.
root@83e392e64949:/# salt '*' test.version
xavier:
2019.8.0-69-g637fe0b
Only issue I see is salt installed itself against python2 instead of the systems python3 interpreter, but thats an out-of-scope issue.
@theunkn0wn1 Thank you for check it! Very appreciate. Can we do anything more on this?
@DmitryKuzmenko As far as I can see the original issue has been resolved.
Going to close this now, thanks for the support!
Most helpful comment
@DmitryKuzmenko As far as I can see the original issue has been resolved.
Going to close this now, thanks for the support!