Module amdgpu
is loaded but not used. lsmod
shows
amdgpu 0
module can be loaded and unloaded. 2 monitors configuration not works. Consoles from 1 to 6 are empty, only 7 console with X is active.
18.09.1571.7795a7ad5f0 (Jellyfish)
to 18.09.1676.7e88992a8c7 (Jellyfish)
No hardware configuration was changed in /etc/nixos/
, only added some software to list of packages.
This is working configuration
- system: `"x86_64-linux"`
- host os: `Linux 4.14.86, NixOS, 18.09.1571.7795a7ad5f0 (Jellyfish)`
- multi-user?: `yes`
- sandbox: `yes`
- version: `nix-env (Nix) 2.1.3`
- channels(root): `"nixos-18.09.1676.7e88992a8c7"`
- channels(razor): `""`
- nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
To get broken one I need to reboot
Hmm, the only changes between those two revisions that seem relevant are stable kernel bumps.
@fpletz may changes in systemd or in some service be the reason?
I see some changes in https://github.com/NixOS/nixpkgs/compare/7795a7ad5f0...7e88992a8c7#diff-57c9ee2975e830e72ffa9f3b1e3f10b2R13
But I dont get what those changes mean and how it can influence to amdgpu
driver initialization.
Can anyone tell me how to bisect the commit which caused this?
How to set nix-channel to specific commit in nixpkgs?
I've rolled back nix-channel and amdgpu
driver works. So now /etc/nixos
are fully identical for working and broken configurations. The difference is nix channels generation. The previous generation works, but not latest.
# nix-env -p /nix/var/nix/profiles/per-user/root/channels --list-generations
1 2018-11-25 18:45:20
2 2018-11-25 21:05:03
3 2018-11-27 16:09:11
4 2018-11-27 16:21:43
5 2018-12-06 01:55:17
6 2018-12-07 17:49:25 (current)
7 2018-12-15 20:10:26
@fpletz I bisected nixpkgs and found the first bad commit which is
99315692f699d89719dbc0fe8c0fc6f991e4843a is the first bad commit
commit 99315692f699d89719dbc0fe8c0fc6f991e4843a
Author: Tim Steinbach <[email protected]>
Date: Thu Dec 13 06:58:07 2018 -0500
linux: 4.14.87 -> 4.14.88
(cherry picked from commit f335fa6d74cf6ec7ace4d2fe7ef28e030126064e)
:040000 040000 603e84d9234cfb7bce102a4d33934d7bc564c947 3d9c8aed7a1dd5cf3b7902bc7cd00db8bb5216e8 M pkgs
As you expected it is the kernel issue.
@NeQuissimus hello, looks like it is yours commit.
Thank your very much for looking into the issue yourself. There has since been a new kernel bump on the 18.09 branch to 4.14.89: 5bfaa294ea375e1b2759e90c88ec50335bdbcc9a
Does this maybe fix your issue?
I seem to have run into this too on 4.14.88.
X failed to start and logs show
Dec 18 10:46:48 fangorn kernel: amdgpu 0000:01:00.0: Direct firmware load for amdgpu/polaris11_k_mc.bin failed with error -2
Dec 18 10:46:48 fangorn kernel: mc: Failed to load firmware "amdgpu/polaris11_k_mc.bin"
Dec 18 10:46:48 fangorn kernel: [drm:gmc_v8_0_sw_init [amdgpu]] *ERROR* Failed to load mc firmware!
Dec 18 10:46:48 fangorn kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP block <gmc_v8_0> failed -2
Dec 18 10:46:48 fangorn kernel: amdgpu 0000:01:00.0: amdgpu_init failed
Dec 18 10:46:48 fangorn kernel: amdgpu 0000:01:00.0: Fatal error during GPU init
Dec 18 10:46:48 fangorn kernel: [drm] amdgpu: finishing device.
Dec 18 10:46:48 fangorn kernel: [TTM] Memory type 2 has not been initialized
Dec 18 10:46:48 fangorn kernel: amdgpu: probe of 0000:01:00.0 failed with error -2
Sadly 4.14.89 from https://github.com/NixOS/nixpkgs/commit/5bfaa294ea375e1b2759e90c88ec50335bdbcc9a did not resolve it for me
Dec 18 12:51:28 fangorn kernel: Linux version 4.14.89 (nixbld@localhost) (gcc version 7.3.0 (GCC)) #1-NixOS SMP Mon Dec 17 08:28:56 UTC 2018
...
Dec 18 12:51:29 fangorn kernel: amdgpu 0000:01:00.0: Direct firmware load for amdgpu/polaris11_k_mc.bin failed with error -2
Dec 18 12:51:29 fangorn kernel: mc: Failed to load firmware "amdgpu/polaris11_k_mc.bin"
Dec 18 12:51:29 fangorn kernel: [drm:gmc_v8_0_sw_init [amdgpu]] *ERROR* Failed to load mc firmware!
Dec 18 12:51:29 fangorn kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP block <gmc_v8_0> failed -2
Dec 18 12:51:29 fangorn kernel: amdgpu 0000:01:00.0: amdgpu_init failed
Dec 18 12:51:29 fangorn kernel: amdgpu 0000:01:00.0: Fatal error during GPU init
Dec 18 12:51:29 fangorn kernel: [drm] amdgpu: finishing device.
Dec 18 12:51:29 fangorn kernel: [TTM] Memory type 2 has not been initialized
Dec 18 12:51:29 fangorn kernel: amdgpu: probe of 0000:01:00.0 failed with error -2
@fpletz 5bfaa294ea375e1b2759e90c88ec50335bdbcc9a
changes nothing, driver doesn't work.
trying 4.19.9
with boot.kernelPackages = pkgs.linuxPackages_latest
didn't work either
Looks like linux-firmware might be out of date: my broken amdgpu.ko
references firmware=amdgpu/polaris11_k_mc.bin
which AFAICT was added in https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=6cca1381f328e7df55ae8bb8ac515b945d35f9f5.
Yep, updating the firmware derivation to use the above commit worked for me.
Thanks! We've bumped the firmware on master a few days ago anyway. I'll backport to 18.09 to fix the issue.
Ok, the firmware updates shouldn't be that intrusive. Backported to 18.09. Thank you for reporting and debugging this! :+1:
fbb7dbdb95d04f1517ec8011a3332a8c4c0e86e6
commit works. Thanks.
Most helpful comment
Thanks! We've bumped the firmware on master a few days ago anyway. I'll backport to 18.09 to fix the issue.