Nixpkgs: amdgpu driver failed to initialize

Created on 17 Dec 2018  路  18Comments  路  Source: NixOS/nixpkgs

Issue description

Module amdgpu is loaded but not used. lsmod shows

amdgpu 0

module can be loaded and unloaded. 2 monitors configuration not works. Consoles from 1 to 6 are empty, only 7 console with X is active.

Steps to reproduce

  1. Updated system from 18.09.1571.7795a7ad5f0 (Jellyfish) to 18.09.1676.7e88992a8c7 (Jellyfish)
  2. Reboot to new system
  3. Driver not initialized. 2 monitors configuration not works. Consoles from 1 to 6 are empty, only 7 console with X is active.

No hardware configuration was changed in /etc/nixos/, only added some software to list of packages.

Technical details

This is working configuration

 - system: `"x86_64-linux"`
 - host os: `Linux 4.14.86, NixOS, 18.09.1571.7795a7ad5f0 (Jellyfish)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.1.3`
 - channels(root): `"nixos-18.09.1676.7e88992a8c7"`
 - channels(razor): `""`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

To get broken one I need to reboot

regression hardware nixos

Most helpful comment

Thanks! We've bumped the firmware on master a few days ago anyway. I'll backport to 18.09 to fix the issue.

All 18 comments

Hmm, the only changes between those two revisions that seem relevant are stable kernel bumps.

@fpletz may changes in systemd or in some service be the reason?

I see some changes in https://github.com/NixOS/nixpkgs/compare/7795a7ad5f0...7e88992a8c7#diff-57c9ee2975e830e72ffa9f3b1e3f10b2R13

But I dont get what those changes mean and how it can influence to amdgpu driver initialization.

Can anyone tell me how to bisect the commit which caused this?
How to set nix-channel to specific commit in nixpkgs?

I've rolled back nix-channel and amdgpu driver works. So now /etc/nixos are fully identical for working and broken configurations. The difference is nix channels generation. The previous generation works, but not latest.

# nix-env -p /nix/var/nix/profiles/per-user/root/channels --list-generations
   1   2018-11-25 18:45:20
   2   2018-11-25 21:05:03
   3   2018-11-27 16:09:11
   4   2018-11-27 16:21:43
   5   2018-12-06 01:55:17
   6   2018-12-07 17:49:25   (current)
   7   2018-12-15 20:10:26

@fpletz I bisected nixpkgs and found the first bad commit which is

99315692f699d89719dbc0fe8c0fc6f991e4843a is the first bad commit
commit 99315692f699d89719dbc0fe8c0fc6f991e4843a
Author: Tim Steinbach <[email protected]>
Date:   Thu Dec 13 06:58:07 2018 -0500

    linux: 4.14.87 -> 4.14.88

    (cherry picked from commit f335fa6d74cf6ec7ace4d2fe7ef28e030126064e)

:040000 040000 603e84d9234cfb7bce102a4d33934d7bc564c947 3d9c8aed7a1dd5cf3b7902bc7cd00db8bb5216e8 M      pkgs

As you expected it is the kernel issue.

@NeQuissimus hello, looks like it is yours commit.

Thank your very much for looking into the issue yourself. There has since been a new kernel bump on the 18.09 branch to 4.14.89: 5bfaa294ea375e1b2759e90c88ec50335bdbcc9a

Does this maybe fix your issue?

I seem to have run into this too on 4.14.88.
X failed to start and logs show

Dec 18 10:46:48 fangorn kernel: amdgpu 0000:01:00.0: Direct firmware load for amdgpu/polaris11_k_mc.bin failed with error -2
Dec 18 10:46:48 fangorn kernel: mc: Failed to load firmware "amdgpu/polaris11_k_mc.bin"
Dec 18 10:46:48 fangorn kernel: [drm:gmc_v8_0_sw_init [amdgpu]] *ERROR* Failed to load mc firmware!
Dec 18 10:46:48 fangorn kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP block <gmc_v8_0> failed -2
Dec 18 10:46:48 fangorn kernel: amdgpu 0000:01:00.0: amdgpu_init failed
Dec 18 10:46:48 fangorn kernel: amdgpu 0000:01:00.0: Fatal error during GPU init
Dec 18 10:46:48 fangorn kernel: [drm] amdgpu: finishing device.
Dec 18 10:46:48 fangorn kernel: [TTM] Memory type 2 has not been initialized
Dec 18 10:46:48 fangorn kernel: amdgpu: probe of 0000:01:00.0 failed with error -2

Sadly 4.14.89 from https://github.com/NixOS/nixpkgs/commit/5bfaa294ea375e1b2759e90c88ec50335bdbcc9a did not resolve it for me

Dec 18 12:51:28 fangorn kernel: Linux version 4.14.89 (nixbld@localhost) (gcc version 7.3.0 (GCC)) #1-NixOS SMP Mon Dec 17 08:28:56 UTC 2018
...
Dec 18 12:51:29 fangorn kernel: amdgpu 0000:01:00.0: Direct firmware load for amdgpu/polaris11_k_mc.bin failed with error -2
Dec 18 12:51:29 fangorn kernel: mc: Failed to load firmware "amdgpu/polaris11_k_mc.bin"
Dec 18 12:51:29 fangorn kernel: [drm:gmc_v8_0_sw_init [amdgpu]] *ERROR* Failed to load mc firmware!
Dec 18 12:51:29 fangorn kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP block <gmc_v8_0> failed -2
Dec 18 12:51:29 fangorn kernel: amdgpu 0000:01:00.0: amdgpu_init failed
Dec 18 12:51:29 fangorn kernel: amdgpu 0000:01:00.0: Fatal error during GPU init
Dec 18 12:51:29 fangorn kernel: [drm] amdgpu: finishing device.
Dec 18 12:51:29 fangorn kernel: [TTM] Memory type 2 has not been initialized
Dec 18 12:51:29 fangorn kernel: amdgpu: probe of 0000:01:00.0 failed with error -2

@fpletz 5bfaa294ea375e1b2759e90c88ec50335bdbcc9a changes nothing, driver doesn't work.

trying 4.19.9 with boot.kernelPackages = pkgs.linuxPackages_latest didn't work either

Looks like linux-firmware might be out of date: my broken amdgpu.ko references firmware=amdgpu/polaris11_k_mc.bin which AFAICT was added in https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=6cca1381f328e7df55ae8bb8ac515b945d35f9f5.

Yep, updating the firmware derivation to use the above commit worked for me.

Thanks! We've bumped the firmware on master a few days ago anyway. I'll backport to 18.09 to fix the issue.

Ok, the firmware updates shouldn't be that intrusive. Backported to 18.09. Thank you for reporting and debugging this! :+1:

fbb7dbdb95d04f1517ec8011a3332a8c4c0e86e6 commit works. Thanks.

Was this page helpful?
0 / 5 - 0 ratings