Describe the bug
X11 as a build-dependency causes run-time functionality to fail "out of the box" on largely vanilla nixos installation (running a Skylake chip).
To Reproduce
Steps to reproduce the behavior:
The samples don't work because there is an actual bonafide implementation of PutSurface available, but it's failing (possibly a transient dependency, possibly an environmental issue, possibly even a bug in the driver itself - I haven't got time today to debug this - my nixos setup is new and clean though, so I'd expect this to work out of the box).
libva info: VA-API version 1.5.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /run/opengl-driver/lib/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_5
libva error: No valid vtable entry for vaPutSurface
libva error: /run/opengl-driver/lib/dri/iHD_drv_video.so init failed
libva info: va_openDriver() returns -1
Expected behavior
Additional context
(My job daily involves using quicksync/etc for development purposes, so I'll have to get to the bottom of this one way or another, but documenting the rough problem for others googling will have to do for now.
Metadata
[robashton@ashton-nuc:~]$ nix run nixpkgs.nix-info -c nix-info -m
"x86_64-linux"Linux 4.19.82, NixOS, 19.09.1149.107e2b7b29f (Loris)yesyesnix-env (Nix) 2.3/nix/var/nix/profiles/per-user/root/channels/nixosMaintainer information:
# a list of nixos modules affected by the problem
module: intel-media-driver
Removing the X11 dependency means that the offending function isn't used and is replaced with a dummy - my override looks like this and I'm able to use quicksync effectively.
(pkgs.intel-media-driver.overrideAttrs (oldAttrs: {
name = "intel-media-driver";
buildInputs = [ libva-full libdrm libva-utils xorg.libpciaccess intel-gmmlib ];
cmakeFlags = [
"-DINSTALL_DRIVER_SYSCONF=OFF"
"-DLIBVA_DRIVERS_PATH=${placeholder "out"}/lib/dri"
];
}))
];
Note: I brought back in the tests (the tests failing are a sure-fire way of knowing you shouldn't be running this driver in the first place or it's broken), and libdrm is specified as a dependency on the repo so I dragged that in as well for completeness.
The next debug step will probably be to bring in X11 again, and strace or even debug the driver code to find why it doesn't work though. I'll report back if I learn more..
(Should also add that I'm 80% sure that this is down to my unfamiliarity with nixos and it'll turn out I'm building the samples incorrectly.
So - I guess we're out of "Bug" and more into "What is our expected behaviour/usage" here.
I can get code that uses the driver working, if I add libX11 as a dependency to my nix-shell and add it to LD_LIBRARY_PATH (taking advantage of the fact that it's in CMAKE_LIBRARY_PATH).
Nowhere does ldd tell me that any of the X libraries are a dependency, or no doubt they'd be munged with absolute paths too - I guess there is a canonical way of dealing with these transitive dynamic dependencies that I'll find or somebody will explain to me in due course..
Do you know what the "driver" is trying to do around this? What I can see easily is that vaPutSurface symbol is defined in libva-x11.so – and that one is linked with libX11 through correct absolute paths.
I have looked further into it - and it looks like the driver helpfully calls out to dlopen for a few of these libraries, which are not linked - they're just made available during the build process.
I think the correct approach, from my reading - is to add options to the nix package for enableX11 (and enableDrm - libva-drm being loaded in much the same way), and then to explicitly add them to the linking process via additional flags to the build process if they've been enabled.
In this way, they'll actually be linked sensibly and calls to dlopen will then succeed (in theory, from my reading, etc etc - I'm sure there is a gap between what I've understood and how it works in practise, but it's in this neighborhood for sure?)
Yes, it is possible to use patchelf to extend the search path list so that dlopen calls succeed – ideally in postFixup phase, as "unused" components get auto-pruned by default (dynamic dlopen isn't detected).
I'll do some learning, and then submit a PR which
a) Makes the dlopened libraries optional
b) sorts out their opening successfully if included (ideally in that postFixup phase that I'm going to learn about)
Typically you use patchelf in this way:
{
postFixup = ''
patchelf --set-rpath "$(patchelf --print-rpath $out/.../some-ELF):${stdenv.lib.makeLibraryPath [ package1 package2 ]}" \
$out/.../that-same-ELF
'';
}
Ah great - I've been reading examples by using GH's code search but that's a lot clearer thanks
Well I've got something that works locally (I haven't worked out how to get the iteration time down - the examples with nix-env I saw don't let me do the test I want to do (postFixup), so I have to keep building the darned thing from scratch to get that step to run in a directory where it's allowed, I'm sure I'll work that out at some point.
Anyway, the culprit seems to be libigfxcmrt.so and I can probably get away with just patching that library (rpaths and such not being inherited for dynamically loaded libraries). I've got a nice simple test set up once the build takes place, so I just need to narrow down what libraries are dynamically loaded and under what circumstances and then make those libraries optional if they are so, I'm not convinced libdrm is optional so that might just be X11 in the end.
After playing around and repeatedly running my little test, I narrowed it down and everything actually works just fine patching the iHD driver.
The intel samples themselves need re-writing with patchelf if they're to run, but my own code works fine out of the box - presumably because I'm not doing anything special and everything is being built sensibly within a nix-shell context.
Thank you for your contributions.
This has been automatically marked as stale because it has had no activity for 180 days.
If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.
Here are suggestions that might help resolve this more quickly:
This is still important to me...
@primeos ? (the bot told me to)
Yes, I'm the new maintainer. Unfortunately I'm already a bit busy with other stuff atm and haven't read the whole issue yet, but patches would be welcome.
Does vainfo from libva-utils work correctly on your system? On my setup (KBL CPU) VA-API works fine, so I'm not sure if I could reproduce this (but I also never tested the intel-media-sdk samples - only vainfo, mpv and Chromium).
To be clear, there's a PR already: https://github.com/NixOS/nixpkgs/pull/73277
sorry, I should have led with that :)
Most helpful comment
To be clear, there's a PR already: https://github.com/NixOS/nixpkgs/pull/73277