Describe the bug
Pytorch fails to build
Executing pythonImportsCheckPhase
Check whether the following modules can be imported: torch
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<string>", line 1, in <lambda>
File "/nix/store/2dcsn57cgaxs92ha5swihrab0g3l2h6g-python3-3.7.7/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/nix/store/i6fkycsn1q8vk4190hslrvh256wilw5i-python3.7-pytorch-1.4.1/lib/python3.7/site-packages/torch/__init__.py", line 81, in <module>
from torch._C import *
ImportError: /nix/store/i6fkycsn1q8vk4190hslrvh256wilw5i-python3.7-pytorch-1.4.1/lib/python3.7/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZN5torch4cuda4nccl6detail16throw_nccl_errorE12ncclResult_t
builder for '/nix/store/lia86jj07nygz23ca3s1gxahx62ipls2-python3.7-pytorch-1.4.1.drv' failed with exit code 1
cannot build derivation '/nix/store/ia0jgpq5w2myqpp8zha2xj7h3f5cj5nn-python3-3.7.7-env.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/hxp00cp2lvhhw677d1v8zbp5iv3cqpy1-home-manager-path.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/gcpmscdc0d5h3fg5m6ik411q6hhsa0mz-home-manager-generation.drv': 1 dependencies couldn't be built
error: build of '/nix/store/gcpmscdc0d5h3fg5m6ik411q6hhsa0mz-home-manager-generation.drv' failed
To Reproduce
Steps to reproduce the behavior:
.config/nixpkgs/config.nix to{
allowUnfree = true;
cudaSupport = true;
}
nix-build '<nixpkgs>' -A python3Packages.pytorchExpected behavior
Pytorch should build without failure.
Notify maintainers
@teh @thoughtpolice @tscholak
Metadata
$ nix-shell -p nix-info --run "nix-info -m"
- system: `"x86_64-linux"`
- host os: `Linux 5.4.35, NixOS, 20.09pre227577.135073a87b7 (Nightingale)`
- multi-user?: `yes`
- sandbox: `yes`
- version: `nix-env (Nix) 2.3.5`
- channels(theo): `"home-manager"`
- channels(root): `"nixos-20.09pre227577.135073a87b7"`
- channels(arsleust): `"home-manager"`
- nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2Maintainer information:
# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
I wonder if this is the sort of issue I was seeing during my many nixpkgs-review runs for #77714
Attaching the error I encountered just in case.
python37Packages.pytorchWithCuda.log
There were also some failures in a nixpkgs-review log that @jonringer posted, but I wasn't sure what to make of them, or if they were more builds that were failing when attempting builds with 128 cores as mentioned here.
Your log (python37Packages.pytorchWithCuda.log) gives exactly the same error for the same symbol...
So if you now say:
Looks like pytorch and batchgenerators build fine on master with Pillow 7.1.2.
I'll try to rerun a build now
Your log (python37Packages.pytorchWithCuda.log) gives exactly the same error for the same symbol...
So if you now say:
Looks like pytorch and batchgenerators build fine on master with Pillow 7.1.2.
I'll try to rerun a build now
I need to stop posting so late.
I did not realize the same symbol was there.
And yes, everything seemed to build fine on hydra.
Though, I suppose there鈥檚 a chance this might be an issue with the build environment, though I can鈥檛 imagine what that might be.
Same error on 20.09pre228204.467ce5a9f45
To my knowledge hydra builds pytorch without CUDA, @tbenst was working on a Hydra for CUDA related libs I believe. Seeing as cuda and nccl are part of the name of the symbol _ZN5torch4cuda4nccl6detail16throw_nccl_errorE12ncclResult_t, I think this is an issue related to CUDA.
Ok I have seen this: https://github.com/pytorch/pytorch/issues/32638
Corrected by https://github.com/pytorch/pytorch/commit/58cffbff91844217b9f4546e50ea0ab95544cb6a
Which is part of milestone 1.4.1 : https://github.com/pytorch/pytorch/milestone/14
but 1.5.0 seems on the way and the issue still not closed.
Anyway, I will investigate upstream and PR update accordingly
since https://github.com/NixOS/nixpkgs/pull/89802 was merged, and I'm able to build this on master:
[16:46:18] jon@nixos ~/projects/nixpkgs (master)
$ nix-build -A python3Packages.pytorch
/nix/store/nr2p2w6mrs8rqnj43ngzw3jq5gixsjai-python3.7-pytorch-1.5.0
I'm going to considered this issue resolved.
Most helpful comment
since https://github.com/NixOS/nixpkgs/pull/89802 was merged, and I'm able to build this on master:
I'm going to considered this issue resolved.