Nixpkgs: Can't build Pytorch

Created on 3 Jun 2020  路  6Comments  路  Source: NixOS/nixpkgs

Describe the bug
Pytorch fails to build

Executing pythonImportsCheckPhase
Check whether the following modules can be imported: torch
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<string>", line 1, in <lambda>
  File "/nix/store/2dcsn57cgaxs92ha5swihrab0g3l2h6g-python3-3.7.7/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/nix/store/i6fkycsn1q8vk4190hslrvh256wilw5i-python3.7-pytorch-1.4.1/lib/python3.7/site-packages/torch/__init__.py", line 81, in <module>
    from torch._C import *
ImportError: /nix/store/i6fkycsn1q8vk4190hslrvh256wilw5i-python3.7-pytorch-1.4.1/lib/python3.7/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZN5torch4cuda4nccl6detail16throw_nccl_errorE12ncclResult_t
builder for '/nix/store/lia86jj07nygz23ca3s1gxahx62ipls2-python3.7-pytorch-1.4.1.drv' failed with exit code 1
cannot build derivation '/nix/store/ia0jgpq5w2myqpp8zha2xj7h3f5cj5nn-python3-3.7.7-env.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/hxp00cp2lvhhw677d1v8zbp5iv3cqpy1-home-manager-path.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/gcpmscdc0d5h3fg5m6ik411q6hhsa0mz-home-manager-generation.drv': 1 dependencies couldn't be built
error: build of '/nix/store/gcpmscdc0d5h3fg5m6ik411q6hhsa0mz-home-manager-generation.drv' failed

To Reproduce
Steps to reproduce the behavior:

  1. Set .config/nixpkgs/config.nix to
{
    allowUnfree = true;
    cudaSupport = true;
}
  1. Switch to nixpkgs version 20.09pre227577.135073a87b7
  2. nix-build '<nixpkgs>' -A python3Packages.pytorch

Expected behavior
Pytorch should build without failure.

Notify maintainers

@teh @thoughtpolice @tscholak

Metadata

$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 5.4.35, NixOS, 20.09pre227577.135073a87b7 (Nightingale)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.5`
 - channels(theo): `"home-manager"`
 - channels(root): `"nixos-20.09pre227577.135073a87b7"`
 - channels(arsleust): `"home-manager"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
  • NVIDIA SMI working
    NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
bug python

Most helpful comment

since https://github.com/NixOS/nixpkgs/pull/89802 was merged, and I'm able to build this on master:

[16:46:18] jon@nixos ~/projects/nixpkgs (master)
$ nix-build -A python3Packages.pytorch
/nix/store/nr2p2w6mrs8rqnj43ngzw3jq5gixsjai-python3.7-pytorch-1.5.0

I'm going to considered this issue resolved.

All 6 comments

I wonder if this is the sort of issue I was seeing during my many nixpkgs-review runs for #77714
Attaching the error I encountered just in case.
python37Packages.pytorchWithCuda.log

There were also some failures in a nixpkgs-review log that @jonringer posted, but I wasn't sure what to make of them, or if they were more builds that were failing when attempting builds with 128 cores as mentioned here.

Your log (python37Packages.pytorchWithCuda.log) gives exactly the same error for the same symbol...

So if you now say:

Looks like pytorch and batchgenerators build fine on master with Pillow 7.1.2.

I'll try to rerun a build now

Your log (python37Packages.pytorchWithCuda.log) gives exactly the same error for the same symbol...

So if you now say:

Looks like pytorch and batchgenerators build fine on master with Pillow 7.1.2.

I'll try to rerun a build now

I need to stop posting so late.
I did not realize the same symbol was there.

And yes, everything seemed to build fine on hydra.
Though, I suppose there鈥檚 a chance this might be an issue with the build environment, though I can鈥檛 imagine what that might be.

Same error on 20.09pre228204.467ce5a9f45

To my knowledge hydra builds pytorch without CUDA, @tbenst was working on a Hydra for CUDA related libs I believe. Seeing as cuda and nccl are part of the name of the symbol _ZN5torch4cuda4nccl6detail16throw_nccl_errorE12ncclResult_t, I think this is an issue related to CUDA.

Ok I have seen this: https://github.com/pytorch/pytorch/issues/32638

Corrected by https://github.com/pytorch/pytorch/commit/58cffbff91844217b9f4546e50ea0ab95544cb6a

Which is part of milestone 1.4.1 : https://github.com/pytorch/pytorch/milestone/14
but 1.5.0 seems on the way and the issue still not closed.

Anyway, I will investigate upstream and PR update accordingly

since https://github.com/NixOS/nixpkgs/pull/89802 was merged, and I'm able to build this on master:

[16:46:18] jon@nixos ~/projects/nixpkgs (master)
$ nix-build -A python3Packages.pytorch
/nix/store/nr2p2w6mrs8rqnj43ngzw3jq5gixsjai-python3.7-pytorch-1.5.0

I'm going to considered this issue resolved.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

copumpkin picture copumpkin  路  3Comments

edolstra picture edolstra  路  3Comments

ghost picture ghost  路  3Comments

chris-martin picture chris-martin  路  3Comments

lverns picture lverns  路  3Comments