Vision: Building Torchvision for CUDA compute architecture 3.0.

Created on 31 Jan 2020 · 15Comments · Source: pytorch/vision

I have successfully built PyTorch from source for my legacy hardware using the following build parameters:
export USE_CUDA=1 USE_CUDNN=1 USE_MKLDNN=1 TORCH_CUDA_ARCH_LIST="3.0"

Now when I try to build TorchVision ~/vision$ python setup.py install I get the following error:

Traceback (most recent call last):
  File "setup.py", line 222, in <module>
    'clean': clean,
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/__init__.py", line 145, in setup
    return distutils.core.setup(**attrs)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/install.py", line 67, in run
    self.do_egg_install()
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/install.py", line 109, in do_egg_install
    self.run_command('bdist_egg')
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 172, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 158, in call_command
    self.run_command(cmdname)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/command/install_lib.py", line 107, in build
    self.run_command('build_ext')
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 84, in run
    _build_ext.run(self)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 369, in build_extensions
    build_ext.build_extensions(self)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 205, in build_extension
    _build_ext.build_extension(self, ext)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
    depends=ext.depends)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/ccompiler.py", line 574, in compile
    self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 285, in unix_wrap_compile
    "'-fPIC'"] + cflags + _get_cuda_arch_flags(cflags)
  File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1036, in _get_cuda_arch_flags
    raise ValueError("Unknown CUDA arch ({}) or GPU not supported".format(arch))
ValueError: Unknown CUDA arch (3.0) or GPU not supported

So, is there a way to build Torchvision for GPUs with CUDA compute 3.0 or can a simple pip install save me from that work?

PS. installing Torchvision from conda also brings with it PyTorch so can't install through that method.

build

Source

RafayAK

Most helpful comment

No, you don't need to do that. Just apply the changes to /home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/torch/utils/cpp_extension.py.

peterjc123 on 2 Feb 2020

👍3

All 15 comments

Does adding 3.0 in this line helps?

peterjc123 on 2 Feb 2020

@peterjc123 seems like that's where the problem is, but would I need to build Pytorch from source again for this to work correctly?

RafayAK on 2 Feb 2020

No, you don't need to do that. Just apply the changes to /home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/torch/utils/cpp_extension.py.

peterjc123 on 2 Feb 2020

👍3

@peterjc123 Thanks a bunch it worked! Probably will also have to do something similar for Torchaudio

RafayAK on 2 Feb 2020

👍1

Before reading this issue, had built pytorch from source with TORCH_CUDA_ARCH_LIST=3.0 and this occurs when building torchvision with it, but editing the 'cpp_extension.py' file per above to add '3.0' for consumer grade kepler cards clears it. Would be nice if said file picked up 'TORCH_CUDA_ARCH_LIST' when building torchvision.

qhaas on 4 May 2020

👍1

@qhaas That would be nice!

RafayAK on 5 May 2020

@qhaas I agree it would be nice if PyTorch stored this information somehow, so that the user doesn't need to specify it themselves. Let me see if others think there is a way of doing this on PyTorch

fmassa on 5 May 2020

@fmassa Actually, it is not exactly the problem. It accepts the TORCH_CUDA_ARCH_LIST as the input, but it just rejects the old architectures like CC 3.0.

peterjc123 on 5 May 2020

@peterjc123 got it. but still, I've seen many times that the user need to manually specify TORCH_CUDA_ARCH_LIST if compiling on one machine and then executing the code on another one (with a different compute capability), while it could have been nicer if it was opt-out not to compile for all the architectures that your pytorch distribution supports.

fmassa on 5 May 2020

@fmassa You mean a switch for compiling for all available architectures? Or we just do that by default?

peterjc123 on 5 May 2020

Thanks for the explanation @peterjc123 ; odd it would blacklist consumer Titan/Geforce keplers (3.0), but not enterprise Tesla keplers (3.5)...

qhaas on 5 May 2020

@peterjc123 I meant to take whatever flags have been used for compiling PyTorch and using them for torchvision. It doesn't make much sense from a user perspective to have to worry about TORCH_CUDA_ARCH_LIST except if they want to compile to only a single architecture to reduce binary size.

So my thinking was that we could have something like torch.__config__.show() to also display the TORCH_CUDA_ARCH_LIST that has been used to compile it (and have an easy way to query it like torch.__config__.get_cuda_arch_list(), and let the C++ extensions in PyTorch to automatically query this information and put it in the env var (if not already present) while compiling the extensions

fmassa on 11 May 2020

@peterjc123 I meant to take whatever flags have been used for compiling PyTorch and using them for torchvision. It doesn't make much sense from a user perspective to have to worry about TORCH_CUDA_ARCH_LIST except if they want to compile to only a single architecture to reduce binary size.

So my thinking was that we could have something like torch.__config__.show() to also display the TORCH_CUDA_ARCH_LIST that has been used to compile it (and have an easy way to query it like torch.__config__.get_cuda_arch_list(), and let the C++ extensions in PyTorch to automatically query this information and put it in the env var (if not already present) while compiling the extensions

LGTM. cc @ezyang

peterjc123 on 11 May 2020

Yes this seems fine. Maybe will require a little work to setup, but certainly seems like useful information to retain.