I have successfully built PyTorch from source for my legacy hardware using the following build parameters:
export USE_CUDA=1 USE_CUDNN=1 USE_MKLDNN=1 TORCH_CUDA_ARCH_LIST="3.0"
Now when I try to build TorchVision ~/vision$ python setup.py install I get the following error:
Traceback (most recent call last):
File "setup.py", line 222, in <module>
'clean': clean,
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/__init__.py", line 145, in setup
return distutils.core.setup(**attrs)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/install.py", line 67, in run
self.do_egg_install()
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/install.py", line 109, in do_egg_install
self.run_command('bdist_egg')
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 172, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 158, in call_command
self.run_command(cmdname)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/command/install_lib.py", line 107, in build
self.run_command('build_ext')
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 369, in build_extensions
build_ext.build_extensions(self)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 205, in build_extension
_build_ext.build_extension(self, ext)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
depends=ext.depends)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/distutils/ccompiler.py", line 574, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 285, in unix_wrap_compile
"'-fPIC'"] + cflags + _get_cuda_arch_flags(cflags)
File "/home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1036, in _get_cuda_arch_flags
raise ValueError("Unknown CUDA arch ({}) or GPU not supported".format(arch))
ValueError: Unknown CUDA arch (3.0) or GPU not supported
So, is there a way to build Torchvision for GPUs with CUDA compute 3.0 or can a simple pip install save me from that work?
PS. installing Torchvision from conda also brings with it PyTorch so can't install through that method.
Does adding 3.0 in this line helps?
@peterjc123 seems like that's where the problem is, but would I need to build Pytorch from source again for this to work correctly?
No, you don't need to do that. Just apply the changes to /home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/torch/utils/cpp_extension.py.
@peterjc123 Thanks a bunch it worked! Probably will also have to do something similar for Torchaudio
Before reading this issue, had built pytorch from source with TORCH_CUDA_ARCH_LIST=3.0 and this occurs when building torchvision with it, but editing the 'cpp_extension.py' file per above to add '3.0' for consumer grade kepler cards clears it. Would be nice if said file picked up 'TORCH_CUDA_ARCH_LIST' when building torchvision.
@qhaas That would be nice!
@qhaas I agree it would be nice if PyTorch stored this information somehow, so that the user doesn't need to specify it themselves. Let me see if others think there is a way of doing this on PyTorch
@fmassa Actually, it is not exactly the problem. It accepts the TORCH_CUDA_ARCH_LIST as the input, but it just rejects the old architectures like CC 3.0.
@peterjc123 got it. but still, I've seen many times that the user need to manually specify TORCH_CUDA_ARCH_LIST if compiling on one machine and then executing the code on another one (with a different compute capability), while it could have been nicer if it was opt-out not to compile for all the architectures that your pytorch distribution supports.
@fmassa You mean a switch for compiling for all available architectures? Or we just do that by default?
Thanks for the explanation @peterjc123 ; odd it would blacklist consumer Titan/Geforce keplers (3.0), but not enterprise Tesla keplers (3.5)...
@peterjc123 I meant to take whatever flags have been used for compiling PyTorch and using them for torchvision. It doesn't make much sense from a user perspective to have to worry about TORCH_CUDA_ARCH_LIST except if they want to compile to only a single architecture to reduce binary size.
So my thinking was that we could have something like torch.__config__.show() to also display the TORCH_CUDA_ARCH_LIST that has been used to compile it (and have an easy way to query it like torch.__config__.get_cuda_arch_list(), and let the C++ extensions in PyTorch to automatically query this information and put it in the env var (if not already present) while compiling the extensions
@peterjc123 I meant to take whatever flags have been used for compiling PyTorch and using them for torchvision. It doesn't make much sense from a user perspective to have to worry about
TORCH_CUDA_ARCH_LISTexcept if they want to compile to only a single architecture to reduce binary size.So my thinking was that we could have something like
torch.__config__.show()to also display theTORCH_CUDA_ARCH_LISTthat has been used to compile it (and have an easy way to query it liketorch.__config__.get_cuda_arch_list(), and let the C++ extensions in PyTorch to automatically query this information and put it in the env var (if not already present) while compiling the extensions
LGTM. cc @ezyang
Yes this seems fine. Maybe will require a little work to setup, but certainly seems like useful information to retain.
Created an issue in PyTorch to track this down https://github.com/pytorch/pytorch/issues/38229
Most helpful comment
No, you don't need to do that. Just apply the changes to /home/rafay/anaconda3/envs/pytorch-build/lib/python3.7/site-packages/torch/utils/cpp_extension.py.