I am using a Tesla K40m, installed pytorch 1.3 with conda, using CUDA 10.1
Steps to reproduce the behavior:
conda install pytorch cudatoolkit -c pytorch
python -c 'import torch; print(torch.cuda.is_available());'
>>> True
.forward()
Traceback (most recent call last):
File "./baselines/get_results.py", line 395, in <module>
main(args)
File "./baselines/get_results.py", line 325, in main
log_info = eval_main(eval_args)
File "/mnt/cdtds_cluster_home/s0816700/git/midi_degradation_toolkit/baselines/eval_task.py", line 165, in main
log_info = trainer.test(0, evaluate=True)
File "/mnt/cdtds_cluster_home/s0816700/git/midi_degradation_toolkit/mdtk/pytorch_trainers.py", line 110, in test
evaluate=evaluate)
File "/mnt/cdtds_cluster_home/s0816700/git/midi_degradation_toolkit/mdtk/pytorch_trainers.py", line 220, in iteration
model_output = self.model.forward(input_data, input_lengths)
File "/mnt/cdtds_cluster_home/s0816700/git/midi_degradation_toolkit/mdtk/pytorch_models.py", line 49, in forward
self.hidden = self.init_hidden(batch_size, device=device)
File "/mnt/cdtds_cluster_home/s0816700/git/midi_degradation_toolkit/mdtk/pytorch_models.py", line 40, in init_hidden
return (torch.randn(1, batch_size, self.hidden_dim, device=device),
RuntimeError: CUDA error: no kernel image is available for execution on the device
First tried downgrading to cudatoolkit=10.0, that exhibited same issue.
The code will run fine if you repeat steps above but instead conda install pytorch=1.2 cudatoolkit=10.0 -c pytorch
.
If no longer supporting a specific GPU, please bomb out upon load with useful error message.
Unfort ran your script after I 'fixed' so pytorch version will be 1.2 here - issue encountered with version 1.3.
Collecting environment information...
PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130
OS: Scientific Linux release 7.6 (Nitrogen)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)
CMake version: version 2.8.12.2
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla K40m
Nvidia driver version: 430.50
cuDNN version: /usr/lib64/libcudnn.so.6.5.18
Versions of relevant libraries:
[pip3] numpy==1.16.3
[pip3] numpydoc==0.8.0
[conda] blas 1.0 mkl
[conda] mkl 2019.4 243
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.0.15 py37ha843d7b_0
[conda] mkl_random 1.1.0 py37hd6b4f25_0
[conda] pytorch 1.2.0 py3.7_cuda10.0.130_cudnn7.6.2_0 pytorch
[conda] torchvision 0.4.0 py37_cu100 pytorch
cc @ezyang @gchanan @zou3519 @jerryzh168 @ngimel
Just to be sure, were you using 1.3.0
or 1.3.1
?
1.3.1
conda list 'pytorch|cuda'
>>> # packages in environment at /home/s0816700/miniconda3/envs/mdtk:
>>> #
>>> # Name Version Build Channel
>>> cudatoolkit 10.1.243 h6bb024c_0
>>> pytorch 1.3.1 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
Was the env at point of failure.
cc @ngimel
K40m has a compute capability of 3.5, which I believe we have dropped support of.
Ok. Please may you implement a useful "oldgpu" warning? Like here: https://github.com/pytorch/pytorch/issues/6529
Error at the moment very unclear to casual user like me.
--- EDIT ---
Would also be great to link users:
Struggl(ed/ing) to find both of those things!
As an aside, @SsnL - possibly this line needs updating if you are correct:
https://github.com/pytorch/pytorch/blob/bf61405ed61816b23c57718722145f26f217666a/torch/__init__.py#L10. Where did you get your information about minimal compute capability support?
@JamesOwers If I'm not mistaken, this commit bumped the minimal compute capability to 3.7.
There's no technical reason for it to be changed to 3.7 right?
The code still supports 3.5 (and even 3.0 again).
This is just for Conda? Looks like it went from 3.5 and 5.0+ to 3.7 and 5.0+ so it was always missing either 3.5 or 3.7. I suppose it takes too long/becomes too large to support more than 2 built architectures.
@soumith might correct me, but I think the main reason is the growing size of the binaries.
@ptrblck that is the reason but it is strange it went from supporting K40 (+ several consumer cards) and not K80 to supporting K80 and not K40 (+ several consumer cards).
on an NVIDIA GPU with compute capability >= 3.0.
I also wish there was a way for the message to reflect the minimum cuda arch from the cuda arch list for when it was compiled. This would make it easier when it gets changed to 3.7, for example. Or when a user supports 3.0 by compiling it themselves.
This is also being discussed at https://github.com/pytorch/pytorch/issues/24205#issuecomment-560185215
I'd just like to suggest that the compatible compute capabilities for the precompiled binaries be added somewhere to the documentation, especially when providing installation instructions for the binaries. That information does not appear to be readily available anywhere.
k40m with cuda10.0 get the same error!!!
build from source get more Error!!!
hi guys, i have made a python 3.6 pytorch 1.3.1 linux_x86_64 wheel without restriction on compute capability, and it's working on my 3.5 GPU. i would be more than happy to build wheels for different python and pytorch versions if someone can tell me a proper distribution channel (i.e. not google drive).
@jayenashar Are you able to provide instruction to build Pytorch version 1.3.1 for a specific GPU (NVIDIA Tesla K20 GPU) & Python 3.6.8? I've attempted to build a compatible version but am still having hardware compatibility issues:
[W NNPACK.cpp:77] Could not initialize NNPACK! Reason: Unsupported hardware.
@anowlan123 I don't see a reason to build for a specific GPU, but I believe you can export the environment variable TORCH_CUDA_ARCH_LIST
for your specific compute capability (3.5
), then use the build-from-source instructions for pytorch.
The pytorch 1.3.1 wheel I made should work for you (python 3.6.9, NVIDIA Tesla K20 GPU). I setup a pypi account to try and distribute it, but it seems there is a 60MB limit, and my wheel is 139MB. So I have uploaded it here: https://github.com/UNSWComputing/pytorch/releases/download/v1.3.1/torch-1.3.1-cp36-cp36m-linux_x86_64.whl
Dear @anowlan123,
I would be very interested in a wheel of pytorch1.4 that work with Keppler K40 and cuda9.2. Would you be able to help out? I am thinking about installing that via miniconda.
@PeteKey you didn't specify a python version, but i made a wheel with python 3.6, pytorch 1.4.1, and magma-cuda92. please try it here: https://github.com/UNSWComputing/pytorch/releases/download/v1.4.1/torch-1.4.1-cp36-cp36m-linux_x86_64.whl if you have any issues, please upgrade to cuda10.2.
@anowlan123, python 3.6 is fine but I guess something is not quite working yet. Any idea how to fix this? I am running this on ubuntu 14.04 if that matters.
File "/home/pk/miniconda3/envs/pytorch1.4py36_unsw_anowlan123/lib/python3.6/site-packages/torch/__init__.py", line 81, in
from torch._C import *
ImportError: libmkl_intel_lp64.so: cannot open shared object file: No such file or directory
@jayenashar Thanks, still having the same compatibility issues though. @PeteKey once i create an conda enviroment, i used this script to build from source.
cd /home/user
conda activate env
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi
conda install -c pytorch magma-cuda101
cd ~/anaconda3/envs/env/compiler_compat
mv ld ld-old
cd /home/user/Downloads
git clone --recursive https://github.com/pytorch/pytorch
cd /home/user/Downloads/pytorch
git submodule sync
git submodule update --init --recursive
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
export PYTORCH_BUILD_VERSION=1.3.1
export PYTORCH_BUILD_NUMBER=1
export TORCH_CUDA_ARCH_LIST=3.5
python setup.py install
setup.py clean --all
cd ~/anaconda3/envs/env/compiler_compat
mv ld-old ld
python 3.6 is fine but I guess something is not quite working yet. Any idea how to fix this? I am running this on ubuntu 14.04 if that matters.
File "/home/pk/miniconda3/envs/pytorch1.4py36_unsw_anowlan123/lib/python3.6/site-packages/torch/init.py", line 81, in
from torch._C import *
ImportError: libmkl_intel_lp64.so: cannot open shared object file: No such file or directory
@PeteKey i think it's because you are using the wheel in conda. try installing mkl in your conda env.
also, upgrade your ubuntu
@anowlan123 i don't see where you checkout v1.3.1 in the git repo. also i'm not sure where your install installs to. here is the script i used, it only makes the wheel, then i used pip to install it.
#!/bin/bash
CONDA_ENV=py369
PYTHON_VERSION=3.6.9
CUDA_VERSION=102
PYTORCH_BUILD_VERSION=1.3.1
set -e
set -u
conda create --yes --name $CONDA_ENV python=$PYTHON_VERSION
conda activate $CONDA_ENV
conda install --yes numpy ninja pyyaml mkl mkl-include setuptools cmake cffi
conda install --yes --channel pytorch magma-cuda$CUDA_VERSION
cd ~/pytorch
git checkout v$PYTORCH_BUILD_VERSION
git submodule sync
git submodule update --init --recursive
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
export PYTORCH_BUILD_VERSION
PYTORCH_BUILD_NUMBER=0 python setup.py bdist_wheel
@PeteKey i just uploaded files for conda (py3.5 and py3.6) to https://github.com/UNSWComputing/pytorch/releases/tag/v1.4.1 . i think you can download the files and install them with conda install $filename
but i'm not sure how to tell conda to install the dependencies. maybe with --only-deps
? here are the dependencies of the py3.6 tarball if you need them:
"blas * mkl",
"cudatoolkit >=9.2,<9.3",
"mkl >=2018",
"ninja",
"numpy >=1.11",
"python >=3.6,<3.7.0a0"
I just got it compiled once and it worked with K40 (so that's the proof of concept done), but then I did something and it did get messed up so I am starting from scratch.
What would be best steps to install torchvision for cuda 10.0 and cudatoolkit 10.0, and then compile the pytorch 1.4? I suppose the question is in which order I should do things in order not to end up with mess as miniconda install likes to 'resolve' lots of things and my feeling is that causes mess later on as it wants to download pytorch 1.4 from the channel etc.
maybe the best thing is to install the upstream pytorch 1.4.1 for cuda 10.0 and then install the package i can make for you. this is how i am making the conda tarballs with pytorch/builder:
export PYTORCH_REPO=pytorch
export PYTORCH_BRANCH=v1.4.1
export PYTORCH_BUILD_VERSION=1.4.1
export PYTORCH_BUILD_NUMBER=0
export TORCH_CONDA_BUILD_FOLDER=pytorch-nightly
export TORCH_PACKAGE_NAME=torch
export PIP_UPLOAD_FOLDER=""
export NIGHTLIES_ROOT_FOLDER="$HOME/local/builder/binaries_v1.4.1"
cd pytorch-builder/cron
./build_multiple.sh conda 3.6 cu92
@jayenashar, I am very close but one thing that is still weird is that even when I export CUDA_VERSION=100, the torch.version.cuda still shows cuda 10.2 ... which is a bit too new for some codes I am trying to run.
If you tell me how to force compilation to use and then have pytorch with cuda 10, this will solve my aches. Below I list my packages (I can see cuda100 and cudatulkit 10.0 there) so am confused.
_libgcc_mutex 0.1 main
blas 1.0 mkl
bzip2 1.0.8 h7b6447c_0
ca-certificates 2020.1.1 0
certifi 2020.4.5.1 py38_0
cffi 1.14.0 py38he30daa8_1
cloudpickle 1.4.1 py_0 conda-forge
cmake 3.14.0 h52cb24c_0
cuda100 1.0 0 pytorch
cudatoolkit 10.0.130 0
cycler 0.10.0 py_2 conda-forge
cytoolz 0.10.1 py38h516909a_0 conda-forge
dask-core 2.16.0 py_0 conda-forge
decorator 4.4.2 py_0 conda-forge
expat 2.2.6 he6710b0_0
freetype 2.9.1 h8a8886c_1
icu 58.2 hf484d3e_1000 conda-forge
imagecodecs-lite 2019.12.3 py38h1e0a361_0 conda-forge
imageio 2.8.0 py_0 conda-forge
intel-openmp 2020.1 217
joblib 0.15.1 py_0 conda-forge
jpeg 9b h024ee3a_2
kiwisolver 1.2.0 py38hbf85e49_0 conda-forge
krb5 1.17.1 h173b8e3_0
ld_impl_linux-64 2.33.1 h53a641e_7
libblas 3.8.0 15_mkl conda-forge
libcblas 3.8.0 15_mkl conda-forge
libcurl 7.69.1 h20c2e04_0
libedit 3.1.20181209 hc058e9b_0
libffi 3.3 he6710b0_1
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
liblapack 3.8.0 15_mkl conda-forge
libpng 1.6.37 hbc83047_0
libssh2 1.9.0 h1ba5d50_1
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 h2733197_0
magma-cuda100 2.5.2 1 pytorch
matplotlib-base 3.1.3 py38hef1b27d_0
mkl 2020.1 217
mkl-include 2020.1 217
mkl-service 2.3.0 py38he904b0f_0
mkl_fft 1.0.15 py38ha843d7b_0
mkl_random 1.1.0 py38h962f231_0
ncurses 6.2 he6710b0_1
networkx 2.4 py_1 conda-forge
ninja 1.9.0 py38hfd86e86_0
numpy 1.18.1 py38h4f9e942_0
numpy-base 1.18.1 py38hde5b4d6_1
olefile 0.46 py_0
openssl 1.1.1g h7b6447c_0
pillow 7.1.2 py38hb39fc2d_0
pip 20.0.2 py38_3
pycparser 2.20 py_0
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
python 3.8.2 hcff3b4d_14
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.8 1_cp38 conda-forge
pytorch 1.4.0 py3.8_cuda10.0.130_cudnn7.6.3_0 pytorch
pywavelets 1.1.1 py38h8790de6_1 conda-forge
pyyaml 5.3.1 py38h7b6447c_0
readline 8.0 h7b6447c_0
rhash 1.3.8 h1ba5d50_0
scikit-image 0.17.2 py38hcb8c335_0 conda-forge
scikit-learn 0.23.0 py38h3a94b23_0 conda-forge
scipy 1.4.1 py38h18bccfc_3 conda-forge
setuptools 46.4.0 py38_0
six 1.14.0 py38_0
sqlite 3.31.1 h62c20be_1
threadpoolctl 2.0.0 pyh5ca1d4c_0 conda-forge
tifffile 2020.5.11 py_0 conda-forge
tk 8.6.8 hbc83047_0
toolz 0.10.0 py_0 conda-forge
torch 1.4.1 pypi_0 pypi
torchvision 0.5.0 py38_cu100 pytorch
tornado 6.0.4 py38h1e0a361_1 conda-forge
wheel 0.34.2 py38_0
xz 5.2.5 h7b6447c_0
yaml 0.1.7 had09818_2
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0
@jayenashar, I am very close but one thing that is still weird is that even when I export CUDA_VERSION=100, the torch.version.cuda still shows cuda 10.2 ... which is a bit too new for some codes I am trying to run.
@PeteKey I'm not 100% sure the CUDA_VERSION in my script actually works. It may only set the package version numbers and not set what version of CUDA it builds with. if you run nvcc --version
on the machine that you are using to build from source, that probably needs to be 10.0.
@jayenashar, @anowlan123, thanks for help. In the end I have compiled pytorch 1.5 for cuda10.2 and installed other packages via pip install, and finally the codes I wanted to run kicked off. So, my old K40 gets a bit more life.
@jayenashar, @anowlan123, last more question... bl...y pytorch somehow has got compiled without magma ... Do you know if I need to setup any flags for that. I had magma installed yet ...
@jayenashar, @anowlan123, solved, export MAGMA_HOME=/home/pk/miniconda3/pkgs/magma-cuda102-2.5.2-1/
Guys, thank you for providing the solutions to get torch running with K40! 馃憤馃徎
I seem to have built torch from source fine, but it fails to import extension from torch.utils.cpp_extension import xxx
My env:
K40, export *ARCH=3.5
Torch v1.5.0, build finished ok and I can import torch
Note, the source built torch==1.5.0 did not replace conda-installed pytorch==1.3.1, which I uninstalled along with its dependencies.
Do I need to specify in the torch source build to use CUDA and cpp_extension ?
Thank you
@breznak
There's some differences between conda and pip such that the names are different (pytorch vs torch) hence the old version was not replaced.
from torch.utils import cpp_extension
works for me with my 1.3.1 pip package, so i'm not sure what is the issue with yours.
from torch.utils import cpp_extension works for me with my 1.3.1 pip package, so i'm not sure what is the issue with yours.
works for me too, but I need torch >= 1.4, and I need to build from source (K40) ..these are the diffs.
do any of the files i uploaded to https://github.com/UNSWComputing/pytorch/releases/tag/v1.4.1 work for you?
do any of the files i uploaded to https://github.com/UNSWComputing/pytorch/releases/tag/v1.4.1 work for you?
thanks! I'll test but guessing from the names I don't think so, unfortunately.
My constraints are:
py>= 3.6
cuda >=10.1
torch>=1.4.1
I assume since you tried pytorch 1.5.0 first, that is your preferred one, so try this one: https://github.com/UNSWComputing/pytorch/releases/tag/v1.5.0 Also assumed the minimums py & cuda are what you already have installed, so built based on those assumptions.
@jayenashar thank you very much for supporting this!
I've tried your repo, but import torch
is failing for me after install with
from torch._C import *
ImportError: /home/xxx/miniconda3/envs/detectron2-env/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: PySlice_Unpack
EDIT: This looks like python issue, I was wrong with the default python: python 3.7 is default in my conda env. I was able to install python==3.6
from conda-forge, but not sure if this err is related to use of that python ?
Are you on Python 3.6.0? I believe there's an issue with that and you need any other 3.6 (i.e., 3.6.1- 3.6.10)
If you run python3.7, it shouldn't try to use this version of torch, so i would guess you have python 3.6.0
Are you on Python 3.6.0? I believe there's an issue with that
you're correct. (I had default python 3.7 installed in conda env, since I mistakenly told you I have 3.6, I installed python 3.6.0 available at conda-forge.) I'll see if there's later version of 3.6.x available at conda.
Dear @jayenashar Can you please build PyTorch 1.5.0 for Python 3.7? I was trying to build from source, but met some unexpected errors. Really appreciate your effort!
My settings are:
Python 3.7
CUDA 10.1
Update: I setup a Python 3.6 environment and the package provided by @jayenashar (https://github.com/UNSWComputing/pytorch/releases/tag/v1.5.0) works well.
The latest torchvision (0.6.0) from Conda seems also work.
No problem. For anyone making future requests, I would like the following info in your request:
I do not need the exact GPU or compute capability. If your chosen version of pytorch supports it, it will be included.
I do not need the OS. I am only building linux packages.
@wydwww check https://github.com/UNSWComputing/pytorch/releases/tag/v1.5.0 for the latest upload. :)
Thank you @jayenashar , I really appreciate your work! Thanks to your prebuilt packages, I'm able to run pytorch (torchvision) on our cluster :+1:
believe there's an issue with that and you need any other 3.6 (i.e., 3.6.1- 3.6.10)
everything works fine with py 3.7.
so try this one: https://github.com/UNSWComputing/pytorch/releases/tag/v1.5.0
for the reference for people installing from these binaries, you should use conda (not python, pip) to install these packages.
conda install pytorch-....tar.gz
@jayenashar Any luck building pytorch 1.6? I'm running into compilation errors at the moment, and am trying to debug it. I was able to build 1.5.1 so I'm not sure what's going on.
sadly my [remote] machine for building is down, and no ETA on when i can get to it to bring it back up, sorry. haven't tried 1.6.
So the "bug" was pretty silly. I forgot to clean the build folder before trying to build 1.6.0. After I cleaned it, the build was successful.
Here's a whl file in case anyone needs it. It was built without tests or caffe2 operators: https://github.com/KevinMusgrave/pytorch/releases/tag/v1.6.0-compute-capability-3.5
@jayenashar Just commenting here to thank you. Managed to make 1.5 work in a K40 :)
@jayenashar could be possible for you to compile a version for a tesla k40c with NVCC release 9.2, V9.2.148?
pytorch version - closer to 1.6.0
conda or pip - any
cuda version - 10
python version - 3.8.5 or other
When i use nvcc --version
it throws 9.2, but when i do nvidia-smi
it outputs: Driver Version: 410.104 CUDA Version: 10.0
So I'm not sure of the cuda version. Any ideas why these versions don't match?
Edit: I managed to run PyTorch 1.4.1 with Torchvision 0.5.0 on Tesla K40c with the following procedure:
pytorch-1.4.1-py3.7_cuda9.2.148_cudnn7.6.3_0.tar.bz2
from @jayenashar solution on https://github.com/UNSWComputing/pytorch/releases/tag/v1.4.1conda install torchvision=0.5.0=py37_cu92
conda install pytorch-1.4.1-py3.7_cuda9.2.148_cudnn7.6.3_0.tar.bz2
@Vichoko https://github.com/UNSWComputing/pytorch/releases/tag/v1.6.0
you may have installed cuda-nvcc-9-2 and a precompiled driver. probably easier to upgrade nvcc.
@jayenashar Can you compile a version for Tesla K40c with the following:
Many thanks!
sorry @sophiaas it looks like i can't do CUDA 10.2 so easily. @KevinMusgrave has 10.1 at https://github.com/KevinMusgrave/pytorch/releases/tag/v1.6.0-compute-capability-3.5 in a wheel and i'm trying to build a conda version now.
@sophiaas i uploaded a 10.1 conda package to https://github.com/UNSWComputing/pytorch/releases/tag/v1.6.0
i'll keep trying to make a 10.2 but it keeps giving me a CPU-only build.
@jayenashar Thank you! Really appreciate it.
If it helps, I put up some K40-compatible pip binaries at https://nelsonliu.me/files/pytorch/whl/torch_stable.html . Versions 1.3.1 to 1.6.0, hoping to keep them updated for new releases.
You can pip-install them with (change desired versions as necessary):
pip install torch==1.3.1+cu92 -f https://nelsonliu.me/files/pytorch/whl/torch_stable.html
I tested all of these binaries (except the CUDA 10.2 ones, hoping to get to those soon) by running the word-level language modeling example on a K40 and manually verifying that the perplexity was the same within versions and generally reasonable.
@nelson-liu that's great. then i only need to worry about conda packages.
this is how i test for cuda: python -c 'import torch; torch.randn([3,5]).cuda()'
@jayenashar
That's so great to see your releases!
I'm planning to subscribe to your future releases as well if you still plan to keep update with later pytorch releases. Would you plan to always release to that forked repo? Where should I request specific build, here or elsewhere?
@Guptajakala yes i will release to that forked repo, unless someone knows a better place. i tried pypi but it seems they have a file size limit and that is the reason the official builds don't support old GPUs. i can try an anaconda channel.
right now i'm taking requests here as it seems to be the discoverable place.
@jayenashar
Hi, does this work with python 3.6.9?
I downloaded this one and run command
conda install ./pytorch-1.6.0-py3.7_cuda10.1.243_cudnn7.6.3_0.tar.bz2
After installation, I import it but it says
No module named 'torch'
conda list shows this item
pytorch 1.6.0 py3.7_cuda10.1.243_cudnn7.6.3_0 <unknown>
@Guptajakala no you can't use a py3.7 pytorch with python 3.6.
do you want me to build you one for python 3.6?
@jayenashar
that would be great, thank you!
@jayenashar aweseome!
Most helpful comment
I'd just like to suggest that the compatible compute capabilities for the precompiled binaries be added somewhere to the documentation, especially when providing installation instructions for the binaries. That information does not appear to be readily available anywhere.