Apex: Cant install on google colab

Created on 24 Nov 2019  路  5Comments  路  Source: NVIDIA/apex

Most helpful comment

Version of default PyTorch in Google Colab has been updated to 1.3.1 and it's compiled by CUDA 10.1.
However, CUDA in the colab runtime is 10.0.130 (you can check it by this command: !cat /usr/local/cuda/version.txt).

So, when you install apex, you may see the following error message (run the pip command without -q):

...
Compiling cuda extensions with
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:01_CDT_2018
    Cuda compilation tools, release 10.0, V10.0.130
    from /usr/local/cuda/bin

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-_saohd3c/setup.py", line 100, in <module>
        check_cuda_torch_binary_vs_bare_metal(torch.utils.cpp_extension.CUDA_HOME)
      File "/tmp/pip-req-build-_saohd3c/setup.py", line 77, in check_cuda_torch_binary_vs_bare_metal
        "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  "
    RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 10.1.243.
    In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  You can try commenting out this check (at your own risk).
...

To solve this problem, you can try to install another version of PyTorch compiled by CUDA 10.
In my case, I choose torch 1.2.0 + torchvision 0.4.0:

!pip install https://download.pytorch.org/whl/cu100/torch-1.2.0-cp36-cp36m-manylinux1_x86_64.whl && pip install https://download.pytorch.org/whl/cu100/torchvision-0.4.0-cp36-cp36m-manylinux1_x86_64.whl

Or you can find the version you want to use from the official archives (choose those links with a prefix cu100 and a suffix cp36-cp36m-manylinux1_x86_64): https://download.pytorch.org/whl/torch_stable.html

After re-installing PyTorch, you can install apex again (it doesn't need to omit the option for extension --global-option="--cpp_ext" --global-option="--cuda_ext"), and it should work.

All 5 comments

Me either. FYI, when I omit --global-option="--cpp_ext" --global-option="--cuda_ext", it worked.

Version of default PyTorch in Google Colab has been updated to 1.3.1 and it's compiled by CUDA 10.1.
However, CUDA in the colab runtime is 10.0.130 (you can check it by this command: !cat /usr/local/cuda/version.txt).

So, when you install apex, you may see the following error message (run the pip command without -q):

...
Compiling cuda extensions with
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:01_CDT_2018
    Cuda compilation tools, release 10.0, V10.0.130
    from /usr/local/cuda/bin

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-_saohd3c/setup.py", line 100, in <module>
        check_cuda_torch_binary_vs_bare_metal(torch.utils.cpp_extension.CUDA_HOME)
      File "/tmp/pip-req-build-_saohd3c/setup.py", line 77, in check_cuda_torch_binary_vs_bare_metal
        "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  "
    RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 10.1.243.
    In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  You can try commenting out this check (at your own risk).
...

To solve this problem, you can try to install another version of PyTorch compiled by CUDA 10.
In my case, I choose torch 1.2.0 + torchvision 0.4.0:

!pip install https://download.pytorch.org/whl/cu100/torch-1.2.0-cp36-cp36m-manylinux1_x86_64.whl && pip install https://download.pytorch.org/whl/cu100/torchvision-0.4.0-cp36-cp36m-manylinux1_x86_64.whl

Or you can find the version you want to use from the official archives (choose those links with a prefix cu100 and a suffix cp36-cp36m-manylinux1_x86_64): https://download.pytorch.org/whl/torch_stable.html

After re-installing PyTorch, you can install apex again (it doesn't need to omit the option for extension --global-option="--cpp_ext" --global-option="--cuda_ext"), and it should work.

downgrading and restarting the runtime works fine! thx
https://colab.research.google.com/drive/1drodd29aL2B8ufcb0gwrBBhGDvPBaDha

I guess you can close this issue

My colab has CUDA 10.1.243. What pytorch version should I install?

Hi @kushagra1198, apex should work fine without doing any further configuration now.

Currently, PyTorch on Colab is also compiled by CUDA 10.1.243, you can check it out by the following code snippet:

# just some code taken from `apex/setup.py`
import subprocess, torch
from torch.utils.cpp_extension import CUDAExtension

cuda_dir = torch.utils.cpp_extension.CUDA_HOME
print(subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"]))

# you would see the this:
# b'nvcc: NVIDIA (R) Cuda compiler driver\nCopyright (c) 2005-2019 NVIDIA Corporation\nBuilt on Sun_Jul_28_19:07:16_PDT_2019\nCuda compilation tools, release 10.1, V10.1.243\n'

Feel free to let me know if it doesn't work.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jah3xc picture jah3xc  路  4Comments

lemonhu picture lemonhu  路  3Comments

Data-drone picture Data-drone  路  4Comments

rmrao picture rmrao  路  3Comments

dave-epstein picture dave-epstein  路  3Comments