Maskrcnn-benchmark: build error:apex

Created on 23 Apr 2019  ·  31Comments  ·  Source: facebookresearch/maskrcnn-benchmark

❓ Questions and Help

when i install the apex using the command "python setup.py install --cuda_ext --cpp_ext"
I get the error :

torch.__version__  =  1.1.0.dev20190422
Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
from /usr/local/cuda/bin

Pytorch binaries were compiled with Cuda 9.0.176

running install
running bdist_egg
running egg_info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
reading manifest file 'apex.egg-info/SOURCES.txt'
writing manifest file 'apex.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'amp_C' extension
gcc -pthread -B /home/dy113/anaconda3/envs/maskrcnn_benchmark/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/TH -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/include/python3.7m -c csrc/amp_C_frontend.cpp -o build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda/bin/nvcc -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/TH -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/include/python3.7m -c csrc/multi_tensor_scale_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
csrc/type_shim.h(13): error: class "at::Type" has no member "scalarType"

1 error detected in the compilation of "/tmp/tmpxft_0000270c_00000000-6_multi_tensor_scale_kernel.cpp1.ii".
error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1

Can anyone help me? Thank you very much!

dependency bug

Most helpful comment

workaround is to downgrade to pytorch nightly from a few days ago:
conda install pytorch-nightly=1.0.0.dev20190404 cudatoolkit=10.0 -c pytorch

All 31 comments

i got the same trouble, too......

what is your gcc version?

what is your gcc version?

mine is gcc-5.2

According to this issue, seems that apex can only be installed with CUDA10. My gcc version is 7.3, python is 3.6, and my pytorch version is 1.0.0. It works.

same error with cuda 10 on ubuntu 16
tried with gcc 5 and gcc 7, python 2.7 and python 3.6

workaround is to downgrade to pytorch nightly from a few days ago:
conda install pytorch-nightly=1.0.0.dev20190404 cudatoolkit=10.0 -c pytorch

cc @mcarilli are you aware of any recent breakages of apex with latest PyTorch nightly?

There's already an issue at Apex (#267) with a PR (#272) that fixes it that apparently will be merged soon.

So if you need an immediate fix use the scalar_type branch of ptrblk's fork.

Thanks @mdsmith-cim for the concise, correct summary. Our fix will be merged tomorrow at the latest (I have some other commitments so I may not have time to review it in detail today).

@mdsmith-cim hi, sorry to bother you. I still have the same error after git your apex, Why is this, I look forward to your reply.

@zskadazhang Did you checkout the scalar_type branch?

@DavidSPumpkins Oh! It works,Thank you very much!

@zskadazhang Good to hear it's working!
Please tag me in case you are running into issues related to this branch.

However, we should merge it to apex/master today so you can pull from the master branch again.

The PR was merged so the build should work again using apex master. :)

With torch 1.1.0.dev20190425, and the latest apex fix, I still get an error when I try to compile with python setup.py install --cuda_ext --cpp_ext. I'm using gcc 5.5. Can anyone please help? Much appreciated!
```/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11080): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11089): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11100): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11109): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11120): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11129): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11140): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11149): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11160): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11169): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11180): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11189): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11200): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11209): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11220): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11229): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11240): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11249): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11260): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11269): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11280): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11289): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11300): error: argument of type "void *" is incompatible with parameter of type "long long *"

92 errors detected in the compilation of "/tmp/tmpxft_00004034_00000000-6_multi_tensor_scale_kernel.cpp1.ii".
error: command '/usr/local/cuda-9.0/bin/nvcc' failed with exit status 1```

Sorry, I don not know why。My enviroment is CUDA10.1,GCC7.3。And I get APEX from @mdsmith-cim 。Maybe You can ask him。

@chengruizhe I've never seen this error before. Does it point to a particular line in the file?

@chengruizhe @mcarilli
Could it be related to the gcc version?
Based on this information e.g. Ubuntu 16.04 should use GCC 5.3.1 for CUDA9.0.

AttributeError: 'AmpState' object has no attribute 'opt_properties'

is there anyone got this problem? i build apex and maskrcnn-benchmark successful without any error.
my version informations are
cuda9.0, gcc 5.2, pytorch-nightly1.1 (Centos)
(i can run it successful under UbuntuOS with cuda9.0, gcc5.2, pytorch-nightly1.0.0...... )

@Tegala Are you building apex from source or are you using an older version of apex?

@Tegala Are you building apex from source or are you using an older version of apex?
Thanks for your reply!
I use the commad to huild apex:

git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

And I am sure I using the latest version of apex. this is strange...

@Tegala Are you building apex from source or are you using an older version of apex?
Thanks for your reply!
I use the commad to huild apex:

git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

And I am sure I using the latest version of apex. this is strange...

The error info outputs:

2019-04-30 06:11:31,877 maskrcnn_benchmark.trainer INFO: Start training
Traceback (most recent call last):
  File "tools/train_net.py", line 177, in <module>
    main()
  File "tools/train_net.py", line 170, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 76, in train
    arguments,
  File "/home/hjz/projects/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 79, in do_train
    with amp.scale_loss(losses, optimizer) as scaled_losses:
  File "/home/hjz/perl5/anaconda3/envs/deephaj-env/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/home/hjz/perl5/anaconda3/envs/deephaj-env/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/handle.py", line 78, in scale_loss
    if not _amp_state.opt_properties.enabled:
AttributeError: 'AmpState' object has no attribute 'opt_properties'

Thanks for the information!
I'm trying to reproduce this issue. CC @mcarilli

@Tegala Amp requires that

model, optimizer = amp.initialize(model, optimizer, opt_level=...)

be called before any invocation of

with amp.scale_loss(losses, optimizer) as scaled_loss:

.

If your code is somehow invoking with amp.scale_loss without ever invoking amp.initialize, the above error will result.

@Tegala Amp要求

model, optimizer = amp.initialize(model, optimizer, opt_level=...)

在任何调用之前调用

with amp.scale_loss(losses, optimizer

如果您的代码以某种方式调用with amp.scale_loss而没有调用amp.initialize,将导致上述错误。

Thanks so much!
I check it again and find that It is just like what you said, now it works!

https://github.com/SeanNaren/deepspeech.pytorch/issues/376

git clone --recursive https://github.com/NVIDIA/apex.git
cd apex && pip install .

This worked for me.

@aashokvardhan pip install . will perform a Python-only install, which is not ideal for performance. You should install with cuda and c++ extensions via

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

and only fall back to pip install . if the extension build doesn't work.

Encountered this when using Ubuntu 18.04 | CUDA 9.0 and the default GCC/G++ in Ubuntu, version 7. The CUDA compiler is incompatible with GCC >= 6.4.

Solved it by installing GCC-5 and G++-5 ( sudo apt install gcc-5 g++-5 ), and setting them as higher priority using update alternatives:

sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 10

After this, Apex installed fine using the default instructions.

My solution is:
sudo ln -sf /usr/bin/gcc-5 /usr/local/cuda-9.0/bin/gcc
sudo ln -sf /usr/bin/g++-5 /usr/local/cuda-9.0/bin/g++

on my Ubuntu 18.04, cuda 9.0, pytorch 1.1.0, python 3.6.

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

I am trying to install apex on windows 10. I clone the apex from its repo and when I run the above command, I get this error: ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found. Do you have any idea how to resolve the issue?
python 3.6
gcc 5.3.0
torch 1.0.1

@Mahhos Could you check your current working directory for the setup.py file?

Was this page helpful?
0 / 5 - 0 ratings