Apex: Building with cpp and cuda causes distrbuted to crash

Created on 16 May 2019 · 7Comments · Source: NVIDIA/apex

Hi,

I can successfully build and use python only apex however when I compile with cpp and cuda, the compilation is successful but the simple example crashes with the error.

File "//anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 235, in
main()
File "//anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 231, in main cmd=process.args)

subprocess.CalledProcessError: Command '['//bin/python', '-u', 'distributed_data_parallel.py', '--local_rank=0']' died with .

or is it the case the case for pytorch > 4.0, we should not be building with cuda_ext and cpp_ext and simple python is sufficient.

Source

Msabih

Most helpful comment

Meet the same problem. After upgrading the gcc from 4.8 to 5.4 and recompiling the apex, the problem is solved.

yinxiaochuan on 11 Jul 2019

👍5

All 7 comments

What version of Pytorch are you using? Extensions should only be used with pytorch 1.0 and later. Also, are you referring to examples/simple/distributed, or something else?

mcarilli on 16 May 2019

@mcarilli
I am using 1.1.0 , yes I am talking about this example.

Msabih on 16 May 2019

I tried to reproduce this by installing Apex in the official 1.1 container (pytorch/pytorch:1.1.0-cuda10.0-cudnn7.5-devel) from Pytorch dockerhub. Install via

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

succeeded and

# cd apex/examples/simple/distributed
# bash run.sh

ran without any problems.

Several pieces of information might help with further debugging:

Does your backtrace show any additional detail?
Can you report the output of nvidia-smi and nvcc -V in your environment?
Does the same error occur if you comment out Apex DDP, uncomment to use Torch DDP, and try bash run.sh again?
Does the same error also occur if you run the script in a single-process way (simply python distributed_data_parallel.py instead of bash run.sh)?

mcarilli on 24 May 2019

Meet the same problem. After upgrading the gcc from 4.8 to 5.4 and recompiling the apex, the problem is solved.

yinxiaochuan on 11 Jul 2019

👍5

in my case, upgrade gcc from 4.8 to 5.4 and installing cuda10 version torch (https://pytorch.org/get-started/locally/#cuda-100-1)
, the problem is solved

parksunwoo on 22 Sep 2019

@yinxiaochuan @parksunwoo Did you guys upgrade gcc from 4.8 to 5.4 on an anaconda environment? Could you please share the steps you followed?

I'm using an AWS EC2 instance with the pytorch_p36 conda environment. I see that my current gcc version is 4.8.5.

cc @mcarilli

g-karthik on 26 Jan 2020

Update for anyone reading this thread, here are the commands I followed to set up gcc 5.4:

wget http://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/gcc-5.4.0/gcc-5.4.0.tar.gz 
tar zxf gcc-5.4.0.tar.gz
cd gcc-5.4.0
yum -y install bzip2
./contrib/download_prerequisites
./configure --disable-multilib --enable-languages=c,c++
make -j 4
make install

I recommend creating a screen and running all these commands within that screen, because the second-last command make -j 4 takes a long time.

After doing the above, I rebuilt the apex library in my pytorch_p36 conda env by running the following command:

pip install -v --no-cache-dir --global-option="--pyprof" --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Then I tried running the run.sh script that @mcarilli was referring to and got the following output:

Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
final loss =  tensor(0.5258, device='cuda:0', grad_fn=<MseLossBackward>)

g-karthik on 27 Jan 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings