I have installed NCCL2 and OpenMPI but am unable to install Horovod on a GPU. As specified in the documentation I am using the following command:
HOROVOD_NCCL_HOME=/usr/local/nccl_2.0.5-3+cuda8.0_amd64 HOROVOD_GPU_ALLREDUCE=NCCL pip install --no-cache-dir horovod
This results in the following output:
Collecting horovod
Downloading horovod-0.9.10.tar.gz (64kB)
100% |████████████████████████████████| 71kB 510kB/s
Installing collected packages: horovod
Running setup.py install for horovod ... error
Complete output from command /opt/anaconda2/envs/tensorflow/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-61FZkn/horovod/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('rn', 'n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-ItuZD1-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/horovod
copying horovod/__init__.py -> build/lib.linux-x86_64-2.7/horovod
creating build/lib.linux-x86_64-2.7/horovod/tensorflow
copying horovod/tensorflow/__init__.py -> build/lib.linux-x86_64-2.7/horovod/tensorflow
copying horovod/tensorflow/mpi_ops.py -> build/lib.linux-x86_64-2.7/horovod/tensorflow
copying horovod/tensorflow/mpi_ops_test.py -> build/lib.linux-x86_64-2.7/horovod/tensorflow
running build_ext
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -std=c++11 -I/opt/anaconda2/envs/tensorflow/include/python2.7 -c build/temp.linux-x86_64-2.7/test_compile/test_tensorflow_libs.cc -o build/temp.linux-x86_64-2.7/test_compile/test_tensorflow_libs.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
gcc -pthread -shared -L/opt/anaconda2/envs/tensorflow/lib -Wl,-rpath=/opt/anaconda2/envs/tensorflow/lib,--no-as-needed build/temp.linux-x86_64-2.7/test_compile/test_tensorflow_libs.o -L/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/core -L/opt/anaconda2/envs/tensorflow/lib -ltensorflow_framework -o build/temp.linux-x86_64-2.7/test_compile/test_tensorflow_libs.so
/bin/ld: cannot find -ltensorflow_framework
collect2: error: ld returned 1 exit status
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -std=c++11 -I/opt/anaconda2/envs/tensorflow/include/python2.7 -c build/temp.linux-x86_64-2.7/test_compile/test_tensorflow_libs.cc -o build/temp.linux-x86_64-2.7/test_compile/test_tensorflow_libs.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
gcc -pthread -shared -L/opt/anaconda2/envs/tensorflow/lib -Wl,-rpath=/opt/anaconda2/envs/tensorflow/lib,--no-as-needed build/temp.linux-x86_64-2.7/test_compile/test_tensorflow_libs.o -L/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/core -L/opt/anaconda2/envs/tensorflow/lib -o build/temp.linux-x86_64-2.7/test_compile/test_tensorflow_libs.so
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -I/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include -I/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include/external/nsync/public -I/opt/anaconda2/envs/tensorflow/include/python2.7 -c build/temp.linux-x86_64-2.7/test_compile/test_tensorflow_abi.cc -o build/temp.linux-x86_64-2.7/test_compile/test_tensorflow_abi.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
gcc -pthread -shared -L/opt/anaconda2/envs/tensorflow/lib -Wl,-rpath=/opt/anaconda2/envs/tensorflow/lib,--no-as-needed build/temp.linux-x86_64-2.7/test_compile/test_tensorflow_abi.o -L/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/core -L/opt/anaconda2/envs/tensorflow/lib -o build/temp.linux-x86_64-2.7/test_compile/test_tensorflow_abi.so
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -std=c++11 -I/usr/local/cuda/include -I/opt/anaconda2/envs/tensorflow/include/python2.7 -c build/temp.linux-x86_64-2.7/test_compile/test_cuda.cc -o build/temp.linux-x86_64-2.7/test_compile/test_cuda.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
gcc -pthread -shared -L/opt/anaconda2/envs/tensorflow/lib -Wl,-rpath=/opt/anaconda2/envs/tensorflow/lib,--no-as-needed build/temp.linux-x86_64-2.7/test_compile/test_cuda.o -L/usr/local/cuda/lib -L/usr/local/cuda/lib64 -L/opt/anaconda2/envs/tensorflow/lib -lcudart -o build/temp.linux-x86_64-2.7/test_compile/test_cuda.so
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -std=c++11 -I/usr/local/nccl_2.0.5-3+cuda8.0_amd64/include -I/usr/local/cuda/include -I/opt/anaconda2/envs/tensorflow/include/python2.7 -c build/temp.linux-x86_64-2.7/test_compile/test_nccl.cc -o build/temp.linux-x86_64-2.7/test_compile/test_nccl.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
gcc -pthread -shared -L/opt/anaconda2/envs/tensorflow/lib -Wl,-rpath=/opt/anaconda2/envs/tensorflow/lib,--no-as-needed build/temp.linux-x86_64-2.7/test_compile/test_nccl.o -L/usr/local/nccl_2.0.5-3+cuda8.0_amd64/lib -L/usr/local/nccl_2.0.5-3+cuda8.0_amd64/lib64 -L/usr/local/cuda/lib -L/usr/local/cuda/lib64 -L/opt/anaconda2/envs/tensorflow/lib -lnccl -o build/temp.linux-x86_64-2.7/test_compile/test_nccl.so
building 'horovod.tensorflow.mpi_lib' extension
creating build/temp.linux-x86_64-2.7/horovod
creating build/temp.linux-x86_64-2.7/horovod/tensorflow
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DHAVE_CUDA=1 -DHAVE_NCCL=1 -DHOROVOD_GPU_ALLREDUCE='N' -I/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include -I/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include/external/nsync/public -I/usr/local/cuda/include -I/usr/local/nccl_2.0.5-3+cuda8.0_amd64/include -I/opt/anaconda2/envs/tensorflow/include/python2.7 -c horovod/tensorflow/mpi_message.cc -o build/temp.linux-x86_64-2.7/horovod/tensorflow/mpi_message.o -std=c++11 -fPIC -O2 -I/usr/local/openmpi/include -pthread -Wl,-rpath -Wl,/usr/local/openmpi/lib -Wl,--enable-new-dtags -L/usr/local/openmpi/lib -lmpi_cxx -lmpi -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DHAVE_CUDA=1 -DHAVE_NCCL=1 -DHOROVOD_GPU_ALLREDUCE='N' -I/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include -I/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include/external/nsync/public -I/usr/local/cuda/include -I/usr/local/nccl_2.0.5-3+cuda8.0_amd64/include -I/opt/anaconda2/envs/tensorflow/include/python2.7 -c horovod/tensorflow/mpi_ops.cc -o build/temp.linux-x86_64-2.7/horovod/tensorflow/mpi_ops.o -std=c++11 -fPIC -O2 -I/usr/local/openmpi/include -pthread -Wl,-rpath -Wl,/usr/local/openmpi/lib -Wl,--enable-new-dtags -L/usr/local/openmpi/lib -lmpi_cxx -lmpi -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
horovod/tensorflow/mpi_ops.cc: In function ‘int horovod::tensorflow::{anonymous}::GetDeviceID(tensorflow::OpKernelContext)’:
horovod/tensorflow/mpi_ops.cc:1723:63: error: ‘const struct tensorflow::DeviceBase::GpuDeviceInfo’ has no member named ‘gpu_id’
device = context->device()->tensorflow_gpu_device_info()->gpu_id;
^
In file included from /opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include/tensorflow/core/framework/op_kernel.h:22:0,
from horovod/tensorflow/mpi_ops.cc:22:
/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include/tensorflow/core/framework/allocator.h: In member function ‘virtual std::size_t tensorflow::Allocator::RequestedSize(void)’:
/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include/tensorflow/core/framework/allocator.h:155:3: warning: control reaches end of non-void function [-Wreturn-type]
}
^
In file included from /opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include/tensorflow/core/framework/op_kernel.h:25:0,
from horovod/tensorflow/mpi_ops.cc:22:
/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include/tensorflow/core/framework/device_base.h: In member function ‘virtual tensorflow::Allocator* tensorflow::DeviceBase::GetAllocator(tensorflow::AllocatorAttributes)’:
/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include/tensorflow/core/framework/device_base.h:152:3: warning: control reaches end of non-void function [-Wreturn-type]
}
^
/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include/tensorflow/core/framework/device_base.h: In member function ‘virtual const tensorflow::DeviceAttributes& tensorflow::DeviceBase::attributes() const’:
/opt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/include/tensorflow/core/framework/device_base.h:183:3: warning: control reaches end of non-void function [-Wreturn-type]
}
^
error: command 'gcc' failed with exit status 1
----------------------------------------
Command "/opt/anaconda2/envs/tensorflow/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-61FZkn/horovod/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('rn', 'n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-ItuZD1-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-61FZkn/horovod/
How do I address these errors? Do I also need to make any changes to handle the warnings which state "valid for C/ObjC but not for C++"?
@dhaners, this is the real error:
horovod/tensorflow/mpi_ops.cc:1723:63: error: ‘const struct tensorflow::DeviceBase::GpuDeviceInfo’ has no member named ‘gpu_id’
Can you upgrade your TensorFlow to 1.1.0+?
I've upgraded tensorflow to 1.3.0 and now it installed successfully. Thanks for the help and for the quick response!
Closing this issue. Feel free to reopen if you have more questions.
Most helpful comment
I've upgraded tensorflow to 1.3.0 and now it installed successfully. Thanks for the help and for the quick response!