If I build mxnet with NCCL using cmake, it failed with "Could not find NCCL libraries" even though my NCCL is installed at /usr/local/cuda/include
cmake -GNinja -DUSE_CUDA=ON -DCMAKE_CUDA_COMPILER_LAUNCHER=ccache -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_BUILD_TYPE=Release -DUSE_CUDNN=ON -DUSE_NCCL=ON ..
CMake Warning at CMakeLists.txt:299 (message):
Could not find NCCL libraries
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
----------Python Info----------
Version : 3.6.6
Compiler : GCC 7.2.0
Build : ('default', 'Jun 28 2018 17:14:51')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 19.3.1
Directory : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
Platform : Linux-4.4.0-1096-aws-x86_64-with-debian-stretch-sid
system : Linux
node : ip-172-31-20-50
release : 4.4.0-1096-aws
version : #107-Ubuntu SMP Thu Oct 3 01:51:58 UTC 2019
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2699.984
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4600.11
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-15,32-47
NUMA node1 CPU(s): 16-31,48-63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt ida
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0060 sec, LOAD: 0.5026 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0011 sec, LOAD: 0.5116 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.1051 sec, LOAD: 0.3917 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0108 sec, LOAD: 0.2085 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.1761 sec, LOAD: 0.1178 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1306 sec, LOAD: 0.1471 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0123 sec, LOAD: 0.4014 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0120 sec, LOAD: 0.0739 sec.
We may refactor https://github.com/apache/incubator-mxnet/blob/master/cmake/Modules/FindNCCL.cmake to improve autodetection. In the meantime see the variables used for searching. If you set one of them to your nccl base directory, it should find nccl successfully?
I experienced this too ... try using -DUSE_NCCL=1 -DUSE_NCCL_PATH=/usr/local/cuda/include (or as @leezu your NCCL path)
@mjsML Thanks, using that flag worked for me. @guanxinq or @ChaiBapchya interested in fixing FindNCCL.cmake as suggested? :)
I took a look at this auto-detection issue.
To solve this particular case, I added a check for symlink (if UNIX) - https://github.com/ChaiBapchya/incubator-mxnet/blob/nccl_autodetect/cmake/Modules/FindNCCL.cmake
If this is enough, I can submit a PR.
However, I'm not sure if it is complete. Coz I took a look at https://github.com/apache/incubator-mxnet/blob/master/cmake/Modules/FindCUDAToolkit.cmake
It has a fairly long drawn way of finding the Cuda Toolkit
Is this what's needed? @leezu @apeforest
In that case it makes sense to "factor" out this check as it will be used at 2 places (findCudatoolkit and findNCCL)
@apeforest could you provide some background if NCCL is installed at /usr/local/cuda/include by default?
@ChaiBapchya your change seems to rely on CUDA_TOOLKIT_ROOT_DIR, but this variable is not among the variables exported by FindCUDAToolkit. In fact, you can see it's explicitly unset:
https://github.com/apache/incubator-mxnet/blob/master/cmake/Modules/FindCUDAToolkit.cmake#L708
Instead, let's use the result variables
Specifically CUDAToolkit_INCLUDE_DIRS and CUDAToolkit_LIBRARY_DIR? Or would the nccl library not be at the CUDAToolkit_LIBRARY_DIR?
Besides using the CUDAToolkit variables as additional defaults to find nccl, the NCCL_ROOT variable needs to be examined as per https://cmake.org/cmake/help/latest/policy/CMP0074.html
(which is done correctly currently I think)
In DLAMI, nccl is installed by default in the cuda directory: /usr/local/cuda/include/nccl.h
However, if user installed nccl manually by themselves, sudo apt install libnccl2 libnccl-dev, you may use the sudo dpkg-query -L libnccl-dev to find where it is.
https://askubuntu.com/questions/1134732/where-is-nccl-h
I would suggest @ChaiBapchya to first search /usr/local/cuda/include/. If not found, try sudo dpkg-query -L libnccl-dev instead. Would that work?
Thanks @ChaiBapchya for volunteering to work on this!
If not found, try
sudo dpkg-query -L libnccl-devinstead.
That's would only work on Debian based platforms and only for one particular way of installing nccl on these systems. I think it's safe to require users to set NCCL_ROOT if they manually installed nccl to a different path.
To improve the user experience, we may fall-back to building nccl ourselves if nccl is required and not found. Pytorch does that for example.
Ya. Even when I looked at different autodetection files for cmake used in various other open-source frameworks
They have similar approach. Either look for default path, env var (NCCL_ROOT) or /usr/local/cuda
Agree with @leezu I haven't seen "dpkg-query" or equivalent "find" commands used in cmake. They are more of command line searches. In cmake, there's find_path, find_library which does similar job.
Thanks @apeforest @leezu for chiming in!
@ChaiBapchya BTW, unfortunately a lot of CMake usage out in the wild does not meet the modern CMake bar but is leftover from the early days of CMake. While not covering all use-cases of MXNet, sometimes we can refer to https://cliutils.gitlab.io/modern-cmake/ for best practices
Most helpful comment
That's would only work on Debian based platforms and only for one particular way of installing nccl on these systems. I think it's safe to require users to set
NCCL_ROOTif they manually installed nccl to a different path.To improve the user experience, we may fall-back to building nccl ourselves if nccl is required and not found. Pytorch does that for example.