Describe the bug
Building v0.9.0 from source fails as some dependencies are missing or not fetched.
Steps/Code to reproduce bug
v0.9.0.bash build.sh libcudf-- RMM: RMM_LIBRARY set to RMM_LIBRARY-NOTFOUND
-- RMM: RMM_INCLUDE set to RMM_INCLUDE-NOTFOUND
-- DLPACK: DLPACK_INCLUDE set to DLPACK_INCLUDE-NOTFOUND
-- NVSTRINGS: NVSTRINGS_INCLUDE set to NVSTRINGS_INCLUDE-NOTFOUND
-- NVSTRINGS: NVSTRINGS_LIBRARY set to NVSTRINGS_LIBRARY-NOTFOUND
-- NVSTRINGS: NVCATEGORY_LIBRARY set to NVCATEGORY_LIBRARY-NOTFOUND
-- NVSTRINGS: NVTEXT_LIBRARY set to NVTEXT_LIBRARY-NOTFOUND
Expected behavior
Build succeeds without missing deps.
Environment overview (please complete the following information)
Additional context
dlpack, rmm or nvstrings as dependencies.RMM READMERMM currently must be built from source. This happens automatically in a submodule when you build or install cuDF or RAPIDS containers.
Users should then expect that theses deps should be automatically pulled.
Hello @ccoulombe ,
I just installed CuDF without conda and as you said dfpack and rmm are missing.
Here's my method to get it solved:
# Download dlpack from conda repo
mkdir /src/dlpack
cd /src/dlpack
wget https://anaconda.org/conda-forge/dlpack/0.2/download/linux-64/dlpack-0.2-he1b5a44_1.tar.bz2
tar xf dlpack-0.2-he1b5a44_1.tar.bz2
ln -s /src/dlpack $CUDF_HOME/cpp/thirdparty/dlpack
# Download dlpack from conda repo
mkdir /src/librmm
cd /src/librmm
wget https://anaconda.org/rapidsai-nightly/librmm/0.10.0a190925/download/linux-64/librmm-0.10.0a190925-cuda10.1_51.tar.bz2
tar xf librmm-0.10.0a190925-cuda10.1_51.tar.bz2
# Now run compilation
export DLPACK_ROOT=/src/dlpack
export RMM_ROOT=/src/librmm
cd $CUDF_HOME/cpp/build/
cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local/lib/cudf/ -DCMAKE_CXX11_ABI=ON -DRMM_INCLUDE=/src/librmm/include/ -DDLPACK_INCLUDE=/src/dlpack/include/
make -j
make install
It works with:
The real solution would be to add the deps as submodules as it is already done for a few deps.
I ended up building rmm manually, pointing cmake to the rmm include folder and lib, likewise for dlpack
# RMM
git clone --recurse-submodules https://github.com/rapidsai/rmm.git -b branch-0.10 && cd rmm
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=..
make -j
make install
cd .. && export RMM_ROOT=$PWD
# DLPACK
git clone https://github.com/dmlc/dlpack.git
export DLPACK_ROOT=$PWD/dlpack
# CUDF
cmake .. -DCMAKE_CXX11_ABI=ON -DRMM_INCLUDE=$RMM_ROOT/include/ -DRMM_LIBRARY=$RMM_ROOT/lib/librmm.so -DDLPACK_INCLUDE=$DLPACK_ROOT/include
But then it fails with a truncated option nvcc fatal : Unknown option 'Wl,--whole-archive'. The option is -Wl,--whole-archive. Seems related to cmake...still investigating.
You should be able to install dlpack similar to RMM (if you want) and then both will be found as part of the system prefix. The --whole-archive option I believe is from building the Arrow CUDA extensions as a static lib and linking all of the symbols into libcudf (for use by Cython). If you have a log from the build failure I may be able to better help troubleshoot this.
The real solution would be to add the deps as submodules as it is already done for a few deps.
We currently only do that for header only libraries and the rest of our dependencies are shared libraries that are dynamically linked. We don't want to have to build all of these shared libraries dependencies as part of the cuDF build so we expect them to either be installed somewhere in the search path or pointed to via the CMake variables / use LD_LIBRARY_PATH.
We absolutely need to improve our docs regarding these dependencies, if you'd like to put in a pull request adding this it would be greatly appreciated.
@kkraus14 Thanks, I'll give a deeper look.
Regarding your second comment. I totally agree, the deps then need to be clearly outline in the README.md.
Yes, I can certainly make such a PR!
I'm sorry... I chocked a little at
wget https://anaconda.org/rapidsai-nightly/...
Guys... we are trying to do research here, which means downloading stable release versions of the dependencies that are needed. We download source tarballs, and keep their hash to make sure they don't change.
Can't those be documented and findable ?
Git clone submodules is not a good solution either! Those are not reproducible.
It was also pretty easy to just
conda install -y dlpack -c conda-forge
if you're using a conda-based environment as well. But yes, some better documentation here on building from source would be helpful.
Yeah, no conda on our clusters. conda creates more problems than it solves. https://docs.computecanada.ca/wiki/Anaconda/en
I don't know who thought was a good idea to enable conda to install things such as OpenMPI, GCC, Cuda or R...
@mboisson: Yeah, I'm not a huge conda fan either. It's a quick fix for users on our systems though. And when all of these data science / machine learning packages rely on it, it's difficult to avoid sometimes. For example, even PyTorch documentation on building from source strongly recommends using conda for certain packages. So I do. Doesn't mean you have to use conda for GCC, OpenMPI, or CUDA though [1].
[1]
P.S. Any chance you guys would let some Americans try out your cvmfs repo? ;-)
@mboisson: Yeah, I'm not a huge conda fan either. It's a quick fix for users on our systems though. And when all of these data science / machine learning packages rely on it, it's difficult to avoid sometimes. For example, even PyTorch documentation on building from source strongly recommends using conda for certain packages. So I do. Doesn't mean you have to use conda for GCC, OpenMPI, or CUDA though [1].
No, you don't.... but as long as you give users enough rope to hang themselves... they will hang themselves.
P.S. Any chance you guys would let some Americans try out your cvmfs repo? ;-)
See https://docs.computecanada.ca/wiki/Accessing_CVMFS (you can contact us if you have questions).
For what it's worth, trying to get a manylinux compliant pip wheel of cudf is basically impossible. We currently link against: libcuda, libcudart, libnvrtc, librmm, libNVCategory, libNVStrings, libarrow, libboost_filesystem, and libpthread along with a few other standard libraries. All of the above libraries are not whitelisted by manylinux which means we'd either need to static link them, or ship them in the wheel. Note that there is no static library for libcuda or libnvrtc.
This basically leaves us with the option of moving to either dynamically load or ship in the wheel the libcuda / libnvrtc libraries, and static link most of the other libraries. This will still end up blowing up our package size larger than can be uploaded to pypi and the best we could do is provide a .whl file that could be downloaded via a URL.
This would be a huge maintenance burden on us whereas producing conda packages is basically entirely straightforward since we can just build native code packages that other packages can depend on nicely.
@mboisson: Ha. You can't save them all, especially the ones still using python2.
@kkraus14 I understand providing a wheel might not be straightforward.
I remember that PyPI increased their limit to accomodate Pytorch wheel size. Pytorch is now a fat wheel of ~700MB.
Still, providing a wheel from an alternative source would be a good alternative, like Nvidia DALI does or detectron2.
But building from source should be relatively easy, and well documented. Then, users can build themselves when conda is not an option. This ensure a coherent build with the libs on the system.
For what it's worth, trying to get a manylinux compliant pip wheel of cudf is basically impossible. We currently link against:
libcuda,libcudart,libnvrtc,librmm,libNVCategory,libNVStrings,libarrow,libboost_filesystem, andlibpthreadalong with a few other standard libraries. All of the above libraries are not whitelisted by manylinux which means we'd either need to static link them, or ship them in the wheel. Note that there is no static library forlibcudaorlibnvrtc.This basically leaves us with the option of moving to either dynamically load or ship in the wheel the
libcuda/libnvrtclibraries, and static link most of the other libraries. This will still end up blowing up our package size larger than can be uploaded to pypi and the best we could do is provide a.whlfile that could be downloaded via a URL.This would be a huge maintenance burden on us whereas producing conda packages is basically entirely straightforward since we can just build native code packages that other packages can depend on nicely.
Or just make it easy to compile from source and link against libraries installed on the system. We don't typically use manylinux wheels either. Compiling from source ensures everything is coherent.
@mboisson: Ha. You can't save them all, especially the ones still using python2.
We still have python2 on our clusters. We have no reason to remove old software, Python 2 is not the first, nor the last old software we have available.
We will improve the documentation as well as streamline the cmake to make building from source as easy and painless as possible.
Great :) !
Update for v0.13.
nvstrings with build.sh libnvstrings nvstrings -v fails as -- NVSTRINGS: NVSTRINGS_LIBRARY set to NVSTRINGS_LIBRARY-NOTFOUND
-- NVSTRINGS: NVCATEGORY_LIBRARY set to NVCATEGORY_LIBRARY-NOTFOUND
-- NVSTRINGS: NVTEXT_LIBRARY set to NVTEXT_LIBRARY-NOTFOUND
I had to export NVSTRINGS_ROOT=$INSTALL_PREFIX before building in order for nvstrings to be found.
https://github.com/rapidsai/cudf/blob/be6b00b1cae8cdda838257b30ba8a085b07d4238/python/nvstrings/cpp/CMakeLists.txt#L111
Then I am getting a bunch of error: ‘PyUnicode_AsUTF8’ was not declared in this scope. Investigating this...
cudf with build.sh libcudf cudf -v fails as many pxd files are not found yielding many errors cythonizing.Exemples :
cudf/_libxx/cpp/column/column.pxd:7:0: 'rmm/_lib/device_buffer.pxd' not found
cudf/_libxx/move.pxd:21:0: 'pyarrow/includes/libarrow/CMessageReader.pxd' not found
cudf/_libxx/transform.pyx:50:33: unknown type in template argument
Haven't look further for theses issues related to pxd yet.
If you have any suggestions, thanks
NVStrings is in process of being deprecated so those problems will go away with it being dropped 😅. But it sounds like it's likely finding either the incorrect Python version from somewhere or you have a Python version installed we don't support. We generally only build and test for Python 3.6 and 3.7 currently where 3.8 is in progress of being spun up.
The Cython .pxd file errors seem to be from missing an installation of the rmm Python library and the pyarrow Python library. Both of those are Cython headers shipped as parts of those packages.
cc @mt-jones as things to try to capture in better install docs 😄
Ah! (for NVStrings)
I built with python 3.7. Have not investiguated further yet.
The pxd files exists in rmm
~ $ find rmm -name "*.pxd"
/home/coulombc/rmm/python/rmm/_lib/__init__.pxd
/home/coulombc/rmm/python/rmm/_lib/device_buffer.pxd
/home/coulombc/rmm/python/rmm/_lib/device_pointer.pxd
/home/coulombc/rmm/python/rmm/_lib/lib.pxd
As do for pyarrow:
~ $ find cudf/cpp/build/arrow/ -name "*.pxd"
cudf/cpp/build/arrow/arrow/python/pyarrow/__init__.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/_cuda.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/_orc.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/_parquet.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/__init__.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/common.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/libarrow.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/libarrow_cuda.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/libarrow_flight.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/libarrow_fs.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/libgandiva.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/libplasma.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/lib.pxd
Looks like it need some hinting at finding thoses.
How did you install those libraries? Generally my process is:
python setup.py build_ext --inplace
pip install -e .
Before building:
module load gcc/7.3 cuda/10.1 python/3.7.4 cmake/3.16 boost
virtualenv ~/.envs/cudf && source ~/.envs/cudf/bin/activate
pip install cmake_setuptools
For RMM, I used :
git clone --recurse-submodules https://github.com/rapidsai/rmm.git -b branch-0.13 && cd rmm
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=..
make -j
make install
cd .. && export RMM_ROOT=$PWD
Equivalent for DLPACK.
For cudf, I used the build script from the repo as I wanted to ensure everything was building correctly first (instead of manually setting cmake).
git clone --recursive https://github.com/rapidsai/cudf.git --branch branch-0.13 && cd cudf
export INSTALL_PREFIX=$PWD
export NVSTRINGS_ROOT=$INSTALL_PREFIX
PARALLEL_LEVEL=16 bash build.sh -v |& tee build.out
Then it fails with the PyUnicode_AsUTF8 errors when building.
Edit : the nvstring cmake find the system python 2.7 instead of the python 3.7. I'll see if I can succeed by manually building the parts then.
Edit : the nvstring cmake find the system python 2.7 instead of the python 3.7. I'll see if I can succeed by manually building the parts then.
That was what I was expecting. The CMake isn't really set up to find a virtualenv Python library and I'm not sure the cmake for the pyniNV* libraries allows you to easily override where to search for the Python libraries.
Is this still an issue?
The NVStrings parts are no longer an issue, but the RMM handling is still an issue but is being handled in #6350
Last time I checked, yes. I tried to manipulate the cmake cache in between to help find the correct python, without success. I'll find some time to test again with 0.15...
Arrow is a dependency that is not automatically pulled now (it was previously), which version should be used? It does not build with the latest Arrow (v1.0.0). But it does with 0.17.0.
Referencing https://github.com/rapidsai/rmm/issues/584 which does not help building cudf from source. The steps I used to build rmm v0.15.0 from source are documented.
After a lot of love, I finally succeeded in building cudf v0.15.0! :tada: Mainly the issues to build from source that are still present are :
Arrow is a dependency that is not automatically pulled now (it was previously), which version should be used?
For cudf 0.15 Arrow 0.17 is required. For cudf 0.16 Arrow 1.0.1 is required. We're actively working on cleaning up the CMake to make this handled much smoother and we'll make sure to update the docs in this process. cc @millerhooks
Great, it is nice though that I could use the locally installed Arrow instead of fetching and rebuilding it.
Great, it is nice though that I could use the locally installed Arrow instead of fetching and rebuilding it.
That will be handled. In general we're going to be using CPMFindPackage which first calls into find_package and if it can't find the package then it goes and fetches it itself. If Arrow or another dependency isn't installed in a standard location, you can set a variable like Arrow_ROOT to hint cmake at the path to look for the dependency.
Most helpful comment
Hello @ccoulombe ,
I just installed CuDF without conda and as you said dfpack and rmm are missing.
Here's my method to get it solved:
It works with: