Cudf: [DOC] [BUG] Building from source fails as deps are not fetched

Created on 10 Sep 2019 · 29Comments · Source: rapidsai/cudf

Describe the bug
Building v0.9.0 from source fails as some dependencies are missing or not fetched.

Steps/Code to reproduce bug

git clone and checkout v0.9.0.
update submodules
bash build.sh libcudf

-- RMM: RMM_LIBRARY set to RMM_LIBRARY-NOTFOUND
-- RMM: RMM_INCLUDE set to RMM_INCLUDE-NOTFOUND
-- DLPACK: DLPACK_INCLUDE set to DLPACK_INCLUDE-NOTFOUND
-- NVSTRINGS: NVSTRINGS_INCLUDE set to NVSTRINGS_INCLUDE-NOTFOUND
-- NVSTRINGS: NVSTRINGS_LIBRARY set to NVSTRINGS_LIBRARY-NOTFOUND
-- NVSTRINGS: NVCATEGORY_LIBRARY set to NVCATEGORY_LIBRARY-NOTFOUND
-- NVSTRINGS: NVTEXT_LIBRARY set to NVTEXT_LIBRARY-NOTFOUND

Expected behavior
Build succeeds without missing deps.

Environment overview (please complete the following information)

Environment location: Centos 7, avx512
Method of cuDF install: source

Additional context

The documentation does not state dlpack, rmm or nvstrings as dependencies.
According to the RMM README

RMM currently must be built from source. This happens automatically in a submodule when you build or install cuDF or RAPIDS containers.

Users should then expect that theses deps should be automatically pulled.

bug doc

Source

ccoulombe

👍2

Most helpful comment

Hello @ccoulombe ,
I just installed CuDF without conda and as you said dfpack and rmm are missing.

Here's my method to get it solved:

# Download dlpack from conda repo
mkdir /src/dlpack
cd /src/dlpack
wget https://anaconda.org/conda-forge/dlpack/0.2/download/linux-64/dlpack-0.2-he1b5a44_1.tar.bz2
tar xf dlpack-0.2-he1b5a44_1.tar.bz2
ln -s /src/dlpack $CUDF_HOME/cpp/thirdparty/dlpack

# Download dlpack from conda repo
mkdir /src/librmm
cd /src/librmm
wget https://anaconda.org/rapidsai-nightly/librmm/0.10.0a190925/download/linux-64/librmm-0.10.0a190925-cuda10.1_51.tar.bz2
tar xf librmm-0.10.0a190925-cuda10.1_51.tar.bz2

# Now run compilation
export DLPACK_ROOT=/src/dlpack
export RMM_ROOT=/src/librmm
cd $CUDF_HOME/cpp/build/
cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local/lib/cudf/ -DCMAKE_CXX11_ABI=ON -DRMM_INCLUDE=/src/librmm/include/ -DDLPACK_INCLUDE=/src/dlpack/include/
make -j
make install

It works with:

Ubuntu 18.04
CUDA 10.1
CuDF : branch-0.10

ZuluPro on 25 Sep 2019

👍2

All 29 comments

Hello @ccoulombe ,
I just installed CuDF without conda and as you said dfpack and rmm are missing.

Here's my method to get it solved:

# Download dlpack from conda repo
mkdir /src/dlpack
cd /src/dlpack
wget https://anaconda.org/conda-forge/dlpack/0.2/download/linux-64/dlpack-0.2-he1b5a44_1.tar.bz2
tar xf dlpack-0.2-he1b5a44_1.tar.bz2
ln -s /src/dlpack $CUDF_HOME/cpp/thirdparty/dlpack

# Download dlpack from conda repo
mkdir /src/librmm
cd /src/librmm
wget https://anaconda.org/rapidsai-nightly/librmm/0.10.0a190925/download/linux-64/librmm-0.10.0a190925-cuda10.1_51.tar.bz2
tar xf librmm-0.10.0a190925-cuda10.1_51.tar.bz2

# Now run compilation
export DLPACK_ROOT=/src/dlpack
export RMM_ROOT=/src/librmm
cd $CUDF_HOME/cpp/build/
cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local/lib/cudf/ -DCMAKE_CXX11_ABI=ON -DRMM_INCLUDE=/src/librmm/include/ -DDLPACK_INCLUDE=/src/dlpack/include/
make -j
make install

It works with:

Ubuntu 18.04
CUDA 10.1
CuDF : branch-0.10

ZuluPro on 25 Sep 2019

👍2

The real solution would be to add the deps as submodules as it is already done for a few deps.

I ended up building rmm manually, pointing cmake to the rmm include folder and lib, likewise for dlpack

# RMM
git clone --recurse-submodules https://github.com/rapidsai/rmm.git -b branch-0.10 && cd rmm
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=..
make -j
make install
cd .. && export RMM_ROOT=$PWD

# DLPACK
git clone https://github.com/dmlc/dlpack.git
export DLPACK_ROOT=$PWD/dlpack

# CUDF
cmake .. -DCMAKE_CXX11_ABI=ON -DRMM_INCLUDE=$RMM_ROOT/include/ -DRMM_LIBRARY=$RMM_ROOT/lib/librmm.so -DDLPACK_INCLUDE=$DLPACK_ROOT/include

But then it fails with a truncated option nvcc fatal : Unknown option 'Wl,--whole-archive'. The option is -Wl,--whole-archive. Seems related to cmake...still investigating.

ccoulombe on 13 Nov 2019

You should be able to install dlpack similar to RMM (if you want) and then both will be found as part of the system prefix. The --whole-archive option I believe is from building the Arrow CUDA extensions as a static lib and linking all of the symbols into libcudf (for use by Cython). If you have a log from the build failure I may be able to better help troubleshoot this.

kkraus14 on 13 Nov 2019

The real solution would be to add the deps as submodules as it is already done for a few deps.

We currently only do that for header only libraries and the rest of our dependencies are shared libraries that are dynamically linked. We don't want to have to build all of these shared libraries dependencies as part of the cuDF build so we expect them to either be installed somewhere in the search path or pointed to via the CMake variables / use LD_LIBRARY_PATH.

We absolutely need to improve our docs regarding these dependencies, if you'd like to put in a pull request adding this it would be greatly appreciated.

kkraus14 on 13 Nov 2019

@kkraus14 Thanks, I'll give a deeper look.

Regarding your second comment. I totally agree, the deps then need to be clearly outline in the README.md.

Yes, I can certainly make such a PR!

ccoulombe on 13 Nov 2019

❤1

I'm sorry... I chocked a little at

wget https://anaconda.org/rapidsai-nightly/...

Guys... we are trying to do research here, which means downloading stable release versions of the dependencies that are needed. We download source tarballs, and keep their hash to make sure they don't change.

Can't those be documented and findable ?

Git clone submodules is not a good solution either! Those are not reproducible.

mboisson on 29 Nov 2019

It was also pretty easy to just

conda install -y dlpack -c conda-forge

if you're using a conda-based environment as well. But yes, some better documentation here on building from source would be helpful.

mkandes on 12 May 2020

Yeah, no conda on our clusters. conda creates more problems than it solves. https://docs.computecanada.ca/wiki/Anaconda/en

I don't know who thought was a good idea to enable conda to install things such as OpenMPI, GCC, Cuda or R...

mboisson on 12 May 2020

@mboisson: Yeah, I'm not a huge conda fan either. It's a quick fix for users on our systems though. And when all of these data science / machine learning packages rely on it, it's difficult to avoid sometimes. For example, even PyTorch documentation on building from source strongly recommends using conda for certain packages. So I do. Doesn't mean you have to use conda for GCC, OpenMPI, or CUDA though [1].

[1]

https://github.com/mkandes/naked-singularity/blob/master/definition-files/comet/pytorch/pytorch-gpu.def

P.S. Any chance you guys would let some Americans try out your cvmfs repo? ;-)

mkandes on 12 May 2020

@mboisson: Yeah, I'm not a huge conda fan either. It's a quick fix for users on our systems though. And when all of these data science / machine learning packages rely on it, it's difficult to avoid sometimes. For example, even PyTorch documentation on building from source strongly recommends using conda for certain packages. So I do. Doesn't mean you have to use conda for GCC, OpenMPI, or CUDA though [1].

No, you don't.... but as long as you give users enough rope to hang themselves... they will hang themselves.

P.S. Any chance you guys would let some Americans try out your cvmfs repo? ;-)

See https://docs.computecanada.ca/wiki/Accessing_CVMFS (you can contact us if you have questions).

mboisson on 12 May 2020

For what it's worth, trying to get a manylinux compliant pip wheel of cudf is basically impossible. We currently link against: libcuda, libcudart, libnvrtc, librmm, libNVCategory, libNVStrings, libarrow, libboost_filesystem, and libpthread along with a few other standard libraries. All of the above libraries are not whitelisted by manylinux which means we'd either need to static link them, or ship them in the wheel. Note that there is no static library for libcuda or libnvrtc.

This basically leaves us with the option of moving to either dynamically load or ship in the wheel the libcuda / libnvrtc libraries, and static link most of the other libraries. This will still end up blowing up our package size larger than can be uploaded to pypi and the best we could do is provide a .whl file that could be downloaded via a URL.

This would be a huge maintenance burden on us whereas producing conda packages is basically entirely straightforward since we can just build native code packages that other packages can depend on nicely.

kkraus14 on 13 May 2020

@mboisson: Ha. You can't save them all, especially the ones still using python2.

mkandes on 13 May 2020

@kkraus14 I understand providing a wheel might not be straightforward.
I remember that PyPI increased their limit to accomodate Pytorch wheel size. Pytorch is now a fat wheel of ~700MB.
Still, providing a wheel from an alternative source would be a good alternative, like Nvidia DALI does or detectron2.

But building from source should be relatively easy, and well documented. Then, users can build themselves when conda is not an option. This ensure a coherent build with the libs on the system.

ccoulombe on 13 May 2020

For what it's worth, trying to get a manylinux compliant pip wheel of cudf is basically impossible. We currently link against: libcuda, libcudart, libnvrtc, librmm, libNVCategory, libNVStrings, libarrow, libboost_filesystem, and libpthread along with a few other standard libraries. All of the above libraries are not whitelisted by manylinux which means we'd either need to static link them, or ship them in the wheel. Note that there is no static library for libcuda or libnvrtc.

This basically leaves us with the option of moving to either dynamically load or ship in the wheel the libcuda / libnvrtc libraries, and static link most of the other libraries. This will still end up blowing up our package size larger than can be uploaded to pypi and the best we could do is provide a .whl file that could be downloaded via a URL.

This would be a huge maintenance burden on us whereas producing conda packages is basically entirely straightforward since we can just build native code packages that other packages can depend on nicely.

Or just make it easy to compile from source and link against libraries installed on the system. We don't typically use manylinux wheels either. Compiling from source ensures everything is coherent.

mboisson on 13 May 2020

@mboisson: Ha. You can't save them all, especially the ones still using python2.

We still have python2 on our clusters. We have no reason to remove old software, Python 2 is not the first, nor the last old software we have available.

mboisson on 13 May 2020

We will improve the documentation as well as streamline the cmake to make building from source as easy and painless as possible.

kkraus14 on 13 May 2020

Great :) !

Update for v0.13.

Building nvstrings with build.sh libnvstrings nvstrings -v fails as

-- NVSTRINGS: NVSTRINGS_LIBRARY set to NVSTRINGS_LIBRARY-NOTFOUND
-- NVSTRINGS: NVCATEGORY_LIBRARY set to NVCATEGORY_LIBRARY-NOTFOUND
-- NVSTRINGS: NVTEXT_LIBRARY set to NVTEXT_LIBRARY-NOTFOUND

I had to export NVSTRINGS_ROOT=$INSTALL_PREFIX before building in order for nvstrings to be found.
https://github.com/rapidsai/cudf/blob/be6b00b1cae8cdda838257b30ba8a085b07d4238/python/nvstrings/cpp/CMakeLists.txt#L111

Then I am getting a bunch of error: ‘PyUnicode_AsUTF8’ was not declared in this scope. Investigating this...

Building cudf with build.sh libcudf cudf -v fails as many pxd files are not found yielding many errors cythonizing.

Exemples :

cudf/_libxx/cpp/column/column.pxd:7:0: 'rmm/_lib/device_buffer.pxd' not found
cudf/_libxx/move.pxd:21:0: 'pyarrow/includes/libarrow/CMessageReader.pxd' not found
cudf/_libxx/transform.pyx:50:33: unknown type in template argument

Haven't look further for theses issues related to pxd yet.

If you have any suggestions, thanks

ccoulombe on 14 May 2020

NVStrings is in process of being deprecated so those problems will go away with it being dropped 😅. But it sounds like it's likely finding either the incorrect Python version from somewhere or you have a Python version installed we don't support. We generally only build and test for Python 3.6 and 3.7 currently where 3.8 is in progress of being spun up.

The Cython .pxd file errors seem to be from missing an installation of the rmm Python library and the pyarrow Python library. Both of those are Cython headers shipped as parts of those packages.

cc @mt-jones as things to try to capture in better install docs 😄

kkraus14 on 14 May 2020

Ah! (for NVStrings)

I built with python 3.7. Have not investiguated further yet.

The pxd files exists in rmm

~ $ find rmm -name "*.pxd"
/home/coulombc/rmm/python/rmm/_lib/__init__.pxd
/home/coulombc/rmm/python/rmm/_lib/device_buffer.pxd
/home/coulombc/rmm/python/rmm/_lib/device_pointer.pxd
/home/coulombc/rmm/python/rmm/_lib/lib.pxd

As do for pyarrow:

~ $ find cudf/cpp/build/arrow/ -name "*.pxd"
cudf/cpp/build/arrow/arrow/python/pyarrow/__init__.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/_cuda.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/_orc.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/_parquet.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/__init__.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/common.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/libarrow.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/libarrow_cuda.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/libarrow_flight.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/libarrow_fs.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/libgandiva.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/includes/libplasma.pxd
cudf/cpp/build/arrow/arrow/python/pyarrow/lib.pxd

Looks like it need some hinting at finding thoses.

ccoulombe on 14 May 2020

How did you install those libraries? Generally my process is:

python setup.py build_ext --inplace
pip install -e .

kkraus14 on 14 May 2020

Before building:

module load gcc/7.3 cuda/10.1 python/3.7.4 cmake/3.16 boost
virtualenv ~/.envs/cudf && source ~/.envs/cudf/bin/activate
pip install cmake_setuptools

For RMM, I used :

git clone --recurse-submodules https://github.com/rapidsai/rmm.git -b branch-0.13 && cd rmm
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=..
make -j
make install
cd .. && export RMM_ROOT=$PWD

Equivalent for DLPACK.

For cudf, I used the build script from the repo as I wanted to ensure everything was building correctly first (instead of manually setting cmake).

git clone --recursive https://github.com/rapidsai/cudf.git --branch branch-0.13 && cd cudf
export INSTALL_PREFIX=$PWD
export NVSTRINGS_ROOT=$INSTALL_PREFIX
PARALLEL_LEVEL=16 bash build.sh -v |& tee build.out

Then it fails with the PyUnicode_AsUTF8 errors when building.

Edit : the nvstring cmake find the system python 2.7 instead of the python 3.7. I'll see if I can succeed by manually building the parts then.

ccoulombe on 19 May 2020

Edit : the nvstring cmake find the system python 2.7 instead of the python 3.7. I'll see if I can succeed by manually building the parts then.

That was what I was expecting. The CMake isn't really set up to find a virtualenv Python library and I'm not sure the cmake for the pyniNV* libraries allows you to easily override where to search for the Python libraries.

kkraus14 on 19 May 2020

Is this still an issue?

harrism on 30 Sep 2020

The NVStrings parts are no longer an issue, but the RMM handling is still an issue but is being handled in #6350

kkraus14 on 30 Sep 2020

Last time I checked, yes. I tried to manipulate the cmake cache in between to help find the correct python, without success. I'll find some time to test again with 0.15...

ccoulombe on 30 Sep 2020

Arrow is a dependency that is not automatically pulled now (it was previously), which version should be used? It does not build with the latest Arrow (v1.0.0). But it does with 0.17.0.

Referencing https://github.com/rapidsai/rmm/issues/584 which does not help building cudf from source. The steps I used to build rmm v0.15.0 from source are documented.

After a lot of love, I finally succeeded in building cudf v0.15.0! :tada: Mainly the issues to build from source that are still present are :

documenting deps and their version (e.g. arrow v0.17.0)
clarifying build instructions
finding deps (dlpack, rmm, spdlog)

ccoulombe on 30 Sep 2020

Arrow is a dependency that is not automatically pulled now (it was previously), which version should be used?

For cudf 0.15 Arrow 0.17 is required. For cudf 0.16 Arrow 1.0.1 is required. We're actively working on cleaning up the CMake to make this handled much smoother and we'll make sure to update the docs in this process. cc @millerhooks

kkraus14 on 30 Sep 2020

Great, it is nice though that I could use the locally installed Arrow instead of fetching and rebuilding it.

ccoulombe on 1 Oct 2020

Great, it is nice though that I could use the locally installed Arrow instead of fetching and rebuilding it.

That will be handled. In general we're going to be using CPMFindPackage which first calls into find_package and if it can't find the package then it goes and fetches it itself. If Arrow or another dependency isn't installed in a standard location, you can set a variable like Arrow_ROOT to hint cmake at the path to look for the dependency.

kkraus14 on 1 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[QST] Appending Dataframe Rows

saifrahmed · 3Comments

[QST] cuda-gdb - viewing rmm::device_vector data

c-jamie · 3Comments

[BUG] RunTimeError in `cudf::strings::starts_with`, `cudf::strings::ends_with` and `cudf::strings::find` when `target=''`

galipremsagar · 3Comments

[BUG] "NaT" string literal needs to be recognized as `null` in to_timestamps method

galipremsagar · 3Comments

[BUG] to_orc fails if one of the columns is a string column

ayushdg · 3Comments