Cudf: [FEA] Expose libcudf C++ internal code (thirdparty headers, common utilities, testing tools)

Created on 18 Jun 2020  路  18Comments  路  Source: rapidsai/cudf

Is your feature request related to a problem? Please describe.
When we build BlazingSQL code we need to use:

  • Header only dependencies (your 3rdparty stuff): libcudacxx, etc
  • Internal libcudf common/utility C++ code related to unit tests
    We wish:
  • You provide a conda package for your 3rdparty libs (specially for libcudacxx and cub) so any C++ project that builds using cudf doesn't have build issues.
  • You expose your internal common/utility code under an experimental namespace as part of you libcudf release

Describe the solution you'd like

  • 1 new rapids conda artifact: libcudf thirdparty
  • You libcudf package should have all the necessary c++ headers to expose you internal code as a public API (under some sort of experimental namespace)

Describe alternatives you've considered

  • We have been creating a conda package that has your libcudacxx and cub third party but now we are getting those headers as part of our gpuci process
  • For the internal code, we have been copying (literally) your code from cudf to our internal code base, and we do this every time your internal code has changes.

Additional context

conda feature request libcudf

Most helpful comment

But i still need the headers for the files in cudf/cpp/tests/utilities/

@jrhemstad should we expose / ship those headers? Otherwise people will have to clone cudf in order to get them.

We found it in include/libcudf/ we are not checking if we going to need cub there?

cub REALLY shouldn't be exposed by any libcudf APIs that you'd need it to link against libcudf. If you're seeing otherwise please let us know.

All 18 comments

@williamBlazing @rommelDB

You expose your internal common/utility code under an experimental namespace as part of you libcudf release

Exactly what internal code are we talking about here? The whole point of internal code is that its... internal :)

We use a lot of the stuff that is in cudf/cpp/tests/utilities/ for our unit tests and also in our regular code

I regular code we use:
column_wrapper.hpp functions like fixed_width_column_wrapper to create cudf_columns from std::vectors, we also use make_counting_transform_iterator
column_utilities.hpp functions like to_host to take gpu column data and put it into a std::vector and to_string for debugging

Perhaps you can suggest other things to use instead.

For unit tests we use a lot of the same ones and more. I can list them all out if you really need to know, but I fell like in general a lot of those testing utility functions should be exposed so that other people like us (BlazingSQL) can more easily build unit tests for things that they build using cudf

We use a lot of the stuff that is in cudf/cpp/tests/utilities/ for our unit tests and also in our regular code

Our test utilities are already built as a static lib: https://github.com/rapidsai/cudf/blob/branch-0.15/cpp/tests/CMakeLists.txt#L27

I don't have much problem with other libs using that stuff (cuSpatial does). However, I _strongly_ advise you do not use those utilities in performance sensitive code. Test utilities are written without any consideration of performance.

Was there anything else other than test utility code?

Its just the utility code and we dont use it for anything performance sensitive

So is there still any change needed for you to use the test utility code? Can you just do what cuSpatial does?

Hi @jrhemstad thanks,

  • Header only dependencies (your 3rdparty stuff): libcudacxx, etc
    what about
  • 1 new rapids conda artifact: libcudf thirdparty

what about this requirement? Do you suggest we take the same approach of cuspatial? like here https://github.com/rapidsai/cuspatial/blob/branch-0.15/cpp/cmake/thirdparty/CMakeLists.txt ?

I'm not going to be much help with build related stuff. @kkraus14 or @mt-jones may be better able to answer that question.

All exposed third party headers are already shipped in the libcudf conda package / libcudf cmake install, minus thrust / cub (which I'm not sure are exposed or not, we've generally avoided exposing them). https://github.com/rapidsai/cudf/blob/branch-0.15/cpp/CMakeLists.txt#L740-L746

In general we want to avoid exposing the third party headers in public APIs so that you don't need those headers in order to build a library which links libcudf. If you're seeing issues from this and could point us to the functions leaking those headers it would be appreciated.

As @jrhemstad also said, the test utility static lib is also shipped as part of the libcudf conda package as well as is installed via the cmake install. https://github.com/rapidsai/cudf/blob/branch-0.15/cpp/tests/CMakeLists.txt#L38-L40

Hi @kkraus14

I just built blazing only with cudf headers and I get

/include/cudf/wrappers/durations.hpp:24:23: fatal error: simt/chrono: No such file or directory

All exposed third party headers are already shipped in the libcudf conda package / libcudf cmake >install, minus thrust / cub (which I'm not sure are exposed or not, we've generally avoided exposing >them). https://github.com/rapidsai/cudf/blob/branch-0.15/cpp/CMakeLists.txt#L740-L746

Are we sure libcudacxx is part of the libcudf package?

@kkraus14 We found it in include/libcudf/ we are not checking if we going to need cub there?

@kkraus14 @jrhemstad so, i see that
"""Our test utilities are already built as a static lib: https://github.com/rapidsai/cudf/blob/branch-0.15/cpp/tests/CMakeLists.txt#L27"""

But i still need the headers for the files in cudf/cpp/tests/utilities/

Any ideas as to how cuSpatial is handling that?

It seems like in cuSpatial they are getting those headers by cloning cudf in their build.sh
and then doing this:
set(CUDF_TEST_INCLUDE "$ENV{CUDF_HOME}/cpp")

It would be nicer if those headers were exposed

But i still need the headers for the files in cudf/cpp/tests/utilities/

@jrhemstad should we expose / ship those headers? Otherwise people will have to clone cudf in order to get them.

We found it in include/libcudf/ we are not checking if we going to need cub there?

cub REALLY shouldn't be exposed by any libcudf APIs that you'd need it to link against libcudf. If you're seeing otherwise please let us know.

I think we dont need cub exposed
For now, we will be cloning cudf to get those headers the way that cuSpatial does. We were already cloning cudf, but will change it up to look more like cuSpatial
But I do think that those test utility headers should be exposed

Let's fix this for 0.16 (a bit late for 0.15).

I started on this.

Note updated cuspatial based on this in rapidsai/cuspatial#292

Was this page helpful?
0 / 5 - 0 ratings