Cudf: [FEA] Port contents of functions.h to new cudf::column and associated types

Created on 3 Oct 2019  路  10Comments  路  Source: rapidsai/cudf

Is your feature request related to a problem? Please describe.

Now that #2207 is merged we need to start porting functionality to use it. This issue covers functionality defined in cudf/functions.h.

Describe the solution you'd like

Specifically:

  • [x] gdf_order_by (already implemented in #2207, so this is simply a header/namespace change)
  • [x] gdf_context_view (not sure what to do with this -- need to look at usage
    Also, functionality in functions.h largely uses the legacy "gdf" file structure and
  • [x] gdf_valid_allocation_size and gdf_num_bitmask_elements won't be needed once the transition is complete, so for now they should just move to a header in the legacy folder and legacy namespace.
  • [x] nvtx functions: useful to keep, should just be updated to throw exceptions and moved to their own header and in the cudf::utility namespace or similar.
  • [x] Error handling functions gdf_error_get_name etc.
  • [x] gdf_extract_datetime_second and similar function may need to be handled in a separate refactor of date time functions.
  • [x] gdf_digitize
  • [x] gdf_transpose
  • [x] gdf_hash
  • [x] gdf_hash_partition
  • [x] gdf_to_dlpack
  • [x] gdf_from_dlpack

Not all of the above needs to be done in a single PR. In fact, I recommend grouping it logically into multiple PRs to make reviewing and merging easy (and to enable collaboration).

Spark feature request libcudf tech debt

Most helpful comment

All PRs merged. Closing!

All 10 comments

gdf_order_by can be removed entirely.

gdf_error_get_name and error handling functions as well as gdf_context_view can/should just be removed all together.

gdf_valid_allocation_size and gdf_num_bitmask_elements are superseded by https://github.com/rapidsai/cudf/blob/branch-0.10/cpp/include/cudf/null_mask.hpp#L47

I started on this today

Please follow the two-step approach (Start with a PR that only moves files to legacy and points include paths at them)

Why are the gdf_nvtx_range_* functions declared in functions.h instead of nvtx_utils.h? These don't have anything to do with columns, but I could update them to CUDF_EXPECTS instead of returning gdf_error.
EDIT: Oh, I missed this at the bottom of your list, Mark. Will do just that.

Why are the gdf_nvtx_range_* functions declared in functions.h instead of nvtx_utils.h?

Because _all_ external APIs used to be defined in functions.h :)

datetime_ops functions are covered by https://github.com/rapidsai/cudf/pull/3201

digitize is being deprecated by upper_bound and lower_bound: https://github.com/rapidsai/cudf/issues/3258

All PRs merged. Closing!

Congrats and great work @trevorsm7, this was a tall order.

Was this page helpful?
0 / 5 - 0 ratings