Is your feature request related to a problem? Please describe.
libcudf has a unique_count function, but it only works on individual columns.
Describe the solution you'd like
I want a function that tells me the number of unique rows in a table.
So it looks like the internal unique_count function _already_ works on tables, but the top-level API only accepts a column, which is wrapped in a table.
What is driving this need (so I can prioritize)?
What is driving this need (so I can prioritize)?
This allows computing the output size (in # of rows) of groupby without requiring a special groupby API.
So it looks like the internal
unique_countfunction _already_ works on tables, but the top-level API only accepts a column, which is wrapped in a table.
table API was not required earlier. pandas Dataframe.nunique calculates _per column unique_count_.
Would simply exposing the detail API and a public API on table be enough?
table API was not required earlier. pandas Dataframe.nunique calculates _per column unique_count_.
Remember that libcudf is not Pandas.
Would simply exposing the detail API and a public API on table be enough?
Changing the existing column_view public API to work on table_view is what I suggest.
@jrhemstad do you need null_policy::INCLUDE/EXCLUDE and nan_policy::NAN_IS_NULL/NAN_IS_VALID functionality as well for this public API?
(this needs some extra code & special cases for implementing for table_view)
If they are not required, exposing detail API should be a better option.
cudf::size_type unique_count(table_view const& keys,
null_equality nulls_equal = null_equality::EQUAL)
@jrhemstad do you need
null_policy::INCLUDE/EXCLUDEandnan_policy::NAN_IS_NULL/NAN_IS_VALIDfunctionality as well for this public API?
(this needs some extra code & special cases for implementing fortable_view)If they are not required, exposing detail API should be a better option.
cudf::size_type unique_count(table_view const& keys, null_equality nulls_equal = null_equality::EQUAL)
We need whatever options are required such that unique_count can be used to compute the output size (in number of rows) of any groupby operation.
Most helpful comment
So it looks like the internal
unique_countfunction _already_ works on tables, but the top-level API only accepts a column, which is wrapped in a table.