Is your feature request related to a problem? Please describe.
I'd like to request a libcudf API with which we could obtain a Scalar out of a given input Column and an index.
Describe alternatives you've considered
Currently, since nvstrings is going away. It would be helpful to pick an element at an index in a string column if this API existed.
Additional context
Functions like these use nvstrings for now: https://github.com/rapidsai/cudf/blob/branch-0.13/python/cudf/cudf/core/column/column.py#L403-L424
cc: @jrhemstad , @kkraus14
Something like:
unique_ptr<scalar> get_element(column_view const& c, size_type index, device_memory_resource* mr);
Yup exactly. I think we can work around this for now by doing a column slice with index and index + 1 and then using the 1 length column_view as needed.
I presume for a dictionary column, we want to return a scalar that contains the key pointed to by index i? So if it's a dictionary column like this:
keys = ([1.0, 2.2, 2.5], FLOAT64)
indices = [0, 1, 1, 0, 2]
c = dict_col(keys, indices);
s = get_element(c, 2);
// s.type() is FLOAT64
// s.value() is 2.2
Yes that's correct unless we plan to have dictionary encoded Scalars in which case getting back a dictionary encoded Scalar would be even better.
Yes that's correct unless we plan to have dictionary encoded Scalars in which case getting back a dictionary encoded Scalar would be even better.
This isn't really going to be possible because there isn't a way to share dictionaries. So a dictionary encoded scalar would require deep copying the entire column of keys.
What makes sharing dictionaries hard?
What makes sharing dictionaries hard?
I believe dictionaries of cudf::Column objects are unique_ptr in libcudf and the Scalar is an owning object.
Most helpful comment
Something like: