Is your feature request related to a problem? Please describe.
I would like to be able to cast a column of elements from one type to another compatible type without having to materialize a deep copy. E.g., casting a timestamp_D to int32.
Currently, libcudf's cast APIs require materializing a deep copy of the input column.
Describe the solution you'd like
libcudf could provide a logical_cast API like so:
column_view logical_cast(column_view const& c, data_type d);
This function should preform type safety checking to make sure the cast is valid, e.g., casting a FLOAT64 column to INT32 wouldn't be valid for a logical_cast.
Describe alternatives you've considered
This is purely an optimization and does not enable any new functionality. So a deep copy cast is fine, just slower.
Talking with @trevorsm7 @codereport @harrism, this would also be useful for the fixed_point/decimal32 type in situations where you want to operate on the underlying integer data and not have to go through the fixed_point type machinery, e.g., in sorting.
Based on the above, making this a priority for 0.15 release in service of fixed-point columns. Assigning to @codereport and @trevorsm7 for now.
I've opened a PR for the changes in the libcudf API. Does this need to be exposed in the Python cudf API?
I've opened a PR for the changes in the libcudf API. Does this need to be exposed in the Python cudf API?
@kkraus14 @shwina I suspect this would be an implementation detail of the cuDF Python casting interface. I'm not sure when a cast in Pandas is zero copy vs. deep copy.
Pandas does a deep copy by default, but we do a shallow copy by default it looks like (https://github.com/rapidsai/cudf/blob/branch-0.15/python/cudf/cudf/core/series.py#L1867)
Pandas does a deep copy by default, but we do a shallow copy by default it looks like (https://github.com/rapidsai/cudf/blob/branch-0.15/python/cudf/cudf/core/series.py#L1867)
How is the shallow copy casting being done today? Just manually?
Pandas does a deep copy by default, but we do a shallow copy by default it looks like (https://github.com/rapidsai/cudf/blob/branch-0.15/python/cudf/cudf/core/series.py#L1867)
We decided on this behavior for performance reasons IIRC.
How is the shallow copy casting being done today? Just manually?
Yea, just logical Python operations of changing the dtype wrapping the Buffer.
Yea, just logical Python operations of changing the dtype wrapping the Buffer.
In that case you probably don't care about using this logical cast, unless you want to defer type/safety checking to the C++ function.
In that case you probably don't care about using this logical cast, unless you want to defer type/safety checking to the C++ function.
Would the is_logically_castable function be more useful then?
Most helpful comment
Talking with @trevorsm7 @codereport @harrism, this would also be useful for the fixed_point/decimal32 type in situations where you want to operate on the underlying integer data and not have to go through the
fixed_pointtype machinery, e.g., in sorting.