Pandas DataFrames has DataFrame.to_csv.
Now that cuStrings supports creating strings from several numeric types, cuDF can create a GPU accelerated cudf.DataFrame.to_csv method.
Pandas's to_csv has many parameters which cuDF can implement over time in different PRs.
For an initial implementation, I suggest cudf should support the following arguments:
path_or_buf
sep
na_rep
columns
header
index
This is implemented through https://github.com/rapidsai/cudf/pull/1419.
The PR didn't have the "Fixes issue ..." in the comment so that's why this issue didn't get automatically closed.
@mjsamoht I'm going to reopen this issue and rename it to reflect Python bindings as the PR was C++ only.
@ayushdg I see you have a WIP PR. Are you going to implement this feature?
If yes, I will assign it to you. Thanks.
@mjsamoht I will be implementing it for 0.8. You can assign it to me.
@ayushdg are you still targeting this for 0.8?
@mjsamoht My work on for the pr is more or less done (few more unit tests and doc updates). I am blocked by #1541 and also waiting on discussing the timeline for #1737 and a response on rapidsai/custrings#305 before the pr (#1542) can be changed to review
Thanks, @ayushdg. It doesn't look like your dependencies will be resolved for v0.8 so we may have to push this out to next release.
@kkraus14 I believe you had a workaround in mind that would avoid the dependency on #1541 ?
@kkraus14 I believe you had a workaround in mind that would avoid the dependency on #1541 ?
I may have spoken too soon, but if the CSV writer can accept a gdf_column with dtype GDF_STRING and its data pointer pointer to an nvstrings object, then we can manually construct the gdf_column without the helper function in the Cython of this function as a workaround for now.
If the CSV writer can't accept the GDF_STRING gdf_column then we'd need to support GDF_STRING_CATEGORY per #1737.
I may have spoken too soon, but if the CSV writer can accept a
gdf_columnwith dtypeGDF_STRINGand its data pointer pointer to an nvstrings object, then we can manually construct thegdf_columnwithout the helper function in the Cython of this function as a workaround for now.
Looks like it can handle it based on https://github.com/rapidsai/cudf/blob/bddd8322fdae3dfafe481f89f4bc012856cafd9b/cpp/src/io/csv/csv_writer.cu#L46 so the workaround would be to manually create the column using the nvstrings object for now.
I can commit to implementing #1737 and fixing rapidsai/custrings#305 for 0.8 for this.
Thanks @davidwend!
I tried the workaround suggested by @kkraus14 and from my initial assessement it seemed to work fine :D
@mjsamoht Let us target it for 0.8 for now and we can reevaluate the state of the pr and it's dependencies sometime next week and see if it would still be viable for 0.8 or not.