Cudf: [FEA] Implement a fluent builder API for cuIO *_args types

Created on 13 Aug 2020  路  4Comments  路  Source: rapidsai/cudf

Some of the parameter types of the cuIO APIs have many members and are thus hard to initialize, making the API hard to use:

  • user needs to know the parameter order in constructors with many default values;
  • many member cannot be set in the constructor because they are preceded with many defaulted values;

Sample constructor: https://github.com/rapidsai/cudf/blob/3cf6c25714e6d0b5ede8a80e6440b647b8f87c35/cpp/include/cudf/io/functions.hpp#L281-L290
Sample calling site:
https://github.com/rapidsai/cudf/blob/9a89887dd34b479cc8e1939af0b506692d6a28dd/cpp/tests/io/csv_test.cpp#L248-L249
Proposed fluent builder API:

write_csv_args const write_args = write_csv_args::make(table, sink_info{filename})
                                                         .with_metadata(metadata)
                                                         .include_header(include_header);
  • [x] Avro reader
  • [x] CSV reader
  • [x] CSV writer
  • [x] JSON reader
  • [x] Parquet reader
  • [x] Parquet writer
  • [x] ORC reader
  • [x] ORC writer
code quality cuIO feature request good first issue libcudf tech debt

Most helpful comment

Open question: do we keep the current way to set the *_args data members, or encapsulate them?

Rip the bandaid off and do it the right way.

All 4 comments

Open question: do we keep the current way to set the *_args data members, or encapsulate them?

Keeping the option to directly set the members limits changes in the Python layer, but it's generally better to have the data members be private and use the setters (such as the ones we are adding).

Open question: do we keep the current way to set the *_args data members, or encapsulate them?

Rip the bandaid off and do it the right way.

A prototype API for the Parquet writer:
https://godbolt.org/z/xjq7ej

All API changes have been merged, closing :)

Was this page helpful?
0 / 5 - 0 ratings