Apache Parquet is a popular data format in the industry. It is used in ScikitLearn, Spark and a many other ML and big-data related software. Currently ParquetReader is an internal class that is not exposed to end users.
Feature Request:
This would significantly simplify integration with various other ML and ETL processes.
This would also provide a good industry-standard data interop between ML.NET and other ML and data tools, as an alternative to CSVs.
Hi @GKrivosheev-rms , thanks for the suggestion :).
If I might add a suggestion tied to this. Parquet supports many kinds of compression. The big ones are snappy (often the default in python packages) and gzip. But others are gaining traction as well for space and/or size issues. Some others that are supported by pyarrow are brotli, lz4, and zstd.
(forgive the link to my own blog)
Most helpful comment
Hi @GKrivosheev-rms , thanks for the suggestion :).