Describe the bug
I try to import parquet files and some files are patially imported because I have en encoding error.
The file are provided by an external providers. I don't know the library used to generate files.
ClickHouse client version 20.6.3.28 (official build).
Expected behavior
A clear and concise description of what you expected to happen.
Error message and/or stacktrace
Code: 33. DB::Exception: Error while reading Parquet data: IOError: Not yet implemented: Unsupported encoding.
Additional context
Add any other context about the problem here.
Current master - is the same:
https://github.com/apache/arrow/blob/8f3b029122a8f7149e2061c17a74f4add2677a6e/cpp/src/parquet/column_reader.cc#L713-L716
Please confirm you use one of:
case Encoding::DELTA_BINARY_PACKED:
case Encoding::DELTA_LENGTH_BYTE_ARRAY:
case Encoding::DELTA_BYTE_ARRAY:
The file are provided by an external providers. I don't know the library used to generate files.
You can use parquet-tools ( https://github.com/apache/parquet-mr/tree/master/parquet-tools ) to inspect your parquet file.
Hello,
This is it DELTA_BYTE_ARRAY the responsible of the error.
have you an idea on how to avoid this error ?
Thanks.
No workarounds. It's just a library used by clickhouse to read parquet doesn't support that encoding - you can open an issue in upstream project https://issues.apache.org/jira/projects/ARROW/issues/
Many thanks for the help. I will open an issue in apache arrow project.