# any dataframe with a timedelta64[ns] type in it
df.to_parquet('/tmp/tmp.parquet.gzip', compression='gzip')
versions:
Python 3.7.6
pandas==0.25.3
pyarrow==0.15.1
Traceback (most recent call last):
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/site-packages/pandas/core/frame.py", line 2237, in to_parquet
**kwargs
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/site-packages/pandas/io/parquet.py", line 254, in to_parquet
**kwargs
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/site-packages/pandas/io/parquet.py", line 101, in write
table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
File "pyarrow/table.pxi", line 1057, in pyarrow.lib.Table.from_pandas
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 560, in dataframe_to_arrays
convert_fields))
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator
yield fs.pop().result()
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 546, in convert_column
raise e
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 540, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow/array.pxi", line 207, in pyarrow.lib.array
File "pyarrow/array.pxi", line 74, in pyarrow.lib._ndarray_to_array
File "pyarrow/array.pxi", line 62, in pyarrow.lib._ndarray_to_type
File "pyarrow/error.pxi", line 86, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: ('Unsupported numpy type 22', 'Conversion failed for column elapsed_time with type timedelta64[ns]')
PS, Submitting an issue on pyarrow in JIRA is a complete 馃 馃槥
you should try with pyarrow 0.16
in any event this should be reported to the pyarrow tracker
pyarrow = 0.16.* does not fix this. Since this belongs at the intersection of pandas (numpy) and pyarrow, this issue could belong in either place and probably belongs in both places, since the responsibility for mapping serialization between them could belong in both places. Hence, I respectfully request clarification on where the responsibility lies for this serialization (it begins with pandas and the method belongs on pandas, so that's the natural place to put the issue since pandas unit tests are not catching the bug, it seems, and it's perfectly reasonable for pyarrrow to issue a NotImplemented exception, which implies that pandas is not using the pyarrow API correctly).
Traceback (most recent call last):
...
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/site-packages/pandas/core/frame.py", line 2237, in to_parquet
**kwargs
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/site-packages/pandas/io/parquet.py", line 254, in to_parquet
**kwargs
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/site-packages/pandas/io/parquet.py", line 117, in write
**kwargs
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/site-packages/pyarrow/parquet.py", line 1343, in write_table
**kwargs) as writer:
File "/opt/conda/envs/gis-dataprocessing/lib/python3.7/site-packages/pyarrow/parquet.py", line 448, in __init__
**options)
File "pyarrow/_parquet.pyx", line 1279, in pyarrow._parquet.ParquetWriter.__cinit__
File "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Unhandled type for Arrow to Parquet schema conversion: duration[ns]
@dazza-codes sure pandas can add more testing
but this is a pyarrow traceback and that鈥檚 where it is handled
you should open an issue there
As you can see from the different error message with pyarrow 0.16, you can see that it now fails in writing to parquet, and no longer in converting pandas to arrow (which has been implemented in pyarrow 0.16).
An issue for timedelta (duration in pyarrow) support in parquet writing already exists: https://issues.apache.org/jira/browse/ARROW-6780