Kedro: [KED-2292] Updating pyarrow version constraint

Created on 19 Nov 2020  路  5Comments  路  Source: quantumblacklabs/kedro

Description

The pyarrow version constraint for ParquetDataSets is now set to =0.12.0, <1.0.0. Pyarrow has since had several major version upgrades (current release is 2.0.0). I am wondering if the version constraint on pyarrow could be relaxed, so we could get more recent versions of pyarrow.

Context

Many other python packages already depend on pyarrow >= 1.0.0, or even >= 2.0.0. An example would be awswrangler. By restricting pyarrow to <1.0.0, this means we either get version conflicts or have to use increasingly outdated packages.

Possible Implementation

I would suggest relaxing the pyarrow version constraint to >=0.12.0, <3.0.0. As per the arrow documentation, files created with any pyarrow version since 0.8.0 should stay readable in versions >= 1.0.0. Files created with pyarrow >= 1.0.0 are, however, not readable for versions < 1.0.0. Version 2.0.0 does not change the data format at all. It does, however, deprecate some functionality in the library (pyarrow.filesystem, pyarrow.serialize, pyarrow.deserialize). I'm not sure if Kedro uses this functionality.

Feature Request

Most helpful comment

@debugger24 it's been fixed by https://github.com/quantumblacklabs/kedro/commit/9acca4688389930b1744e241a94dd20cc5918bb3, will be available in 0.17.1. :)

All 5 comments

Hi @sndrtj thanks a lot for bringing this to our attention! That sounds like a very reasonable request to me. Happy for you to make this contribution if you like, otherwise someone in the team will pick it up. 馃槉

Closing this as resolved in https://github.com/quantumblacklabs/kedro/commit/f3fcd56b6e07292590e4d68142aca10efc517a4b

Thanks again for raising this!

Hi @lorenabalan I am unable to pip compile kedro[pandas]==0.17.0. I am getting the following error.

  pyarrow<4.0dev,>=1.0.0 (from google-cloud-bigquery[bqstorage,pandas]==2.7.0->pandas-gbq==0.14.1->kedro[pandas]==0.17.0->-r src/requirements.in (line 8))
  pyarrow<1.0.0,>=0.12.0 (from kedro[pandas]==0.17.0->-r src/requirements.in (line 8)) 

pandas.ParquetDataSet in setup.py is restricting it to be <1.0.0
"pandas.ParquetDataSet": [PANDAS, "pyarrow>=0.12.0, <1.0.0"]

This is updated in this PR at test_requirements.txt but setup.py still says <1.0.0


Python: 3.8.5
OS: MacOS
Kedro: 0.17.0

Thanks for flagging! Will look at fixing it today.

@debugger24 it's been fixed by https://github.com/quantumblacklabs/kedro/commit/9acca4688389930b1744e241a94dd20cc5918bb3, will be available in 0.17.1. :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bensdm picture bensdm  路  4Comments

WaylonWalker picture WaylonWalker  路  3Comments

philippegr picture philippegr  路  4Comments

jmrichardson picture jmrichardson  路  3Comments

applelok picture applelok  路  3Comments