Package created by kedro new command seems to import pyarrow and cause a problem.
I tried to use a Python package called "ray" inside the package created by kedro new
command, but import ray returns the following error.
import ray
File "/usr/local/lib/python3.6/dist-packages/ray/__init__.py", line 9, in <module>
raise ImportError("Ray must be imported before pyarrow because Ray "
ImportError: Ray must be imported before pyarrow because Ray requires a specific version of pyarrow (which is packaged along with Ray).
This issue was reported at ray's GitHub repo (https://github.com/ray-project/ray/issues/5497), but hasn't been resolved yet.
This error itself is ray's issue, but I found that it does not reproduce outside of a package
created by kedro new.
Here are 3 experiments:
kedro new (package_created_by_kedro_new.nodes.exmaple.py)import ray
ImportError: Ray must be imported before pyarrow ... as shown above.
kedro newimport kedro
print(">>> imported kedro: ", kedro.__version__)
import pyarrow
print(">>> pyarrow NOT imported by ray: ", pyarrow.__version__)
import ray
print(">>> imported ray: ", ray.__version__)
Output:
>>> imported kedro: 0.15.1
>>> pyarrow NOT imported by ray: 0.12.0
Traceback (most recent call last):
File "/home/u/Minyus/_Python_scratch/scratch.py", line 5, in <module>
import ray
File "/usr/local/lib/python3.6/dist-packages/ray/__init__.py", line 9, in <module>
raise ImportError("Ray must be imported before pyarrow because Ray "
ImportError: Ray must be imported before pyarrow because Ray requires a specific version of pyarrow (which is packaged along with Ray).
Process finished with exit code 1
kedro newimport kedro
print(">>> imported kedro: ", kedro.__version__)
# import pyarrow
# print(">>> pyarrow NOT imported by ray: ", pyarrow.__version__)
import ray
print(">>> imported ray: ", ray.__version__)
import pyarrow
print(">>> pyarrow imported by ray: ", pyarrow.__version__)
print(">>> No error is returned!")
Output:
>>> imported kedro: 0.15.1
>>> imported ray: 0.7.3
>>> pyarrow imported by ray: 0.14.0.RAY
>>> No error is returned!
Process finished with exit code 0
In summary, importing kedro is harmless, but the package created by kedro new command seems
to import pyarrow and causes a problem.
Are there any workaround?
pip show kedro or kedro -V): 0.15.1 (installed by pip install kedro)python -V): Python 3.6.8Thanks @Minyus, could it possibly be a duplicate of this issue?
Thanks @Minyus, could it possibly be a duplicate of this issue?
Hi @Pet3ris,
I believe my issue is technically different although both are related to pyarrow. While https://github.com/quantumblacklabs/kedro/issues/94 is installation failure using pipenv, I do not use pipenv and I can install and import Kedro without errors. I can use kedro new command as well.
Hey @Minyus! I hope that you're well this week! Thanks for raising this issue. We've put it on our backlog to investigate.
Hi @Minyus, when trying to reproduce your issue, I've managed to get the same exception when importing pyarrow before ray even without Kedro being installed in the environment:
conda create -y -n test_pyarrow python=3.6
conda activate test_pyarrow
pip install pyarrow ray
Then in python:
import pyarrow
import ray
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dmitrii_deriabin/anaconda3/envs/test_pyarrow/lib/python3.6/site-packages/ray/__init__.py", line 35, in <module>
raise ImportError("Ray must be imported before pyarrow because Ray "
ImportError: Ray must be imported before pyarrow because Ray requires a specific version of pyarrow (which is packaged along with Ray).
ray/__init__.py contains the following, which is not Kedro specific:
# MUST import ray._raylet before pyarrow to initialize some global variables.
# It seems the library related to memory allocation in pyarrow will destroy the
# initialization of grpc if we import pyarrow at first.
# NOTE(JoeyJiang): See https://github.com/ray-project/ray/issues/5219 for more
# details.
import ray._raylet # noqa: E402
if "pyarrow" in sys.modules:
raise ImportError("Ray must be imported before pyarrow because Ray "
"requires a specific version of pyarrow (which is "
"packaged along with Ray).")
From what I can understand, it's not kedro new that is the problem, but running any code that includes import ray with kedro run, since kedro/cli.py imports stuff that imports stuff, that eventually imports kedro.io which imports datasets which imports s3fs which imports fsspec which imports pyarrow, sopyarrow will be present in sys.modules long before kedro run runs import ray.
We're restructuring our contrib and dependencies significantly, which would have potentially solved your problem, but we're also moving all our datasets to use fsspec (which imports pyarrow) -- so if you want to use any fsspec dataset, you'll be forced to have pyarrow.
I think this is really something on ray's side to fix.
Following my above reply, we'll likely be removing pyarrow from our core requirements.txt, but Kedro will still have an implicit pyarrow import when using the CLI because we have fsspec as part of our core io library, which imports pyarrow.
This should be brought up on Ray's side as a issue to fix, there isn't much we can do. If that's okay with you, I'll go ahead and close this issue. Thank you for raising it! :)
Most helpful comment
Hi @Pet3ris,
I believe my issue is technically different although both are related to pyarrow. While https://github.com/quantumblacklabs/kedro/issues/94 is installation failure using pipenv, I do not use pipenv and I can install and import Kedro without errors. I can use
kedro newcommand as well.