Ray: Import error for pyarrow when Ray built from source

Created on 25 Jul 2018  路  16Comments  路  Source: ray-project/ray

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu
  • Ray installed from (source or binary): source
  • Ray version: source
  • Python version: 3.5
  • Exact command to reproduce: from ray.pyarrow._parquet import (ParquetReader, FileMetaData)

Describe the problem


Import error on importing parquet from pyarrow. See full error below.

Source code / logs


ImportError: /data/devin/ray/python/ray/pyarrow_files/pyarrow/libparquet.so.1: undefined symbol: _ZNK5arrow6Status8ToStringB5cxx11Ev

Most helpful comment

I'm seeing this on jupter/all-spark-notebook too

All 16 comments

This is because of https://github.com/ray-project/ray/commit/4c82ac72df8f3dc667bbe5d4f724a6fa9bc24c6c. The _GLIBCXX_USE_CXX11_ABI=0 flag is not applied to the parquet build currently.

This is starting to affect more and more dependencies which is worriesome, maybe we need to figure out a plan to keep the new ABI here.

This was never resolved, right? It came up recently when trying to compile the master.

@devin-petersohn did you have a workaround or some way of fixing this? Is this easily reproducible? Somehow I've never seen it.

@pcmoritz now that parquet-cpp is part of Arrow, would updating Arrow potentially make this problem go away?

cc @vengroff

I have the same issue when I try to compile the current master branch.
@robertnishihara did you find a workaround for this issue?
@devin-petersohn does the problem remain for you in Ray release 0.5.3?

@robertnishihara @pethor I am getting a different error on 0.5.3:

In [1]: import ray

In [2]: ray.__version__
Out[2]: '0.5.3'

In [3]: from ray.pyarrow._parquet import ParquetReader
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-3-481ff70270a9> in <module>()
----> 1 from ray.pyarrow._parquet import ParquetReader

ImportError: No module named 'ray.pyarrow'

In [4]: import ray.pyarrow
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-4-33b14d007198> in <module>()
----> 1 import ray.pyarrow

ImportError: No module named 'ray.pyarrow'

As a workaround, we just use the user's installed Arrow library. This leads to some issues with deprecation, etc., but we are able to manage it.

@devin-petersohn I suspect that doesn't actually use the user's installed pyarrow, because import ray modifies the Python path to make Ray's copy of pyarrow first, see

https://github.com/ray-project/ray/blob/8aa736572bcdee4b8cb7700860e25e2fc0fdeca9/python/ray/__init__.py#L8-L17

e.g., you can do pip uninstall pyarrow and that shouldn't change anything.

On release 0.5.3, I do not see the issue. I am closing this, but will reopen in the case others are experiencing this on 0.5.3. It seems to have been resolved.

@vengroff do you still see this issue?

@robertnishihara I still have a very similar issue, I will open a new issue

I am seeing this error on the AWS Ubuntu Deep Learning AMI with modin-0.5.4-py3 on Anaconda python 3.6.7. I can't figure a way around this. Anyone?

Hi @rjurney, is the error identical to the message above? Would you be able to post the error message?

I will try to reproduce the error on the AMI you mentioned, but if you have access to the error message, that would also be helpful.

The worrisome issue here is that this has happened from installing via pip, which has not been happening in the past.

I spun the AMI you mentioned and everything seems to be working fine. If you have access to the error message @rjurney or if you have more information that would be very helpful.

@devin-petersohn It was using the tensorflow_36 virtual environment.

Thanks @rjurney, but I still cannot replicate:

(tensorflow_p36) ubuntu@ip-172-31-30-78:~$ python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import modin.pandas as pd
>>> pd.read_parquet("test.parquet")
   col1  col2
0     1     4
1     2     5
2     3     6
3     4     7

If it was an ImportError, it would not be able to read the file at all. Did you do any other processing or run import pyarrow before the read_parquet?

koalas on dockerhub jupter/all-spark-notebook:latest

ImportError: /opt/conda/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK5arrow6Status8ToStringB5cxx11Ev

I'm seeing this on jupter/all-spark-notebook too

Auto-closing stale issue.

Was this page helpful?
0 / 5 - 0 ratings