ray.init(address='auto') crashes *with* pytest

Created on 21 Oct 2020  路  5Comments  路  Source: ray-project/ray

What is the problem?

Ray is successfully started on the head with

$ ~/anaconda3/envs/38env/bin/ray start --head --port=6379

on on another node with

$ ~/anaconda3/envs/38env/bin/ray start --address=XYZ --redis-password=XYZ

When ray.init(address='auto') is run on that second node, if https://github.com/ray-project/ray/issues/11530 has been resolved, the terminal is immediately hit with an avalanche of

(pid=raylet) Traceback (most recent call last):
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 319, in <module>
(pid=raylet)     raise e
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 309, in <module>
(pid=raylet)     loop.run_until_complete(agent.run())
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
(pid=raylet)     return future.result()
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 122, in run
(pid=raylet)     modules = self._load_modules()
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 76, in _load_modules
(pid=raylet)     agent_cls_list = dashboard_utils.get_all_modules(
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/utils.py", line 205, in get_all_modules
(pid=raylet)     importlib.import_module(name)
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/importlib/__init__.py", line 127, in import_module
(pid=raylet)     return _bootstrap._gcd_import(name[level:], package, level)
(pid=raylet)   File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
(pid=raylet)   File "<frozen importlib._bootstrap>", line 991, in _find_and_load
(pid=raylet)   File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
(pid=raylet)   File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
(pid=raylet)   File "<frozen importlib._bootstrap_external>", line 783, in exec_module
(pid=raylet)   File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/modules/log/test_log.py", line 12, in <module>
(pid=raylet)     from ray.new_dashboard.tests.conftest import *  # noqa
(pid=raylet) ModuleNotFoundError: No module named 'ray.new_dashboard.tests'
(pid=raylet) Traceback (most recent call last):
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 319, in <module>
(pid=raylet)     raise e
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 309, in <module>
(pid=raylet)     loop.run_until_complete(agent.run())
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
(pid=raylet)     return future.result()
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 122, in run
(pid=raylet)     modules = self._load_modules()
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 76, in _load_modules
(pid=raylet)     agent_cls_list = dashboard_utils.get_all_modules(
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/utils.py", line 205, in get_all_modules
(pid=raylet)     importlib.import_module(name)
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/importlib/__init__.py", line 127, in import_module
(pid=raylet)     return _bootstrap._gcd_import(name[level:], package, level)
(pid=raylet)   File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
(pid=raylet)   File "<frozen importlib._bootstrap>", line 991, in _find_and_load
(pid=raylet)   File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
(pid=raylet)   File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
(pid=raylet)   File "<frozen importlib._bootstrap_external>", line 783, in exec_module
(pid=raylet)   File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
(pid=raylet)   File "/home/david/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/modules/log/test_log.py", line 12, in <module>
(pid=raylet)     from ray.new_dashboard.tests.conftest import *  # noqa
(pid=raylet) ModuleNotFoundError: No module named 'ray.new_dashboard.tests'

If that is fixed (or "fixed", see below), the same thing happens with ModuleNotFoundError: No module named 'ray.tests'.
Ray version and other system information (Python version, TensorFlow version, OS):
Ray is installed with:

$ ~/anaconda3/bin/conda create --name 38env python=3.8
$ ~/anaconda3/envs/38env/pip install --upgrade https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-1.1.0.dev0-cp38-cp38-manylinux1_x86_64.whl

Reproduction (REQUIRED)

Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):

ray.init(address='auto')

If we cannot run your script, we cannot fix your issue.

  • [X] I have verified my script runs in a clean environment and reproduces the issue.
  • [X] I have verified the issue also occurs with the latest wheels.

Fixing this is not as simple as fixing https://github.com/ray-project/ray/issues/11530. Leaving aside whatever's going on with new_dashboard, ModuleNotFoundError: No module named 'ray.tests' happens because ray.tests is in fact left out of the installation, as you can see by checking site-packages/ray.

As a manual hack, I restored the missing subpackages by directly copying them from the git repository:

~/anaconda/envs/38env/lib/python3.8/site-packages/ray$ cp -r /tmp/ray/python/ray/tests .
~/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard$ cp -r /tmp/ray/dashboard/tests/ .

With those manually copied in, Ray was able to start and run just fine.

For a real fix, the obvious solution is to add an __init__.py to ray.tests. That will (I think?) cause it to be included. (Though I haven't been able to understand your wheel-building scripts all that well.) It looks like that would also work for 'ray.new_dashboard.tests'.

bug triage

All 5 comments

Hmm, thanks a bunch for opening these issues. Adding __init__.py to ray.new_dashboard.tests to capture conftest sounds good.

I'm not so sure about adding __init__.py to ray.tests, can you post the stacktrace for that module error?

Also, both this and #11530 seem to be dependency problems introduced by the new_dashboard code.

cc @mfitton

I'm not so sure about adding init.py to ray.tests, can you post the stacktrace for that module error?

E.g.

WARNING worker.py:1091 -- The agent on node <name of head node or other node as appropriate> failed with the following error:
Traceback (most recent call last):
  File "/ascldap/users/dahanna/anaconda3/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 309, in <module>
    loop.run_until_complete(agent.run())
  File "/ascldap/users/dahanna/anaconda3/envs/38env/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/ascldap/users/dahanna/anaconda3/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 122, in run
    modules = self._load_modules()
  File "/ascldap/users/dahanna/anaconda3/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 76, in _load_modules
    agent_cls_list = dashboard_utils.get_all_modules(
  File "/ascldap/users/dahanna/anaconda3/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/utils.py", line 205, in get_all_modules
    importlib.import_module(name)
  File "/ascldap/users/dahanna/anaconda3/envs/38env/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/ascldap/users/dahanna/anaconda3/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/modules/log/test_log.py", line 12, in <module>
    from ray.new_dashboard.tests.conftest import *  # noqa
  File "/ascldap/users/dahanna/anaconda3/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/tests/conftest.py", line 3, in <module>
    from ray.tests.conftest import *  # noqa
ModuleNotFoundError: No module named 'ray.tests'

Like the other two, this error appears when ray.init(address='auto') is run, alternating for each node including the head node, and repeats until ray stop.

Hey, just some additional context, the PR that caused this error has been reverted, so it shouldn't be happening. However, there's currently an issue with our Linux wheel builds such that no builds are being completed, so the latest wheel released is still a broken one.

Ah, I see that https://github.com/ray-project/ray/pull/11510/files already has dashboard/tests/__init__.py.

Well, python/ray/tests/__init__.py is in https://github.com/ray-project/ray/pull/11536 if you want it in view of that stacktrace. (Not sure what you were looking for in that stacktrace, so I'm not sure what conclusion you'll draw from it.)

In the meantime,

~/anaconda/envs/38env/lib/python3.8/site-packages/ray$ cp -r /tmp/gitclone/ray/python/ray/tests .
~/anaconda/envs/38env/lib/python3.8/site-packages/ray/new_dashboard$ cp -r /tmp/gitclone/ray/dashboard/tests/ .

~seems to work as a workaround to get Ray working with the current wheel.~ Ah, never mind, it worked when running two Ray instances on a single machine but not when actually distributing across two machines. It gets past the ModuleNotFoundErrors it was getting before, but still fails for unrelated reasons, which I'm guessing you already know about.

>>> ray.init(address='auto')
WARNING worker.py:1091 -- The agent on node <name of head node or other node as appropriate> failed with the following error:
Traceback (most recent call last):
  File "/ascldap/users/dahanna/anaconda3/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 309, in <module>
    loop.run_until_complete(agent.run())
  File "/ascldap/users/dahanna/anaconda3/envs/38env/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/ascldap/users/dahanna/anaconda3/envs/38env/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 167, in run
    await raylet_stub.RegisterAgent(
  File "/ascldap/users/dahanna/anaconda3/envs/38env/lib/python3.8/site-packages/grpc/aio/_call.py", line 285, in __await__
    raise _create_rpc_error(self._cython_call._initial_metadata,
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses"
        debug_error_string = "{"created":"@1603310089.237882562","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":4133,"referenced_errors":[{"created":"@1603310089.237868910","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":397,"grpc_status":14}]}"

Should be fixed on the latest nightly (by revert).

Was this page helpful?
0 / 5 - 0 ratings