Ray: Ray hangs when machine is disconnected from network

Created on 22 Mar 2020  路  4Comments  路  Source: ray-project/ray

What is the problem?

When I disconnect my machine from the internet (e.g. by unplugging the ethernet cable) in the middle of a Tune training, the trials hang forever. This seems unexpected when running things locally. If it's by design, then this would be a feature request and not a bug report 馃檪

Ray version and other system information (Python version, TensorFlow version, OS): 0.8.2

Reproduction (REQUIRED)

Run the script below and disconnect your machine from the network after the first result.

import time

from ray import tune


class MyTrainableClass(tune.Trainable):
    def _setup(self, config):
        self.timestep = 0

    def _train(self):
        self.timestep += 1
        result = {"episode_reward_mean": self.timestep}
        if self.timestep == 100:
            result['done'] = True
        time.sleep(5)
        return result


tune.run(
    MyTrainableClass,
    name="network-test",
    num_samples=1,
    config={'a': 1})
  • [x] I have verified my script runs in a clean environment and reproduces the issue.
  • [x] I have verified the issue also occurs with the latest wheels.
bug

All 4 comments

Oh wow ... I think I know what the issue is (we look for the IP address at each step).

Actually, doesn't seem to be the case. This seems to be a Ray issue.
Screenshot 2020-03-22 17 34 44

After internet shutoff -

Screenshot 2020-03-22 17 34 56

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

I believe this is still an issue and should not be closed.

Was this page helpful?
0 / 5 - 0 ratings