Ray: Check failed: assigned_port != -1 on virtual Python environments on Windows

Created on 24 Jun 2020 · 11Comments · Source: ray-project/ray

What is the problem?

Running Ray on virtual Python 3.7 or 3.8 environments on Windows sometimes triggers an error.

This does appear to occur with Python 3.6, nor with standard or Anaconda installations.

It is unclear whether this issue is related to #9083, but it might be.

Reproduction (REQUIRED)

Try this on a machine with multiple CPUs (e.g. 8) with Python 3.7 or 3.8.

Note that passing a low value for num_cpus may avoid triggering this error.

> python -m venv myenv
> myenv\Scripts\activate
> python -m pip install ray
> python -c "import ray; ray.init()"
[(pid=7956) F0623 14:08:16.723009  7956  4168 core_worker.cc:294]  Check failed: assigned_port != -1 Failed to allocate a port for the worker. Please specify a wider port range using the '--min-worker-port' and '--max-worker-port' arguments to 'ray start'.
...

Related issue number

9114

[x] I have verified my script runs in a clean environment and reproduces the issue.
[x] I have verified the issue also occurs with the latest wheels.

P2 bug core

Source

mehrdadn

👍5

Most helpful comment

I managed to run this locally with ray.init(local_mode=True)

lelayf on 16 Aug 2020

👍2

All 11 comments

I get this same behavior in linux if I attempt to schedule tasks immediately after init. Its not a problem if I wait.

2020-06-25 20:00:01,072 WARNING worker.py:1047 -- The actor or task with ID 45b95b1c8bd3a9c4ffffffff0100 is pending and cannot currently be scheduled. It requires {CPU: 1.000000} for execution and {CPU: 1.000000} for placement, but this node only has remaining {node:10.44.81.203: 1.000000}, {CPU: 96.000000}, {memory: 38.134766 GiB}, {object_store_memory: 13.134766 GiB}. In total there are 1 pending tasks and 0 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale.
2020-06-25 20:00:03,272 INFO (unknown file):0 -- gc.collect() freed 16 refs in 1.603849448962137 seconds
(pid=19312) F0625 20:00:18.973336 19312 19312 core_worker.cc:294]  Check failed: assigned_port != -1 Failed to allocate a port for the worker. Please specify a wider port range using the '--min-worker-port' and '--max-worker-port' arguments to 'ray start'.

laxatives on 25 Jun 2020

@laxatives Is it also happening in the virtualenv?

rkooo567 on 25 Jun 2020

No this, is on a notebook server with a dedicated k8s image. My problem was resolved after waiting a few seconds before attemping to use the server.

laxatives on 25 Jun 2020

Hmm the wait doesn't seem to consistently prevent the issue. It resolves after manually retrying to run a task/actor, but I'm not sure whats going on. I'm trying to run in a notebook, so its possible I'm not doing a clean teardown between attempts.

laxatives on 25 Jun 2020

I assume there's race condition that is triggered only at a certain env. this should be resolved when @mehrdadn fixes the Windows issue. I will bump up the priority level.

rkooo567 on 25 Jun 2020

@mehrdadn Can you prioritizing the fix for this issue?

rkooo567 on 25 Jun 2020

@rkooo567 I'm not sure—I can try to diagnose it, but from the looks of it, there's a chance I might not be able to find a fix soon. I can get back to you after doing some more diagnosis, but my guess is someone more familiar with the Ray core would be able to handle this much faster than I could.

mehrdadn on 26 Jun 2020

I managed to run this locally with ray.init(local_mode=True)

lelayf on 16 Aug 2020

👍2

@lelayf Thanks heaps - this resolved this issue for me running Py 3.8 on Win10 in a venv.

cfculhane on 28 Aug 2020

I have the same issue. with "local_mode" it is running, but I do seem to have major performance issues.
Does "local_mode" impact the performance? (RL-LIB on one computer with 8 CPUs and 1 GPU)