Ray: [Core] Allow user to specify the port for the core worker server.

Created on 6 Apr 2020 · 5Comments · Source: ray-project/ray

Describe your feature request

Being able to manually set the core worker server's port would be great for use cases in which that port needs to be explicitly exposed (e.g. from a container) on the driver.

Use Case Details

I'm running the Ray driver in a separate network namespace than the Ray worker; specifically, I'm running the Ray driver and a node-colocated Ray worker in two different Kubernetes pods, the former in a regular k8s deployment and the latter in a DaemonSet (so any Ray driver will be guaranteed to have at least one Ray worker on the same node.) In order for the Raylet in the DaemonSet Ray worker pod to make an RPC to the Ray driver's core worker server, the core worker server's port needs to be exposed at the container/Kubernetes level. However, this isn't possible when the core worker server's port is always randomly chosen by the kernel.

Possible Implementation

Add core_worker_server_port argument to the CoreWorker constructor, keeping the default of 0, and create the core worker gRPC server using that port. A default of 0 could allow a more gradual port of language workers other than the Python worker, and wouldn't require an API change for the creation of non-driver workers.
Modify the Python API, i.e. add a core worker server port argument to ray.init(), with a default of 0 at the Python API level. Could expose this at the ray.init() level as driver_core_worker_server_port, such that it only applies to the driver. For our use case, this only being exposed at the ray.worker.connect() API level would suffice; we construct a ray.node.Node() and call ray.worker.connect() directly to allow for customization that is not exposed at ray.init() (yet.)

I know that this might overlap with some existing work on offering a client-server model for Ray drivers, but this stopgap would be very useful for our use case in which we _do_ still have a Ray worker colocated on the same machine, it's just under a different network namespace (a common side-effect of the Ray worker DaemonSet deployment patter, which has a lot of advantages for us.) I'd be willing to submit a PR with the requisite changes if this seems acceptable.

enhancement

Source

clarkzinzow

Most helpful comment

I guess this PR is related? https://github.com/ray-project/ray/pull/7833

rkooo567 on 6 Apr 2020

😄1 👍1

All 5 comments

I guess this PR is related? https://github.com/ray-project/ray/pull/7833

rkooo567 on 6 Apr 2020

😄1 👍1

Thanks @rkooo567 that's exactly what I'm looking for! Even better in fact. Not a crazy ask after all eh?

@edoakes Lmk if you'd like me to close this issue.

clarkzinzow on 6 Apr 2020

😄1

Glad it helped! Let us know if specifying individual port (than a range) is a crucial feature for you!!

rkooo567 on 6 Apr 2020

Being able to specify the worker server port for the driver (driver_port in that PR) is what I'm really interested in, although being able to specify a range for the driver_port would be an even bigger help. I plan to define a driver port range and keep track of unused driver ports at the application level, since we can have multiple concurrent Ray drivers within the same container which would produce port conflicts if they all attempted to use the same driver port. It'd be awesome if Ray handled this so we wouldn't have to maintain this port range and set of unused driver ports ourselves, but it's by no means a blocker.

clarkzinzow on 6 Apr 2020

👍1

@ClarkZinzow https://github.com/ray-project/ray/pull/7833 now allocates both worker and driver ports from the same pool (set by --min-worker-port and --max-worker-port) - hopefully that works for your use case.