Ray: Multiply ray Actors on a single machine fight for CPUs with PyTorch

Created on 22 Dec 2018 · 2Comments · Source: ray-project/ray

System information

OS Platform and Distribution: Ubuntu 16.04; Ubuntu 18.04; Amazon Linux
Ray installed from (source or binary): pip and wheel from #3520
Ray version: 0.4.0; 0.6.0 from #3520
Python version: 3.6.6 for ray 0.4.0 and 3.7 for ray 0.6.0 from #3520

Describe the problem

I want to run a ray program with many 2-CPU actors on a single m5.24xlarge instance on AWS to avoid network communication delays, but ray gets horribly slow when PyTorch calls are executed concurrently by multiple actors on the same machine. I tested this on my local and 2 remote Ubuntu machines and this seems to be true for all of them.

In the System Monitor, I can see all CPUs shooting up to close to 100% when limiting the actor to 1 CPU (this is, when just having 1 actor, of course!).
I am not sure whether this is a ray or a PyTorch problem, but I hope someone can help.

Note: on many separate AWS m5.large instances (each has 2 CPUs, i.e. one actor on each machine), my program scales very well, so that is not the cause.

I provide toy code that, when run on a single multi-CPU machine, runs slower if jobs are split among actors than if a single actor does it here:

import time

import ray
import torch


class NeuralNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.l = torch.nn.Linear(1000, 2048)
        self.l2 = torch.nn.Linear(2048, 2)

    def forward(self, x):
        return self.l2(self.l(x))


@ray.remote(num_cpus=1)
class TestActor:
    def __init__(self):
        self.net = NeuralNet()
        self.crit = torch.nn.MSELoss()

    def do_torch_stuff(self, batch_size):
        p = self.net(torch.rand((batch_size, 1000), ))


def _parallel_on_5_actors():
    t0 = time.time()

    ray.init()
    acs = [TestActor.remote() for _ in range(5)]
    for _ in range(1000):
        ray.get([ac.do_torch_stuff.remote(10) for ac in acs])

    print("With 5 actors: ", time.time() - t0)


def _all_on_1_actor():
    t0 = time.time()

    ray.init()
    ac = TestActor.remote()
    for _ in range(5000):
        ray.get(ac.do_torch_stuff.remote(10))

    print("With 1 actor: ", time.time() - t0)


if __name__ == '__main__':
    _all_on_1_actor() # ~10 sec on my machine
    # _parallel_on_5_actors() # -> ~18 sec on my machine. Should be 2?!?!?

question

Source

EricSteinberger

Most helpful comment

PyTorch is already parallelizing using multiple threads internally. When you have multiple processes this can cause excessive thrashing from context switching.

It looks like pytorch doesn't let you set threads explicitly https://github.com/pytorch/pytorch/issues/975
However setting OMP_NUM_THREADS=1 prior to starting ray should work.

ericl on 22 Dec 2018

🎉1 👍1

All 2 comments

PyTorch is already parallelizing using multiple threads internally. When you have multiple processes this can cause excessive thrashing from context switching.

It looks like pytorch doesn't let you set threads explicitly https://github.com/pytorch/pytorch/issues/975
However setting OMP_NUM_THREADS=1 prior to starting ray should work.

ericl on 22 Dec 2018

🎉1 👍1

Thank you for your quick response! That does the job!

EricSteinberger on 22 Dec 2018

Was this page helpful?

0 / 5 - 0 ratings