Ray: Multiply ray Actors on a single machine fight for CPUs with PyTorch

Created on 22 Dec 2018  路  2Comments  路  Source: ray-project/ray

System information

  • OS Platform and Distribution: Ubuntu 16.04; Ubuntu 18.04; Amazon Linux
  • Ray installed from (source or binary): pip and wheel from #3520
  • Ray version: 0.4.0; 0.6.0 from #3520
  • Python version: 3.6.6 for ray 0.4.0 and 3.7 for ray 0.6.0 from #3520

Describe the problem

I want to run a ray program with many 2-CPU actors on a single m5.24xlarge instance on AWS to avoid network communication delays, but ray gets horribly slow when PyTorch calls are executed concurrently by multiple actors on the same machine. I tested this on my local and 2 remote Ubuntu machines and this seems to be true for all of them.

In the System Monitor, I can see all CPUs shooting up to close to 100% when limiting the actor to 1 CPU (this is, when just having 1 actor, of course!).
I am not sure whether this is a ray or a PyTorch problem, but I hope someone can help.

Note: on many separate AWS m5.large instances (each has 2 CPUs, i.e. one actor on each machine), my program scales very well, so that is not the cause.

I provide toy code that, when run on a single multi-CPU machine, runs slower if jobs are split among actors than if a single actor does it here:

import time

import ray
import torch


class NeuralNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.l = torch.nn.Linear(1000, 2048)
        self.l2 = torch.nn.Linear(2048, 2)

    def forward(self, x):
        return self.l2(self.l(x))


@ray.remote(num_cpus=1)
class TestActor:
    def __init__(self):
        self.net = NeuralNet()
        self.crit = torch.nn.MSELoss()

    def do_torch_stuff(self, batch_size):
        p = self.net(torch.rand((batch_size, 1000), ))


def _parallel_on_5_actors():
    t0 = time.time()

    ray.init()
    acs = [TestActor.remote() for _ in range(5)]
    for _ in range(1000):
        ray.get([ac.do_torch_stuff.remote(10) for ac in acs])

    print("With 5 actors: ", time.time() - t0)


def _all_on_1_actor():
    t0 = time.time()

    ray.init()
    ac = TestActor.remote()
    for _ in range(5000):
        ray.get(ac.do_torch_stuff.remote(10))

    print("With 1 actor: ", time.time() - t0)


if __name__ == '__main__':
    _all_on_1_actor() # ~10 sec on my machine
    # _parallel_on_5_actors() # -> ~18 sec on my machine. Should be 2?!?!?
question

Most helpful comment

PyTorch is already parallelizing using multiple threads internally. When you have multiple processes this can cause excessive thrashing from context switching.

It looks like pytorch doesn't let you set threads explicitly https://github.com/pytorch/pytorch/issues/975
However setting OMP_NUM_THREADS=1 prior to starting ray should work.

All 2 comments

PyTorch is already parallelizing using multiple threads internally. When you have multiple processes this can cause excessive thrashing from context switching.

It looks like pytorch doesn't let you set threads explicitly https://github.com/pytorch/pytorch/issues/975
However setting OMP_NUM_THREADS=1 prior to starting ray should work.

Thank you for your quick response! That does the job!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

floringogianu picture floringogianu  路  32Comments

alex-petrenko picture alex-petrenko  路  34Comments

robertnishihara picture robertnishihara  路  50Comments

J1810Z picture J1810Z  路  47Comments

roireshef picture roireshef  路  38Comments