Currently Ray keeps track of the GPUs (and other resources) that _it_ is using. However, to my understanding, and based on the experiments I have run, it doesn't check whether _other_ users are using GPUs before running on them. As a consequence, it is possible to start a GPU application and have this terminate in error immediately due to a "Cuda out of memory" exception when the GPU is in heavy use.
One workaround here would be to write ray remote function defensively, monitoring (e.g. via GPUtil) the usage of the GPUs it has been assigned by ray, and only allocating memory once there is sufficient available. However, potentially it could take a long time for the other applications using the GPUs to finish, and meanwhile there might be other nodes on the cluster whose GPUs are not being used that could run the job right away.
Is there a way to have ray check for current GPU usage before assigning GPUs to remote functions or actors? If not, this would be a very useful feature.
I do agree this is incredibly useful - are you using Ray primarily via RLlib or Tune?
How beneficial would a utility that determines the available (unused) GPUs at the beginning of execution, and then for the rest of the execution to only draw from that subset of GPUs?
This is the easiest bandaid I can think of. Otherwise, if you need this to be continuously checked during execution (i.e., other users have varying workloads), then we should revisit the way we handle GPU scheduling to support this.
I'm primarily interested in Tune.
I think it would definitely be an improvement to determine GPU usage at the
beginning of execution. However, I work in an environment where GPU usage
varies a lot, and so it is quite possible that at the beginning of
execution many more GPUs would be in use compared with some time shortly
after. In this case it would be ideal if Ray could detect the extra free
GPUs and make use of them.
Personally, I have a workaround (outside of Ray) that is sufficient for me
at the moment. However, I thought I would suggest this since I think
Ray/Tune is a great framework, and I can see something like this being of
interest to other users like myself.
On Sat, 24 Aug 2019 at 23:45, Richard Liaw notifications@github.com wrote:
I do agree this is incredibly useful - are you using Ray primarily via
RLlib or Tune?How beneficial would a utility that determines the available (unused) GPUs
at the beginning of execution, and then for the rest of the execution to
only draw from that subset of GPUs?This is the easiest bandaid I can think of. Otherwise, if you need this to
be continuously checked during execution (i.e., other users have varying
workloads), then we should revisit the way we handle GPU scheduling to
support this.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/ray-project/ray/issues/5528?email_source=notifications&email_token=AATVTIOLLEYFXBYDM7HKOVTQGG2YFA5CNFSM4IPFVUH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5CIYDQ#issuecomment-524585998,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AATVTIL75HAFXZN45K3MYYLQGG2YFANCNFSM4IPFVUHQ
.
I'm primarily interested in Tune, too.
I'm particularly interested in the scenario where multiple users cooperate to use Tune.
A flexible framework should be able to deal with the following scenario without pain:
There are 8 GPUs in a machine. At first, Alice and Bob share the machine. Alice mainly uses [0, 1, 2, 3] while Bob mainly uses [4, 5, 6, 7].
While Alice is on vacation, Bob uses all of the GPUs for extreme hyper-parameter search which may take weeks.
Alice returns because an excellent idea strikes her and she thinks it may win her the best paper prize. She asks for 6 GPUs in compensation for the greediness of Bob.
Two days later, experiments show Alice's idea is worthless. Alice plans another trip out of disappointment. Thus Bob greedily takes all GPUs again.
By "without pain", I mean Bob doesn't need to kill his process throughout the whole story.
Here is a decent solution I come up with and should be implemented by Ray with a few modifications:
num_gpus). For example:ray.init(gpu_ids=[0, 1, 2, 3])
ray.init(gpu_ids=[0, 1, 2, 3], include_webui=True)
In the web ui, we can add GPUs or mark some GPUs as unavailable.
This way, Bob initializes Ray with all GPUs. When Alice returns, Bob marks 6 GPUs as unavailable. After works running on these GPUs exit, Alice can carry out her plan of the best paper. When Alice leaves, Bob adds these GPUs to Ray, carrying on his hyper-parameter search plan.
To achieve this, we only need to modify Ray so that it maintains the IDs of available GPUs, which I think can be implemented with a few modifications.
The modifications can be made backward-compatible, too. Just add a parameter to the init function:
def init(..., gpu_ids=None):
gpu_ids = gpu_ids or available_gpus[:num_gpus]
@richardliaw Do you have any suggestions about how to avoid out of cuda memory error when using ray.tune?
Smaller batch size maybe? Are you using fractional gpus?
Smaller batch size maybe? Are you using fractional gpus?
@richardliaw No, I uses multiple gpu for a trial: @ray.remote(num_gpus=8, max_calls=1)
def train_model().
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
Most helpful comment
I'm primarily interested in Tune, too.
I'm particularly interested in the scenario where multiple users cooperate to use Tune.
A flexible framework should be able to deal with the following scenario without pain:
There are 8 GPUs in a machine. At first, Alice and Bob share the machine. Alice mainly uses [0, 1, 2, 3] while Bob mainly uses [4, 5, 6, 7].
While Alice is on vacation, Bob uses all of the GPUs for extreme hyper-parameter search which may take weeks.
Alice returns because an excellent idea strikes her and she thinks it may win her the best paper prize. She asks for 6 GPUs in compensation for the greediness of Bob.
Two days later, experiments show Alice's idea is worthless. Alice plans another trip out of disappointment. Thus Bob greedily takes all GPUs again.
By "without pain", I mean Bob doesn't need to kill his process throughout the whole story.
Here is a decent solution I come up with and should be implemented by Ray with a few modifications:
num_gpus). For example:In the web ui, we can add GPUs or mark some GPUs as unavailable.
This way, Bob initializes Ray with all GPUs. When Alice returns, Bob marks 6 GPUs as unavailable. After works running on these GPUs exit, Alice can carry out her plan of the best paper. When Alice leaves, Bob adds these GPUs to Ray, carrying on his hyper-parameter search plan.
To achieve this, we only need to modify Ray so that it maintains the IDs of available GPUs, which I think can be implemented with a few modifications.
The modifications can be made backward-compatible, too. Just add a parameter to the
initfunction: