Ray will find a GPU and place the model (e.g. FCNet) on the GPU even when num_gpus=0.
Stack Trace:
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/agents/trainer.py:520: in train
raise e
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/agents/trainer.py:506: in train
result = Trainable.train(self)
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/tune/trainable.py:260: in train
result = self._train()
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py:139: in _train
return self._train_exec_impl()
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py:177: in _train_exec_impl
res = next(self.train_exec_impl)
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/util/iter.py:731: in __next__
return next(self.built_iterator)
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/util/iter.py:752: in apply_foreach
result = fn(item)
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/agents/maml/maml.py:143: in __call__
fetches = self.workers.local_worker().learn_on_batch(samples)
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py:742: in learn_on_batch
.learn_on_batch(samples)
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/policy/torch_policy.py:244: in learn_on_batch
self._loss(self, self.model, self.dist_class, train_batch))
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/agents/maml/maml_torch_policy.py:329: in maml_loss
logits, state = model.from_batch(train_batch)
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/models/modelv2.py:224: in from_batch
return self.__call__(input_dict, states, train_batch.get("seq_lens"))
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/models/modelv2.py:181: in __call__
res = self.forward(restored, state or [], seq_lens)
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/models/torch/fcnet.py:118: in forward
self._features = self._hidden_layers(self._last_flat_in)
../../miniconda3/envs/ray/lib/python3.7/site-packages/torch/nn/modules/module.py:550: in __call__
result = self.forward(*input, **kwargs)
../../miniconda3/envs/ray/lib/python3.7/site-packages/torch/nn/modules/container.py:100: in forward
input = module(input)
../../miniconda3/envs/ray/lib/python3.7/site-packages/torch/nn/modules/module.py:550: in __call__
result = self.forward(*input, **kwargs)
../../miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/models/torch/misc.py:110: in forward
return self._model(x)
../../miniconda3/envs/ray/lib/python3.7/site-packages/torch/nn/modules/module.py:550: in __call__
result = self.forward(*input, **kwargs)
../../miniconda3/envs/ray/lib/python3.7/site-packages/torch/nn/modules/container.py:100: in forward
input = module(input)
../../miniconda3/envs/ray/lib/python3.7/site-packages/torch/nn/modules/module.py:550: in __call__
result = self.forward(*input, **kwargs)
../../miniconda3/envs/ray/lib/python3.7/site-packages/torch/nn/modules/linear.py:87: in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat1' in call to _th_addmm
Run this in ray/rlllib:
pytest -v -s agents/maml/tests/test_maml.py on a machine with a GPU.
@michaelzhiluo Looking at this now ...
Most helpful comment
@michaelzhiluo Looking at this now ...