Ml-agents: Parallel training with multiple machines?

Created on 17 May 2019 · 6Comments · Source: Unity-Technologies/ml-agents

Is it possible to run parallel training on multiple machines? I use computation heavy perception methods and training process is quite slower than example scenes. To speed up training I try to train on multiple unity instances but it didn't help much. Is it possible to distributed training with different machines, if not will be in included future releases?

request

Source

ertugrulerdogan

Most helpful comment

@Taikatou You can try to use rllib(https://ray.readthedocs.io/en/latest/rllib.html) along with the gym wrapper we have for this. Our own parallel training with multiple machines will need more time to come.

xiaomaogy on 19 Jun 2019

👍2

All 6 comments

From their blog post: _Our work doesn’t stop here; we are also working on techniques to train multiple levels concurrently by scaling out training across multiple machines._

roboserg on 18 May 2019

@roboserg You are right, we are still working on this.

xiaomaogy on 20 May 2019

hi @ertugrulerdogan and @roboserg - i've documented this and will update when we make more progress.

unityjeffrey on 30 May 2019

👍1

@unityjeffrey The project I am currently working requires this feature, is there any eta if not is it possible to work on this feature? Thank you so much :D

Taikatou on 19 Jun 2019

xiaomaogy on 19 Jun 2019

👍2

Thank you for submitting this request. We’ve added it to our internal tracker. I’m going to close this issue for now, but we’ll ping back with any updates.

Also note that parallel environments were improved in 0.9, they no longer block each other when training. Give it a go. Also, if environments are the bottleneck, the SAC trainer in v0.10 should help quite a bit even in single machine training.