Tool looks great! But just wondering how you would go about running an experiment with trials distributed across GPUs (on a single machine). I am looking at the Service API / Developer API pages but cannot see how a client/server or queue structure would work (have not dug through code yet).
I'm after something like Ray to do optimisation.
I think distributed experiments is a pretty important feature so I'm assuming it has to be there somewhere, a tutorial would be great. I'm interested on single host / multi GPU environment but I'm sure multi-host would also be of value to people.
Thanks for raising this issue - we definitely agree that distributed experimentation is a really important feature and there's actually an integration with Ray in-flight. See https://github.com/ray-project/ray/pull/4731 for the initial PR. There are still a couple of tweaks to make the parallelism work, but this should be available soon (hopefully by end of next week).
After this is available, we'll be sure to update the documentation to explicitly mention how to run experiments in a distributed manner.
BTW @gatapia, check out this tutorial notebook for using Ax and Tune. You'll have to add a ray.init line to connect to your Ray cluster. We'll also push this tutorial onto the Ray docs.
RayTune tutorial is now ready and checked in on master.
great tutorial. feel free to close.
Most helpful comment
great tutorial. feel free to close.