Hi
There are 2 top level directories: Slim and Inception. They contain some of the same functionality but implemented differently.
I've noticed that the Inception section has a very detailed and working tutorial on training Inception v3 on multi GPU or distributed machines, however only has Inception v3 (plus CIFAR and some other nets).
Whereas the Slim section has Inception v1, 2, 3 and 4, but no code to do advanced training such as multi GPUs. I've been trying to reconcile the two as I need to train GoogLeNet aka Inception v1 on a multi GPU setup, so the multi GPU examples under Inception are useful.
Has anyone else tried this?
Is there a reason why the two top level directories of this repository are separate? Can we combine them and bring everything under the umbrella of Slim so that the multi GPU functionality can be used on all the Slim models?
The models repository contains contributions from different people and teams, and there is no requirement to keep the models unique. However, @sguada: are there plans to reorganize this?
OK thank you for getting back to me! If I get a nice multi-GPU version of the flowers demo in Slim working I will submit a pull request so at least that part is complete.
OK I've added a multi GPU demo as a PR: https://github.com/tensorflow/models/pull/665
I've run the inception_v3 with the dataset flowers on Slim, using 1, 2, 3 and 4 GPU(s) respectively. However, the training speed seems to be the same. During my experiment, I just modified the num_clones and the specified GPU(s) worked normally. I am wondering if I had missed some important parameters and do you gain any speedup as the number of GPU(s) increased.
Is this problem solved?
Yes, I think so as TF has moved on since I opened the issue and we are now in Release 1.0. Thanks for your help.
Most helpful comment
I've run the inception_v3 with the dataset flowers on Slim, using 1, 2, 3 and 4 GPU(s) respectively. However, the training speed seems to be the same. During my experiment, I just modified the num_clones and the specified GPU(s) worked normally. I am wondering if I had missed some important parameters and do you gain any speedup as the number of GPU(s) increased.