Estimators seem to make data parallelism much easier with the replicate_model_fn and TowerOptimizer decorators. This doesn't seem to be included in the Estimator definitions at model_lib.py.
Could Multi GPU use be clarified (if already present)?
For my present use case, I happen to be modifying the model_lib.py defention with the decorators to accommodate tower cloning. 
_System Information doesn't seem relevant to this, but included nevertheless_
python object_detection/model_main.pyThe recommended API for parallelizing estimators is Distribution Strategies.
https://www.tensorflow.org/versions/master/api_docs/python/tf/contrib/distribute
and for examples in official models you can follow:
https://github.com/tensorflow/models/blob/master/official/utils/misc/distribution_utils.py
Estimators were meant to make scaling much easier than the tf.contrib.distribute api.
Does the former work with estimators too?
And are these (whether decorators or function wrappers ) baked into the object detection api (akin to the previous train script based on slim.training allowing for multiple clones)?
Yes, you simply pass a distribution strategy to tf.estimator.RunConfig() and the estimator handles the rest when passed a config with distribution set. Currently only OneDeviceStrategy (single CPU or GPU) and MirroredStrategy (Multi GPU, single node) are implemented, but more are in development. official/resnet, official/wide_deep, and official/transformer all use this API, so you can check them for details.
It doesn't appear that research/object detection uses DistributionStrategies right now.
Thanks! That's helpful.
Keeping this open however as a feature request. (for the api)
@varun19299 if the changes are simple, would you mind sharing your modifications to support multi-gpu training?
@robieta so does this mean the update to estimator-based object detection effectively removed multi GPU support?
No, it just means that that isn't how they are implementing multi-gpu support.
Hmm when switching to estimator-based training there seems to no longer be options for how to select number of GPUs with the new model_main.py as there was in the past with legacy/train.py
@pkulzc
@varun19299 Distribution strategies that @robieta mentioned currently do not work with models constructed using tf.contrib.slim layers; and all models in the Tensorflow Object Detection API use tf.contrib.slim
We are evaluating changing model construction to be based on tf.layers or tf.keras after which we should be able to support all distribution strategies.
For now we only support
It would be great if these two could be clarified:
tf.contrib.slim has moved it's layer's to tf.contrib.layers and slim.argscope to tf.contrib.framework.argsope. These do work with estimators (I've tried the decorators similar to the mnist example at models/examples). Also, going by issue  #16182 on tensorflow/tensorflow, I thought there were plans in works to shift to these two or better supported APIs. (slim was supposed to be deprecated soon).
Multiworker asynchronous GPU training via model_main.py
could the same be shown as an example. I'm not sure the current model_main supports the clone based data parallelism (not sure which distribution strategy slim uses, but assuming it is asynchronous. Certainly not as vast as what Estimators have) in the current code.
Could you please clarify these? Thanks a lot!
It would be great if these two could be clarified:
@tombstone if I'm not wrong tf.contrib.slim has moved it's layer's to tf.contrib.layers and slim.argscope to tf.contrib.framework.argsope. These do work with estimators (I've tried the decorators similar to the mnist example at models/examples).
You are right. But it is the new distribution strategies in estimators that don't work well with tf.contrib.layers or tf.contrib.slim
Also, going by issue #16182 on tensorflow/tensorflow, I thought there were plans in works to shift to these two or better supported APIs. (slim was supposed to be deprecated soon).
Regarding
Multiworker asynchronous GPU training via model_main.pycould the same be shown as an example. I'm not sure the current model_main supports the clone based data parallelism (not sure which distribution strategy slim uses, but assuming it is asynchronous. Certainly not as vast as what Estimators have) in the current code.
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_pets.md shows an example of running multi worker asynchronous jobs
If you were using the clone mechanism for single worker multi-gpu training before, please continue to use legacy/train.py. It should work.
Could you please clarify these? Thanks a lot!
You are right. But it is the new distribution strategies in estimators that don't work well with tf.contrib.layers or tf.contrib.slim
That's interesting. Any particular reason why? (It would be great if you could explain a bit as to how the backend for these decorators work, that's quite a dark area for me)
I had a partially similar question. Thanks for asking this @varun19299
Hi There,
 We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
 If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.
Most helpful comment
Yes, you simply pass a distribution strategy to
tf.estimator.RunConfig()and the estimator handles the rest when passed a config with distribution set. Currently only OneDeviceStrategy (single CPU or GPU) and MirroredStrategy (Multi GPU, single node) are implemented, but more are in development. official/resnet, official/wide_deep, and official/transformer all use this API, so you can check them for details.It doesn't appear that research/object detection uses DistributionStrategies right now.