Models: Object detection API num_clones and must be 1 when using synchronous training

Created on 29 Jun 2018 · 3Comments · Source: tensorflow/models

In the recent update of object detection API, there is modified code lines:
https://github.com/tensorflow/models/blob/2dc6b914b5ce8b98451b48c291ad26c8be3afdc4/research/object_detection/trainer.py#L262-L264

How ever I cant figure out the meaning of this modification, naturally we have multiple workers to sync, and there is multiple clones on each worker.

docs feature

Source

twangnh

👍1

All 3 comments

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

tensorflowbutler on 30 Jun 2018

You may want to look into worker_replicas or replicas_to_aggregate instead. Adding a feature request tag for better documentation between the different terms for distributed training (this could also apply for general tensorflow as well -- documentation for distributed training is a bit lacking now).

k-w-w on 4 Jul 2018

👎2 👍1

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

tensorflowbutler on 30 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings