Models: Tensorflow logs everything twice while training

Created on 13 Jun 2018 · 9Comments · Source: tensorflow/models

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
macOS Sierra version 10.12.6
TensorFlow installed from (source or binary):
source
TensorFlow version (use command below):
1.8.0
Python version:
3.6.5
Bazel version (if compiling from source):
0.13.0
GCC/Compiler version (if compiling from source):
GCC 4.2.1

Describe the problem

I'm training an object detection model using the new ssdlite_mobilenet_v2_coco_2018_05_09 and it's configuration file ssdlite_mobilenet_v2_coco.config and tensorflow installed from source. When I launch the training tensorflow starts printing the same info twice.

This problem didn't happen while training the same network I'm trying to get, with a different model (checkpoint) ssd_mobilenet_v1_coco_2017_11_17 and the configuration file ssd_mobilenet_v1_pets.config and with tensorflow installed from pip (I tested with version 1.6.0 and 1.8.0)

NOTE : I didn't change the code in both cases and I wonder what's the cause of this.
I'm using CPU only and the command to execute the training is (for both cases) :

python3 train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/name_of_config_file.config

Source code / logs

INFO:tensorflow:global step 3292: loss = 3.2832 (2.960 sec/step)
INFO:tensorflow:global step 3292: loss = 3.2832 (2.960 sec/step)
INFO:tensorflow:global step 3293: loss = 3.5285 (3.675 sec/step)
INFO:tensorflow:global step 3293: loss = 3.5285 (3.675 sec/step)
INFO:tensorflow:global step 3294: loss = 2.3972 (3.564 sec/step)
INFO:tensorflow:global step 3294: loss = 2.3972 (3.564 sec/step)
INFO:tensorflow:Recording summary at step 3294.
INFO:tensorflow:Recording summary at step 3294.
INFO:tensorflow:global_step/sec: 0.294019
INFO:tensorflow:global_step/sec: 0.294019

awaiting response

Source

achraf-boussaada

All 9 comments

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

tensorflowbutler on 13 Jun 2018

The OP specifically stated that "I'm using CPU only" so what's the point with the field "CUDA/cuDNN version" and "GPU model and memory". Similarly "the command to execute the training is (for both cases)" does not count for "Exact command to reproduce"?

I am having the same problem with a slightly modified code (nothing on the core of the training though) in Ubuntu 16.04.

My GPU (if it makes any difference) is Asus Cerberus GTX-1070TI-A8G and TensorFlow 1.8 also. My TensorFlow was installed by binary file and python version is 3.5.2.

eypros on 20 Jun 2018

this problem appear in my project. the fun thing : it is all normal yesterday, only the server exchange ip and room.

HalfLemon01 on 22 Jun 2018

👍1

I have exactly the same problem did anyone find the solution?

Evaggelou on 25 Jul 2018

In my case, I got the same problem when I set the pre-trained model in configs file.

ex) fine_tune_checkpoint:"~/models/research/object_detection/checkpoints/resnet_v1_101.ckpt"

jicheol93 on 26 Jul 2018

you can set logger.propagate = False to prevent logger message propagate to its root handler

huangynn on 29 Aug 2018

@huangynn Hi, since I am new to tensorflow, could you please tell more detail? for example, where to set the logger.propagate? Thanks.

swg209 on 4 Sep 2018

👍1

Open variables_helper.py in models/research/object_detection/utils/variables_helper.py and change import like this：

import re
import tensorflow as tf
from tensorflow import logging as logging
slim = tf.contrib.slim

QuickLearner171998 on 28 Jun 2019

👍1

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

tensorflowbutler on 30 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings