URL:https://github.com/tensorflow/models/tree/master/research/object_detection
Training was suspended soon after it began. Is this a bug in Tensorflow1.15.0?
I0608 08:32:28.426066 2940 basic_session_run_hooks.py:262] loss = 1.0046523, step = 140854
INFO:tensorflow:global_step/sec: 1.97309
I0608 08:33:19.126070 2940 basic_session_run_hooks.py:692] global_step/sec: 1.97309
INFO:tensorflow:loss = 1.2582233, step = 140954 (50.702 sec)
I0608 08:33:19.128067 2940 basic_session_run_hooks.py:260] loss = 1.2582233, step = 140954 (50.702 sec)
It seems that the training step starts at 140954 which can happen when you train your network starting from a pre-trained model. Then the training step variable is initialized with the last step of the pre-trained model.
I would suggest setting the training_steps to 200 000 and see if it still happens.
@abenbihi I setting the training_steps is 3000000.
how do you specify the training steps to the python script?
What value shows up if you print your train_steps variable inside your script?
@abenbihi

Do you use model_main.py?
what shows up if you print train_steps after the line train_steps = train_and_eval_dict['train_steps'] in model_main.py?
@abenbihi print:

I don't use checkpoints for recovery training and the results are the same. model never train. But program is runing.


Thank you, these are useful information to better understand your issue.
It seems that the model trains starting from step=0 to step=140854.
So what observation suggests it stops automatically soon after it started?
What do you call the model never trains?
@abenbihi
I trained to 140854 steps because I used Tensorflow 1.14.0
Ha, you mean now the training stops at step=100?
@abenbihi
Yes, If I use Tensorflow 1.15 training stops at step=100
@abenbihi
No matter where you start, 100 steps will stop. (Tensorflow 1.15)
@abenbihi
I'm going to use ssdlite_mobilenet_v3_small_320x320_coco.config. This problem did not occur.
with which config file does the problem occur?
ssdlite_mobilenet_v2_coco.config
I'm going to use ssdlite_mobilenet_v3_small_320x320_coco.config. This problem did not occur.
I'm going to use ssdlite_mobilenet_v3_large_320x320_coco.config. This problem occur.
@dreamitpossible1 I don't know why.
tensorflow-gpu1.15,
@dreamitpossible1 Me too. Maybe you can use Ubuntu to train.
thank you ,I try it