TensorFlow installed from (source or binary):
pip installed
TensorFlow version (use command below):
1.12
Bazel version (if compiling from source):
CUDA/cuDNN version:
CUDA 9.0
GPU model and memory:
GTX 1060 6 Gb
Exact command to reproduce:
python E:\Documents\Projects\tensorflow\models\research\object_detection\model_main.py --alsologtostderr --pipeline_config_path=experiments/training/ssdlite_mobilenet_v2_coco.config --model_dir=/experiments/training/ --num_train_steps=50000 --NUM_EVAL_STEPS=2000
Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.
model_main doesn't save training checkpoints, I see the status (see below) but I dont see any checkpoints being saved during training, what's going on?
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
I see this but its not saving any checkpoint to this directory
tensorflow:Saving 'checkpoint_path' summary for global step 40469: /experiments/training/model.ckpt-40469
I1125 05:27:49.819430 7500 tf_logging.py:115] Saving 'checkpoint_path' summary for global step 40469: /experiments/training/model.ckpt-40469
I think the problem is with the model_dir=/experiments/training/ flag since you put / before experiments that means you're pointing it to the root dir and not in the project's directory.
try to write model_dir=experiments/training/ instead of model_dir=/experiments/training/
I can't remember exactly but it was giving me a different issue and someone suggested that I use this,
what do you use it like?
It should complain about it at least?
Any other flag that I am unaware of?
I'll check my last run command when I come back home
Thanks please do
@zubairahmed-ai I think you're missing this flag --checkpoint_dir=yourDir because when I compared my command with yours this is the only one missing
@saleem-hadad Thanks Saleem the training has actually started after I did some tests, it turns out I needed to create a export and within that a servo folder in order to let Tensorflow save the checkpoint, dont ask me why its just too weird, I tested it with fewer number of steps and actually it started saving checkpoints too see my comment https://github.com/tensorflow/models/issues/2984#issuecomment-441422918
hhh that's great 馃ぃ All the best dude
Thanks :)
Please, how can I calculate the MAP for these results?
INFO:tensorflow:Restoring parameters from test_image1/model.ckpt-50000
INFO:tensorflow:Restoring parameters from test_image1/model.ckpt-50000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Performing evaluation on 4 images.
INFO:tensorflow:Performing evaluation on 4 images.
creating index...
index created!
INFO:tensorflow:Loading and preparing annotation results...
INFO:tensorflow:Loading and preparing annotation results...
INFO:tensorflow:DONE (t=0.00s)
INFO:tensorflow:DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=0.06s).
Accumulating evaluation results...
DONE (t=0.01s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.211
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.380
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.187
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.003
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.233
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.485
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.067
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.241
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.261
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.020
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.283
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.570
INFO:tensorflow:Finished evaluation at 2019-10-28-14:06:58
INFO:tensorflow:Finished evaluation at 2019-10-28-14:06:58
INFO:tensorflow:Saving dict for global step 50000: DetectionBoxes_Precision/mAP = 0.21072435, DetectionBoxes_Precision/mAP (large) = 0.485231, DetectionBoxes_Precision/mAP (medium) = 0.23299623, DetectionBoxes_Precision/mAP (small) = 0.0032204273, DetectionBoxes_Precision/[email protected] = 0.38036388, DetectionBoxes_Precision/[email protected] = 0.1866721, DetectionBoxes_Recall/AR@1 = 0.06658986, DetectionBoxes_Recall/AR@10 = 0.24055299, DetectionBoxes_Recall/AR@100 = 0.26059908, DetectionBoxes_Recall/AR@100 (large) = 0.57045454, DetectionBoxes_Recall/AR@100 (medium) = 0.2825, DetectionBoxes_Recall/AR@100 (small) = 0.02, Loss/BoxClassifierLoss/classification_loss = 0.5046802, Loss/BoxClassifierLoss/localization_loss = 0.3415953, Loss/RPNLoss/localization_loss = 0.54075974, Loss/RPNLoss/objectness_loss = 0.46926486, Loss/total_loss = 1.8563001, global_step = 50000, learning_rate = 0.0002, loss = 1.8563001
INFO:tensorflow:Saving dict for global step 50000: DetectionBoxes_Precision/mAP = 0.21072435, DetectionBoxes_Precision/mAP (large) = 0.485231, DetectionBoxes_Precision/mAP (medium) = 0.23299623, DetectionBoxes_Precision/mAP (small) = 0.0032204273, DetectionBoxes_Precision/[email protected] = 0.38036388, DetectionBoxes_Precision/[email protected] = 0.1866721, DetectionBoxes_Recall/AR@1 = 0.06658986, DetectionBoxes_Recall/AR@10 = 0.24055299, DetectionBoxes_Recall/AR@100 = 0.26059908, DetectionBoxes_Recall/AR@100 (large) = 0.57045454, DetectionBoxes_Recall/AR@100 (medium) = 0.2825, DetectionBoxes_Recall/AR@100 (small) = 0.02, Loss/BoxClassifierLoss/classification_loss = 0.5046802, Loss/BoxClassifierLoss/localization_loss = 0.3415953, Loss/RPNLoss/localization_loss = 0.54075974, Loss/RPNLoss/objectness_loss = 0.46926486, Loss/total_loss = 1.8563001, global_step = 50000, learning_rate = 0.0002, loss = 1.8563001
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 50000: test_image1/model.ckpt-50000
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 50000: test_image1/model.ckpt-50000
INFO:tensorflow:Performing the final export in the end of training.
INFO:tensorflow:Performing the final export in the end of training.
hello @zubairahmed-ai i am having the same issue but i dont really understand your solution. Can you explain to me what is your solution exactly?
Most helpful comment
@saleem-hadad Thanks Saleem the training has actually started after I did some tests, it turns out I needed to create a
exportand within that aservofolder in order to let Tensorflow save the checkpoint, dont ask me why its just too weird, I tested it with fewer number of steps and actually it started saving checkpoints too see my comment https://github.com/tensorflow/models/issues/2984#issuecomment-441422918