I got the follow result after running:
PATH_TO_YOUR_PIPELINE_CONFIG="object_detection/samples/configs/faster_rcnn_resnet101_voc07.config"
PATH_TO_TRAIN_DIR="out"
PATH_TO_EVAL_DIR="out"
GPU_ID=1
CUDA_VISIBLE_DEVICES=${GPU_ID} python object_detection/eval.py --debug\
--logtostderr \
--pipeline_config_path=${PATH_TO_YOUR_PIPELINE_CONFIG} \
--checkpoint_dir=${PATH_TO_TRAIN_DIR} \
--eval_dir=${PATH_TO_EVAL_DIR}
WARNING:root:The following classes have no ground truth examples: [ 0 2 3 8 9 11 13 14 16 17 18]
/local/home/cpchung/software/models/object_detection/utils/metrics.py:145: RuntimeWarning: invalid value encountered in true_divide
num_images_correctly_detected_per_class / num_gt_imgs_per_class)
Where can I find the result of this evalution?
I got same warning:
WARNING:root:The following classes have no ground truth examples: 0
My dataset has only one class, the id is 1 (I noticed in your code that the class id should start from 1).
But I cannot finish evaluation since there is no ground truth examples on class 0. According to your updated label_map.pbtxt, there shouldn't be any class id start with 0.
Then the evaluation stuck on:
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
2017-07-12 13:55:33.344616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:06:00.0)
INFO:tensorflow:Restoring parameters from /scratch2/wangxiny2/workspace/models/object_detection/train_car_Jul_12_2/model.ckpt-5351
INFO:tensorflow:Restoring parameters from /scratch2/wangxiny2/workspace/models/object_detection/train_car_Jul_12_2/model.ckpt-5351
When I hit ctrl+c, I noticed the warning:
WARNING:root:The following classes have no ground truth examples: 0
Traceback (most recent call last):
File "eval.py", line 162, in <module>
tf.app.run()
File "/scratch2/wangxiny2/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "eval.py", line 158, in main
FLAGS.checkpoint_dir, FLAGS.eval_dir)
File "/scratch2/wangxiny2/workspace/models/object_detection/evaluator.py", line 211, in evaluate
save_graph_dir=(eval_dir if eval_config.save_graph else ''))
File "/scratch2/wangxiny2/workspace/models/object_detection/eval_util.py", line 515, in repeated_checkpoint_run
keys_to_exclude_from_results)
File "/scratch2/wangxiny2/workspace/models/object_detection/eval_util.py", line 393, in run_checkpoint_once
tensor_dict, sess, batch, counters, update_op)
File "/scratch2/wangxiny2/workspace/models/object_detection/evaluator.py", line 160, in _process_batch
(result_dict, _) = sess.run([tensor_dict, update_op])
File "/scratch2/wangxiny2/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/scratch2/wangxiny2/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/scratch2/wangxiny2/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/scratch2/wangxiny2/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
return fn(*args)
File "/scratch2/wangxiny2/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
status, run_metadata)
KeyboardInterrupt
I could use tensorboard to load evaluation results after I shut down the program.
My label map is:
item {
id: 1
name: 'car'
}
I had the same problem:
WARNING:root:The following classes have no ground truth examples: [ 5 10 11 12 13 14 15 16 17 19]
Found solution to my problem. In the default config file, eval_config.num_examples=2000
and eval_input_reader.shuffle=false
. So if your validation set pic num>2000 and sorted by class, the classes after 2000pic wont be validated. The solution is to change eval_config.num_examples
to the size of your val set or to change eval_input_reader.shuffle
to true.
I am having the same issue as @protossw512
WARNING:root:The following classes have no ground truth examples: 0
/usr/local/lib/python3.5/dist-packages/tensorflow/models/object_detection/utils/metrics.py:145: RuntimeWarning: invalid value encountered in true_divide
num_images_correctly_detected_per_class / num_gt_imgs_per_class)
The problem @bclyc encountered is not the same since 0 is not listed in your warning. Therefore, your solution seems to work for your case, however it is not valid for our case because we actually don't even have any class with id 0 since documentation mentions that ids should start with 1.
@protossw512, evaluation process can be ended by adding max_evals
parameter in eval_config
. So it is not the warning we encounter that causes termination issue.
I ignored the warning in the beginning but it turned out that eval script stops doing the job after couple images. @bclyc could you please give more details about your label_map.pbtxt file? I am guessing that might help us see the problem.
@FurkanKyo
Got the same problem. I only have 2 classes: background which refers to id 0 and text refers to id 1. To be noticed that id 0 should not be listed in the pbtxt file.
After the warning presents, the process seems to be stuck.
I think I found the solution for my trouble. I have to run train.py and eval.py at the same time cause eval.py will wait new checkpoint for evaluation. I could use tensorboard to monitor the process.
Hi @deatherving
How to run train.py and eval.py at the same time. I have 8 GB GPU memory, and when I run them at the same time it fails because there is no enough memory, train.py takes all the memory
@Abduoit
You can set gpu_options when create the session, so that the programme takes only a part of memory:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
thanks @bclyc
How to set the gpu_options ?? should I modify the trainer.py
file how?
@Abduoit I added those lines to the train.py
file. The first 2 lines in main...
def main(_):
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
assert FLAGS.train_dir, '`train_dir` is missing.'
Thx @ncaadam
Now, I am running train.py and eval.py simultaneously. However, I am still getting this Warning, I was thinking that I can avoid it by running them at the same time.
WARNING:root:The following classes have no ground truth examples: 0
@Abduoit This warning is fine. It means that your training set has no samples of class 0 as you referred it in the pbtxt file. I think you are not running current version of object detection. Update the git repository and you'll find that the code has forbidden class 0 (usually class 0 is the background id in the old version).
@deatherving
I did train.py and eval.py, the total-loss and mAP are look good. However, when I test the output file (after convert it to .pb
) in object_detection_tutorial.ipynb
, I cant detect any thing. I think I am mistaken somewhere in generating TFRecords. I am still confused about /models/object_detection/VOCdevkit/VOC2012/ImageSets/Main
. How to deal with the files inside the /Main
directory. ???
@Abduoit are your mAP graphs converging in all of your classes? Don't just rely on the decreasing of total loss and increasing of mAP. Confirm that all of your classes' mAP have _taken off_ and are converging.
Regarding,
/models/object_detection/VOCdevkit/VOC2012/ImageSets/Main
I think I explained it in my answer and even linked the Official Pascal VOC doc. Refer to it to understand what the different datasets i.e. _train_, _val_, _train_val_ mean.
If your question was how does Tensorflow use it, Jonathan answer my question here a while ago.
During and after your training, you don't really need to do anything to that directory. After exporting the model, you're supposed to feed it with data it has never seen before to gauge its accuracy.
@eshirima thax for your patience
I realized that I am mistaken in generating TFRecords files, I will explain what I did,
I placed my images and annotations, and I prepared the files in directory /Main
to be like this raccoon_train.txt
, raccoon_val.txt
, and raccoon_trainval.txt
. All my own images contain the classname that I want to detect which is raccoon
. Therefore, the file raccoon_train.txt
should be something like this:-
raccoon-1 1
raccoon-2 1
raccoon-3 1
.
.
raccoon-200 1
In addition, I edited the example path. I changed the aeroplane_
to my class name which is raccoon_
Based on this link I arranged my data structure which is as follows
1.models
1.1 model
1.1.1 ssd_mobilenet_v1_pets.config
1.1.2 train
1.1.3 evaluation
1.1.4 ssd_mobilenet_v1_coco_11_06_2017/model.ckpt
1.2 object_detection
1.2.1 data (pascal_train.record, pascal_val.record, and pascal_label_map.pbtxt)
1.2.2 VOCdevkit
1.2.2.1 VOC2012
1.2.2.1.1 JPEGImages (my own images)
1.2.2.1.2 Annotations (my own annotation)
1.2.2.1.3 ImageSets
1.2.2.1.3.1 Main (raccoon_train.txt,raccoon_val.txt,raccoon_trainval.txt)
When I want to generate TFRecords files based on this link I get the following errors, please tell me where I am mistaken ??. Do I have to specify the group of train images in file raccoon_train.txt
, and give different group of evaluation images in file raccoon_val.txt
and the summation of all the images (train group + evaluation group) in file raccoon_trainval.txt
??
(abdu-py2) jesse@jesse-System-Product-Name:~/abdu-py2/models$ python object_detection/create_pascal_tf_record.py --label_map_path=object_detection/data/pascal_label_map.pbtxt --data_dir=object_detection/VOCdevkit --year=VOC2012 --set=train --output_path=object_detection/data/pascal_train.record
/home/jesse/abdu-py2/models/object_detection/utils/dataset_util.py:75: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
if not xml:
Traceback (most recent call last):
File "object_detection/create_pascal_tf_record.py", line 183, in <module>
tf.app.run()
File "/home/jesse/abdu-py2/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/create_pascal_tf_record.py", line 176, in main
FLAGS.ignore_difficult_instances)
File "object_detection/create_pascal_tf_record.py", line 87, in dict_to_tf_example
encoded_jpg = fid.read()
File "/home/jesse/abdu-py2/local/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 118, in read
self._preread_check()
File "/home/jesse/abdu-py2/local/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 78, in _preread_check
compat.as_bytes(self.__name), 1024 * 512, status)
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/home/jesse/abdu-py2/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: object_detection/VOCdevkit/images/JPEGImages/raccoon-1.png
@Abduoit The error is telling you that it can't find the image _raccoon-1.png_ in the path _object_detection/VOCdevkit/images/JPEGImages/_.
According to your structure, your images are inside _object_detection/VOCdevkit/VOC2012/JPEGImages/_.
See the problem? :wink:
@eshirima
In order to check the image's path, I used vim classname.xml
and I found out that the file .xml
has wrong path, although I placed the annotations and images in correct directory. For some reasons the files .xml
that I used from this link have wrong paths. Any way, I re-annotate the images again with correct path (I used my own annotations), and I successfully re-generated TFRecords files
, now I am running train.py
and eval.py
and everything looks fine. I will try later to train my model with multiple classes, I think I will be back with new questions.
I appreciated your time and kind help, @eshirima
I'm also having a problem. My eval.py gives me the "following classes have no ground truth" message, and further, when I look on tensorboard I see no bounding boxes at all. It'll have an "Images" tab where it states "image-9", then has a slider for the steps, but as I move the slider it's just displaying different images, and no bounding boxes are displayed. Does anyone know what is going on?
same error -
I am using faster_rcnn_inception_resnet_v2_atrous_coco model I modified configuration file only path and no of class here is my configuration file`
model {
faster_rcnn {
num_classes: 3
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_inception_resnet_v2'
first_stage_features_stride: 8
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 8
width_stride: 8
}
}
first_stage_atrous_rate: 2
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 17
maxpool_kernel_size: 1
maxpool_stride: 1
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
#schedule {
#step: 900000
#learning_rate: .00003
#}
#schedule {
#step: 1200000
# learning_rate: .000003
#}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "faster_rcnn_inception_resnet_v2_atrous_coco_2017_11_08/model.ckpt"
from_detection_checkpoint: true
#num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "data/train.record"
}
label_map_path: "data/object-detection.pbtxt"
}
eval_config: {
num_examples: 10
max_evals: 1
}
eval_input_reader: {
tf_record_input_reader {
input_path: "data/test.record"
}
label_map_path: "data/object-detection.pbtxt"
shuffle: true
num_readers: 1
num_epochs: 1
}
Here is my Object_detection.pbtxt file
item {
id: 1
name: 'one'
}
item {
id: 2
name: 'two'
}
item {
id: 3
name: 'three'
}
I am using centos 7 and tensorflow version 1.2.1 I made changes in train.py added two lines
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))`
I am running train.py and eval.py parallely getting error WARNING:root:The following classes have no ground truth examples: 0 after that it terminates. I am stuck looking for solution.
Error INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
2017-11-20 14:53:04.346351: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-20 14:53:04.346405: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-20 14:53:04.346422: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-11-20 14:53:04.692485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:02:00.0
Total memory: 7.92GiB
Free memory: 1.24GiB
2017-11-20 14:53:04.971799: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x9336b80 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-11-20 14:53:04.972495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:83:00.0
Total memory: 7.92GiB
Free memory: 1.36GiB
2017-11-20 14:53:04.972746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 1
2017-11-20 14:53:04.972801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 0
2017-11-20 14:53:04.972841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1
2017-11-20 14:53:04.972875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y N
2017-11-20 14:53:04.972890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1: N Y
2017-11-20 14:53:04.972924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0)
2017-11-20 14:53:04.972943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:83:00.0)
INFO:tensorflow:Restoring parameters from training/model.ckpt-7625
INFO:tensorflow:Restoring parameters from training/model.ckpt-7625
2017-11-20 14:53:11.174053: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: FIFOQueue '_2_parallel_read/common_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: parallel_read/common_queue_Dequeue = QueueDequeueV2[component_types=[DT_STRING, DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](parallel_read/common_queue)]]
2017-11-20 14:53:11.174148: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: FIFOQueue '_2_parallel_read/common_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: parallel_read/common_queue_Dequeue = QueueDequeueV2[component_types=[DT_STRING, DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](parallel_read/common_queue)]]
2017-11-20 14:53:11.877137: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 14:53:12.453296: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1011.45MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 14:53:12.833347: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.33GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 14:53:13.993989: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 762.01MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 14:53:13.994115: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.22GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 14:53:13.994158: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 985.90MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 14:53:14.183787: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 696.98MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 14:53:14.207104: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 836.34MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 14:54:00.632728: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: FIFOQueue '_3_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: prefetch_queue_Dequeue = QueueDequeueV2[component_types=[DT_BOOL, DT_FLOAT, DT_UINT8, DT_BOOL, DT_INT64, DT_FLOAT, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](prefetch_queue)]]
2017-11-20 14:54:00.632887: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: FIFOQueue '_3_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: prefetch_queue_Dequeue = QueueDequeueV2[component_types=[DT_BOOL, DT_FLOAT, DT_UINT8, DT_BOOL, DT_INT64, DT_FLOAT, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](prefetch_queue)]]
2017-11-20 14:54:00.632940: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: FIFOQueue '_3_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: prefetch_queue_Dequeue = QueueDequeueV2[component_types=[DT_BOOL, DT_FLOAT, DT_UINT8, DT_BOOL, DT_INT64, DT_FLOAT, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](prefetch_queue)]]
2017-11-20 14:54:00.632988: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: FIFOQueue '_3_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: prefetch_queue_Dequeue = QueueDequeueV2[component_types=[DT_BOOL, DT_FLOAT, DT_UINT8, DT_BOOL, DT_INT64, DT_FLOAT, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](prefetch_queue)]]
2017-11-20 14:54:00.633118: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: FIFOQueue '_3_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: prefetch_queue_Dequeue = QueueDequeueV2[component_types=[DT_BOOL, DT_FLOAT, DT_UINT8, DT_BOOL, DT_INT64, DT_FLOAT, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](prefetch_queue)]]
2017-11-20 14:54:00.647563: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: FIFOQueue '_3_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: prefetch_queue_Dequeue = QueueDequeueV2[component_types=[DT_BOOL, DT_FLOAT, DT_UINT8, DT_BOOL, DT_INT64, DT_FLOAT, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](prefetch_queue)]]
2017-11-20 14:54:00.647839: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: FIFOQueue '_3_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: prefetch_queue_Dequeue = QueueDequeueV2[component_types=[DT_BOOL, DT_FLOAT, DT_UINT8, DT_BOOL, DT_INT64, DT_FLOAT, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](prefetch_queue)]]
2017-11-20 14:54:00.648038: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: FIFOQueue '_3_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: prefetch_queue_Dequeue = QueueDequeueV2[component_types=[DT_BOOL, DT_FLOAT, DT_UINT8, DT_BOOL, DT_INT64, DT_FLOAT, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](prefetch_queue)]]
2017-11-20 14:54:00.648145: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: FIFOQueue '_3_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: prefetch_queue_Dequeue = QueueDequeueV2[component_types=[DT_BOOL, DT_FLOAT, DT_UINT8, DT_BOOL, DT_INT64, DT_FLOAT, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](prefetch_queue)]]
2017-11-20 14:54:00.648225: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: FIFOQueue '_3_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: prefetch_queue_Dequeue = QueueDequeueV2[component_types=[DT_BOOL, DT_FLOAT, DT_UINT8, DT_BOOL, DT_INT64, DT_FLOAT, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](prefetch_queue)]]
WARNING:root:The following classes have no ground truth examples: 0
[renuka@testlab-gpgpu object_detection]$ python eval.py --logtostderr --pipeline_config_path=training/faster_rcnn_inception_resnet_v2_atrous_coco.config --checkpoint_dir=training --eval_dir=eval_log
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
2017-11-20 15:04:33.209218: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-20 15:04:33.209279: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-20 15:04:33.209320: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-11-20 15:04:33.507031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:02:00.0
Total memory: 7.92GiB
Free memory: 1.24GiB
2017-11-20 15:04:33.760744: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x15149560 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-11-20 15:04:33.761509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:83:00.0
Total memory: 7.92GiB
Free memory: 1.36GiB
2017-11-20 15:04:33.761819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 1
2017-11-20 15:04:33.761885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 0
2017-11-20 15:04:33.761934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1
2017-11-20 15:04:33.761954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y N
2017-11-20 15:04:33.761969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1: N Y
2017-11-20 15:04:33.762030: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0)
2017-11-20 15:04:33.762076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:83:00.0)
INFO:tensorflow:Restoring parameters from training/model.ckpt-8096
INFO:tensorflow:Restoring parameters from training/model.ckpt-8096
2017-11-20 15:04:40.731353: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: RandomShuffleQueue '_2_parallel_read/common_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: parallel_read/common_queue_Dequeue = QueueDequeueV2[component_types=[DT_STRING, DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](parallel_read/common_queue)]]
2017-11-20 15:04:41.359647: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 15:04:41.715801: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1011.45MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 15:04:42.099301: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.33GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 15:04:43.177028: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 762.01MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 15:04:43.177129: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.22GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 15:04:43.177188: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 985.90MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 15:04:43.353473: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 696.98MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-11-20 15:04:43.376426: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 836.34MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
WARNING:root:The following classes have no ground truth examples: 0
I've since realized I was using Tensorflow 1.3. Updating to 1.4 (with cuda v8.0 on Windows 10) and recompiling protos seems to have fixed the issue for me.
I had the same error as described in the initial post. My error was that I misspelled some class names in the ..._label_map.pbtxt. It seems that the code is case sensitve when mapping the labels with the labels_map and that it does not show errors or at least warnings when it cannot map a label. In my case it wasn't able to map ANY label but just kept on trying to learn them without telling me. I hope this helps some of you guys.
Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.
Most helpful comment
I had the same problem:
WARNING:root:The following classes have no ground truth examples: [ 5 10 11 12 13 14 15 16 17 19]
Found solution to my problem. In the default config file,
eval_config.num_examples=2000
andeval_input_reader.shuffle=false
. So if your validation set pic num>2000 and sorted by class, the classes after 2000pic wont be validated. The solution is to changeeval_config.num_examples
to the size of your val set or to changeeval_input_reader.shuffle
to true.