Models: Error in run eval.py WARNING:root:The following classes have no ground truth examples: 0

Created on 20 Jun 2017  路  36Comments  路  Source: tensorflow/models

when I running the tensorflow object detection API locally just as https://github.com/tensorflow/models/blob/9c17823e147ff2893427b47cb57d171da9350d20/object_detection/g3doc/running_locally.md suggest, it goes well when I run

$ python object_detection/train.py -logtostderr --pipeline_config_path=object_detection/mymodels/model/faster_rcnn_resnet101_voc07.config --train_dir=object_detection/mymodels/model/train/

and it can train correctly, but when I try to eval,and run

python object_detection/eval.py --logtostderr --pipeline_config_path=object_detection/mymodels/model/faster_rcnn_resnet101_voc07.config --checkpoint_dir=object_detection/mymodels/model/train/ --eval_dir=object_detection/mymodels/model/eval/

it show:
WARNING:root:The following classes have no ground truth examples: 0
/home/yanliang/.conda/envs/tensorflow/models/object_detection/utils/metrics.py:144: RuntimeWarning: invalid value encountered in true_divide
num_images_correctly_detected_per_class / num_gt_imgs_per_class)
^CTraceback (most recent call last):
File "object_detection/eval.py", line 162, in
tf.app.run()
File "/home/yanliang/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/eval.py", line 158, in main
FLAGS.checkpoint_dir, FLAGS.eval_dir)
File "/home/yanliang/.conda/envs/tensorflow/models/object_detection/evaluator.py", line 211, in evaluate
save_graph_dir=(eval_dir if eval_config.save_graph else ''))
File "/home/yanliang/.conda/envs/tensorflow/models/object_detection/eval_util.py", line 524, in repeated_checkpoint_run
time.sleep(time_to_next_eval)
KeyboardInterrupt

The dataset I use is pascal_voc_2012, I follow the tutorial as well.
+data
-pascal_label_map.pbtxt
-pascal_train.record
-pascal_voc.record
+models

  • model
    -faster_rcnn_resnet101_voc07.config
    +train
    +eval

Are there any body give me some suggest? thanks!

Most helpful comment

@YanLiang0813 You can ignore the error. The class at index 0 is 'none_of_the_above' for both PASCAL and pet datasets and is a placeholder index. The TFRecords will contain no instances of this placeholder class.

All 36 comments

I have the same issue.

@ahmetkucuk did your training works well? This is partial of my training log:
INFO:tensorflow:Restoring parameters from /home/yanliang/.conda/envs/tensorflow/models/object_detection/mymodels/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path object_detection/mymodels/model/train/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:global step 1: loss = 4.3562 (6.369 sec/step)
2017-06-20 10:50:49.153778: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2383 get requests, put_count=1971 evicted_count=1000 eviction_rate=0.507357 and unsatisfied allocation rate=0.634494
2017-06-20 10:50:49.153983: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:global step 2: loss = 4.5299 (1.051 sec/step)
INFO:tensorflow:global step 3: loss = 4.3959 (0.363 sec/step)
INFO:tensorflow:global step 4: loss = 5.5421 (0.799 sec/step)
INFO:tensorflow:global step 5: loss = 3.9413 (1.042 sec/step)
INFO:tensorflow:global step 6: loss = 3.6625 (0.354 sec/step)
INFO:tensorflow:global step 7: loss = 3.6821 (0.364 sec/step)
INFO:tensorflow:global step 8: loss = 3.4374 (0.355 sec/step)
INFO:tensorflow:global step 9: loss = 3.3901 (0.359 sec/step)
INFO:tensorflow:global step 10: loss = 3.1503 (1.024 sec/step)
INFO:tensorflow:global step 11: loss = 3.2978 (0.360 sec/step)
INFO:tensorflow:global step 12: loss = 2.8448 (1.055 sec/step)
INFO:tensorflow:global step 13: loss = 3.2599 (0.470 sec/step)
INFO:tensorflow:global step 14: loss = 2.5151 (0.359 sec/step)
INFO:tensorflow:global step 15: loss = 2.2614 (0.358 sec/step)
INFO:tensorflow:global step 16: loss = 2.2486 (0.355 sec/step)
INFO:tensorflow:global step 17: loss = 2.2398 (0.810 sec/step)
2017-06-20 10:50:58.253875: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2110 get requests, put_count=2065 evicted_count=1000 eviction_rate=0.484262 and unsatisfied allocation rate=0.506161
2017-06-20 10:50:58.253938: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
INFO:tensorflow:global step 18: loss = 2.1277 (0.360 sec/step)
INFO:tensorflow:global step 19: loss = 2.9921 (0.349 sec/step)
INFO:tensorflow:global step 20: loss = 2.0339 (0.353 sec/step)
INFO:tensorflow:global step 21: loss = 2.6191 (0.347 sec/step)
INFO:tensorflow:global step 22: loss = 3.0585 (0.359 sec/step)
INFO:tensorflow:global step 23: loss = 1.1144 (0.976 sec/step)
INFO:tensorflow:global step 24: loss = 1.7001 (0.382 sec/step)
INFO:tensorflow:global step 25: loss = 1.3169 (0.347 sec/step)
INFO:tensorflow:global step 26: loss = 1.2461 (0.368 sec/step)
INFO:tensorflow:global step 27: loss = 1.9536 (0.370 sec/step)
INFO:tensorflow:global step 28: loss = 1.7631 (0.376 sec/step)
INFO:tensorflow:global step 29: loss = 2.2164 (1.042 sec/step)
INFO:tensorflow:global step 30: loss = 0.9388 (0.353 sec/step)
INFO:tensorflow:global step 31: loss = 2.1595 (0.362 sec/step)
INFO:tensorflow:global step 32: loss = 1.9991 (0.352 sec/step)
INFO:tensorflow:global step 33: loss = 2.1409 (0.365 sec/step)
INFO:tensorflow:global step 34: loss = 3.0498 (0.361 sec/step)
INFO:tensorflow:global step 35: loss = 1.7767 (0.355 sec/step)
INFO:tensorflow:global step 36: loss = 1.3106 (0.354 sec/step)
INFO:tensorflow:global step 37: loss = 1.3067 (0.357 sec/step)
INFO:tensorflow:global step 38: loss = 4.0444 (0.785 sec/step)
INFO:tensorflow:global step 39: loss = 1.9622 (1.082 sec/step)
INFO:tensorflow:global step 40: loss = 2.8836 (1.094 sec/step)
INFO:tensorflow:global step 41: loss = 2.6982 (0.382 sec/step)
INFO:tensorflow:global step 42: loss = 1.6046 (0.359 sec/step)
INFO:tensorflow:global step 43: loss = 1.1759 (1.070 sec/step)
INFO:tensorflow:global step 44: loss = 0.9371 (0.377 sec/step)
INFO:tensorflow:global step 45: loss = 1.4666 (0.377 sec/step)
INFO:tensorflow:global step 46: loss = 2.4793 (1.080 sec/step)
INFO:tensorflow:global step 47: loss = 2.8852 (0.379 sec/step)
INFO:tensorflow:global step 48: loss = 1.8985 (0.380 sec/step)
INFO:tensorflow:global step 49: loss = 1.8162 (0.638 sec/step)
INFO:tensorflow:global step 50: loss = 0.9691 (0.357 sec/step)
INFO:tensorflow:global step 51: loss = 1.2954 (0.437 sec/step)
INFO:tensorflow:global step 52: loss = 2.8442 (0.644 sec/step)

@YanLiang0813 Yes, the total loss decreases gradually in my case as well.

Having the same issue as well!

@sguada I really need your help, could'd you give some suggestion on how to solve this problem? Thanks!!!

@YanLiang0813 You can ignore the error. The class at index 0 is 'none_of_the_above' for both PASCAL and pet datasets and is a placeholder index. The TFRecords will contain no instances of this placeholder class.

@derekjchow how to ignore the error, I comment the lines in object_detection_evaluation.py https://github.com/tensorflow/models/blob/a4944a57ad2811e1f6a7a87589a9fc8a776e8d3c/object_detection/utils/object_detection_evaluation.py#L197

if (self.num_gt_instances_per_class == 0).any():
  logging.warn(
      'The following classes have no ground truth examples: %s',
      np.squeeze(np.argwhere(self.num_gt_instances_per_class == 0)))

but it doesn't work, the error still exist:

/home/yanliang/.conda/envs/tensorflow/models/object_detection/utils/metrics.py:144: RuntimeWarning: invalid value encountered in true_divide
num_images_correctly_detected_per_class / num_gt_imgs_per_class)

could you give me some suggestion, how can i ignore the error? And are there any one solved it ?

@jaydee713 did you solve this problem?

@YanLiang0813 I didn't, decided I would just ignore it since it is just a warning :P doesn't seem to have caused me any problems yet...

@jaydee713 Yes, I now know it, we just ignore it and run train.py and eval.py concurrently, so we can see the precision on tensorboard

@YanLiang0813 but after this warning, the eval.py seems hanging. or it just takes long time??

        It just take a long time,you can open tensorboard to monitor the result聽鍙戣嚜缃戞槗閭澶у笀
        On 06/25/2017 10:48, kwyuan wrote:@YanLiang0813 but after this warning, the eval.py seems hanging. or it just takes long time??

鈥擸ou are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/tensorflow/models","title":"tensorflow/models","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/tensorflow/models"}},"updates":{"snippets":[{"icon":"PERSON","message":"@KleinYuan in #1696: @YanLiang0813 but after this warning, the eval.py seems hanging. or it just takes long time??"}],"action":{"name":"View Issue","url":"https://github.com/tensorflow/models/issues/1696#issuecomment-310877379"}}}

Looks like this is resolved. This is just a warning that is safe to ignore. Closing this issue.

I'm getting this same error. I think it crashes it.

@ali01 The eval directory is being populated with new tfrecords up until this warning/error comes up. Maybe reopen the issue?

@alexalemi It's warning and just wait for a it completes. Takes a while. Don't think this will crash the app.

I am encountering the same issue, but mine does not wait but exits after giving traceback. How did you ignore the error(what changes if any)

        I did not change anything just train and eval聽synchronization鍙戣嚜缃戞槗閭澶у笀
        On 07/20/2017 08:17, SriramGS wrote:I am encountering the same issue, but mine does not wait but exits after giving traceback. How did you ignore the error(what changes if any)

鈥擸ou are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/tensorflow/models","title":"tensorflow/models","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/tensorflow/models"}},"updates":{"snippets":[{"icon":"PERSON","message":"@SriramGS in #1696: I am encountering the same issue, but mine does not wait but exits after giving traceback. How did you ignore the error(what changes if any)"}],"action":{"name":"View Issue","url":"https://github.com/tensorflow/models/issues/1696#issuecomment-316552367"}}}

Oh, My run does the training successfully, but when i run eval.py, I get the warning and program quits itself, does not continue. Any idea why.

Can I label objects with the placeholder class 0, and treat these images as true negatives to improve my model?

@SriramGS Did you solve the problem? I have this same issue

        It's not a problem, it just spend a long time,you just wait the result鍙戣嚜缃戞槗閭澶у笀
        On 08/04/2017 18:07, Szymon Klepacz wrote:@SriramGS Did you solve the problem? I have this same issue

鈥擸ou are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/tensorflow/models","title":"tensorflow/models","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/tensorflow/models"}},"updates":{"snippets":[{"icon":"PERSON","message":"@szymonk92 in #1696: @SriramGS Did you solve the problem? I have this same issue "}],"action":{"name":"View Issue","url":"https://github.com/tensorflow/models/issues/1696#issuecomment-320209436"}}}

I made another try with just few iterations it took a minute and I left my computer for 30minutes, nothing happened. I will try again. Thanks!

@szymonk92 I was not able to solve it. I am still looking for a solution. Let me know if you find anything.

I have also received this error. I'm waiting to see if it continues after the message

Some people in this solve the issue by running train.py and eval.py at the same time. I also have tried this suggestion but it fails, cuz there is no enough memory. However, I have 8 GB GPU memory.

I built TensorFlow from source and I still have this same problem. On both computers. I can see the evaluation results (images) after few seconds but terminal is frozen for an hour.

Any ideas? Can I force close the terminal?

I would like to run training and evaluation at this same time, however my computer (GPU 12GB ) doesn't have enough memory to run them simultaneously using Faster RCNN with Inception v2.

@szymonk92

U need to divide your gpu to two parts, 50% for running training and 50% for evaluation.

and don't worry about this warning, see this discussion

Add those lines to the train.py file. The first 2 lines in main...

def main(_):
  gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)  
  sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))  
  assert FLAGS.train_dir, '`train_dir` is missing.'

@Abduoit Thanks for the tip. I tried with 6GB and it seems that I don't have enough memory. I will try again at Monday with 12GB

@szymonk92

even if u tried with 6GB, it should allocate 50% of gpu for train.py and the second 50% will be for eval.py.

plz make sure that u add the following lines correctly in file train.py. the two lines should be after def main(_):

def main(_):
  gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)  
  sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

I have 2 classes in my label_map.pbtxt, yet I get the warning:

The following classes have no ground truth examples: [ 0 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255]

Also, the precision when I evaluate is also always 0 (Precision/[email protected]: 0.000000) after 500k training steps. I couldn't find any solutions so far, so any help would be appreciated. Thanks.

@Abduoit My train.py take 50% but eval.py take almost 100 % of my memory GPU and run out of memory. It is possible to limit the allocation of memory for train.py but how to do it for eval.py ? Thanks.

@YanLiang0813 , what's your GPU ? i can't fine-tune faster_rcn_res101_coco for pascal 2007 with 1080.

I used transfer learning to detect my own dataset using the _ssd_mobilenet_v1_coco_11_06_2017_ model.
I trained my model on Google Cloud using its training job through The cloud shell. My training was successful and I exported the model onto my local machine. I decided to run the evaluation using eval.py on my local machine but the eval.py command stuck after this:
image
I have only 3 classes:
Here's my object-detection.pbtxt file:

 {
  id: 1
  name: 'tree'

  id: 2
  name: 'water body'

  id: 3
  name: 'building'
}

Please help.

Hey, I was able to resolve the error and hence successfully run my model by changing my label pbtxt file (object-detection.pbtxt in my case).
Earlier my file was:

{
  id: 1
  name: 'tree'

  id: 2
  name: 'water body'

  id: 3
  name: 'building'
}

I changed that to:

item {
  id: 1
  name: 'tree'
     }

item {
  id: 2
  name: 'water body'
     }

item {
  id: 3
  name: 'building'
     }

l have the same issue, you need to check your .txt file

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Mostafaghelich picture Mostafaghelich  路  3Comments

sun9700 picture sun9700  路  3Comments

nmfisher picture nmfisher  路  3Comments

dsindex picture dsindex  路  3Comments

XavDCtpz picture XavDCtpz  路  3Comments