Models: python object_detection/eval.py stuck

Created on 27 Jun 2017 · 15Comments · Source: tensorflow/models

liliangqi@liliangqi-workstation:models$ python object_detection/eval.py --logtostderr --pipeline_config_path='/home/liliangqi/google_research/models/object_detection/models/model/faster_rcnn_resnet101_voc07.config' --checkpoint_dir='/home/liliangqi/google_research/models/object_detection/models/model/train/' --eval_dir='/home/liliangqi/google_research/models/object_detection/models/model/eval'
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
2017-06-26 22:13:55.627264: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-26 22:13:55.627281: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-26 22:13:55.627285: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-26 22:13:55.627289: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-26 22:13:55.627292: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-06-26 22:13:55.749328: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-06-26 22:13:55.749717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:01:00.0
Total memory: 11.90GiB
Free memory: 11.51GiB
2017-06-26 22:13:55.749742: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-06-26 22:13:55.749747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-06-26 22:13:55.749753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0)
INFO:tensorflow:Restoring parameters from /home/liliangqi/google_research/models/object_detection/models/model/train/model.ckpt-140000
INFO:tensorflow:Restoring parameters from /home/liliangqi/google_research/models/object_detection/models/model/train/model.ckpt-140000
WARNING:root:The following classes have no ground truth examples: 0
/home/liliangqi/google_research/models/object_detection/utils/metrics.py:144: RuntimeWarning: invalid value encountered in true_divide
num_images_correctly_detected_per_class / num_gt_imgs_per_class)

support

Source

habor777

👍7

Most helpful comment

I also encounter this issue. I think that this is not stuck. Look at eval_util.py -> repeated_checkpoint_run(..) function's one of arguments named 'max_number_of_evaluations'. This argument's default value is 'None'. If max_number_of_evaluations values is None, the evaluation continues indefinitely.
So, you should add eval_config.max_evals value. For example, my eval config.max_evals is 1.
If you finish evaluation, you can confirm result by tensorboard.
(I can not speak English very well.... sorry.)

tlsgb456 on 7 Jul 2017

👍9

All 15 comments

Please provide details about what platform you are using (operating system, architecture). Also include your TensorFlow version. Also, did you compile from source or install a binary? Make sure you also include the exact command if possible to produce the output included in your test case. If you are unclear what to include see the issue template displayed in the Github new issue template.

We ask for this in the issue submission template, because it is really difficult to help without that information. Thanks!

ali01 on 30 Jun 2017

I also have this question when eval the trained model. And operating system ubuntu 14.04 TF version 1.0, i install TF by Anaconda. But i can train the voc2007 dataset with train.py.

briantse100 on 5 Jul 2017

👍1

tlsgb456 on 7 Jul 2017

👍9

@tlsgb456 Thank you very much!

habor777 on 10 Jul 2017

@gdelab Thanks

habor777 on 10 Jul 2017

It looks as if this is resolved, so I am closing for now, but please reopen with more details (per the issue template) if not thanks!

michaelisard on 10 Jul 2017

I had the same issues.

changing max_number_of_evaluations=None to 1 doesnt solve the problem

After tracing, the part "def run_checkpoint_once( " has problems because everthing previous to this function is still working for the first iteration.

chakpongchung on 11 Jul 2017

@chakpongchung How about edit .config file? My config file's eval_config part is
eval_config: {
num_examples: 4952
max_evals: 1 # I add this line
}

tlsgb456 on 12 Jul 2017

@tlsgb456 this works!

but I still have this one:

WARNING:root:The following classes have no ground truth examples: 0
/home/liliangqi/google_research/models/object_detection/utils/metrics.py:144: RuntimeWarning: invalid value encountered in true_divide
num_images_correctly_detected_per_class / num_gt_imgs_per_class)

chakpongchung on 12 Jul 2017

@chakpongchung Sorry... I don't know yet why that warning is occured.. :(

tlsgb456 on 12 Jul 2017

@tlsgb456 @chakpongchung That warning occured because there is no example of the class 0 in your evaluation set. Usually the class 0 is '__none of above__' which you can check in your label_map.pbtxt, so there's nothing to worry about.

bclyc on 13 Jul 2017

👍1

I also add the max_evals : 1 , but now I am getting the error : "local variable 'metrics' referenced before assignment". I have checked the code eval_util.py where the error comes from (line 393) but I still couldn't understand the reason behind.