CPU: Intel x86-64 Intel Core i7-7700K @ 4.20GHz x 8, 32GB memory
Exact command to reproduce:
python train.py --logtostderr --train_dir=/media/raul/RD750G/highwai/tfrecords/checkpoint --pipeline_config_path=/media/raul/RD750G/highwai/tfrecords/models/model/highwai_data.config
I run the object_detection train.py script, which I have done many times successfully. However, after a git pull today, I get the error below. I have not changed any code in object_detection at all.
python train.py --logtostderr --train_dir=/media/raul/RD750G/highwai/tfrecords/checkpoint --pipeline_config_path=/media/raul/RD750G/highwai/tfrecords/models/model/highwai_data.config
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
Traceback (most recent call last):
File "train.py", line 163, in
tf.app.run()
File "/home/raul/tensorflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 159, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/raul/tensorflow/models/research/object_detection/trainer.py", line 217, in train
train_config.prefetch_queue_capacity, data_augmentation_options)
File "/home/raul/tensorflow/models/research/object_detection/trainer.py", line 59, in create_input_queue
tensor_dict = create_tensor_dict_fn()
File "/home/raul/tensorflow/models/research/object_detection/builders/input_reader_builder.py", line 72, in build
label_map_proto_file=label_map_proto_file)
File "/home/raul/tensorflow/models/research/object_detection/data_decoders/tf_example_decoder.py", line 128, in __init__
label_handler = slim_example_decoder.BackupHandler(
AttributeError: module 'tensorflow.contrib.slim.python.slim.data.tfexample_decoder' has no attribute 'BackupHandler'
I have the same error. I search
BackupHandler
in tensorflow sourcecode from r1.0 - r1.4. I can not find it.
I aslo run the unittest at research directory:
python object_detection/data_decoders/tf_example_decoder_test.py
there are 12 errors, all the error message is similar, here is the first error message:
======================================================================
ERROR: testDecodeBoundingBox (__main__.TfExampleDecoderTest)
Traceback (most recent call last):
File "object_detection/data_decoders/tf_example_decoder_test.py", line 128, in testDecodeBoundingBox
'image/format': self._BytesFeature('jpeg'),
File "object_detection/data_decoders/tf_example_decoder_test.py", line 57, in _BytesFeature
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
TypeError: 'jpeg' has type str, but expected one of: bytes
I hava the same error too
I run object_detection/train.py today, the error info is follows:
Traceback (most recent call last):
File "object_detection/train.py", line 163, in
tf.app.run()
File "/home/caffe/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 159, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/caffe/models/research/object_detection/trainer.py", line 217, in train
train_config.prefetch_queue_capacity, data_augmentation_options)
File "/home/caffe/models/research/object_detection/trainer.py", line 59, in create_input_queue
tensor_dict = create_tensor_dict_fn()
File "/home/caffe/models/research/object_detection/builders/input_reader_builder.py", line 72, in build
label_map_proto_file=label_map_proto_file)
File "/home/caffe/models/research/object_detection/data_decoders/tf_example_decoder.py", line 128, in __init__
label_handler = slim_example_decoder.BackupHandler(
AttributeError: 'module' object has no attribute 'BackupHandler'
I run this before many times and it runs well. But today I start from a new virtual machine with same cuda8.0, cudnn6.0 and other dependencies, the tensorflow is installed using command "pip install tensorflow-gpu" with version of 1.3.0 under python2.7.12
Since error happened in object_detection/data_decoders/tf_example_decoder.py, line 128: slim_example_decoder = tf.contrib.slim.tfexample_decoder, label_handler = slim_example_decoder.BackupHandler( ..........
I check the tensorflow 1.3.0 sorcecode, https://github.com/tensorflow/tensorflow/blob/r1.3/tensorflow/contrib/slim/python/slim/data/tfexample_decoder.py
This file "tfexample_decoder.py" actually does not have attribute 'BackupHandler', I don't know why object_detection/data_decoders/tf_example_decoder.py has the code 'label_handler = slim_example_decoder.BackupHandler'
I checked all versions of tensorflow from r0.7-r1.4, all of them don't have attribute 'BackupHandler' in tensorflow/contrib/slim/python/slim/data/tfexample_decoder.py. Only in master branch has this attribute.
me too :
root@fc086fe4f615:/notebooks/models/research# python object_detection/train.py --logtostderr /notebooks/katakana --pipeline_config_path=/noteboo
ks/katakana/data/ssd_mobilenet_v1_katakana.config --train_dir=/notebooks/katakana/
Traceback (most recent call last):
File "object_detection/train.py", line 163, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 159, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/notebooks/models/research/object_detection/trainer.py", line 217, in train
train_config.prefetch_queue_capacity, data_augmentation_options)
File "/notebooks/models/research/object_detection/trainer.py", line 59, in create_input_queue
tensor_dict = create_tensor_dict_fn()
File "/notebooks/models/research/object_detection/builders/input_reader_builder.py", line 72, in build
label_map_proto_file=label_map_proto_file)
File "/notebooks/models/research/object_detection/data_decoders/tf_example_decoder.py", line 128, in __init__
label_handler = slim_example_decoder.BackupHandler(
AttributeError: 'module' object has no attribute 'BackupHandler'
I am facing the same issue today. Yesterday I was able to run on the personal machine. But today I am trying on AWS.
My command is
python object_detection/train.py --logtostderr --pipeline_config_path=/home/ubuntu/od/models/model/ssd_mobilenet_v1_pets.config --train_dir=/home/ubuntu/od/models/model/train
I get this error.
Traceback (most recent call last):
File "object_detection/train.py", line 163, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 159, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/ubuntu/models/research/object_detection/trainer.py", line 217, in train
train_config.prefetch_queue_capacity, data_augmentation_options)
File "/home/ubuntu/models/research/object_detection/trainer.py", line 59, in create_input_queue
tensor_dict = create_tensor_dict_fn()
File "/home/ubuntu/models/research/object_detection/builders/input_reader_builder.py", line 72, in build
label_map_proto_file=label_map_proto_file)
File "/home/ubuntu/models/research/object_detection/data_decoders/tf_example_decoder.py", line 128, in __init__
label_handler = slim_example_decoder.BackupHandler(
AttributeError: 'module' object has no attribute 'BackupHandler'
BUT tfexample_decoder.py has the class BackupHandler.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/slim/python/slim/data/tfexample_decoder.py
@chihyuwang When do pip install tensorflow-gpu, in the installed tensorflow code 'BackupHandler' class is itself not their. May be we need build tensorflow from source i guess. I'm trying that now and will here accordingly.
I changed to the previous commit code, now it works.
Temporary solution (at ubuntu 16.04 @ TensorFlow r1.4 ):
1:download new tfexample_decoder.py:
$ cd $home
$ wget https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/slim/python/slim/data/tfexample_decoder.py
2锛歳eplace
$ cd /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/data
$ sudo cp ./tfexample_decoder.py tfexample_decoder-backup.py
$ sudo rm ./tfexample_decoder.py
$ sudo cp /home/wpq/tfexample_decoder.py ./tfexample_decoder.py
ok!
File "/home/wpq/models-master/research/object_detection/utils/variables_helper.py", line 122, in get_variables_available_in_checkpoint
ckpt_reader = tf.train.NewCheckpointReader(checkpoint_path)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 150, in NewCheckpointReader
return CheckpointReader(compat.as_bytes(filepattern), status)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
Error:
tensorflow.python.framework.errors_impl.DataLossError:
Unable to open table file /home/wpq/data/potato/data/model.ckpt.data-00000-of-00001:
Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
help me:
What is the reason锛孒ow to solve?
The Class in the tfexample_decoder.py TFExampleDecoder , while the constructor calling it has a name TfExampleDecoder
Hi all,
I have the same problem. I installed Tensor Flow using pip install tensorflow-gpu
.
Does installing from source or a previous commit solve the problem?
If so how do I do that?
Thanks
The new ones have models/research/object_detection . They are still being updated . I am at least able to train with this repo .
https://github.com/tensorflow/models/tree/0375c800c767db2ef070cee1529d8a50f42d1042
The older version seems to work.
@tombstone I was able to run the latest repo. I installed the tensorflow from the source code of master branch. But the loss looks really bad.. Loss is in 10 digits long. I will attach the screenshot.
wow, BackupHandler
only exists in master
, not even in Tensorflow-1.4
. I know this is research code, but it would be great if we can stick with an official TF release, given that the latest supported TF version on Google Cloud is only 1.2.
To resolve this issue you'll have to update your TensorFlow version. You can install a nightly build here for GPU, or here for CPU.
Any TensorFlow version after October 24th 2017 should work (which is when this method was added).
Hello AlgorithmR: it works! The nightly build works with object_detection/train.py.
HOWEVER, I installed this in a clean ubuntu linux env, and now tensorboard is missing. Is there something different in the nightly builds that requires a separate installation for tensorboard? And if so, do I need to build tensorboard from source?
Thank you!
@radzfoto I use this nightly build and this error is gone, but I get another one, from the message, I can know tensorflow can't restore the variable from checkpoint, I use ssd mobile v1 model which I download from model zoom, which one do you use?
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /home/scott/github/models/research/object_detection/ssd_mobilenet_v1_coco_11_06_2017/model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
hi @scotthuang1989: I use faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017. I re-downloaded it just in case, but it looks to be the same as before. I did not have problems with this frozen proto.
@radzfoto Tensorboard is not included in the nightly builds of TensorFlow. You can install it manually with pip like so: pip install tensorflow-tensorboard.
@radzfoto your TensorFlow version is?
Can you give the contents of the pipeline_config file ?
my faster_rcnn_inception_resnet_v2_atrous_pets.config:
model {
faster_rcnn {
num_classes: 37
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_inception_resnet_v2'
first_stage_features_stride: 8
}
I have the same issue, anyone got solution yet? I was trying to go to previous commits etc. but no avail
@mpeniak You'll have to update TensorFlow using their nightly builds as specified here: https://github.com/tensorflow/models/issues/2653#issuecomment-340885146.
Alternatively you could revert to an earlier version of object_detection, but I'm not entirely sure when this was added. Best guess would be that anything before October 27th is compatible with the main version of TF.
@wpq3142 , my pipeline config is moderately complicated and very specific to my project, so sharing it wouldn't help you. As per the instructions from @AlgorithmR above, I installed the nightly build two nights ago. That solved the problem for me.
Based on AlgorithmR's comment above, I downloaded one of the linux wheel files from today (11/02). But,
pip install tf_nightly_gpu-1.5.0.dev20171102-cp36-cp36m-manylinux1_x86_64.whl
failed with:
tf_nightly_gpu-1.5.0.dev20171102-cp36-cp36m-manylinux1_x86_64.whl is not a supported wheel on this platform.
I am on an x86 machine (not amd) running Ubuntu 16.04. Is this to be done differently?
Oops! Hadn't paid attention to that fact that there are several wheel files for Linux differing very little in their naming. This one worked for Ubuntu:
tf_nightly_gpu-1.5.0.dev20171102-cp27-cp27mu-manylinux1_x86_64.whl
If you need to run object detection on google ML engine where only TF 1.2 is available, I have made a fork of this project and copied those missing handler code over from current master. https://github.com/mrfortynine/models/commit/6b940fb8980cf2ddc944f0cdb0a831dc5ec02bea
https://github.com/tensorflow/models/pull/2692 should fix this.
Hi, what should I need to do to solve this issue on Windows machine.
I ran the command:
python train.py --logtostderr --train_dir=path-to-my-training-directory --pipeline_config_path=path-to-my-ssd_mobilenet_v1_pets.config
I get similar error:
label_handler = slim_example_decoder.BackupHandler(
AttributeError: module 'tensorflow.contrib.slim.python.slim.data.tfexample_decoder' has no attribute 'BackupHandler'
I am running the program in anaconda environment.
Still getting this error with the latest version of the master branch.
Any other cause? Because it seems it was merged the fix to master but still happens here.
I'm getting the same error
$ python train.py --logtostderr --train_dir=./models/train --pipeline_config_path=ssd_mobilenet_v1_coco.config
WARNING:tensorflow:From C:\Users\wyujia\AppData\Local\Continuum\anaconda3\lib\site-packages\object_detection-0.1-py3.6.egg\object_detection\trainer.py:228: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
Traceback (most recent call last):
File "train.py", line 169, in <module>
tf.app.run()
File "C:\Users\wyujia\AppData\Local\Continuum\anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 165, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "C:\Users\wyujia\AppData\Local\Continuum\anaconda3\lib\site-packages\object_detection-0.1-py3.6.egg\object_detection\trainer.py", line 235, in train
train_config.prefetch_queue_capacity, data_augmentation_options)
File "C:\Users\wyujia\AppData\Local\Continuum\anaconda3\lib\site-packages\object_detection-0.1-py3.6.egg\object_detection\trainer.py", line 59, in create_input_queue
tensor_dict = create_tensor_dict_fn()
File "train.py", line 122, in get_next
worker_index=FLAGS.task)).get_next()
File "C:\Users\wyujia\AppData\Local\Continuum\anaconda3\lib\site-packages\object_detection-0.1-py3.6.egg\object_detection\builders\dataset_builder.py", line 140, in build
label_map_proto_file=label_map_proto_file)
File "C:\Users\wyujia\AppData\Local\Continuum\anaconda3\lib\site-packages\object_detection-0.1-py3.6.egg\object_detection\data_decoders\tf_example_decoder.py", line 153, in __init__
label_handler = slim_example_decoder.BackupHandler(
AttributeError: module 'tensorflow.contrib.slim.python.slim.data.tfexample_decoder' has no attribute 'BackupHandler'
Please help. Many thanks in advance.
Seems like the commit https://github.com/tensorflow/models/commit/7a9934df2afdf95be9405b4e9f1f2480d748dc40 broke PR https://github.com/tensorflow/models/pull/2692
which was referenced earlier here by @tombstone and subsequently issue closed by @jch1
Should this issue be re-opened or should a new issue be created?
Reproducible with master of this repo, Google Cloud ML tensorflow 1.4. Compiled with Tensorflow 1.5 locally but failing in Google Cloud ML, which I would assume is stuck with tensorflow 1.4.
Is this essentially because this underlying tensorflow issue was only recently fixed? If so Google Cloud ML folks need to continue to workaround as provided by @mrfortynine ?
@caroline6927 I can also confirm the same issue from my side. I have the latest tensorflow/model repo cloned and I am using Tensorflow 1.4.0 in the installation.
UPDATE: the OS is Ubuntu 16.04
Getting following error on GCP. It works locally. Any ideas ?
The replica master 0 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", line 167, in
In my case there were not any files in /usr/local/lib/python3.5/dist-packages directory.
So the assumption is that tensorflow has not been installed successfully.
I am working within an anaconda environment.so I updated tensorflow.
here is the command that I used.
conda install -c conda-forge tensorflow
problem was solved !
@CHAMOD I am facing problem on google cloud platform. No issues when I run locally.
@yogeshl18 I have not experience yet running it on google cloud platform.
I am also have such kind of error ....
C:\tensorflow1\models\research\object_detection>python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config
Traceback (most recent call last):
File "train.py", line 49, in
from object_detection.builders import dataset_builder
File "C:\Users\irijdmAppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\dataset_builder.py", line 27, in
from object_detection.data_decoders import tf_example_decoder
File "C:\Users\irijdmAppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\data_decoders\tf_example_decoder.py", line 27, in
slim_example_decoder = tf.contrib.slim.tfexample_decoder
AttributeError: module 'tensorflow' has no attribute 'contrib'
command- python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config
problem error - Traceback (most recent call last):
File "train.py", line 51, in
from object_detection.builders import model_builder
File "/usr/local/lib/python2.7/dist-packages/object_detection-0.1-py2.7.egg/object_detection/builders/model_builder.py", line 20, in
from object_detection.builders import anchor_generator_builder
File "/usr/local/lib/python2.7/dist-packages/object_detection-0.1-py2.7.egg/object_detection/builders/anchor_generator_builder.py", line 18, in
from object_detection.anchor_generators import flexible_grid_anchor_generator
File "/usr/local/lib/python2.7/dist-packages/object_detection-0.1-py2.7.egg/object_detection/anchor_generators/flexible_grid_anchor_generator.py", line 19, in
from object_detection.anchor_generators import grid_anchor_generator
File "/usr/local/lib/python2.7/dist-packages/object_detection-0.1-py2.7.egg/object_detection/anchor_generators/grid_anchor_generator.py", line 1
/#Copyright 2017 The TensorFlow Authors. All Rights Reserved.
^
i am getting error can you all please help me with this
Most helpful comment
The new ones have models/research/object_detection . They are still being updated . I am at least able to train with this repo .
https://github.com/tensorflow/models/tree/0375c800c767db2ef070cee1529d8a50f42d1042
The older version seems to work.