Models: im2txt - build_mscoco_data.py : The all files of the patterns train-?????-of-00256, val-?????-of-00004 and test-?????-of-00008 are 0KB

Created on 9 Mar 2017  路  7Comments  路  Source: tensorflow/models

======== RESTART: E:\my_code\im2txt\im2txt\data\build_mscoco_data.py ========
Loaded caption metadata for 82783 images from F:/tmp/annotations/captions_train2014.json
Proccessing captions.
Finished processing 414113 captions for 82783 images in F:/tmp/annotations/captions_train2014.json
Loaded caption metadata for 40504 images from F:/tmp/annotations/captions_val2014.json
Proccessing captions.
Finished processing 202654 captions for 40504 images in F:/tmp/annotations/captions_val2014.json
Creating vocabulary.
Total words: 29415
Words in vocabulary: 11519
Wrote vocabulary file: F:/tmp/txt/word_counts.txt
Launching 8 threads for spacings: [[0, 73296], [73296, 146592], [146592, 219888], [219888, 293184], [293184, 366480], [366480, 439776], [439776, 513072], [513072, 586368]]
Exception in thread Thread-7:
Traceback (most recent call last):
File "F:\Program Files\Python35\lib\threading.py", line 914, in _bootstrap_inner
self.run()
File "F:\Program Files\Python35\lib\threading.py", line 862, in run
self._target(self._args, *self._kwargs)
File "E:\my_code\im2txt\im2txt\data\build_mscoco_data.py", line 278, in _process_image_files
sequence_example = _to_sequence_example(image, decoder, vocab)
File "E:\my_code\im2txt\im2txt\data\build_mscoco_data.py", line 224, in _to_sequence_example
"image/data": _bytes_feature(encoded_image),
File "E:\my_code\im2txt\im2txt\data\build_mscoco_data.py", line 189, in _bytes_feature
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
File "F:\Program Files\Python35\lib\site-packages\google\protobuf\internal\python_message.py", line 522, in init
copy.extend(field_value)
File "F:\Program Files\Python35\lib\site-packages\google\protobuf\internal\containers.py", line 275, in extend
new_values = [self._type_checker.CheckValue(elem) for elem in elem_seq_iter]
File "F:\Program Files\Python35\lib\site-packages\google\protobuf\internal\containers.py", line 275, in
new_values = [self._type_checker.CheckValue(elem) for elem in elem_seq_iter]
File "F:\Program Files\Python35\lib\site-packages\google\protobuf\internal\type_checkers.py", line 108, in CheckValue
raise TypeError(message)
TypeError: 'b\'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00H\x00H\x00\x00\xff\xdb\x00C\x00\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x02\x02\x03\x02\x02\x02\x02\x02\x04\x03\x03\x02\x03\x05\x04\x05\x05\x05\x04\x04\x04\x05\x06\x07\x06\x05\x05\x07\x06\x04\x04\x06\t\x06\x07\x08\x08\x08\x08\x08\x05\x06\t\n\t\x08\n\x07\x08\x08\x08\xff\xdb\x00C\x01\x01\x01\x01\x02\x02\x02\x04\x02\x02\x04\x08\x05\x04\x05\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\xff\xc0\x00\x11\x08\x01\xf4\x01w\x03\x01\x11\x00\x02\x11\x01\x03\x11\x01\xff\xc4\x00\x1f\x00\x00\x01\x04\x02\x03\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x07\x05\x06\x08\t\x03\x04\x00\x02\n\x01\x0b\xff\xc4\x00G\x1 has type , but expected one of: ((,),)

2017-03-09 11:14:46.377473: Finished processing all 586368 image-caption pairs in data set 'train'.

Most helpful comment

@cshallue I solved this problem by removing "str" in following code
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
and changing the _bytes_feature(v) to _bytes_feature(bytes(v,encoding='utf-8') in following code
def _bytes_feature_list(values):
return tf.train.FeatureList(feature=[_bytes_feature(v) for v in values])

All 7 comments

@wuluhui I just ran the script on Ubuntu and it worked for me. Are you using windows?

At a guess, does it work if you change "str" to "bytes" in this line?

@cshallue yes, I use windows. I change "str" to "bytes", but it does not work. I think I should change my operating system.

@cshallue Thank you very much

@cshallue I solved this problem by removing "str" in following code
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
and changing the _bytes_feature(v) to _bytes_feature(bytes(v,encoding='utf-8') in following code
def _bytes_feature_list(values):
return tf.train.FeatureList(feature=[_bytes_feature(v) for v in values])

@wuluhui
Can you offer captions_train2014.json and captions_val2014.json to me?
I see your json code should not be english.
Thank you here now!

@j2233ack Could you give me your email?

Was this page helpful?
0 / 5 - 0 ratings