Models: [im2txt] Unable to preprocess mscoco dataset for show and tell

Created on 25 Mar 2017  路  9Comments  路  Source: tensorflow/models

I'm trying to preprocess mscoco dataset for use in im2txt model.
I'm using tensorflow 1.0 for GPU on a GTX 1070 with 16 GB RAM.
Python version: 3.5.3

(tensorflow) timberners@galileo:/media/timberners/magicae/models/im2txt$ bazel-bin/im2txt/download_and_preprocess_mscoco "${MSCOCO_DIR}"
/media/timberners/magicae/models/im2txt
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Loaded caption metadata for 82783 images from /media/timberners/magicae/models/im2txt/data/mscoco/raw-data/annotations/captions_train2014.json
Processing captions.
Finished processing 414113 captions for 82783 images in /media/timberners/magicae/models/im2txt/data/mscoco/raw-data/annotations/captions_train2014.json
Loaded caption metadata for 40504 images from /media/timberners/magicae/models/im2txt/data/mscoco/raw-data/annotations/captions_val2014.json
Processing captions.
Finished processing 202654 captions for 40504 images in /media/timberners/magicae/models/im2txt/data/mscoco/raw-data/annotations/captions_val2014.json
Creating vocabulary.
Total words: 29415
Words in vocabulary: 11519
Wrote vocabulary file: /media/timberners/magicae/models/im2txt/data/mscoco/word_counts.txt
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7465
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.47GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
Launching 8 threads for spacings: [[0, 73296], [73296, 146592], [146592, 219888], [219888, 293184], [293184, 366480], [366480, 439776], [439776, 513072], [513072, 586368]]
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00\\xb4\\x00\\xb4\\x00\\x00\\xff\\xe2\\ has type str, but expected one of: bytes

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xe2\\x0cXICC_ has type str, but expected one of: bytes

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xdb\\x00C\\x0 has type str, but expected one of: bytes

Exception in thread Thread-7:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xdb\\x00C\\x0 has type str, but expected one of: bytes
Exception in thread Thread-6:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xe2\\x0cXICC_ has type str, but expected one of: bytes
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xe2\\x0cTICC_ has type str, but expected one of: bytes


Exception in thread Thread-5:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xe1\\x0b\\x0e has type str, but expected one of: bytes


Exception in thread Thread-8:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x01\\xce\\x01\\xce\\x00\\x00\\xff\\xe2\\ has type str, but expected one of: bytes

2017-03-26 01:57:03.430551: Finished processing all 586368 image-caption pairs in data set 'train'.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
Launching 4 threads for spacings: [[0, 2533], [2533, 5066], [5066, 7599], [7599, 10132]]
Exception in thread Thread-10:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xdb\\x00C\\x0 has type str, but expected one of: bytes

Exception in thread Thread-11:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xdb\\x00C\\x0 has type str, but expected one of: bytes

Exception in thread Thread-9:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xe2\\x0cXICC_ has type str, but expected one of: bytes

Exception in thread Thread-12:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x01,\\x01,\\x00\\x00\\xff\\xdb\\x00C\\x0 has type str, but expected one of: bytes

2017-03-26 01:57:04.812453: Finished processing all 10132 image-caption pairs in data set 'val'.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
Launching 8 threads for spacings: [[0, 2533], [2533, 5066], [5066, 7600], [7600, 10133], [10133, 12666], [12666, 15200], [15200, 17733], [17733, 20267]]
Exception in thread Thread-16:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xfe\\x00\\x0c has type str, but expected one of: bytes

Exception in thread Thread-19:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xdb\\x00C\\x0 has type str, but expected one of: bytes

Exception in thread Thread-13:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xdb\\x00C\\x0 has type str, but expected one of: bytes

Exception in thread Thread-15:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00`\\x00`\\x00\\x00\\xff\\xdb\\x00C\\x0 has type str, but expected one of: bytes

Exception in thread Thread-20:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xe1\\n\\x00XM has type str, but expected one of: bytes
Exception in thread Thread-18:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xe1\\x01\\xc7 has type str, but expected one of: bytes

Exception in thread Thread-14:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x01,\\x01,\\x00\\x00\\xff\\xdb\\x00C\\x0 has type str, but expected one of: bytes

Exception in thread Thread-17:
Traceback (most recent call last):
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/timberners/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 281, in _process_image_files
    sequence_example = _to_sequence_example(image, decoder, vocab)
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 227, in _to_sequence_example
    "image/data": _bytes_feature(encoded_image),
  File "/media/timberners/magicae/models/im2txt/bazel-bin/im2txt/download_and_preprocess_mscoco.runfiles/im2txt/im2txt/data/build_mscoco_data.py", line 192, in _bytes_feature
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))
TypeError: 'b\'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xe2\\x0cXICC_ has type str, but expected one of: bytes


2017-03-26 01:57:05.852854: Finished processing all 20267 image-caption pairs in data set 'test'.

help wanted bug

Most helpful comment

@aayushARM

I've been spending several days to figure this out also and I found something.
For the code that @Tiyanak wrote, I had the same result with you, but when I just applied only the last part(I removed all the modified part before do this),

before:
 def _bytes_feature(value):
     return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))

after change:
def _bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value.encode('utf-8') if type(value)==str else value]))

it started to work like a charm. It's still working, so I cannot say that this works perfectly but for now this is the only way to let it work for my case..

All 9 comments

I don't think that the model works on Python 3.

Yes, this is indeed a python2/python3 compatibility thing. It would be great if you could attempt to fix it for Python 3. For examplee, you might try using bytes(value) instead of str(value) on that error and see if it works.

1133 may be helpful here

It is definitely a Python 3 issue. I realized it after successfully running the training using Python 2.
I would attempt to fix it for Python 3 and send a PR. I believe the issue can be closed until then.

@KranthiGV thanks for looking into that and volunteering to switch the code to Python 3 compatible. I'll mark it contributions welcome for now, but keep it open. Please reference it in your commit with a message like "Fixes #thisbug" and it will automatically close. Thanks!

for python 2 bytes are string (just a allias for str type), and for python 3 bytes are bytes, so shortly:

line 378 is problem:

```def _load_and_process_metadata(captions_file, image_dir):

with tf.gfile.FastGFile(captions_file, "r") as f:
# WRONG
captions_data = json.load(f)
# " WRONG!!! f is bytes and json here in p3 looks for str so solution: change
#captions_data=json.load(f) to: ==> "

#correct code
import codecs # on begining of a file
reader = codecs.getreader('utf-8')
f = reader(f)
caption_data = json.load(f) 
# correct

and change method on line 169:

before:
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))

after change:
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value.encode('utf-8') if type(value)==str else value]))```

explaination: tf.train.ByteList logic says it looks for bytes, and you convert value which are bytes to str? why? no sense, so for case if sometime value can be str and sometime bytes, change it to give correct type, and then all preprocess script will work on python 3

@Tiyanak I wasn't getting any error on caption_data variable, but only in _bytes_feature() method(exactly the one posted by @KranthiGV above), however, I made both changes as you did. Doing this solved the 2nd error, but gave a new one on caption_data:

Traceback (most recent call last):
  File "build_mscoco_data.py", line 486, in <module>
    tf.app.run()
  File "/home/aayush/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
    _sys.exit(main(argv))
  File "build_mscoco_data.py", line 462, in main
    FLAGS.train_image_dir)
  File "build_mscoco_data.py", line 412, in _load_and_process_metadata
    caption_data = json.load(f)
  File "/home/aayush/anaconda3/lib/python3.6/json/__init__.py", line 296, in load
    return loads(fp.read(),
  File "/home/aayush/anaconda3/lib/python3.6/codecs.py", line 497, in read
    data = self.bytebuffer + newdata
TypeError: can't concat bytes to str

I reverted the change on caption_data but kept changes on _bytes_feature(), and all errors were gone. Why do you think this happened?

@aayushARM

I've been spending several days to figure this out also and I found something.
For the code that @Tiyanak wrote, I had the same result with you, but when I just applied only the last part(I removed all the modified part before do this),

before:
 def _bytes_feature(value):
     return tf.train.Feature(bytes_list=tf.train.BytesList(value=[str(value)]))

after change:
def _bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value.encode('utf-8') if type(value)==str else value]))

it started to work like a charm. It's still working, so I cannot say that this works perfectly but for now this is the only way to let it work for my case..

I tried all the solutions above. But still, get the error that "'utf-8' codec can't decode byte 0xff".

Change all "r" to "rb" solved my problem.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hanzy123 picture hanzy123  路  3Comments

nmfisher picture nmfisher  路  3Comments

jacknlliu picture jacknlliu  路  3Comments

rakashi picture rakashi  路  3Comments

trungdn picture trungdn  路  3Comments