Models: im2txt error on DecodeJpeg

Created on 30 Oct 2016  路  16Comments  路  Source: tensorflow/models

I am trying im2txt on a custom dataset. Facing this issue. All files are jpeg and none of them are png.
Is there a way an easy way to find which file is causing this decode jpeg error ?

Not a JPEG file: starts with 0x00 0x10
Not a JPEG file: starts with 0x00 0x10
Not a JPEG file: starts with 0x00 0x10
Not a JPEG file: starts with 0x00 0x10
INFO:tensorflow:Error reported to Coordinator: , Invalid JPEG data, size 71580
[[Node: decode/DecodeJpeg = DecodeJpeg[acceptable_fraction=1, channels=3, fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](ParseSingleSequenceExample/ParseSingleSequenceExample)]]
[[Node: distort_color/adjust_hue/Mod/_1795 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_163_distort_color/adjust_hue/Mod", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op u'decode/DecodeJpeg', defined at:
File "/data/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 114, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/data/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 65, in main
model.build()
File "/data/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/show_and_tell_model.py", line 353, in build
self.build_inputs()
File "/data/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/show_and_tell_model.py", line 166, in build_inputs
image = self.process_image(encoded_image, thread_id=thread_id)
File "/data/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/show_and_tell_model.py", line 120, in process_image
image_format=self.config.image_format)
File "/data/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/ops/image_processing.py", line 100, in process_image
image = tf.image.decode_jpeg(encoded_image, channels=3)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_image_ops.py", line 283, in decode_jpeg
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 749, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2380, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1298, in __init__
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Invalid JPEG data, size 71580
[[Node: decode/DecodeJpeg = DecodeJpeg[acceptable_fraction=1, channels=3, fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](ParseSingleSequenceExample/ParseSingleSequenceExample)]]
[[Node: distort_color/adjust_hue/Mod/_1795 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_163_distort_color/adjust_hue/Mod", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

awaiting response

Most helpful comment

When I run the im2txt model, the following image file causes the DecodeJpeg error (so do more unknown files) /cc @girving

./COCO_val2014_000000320612.jpg

It is a PNG. Verification:

In [10]: im_encoded = tf.gfile.GFile("./COCO_val2014_000000320612.jpg", "r").read()

In [11]: len(im_encoded)
Out[11]: 302086

In [12]: im_encoded[:4]
Out[12]: '\x89PNG'

In [13]: sess.run( tf.image.decode_jpeg(im_encoded, channels=3) )
Not a JPEG file: starts with 0x89 0x50
Not a JPEG file: starts with 0x89 0x50
---------------------------------------------------------------------------
InvalidArgumentError
...

It is the problem of COCO, but as TF (tensorflow#4009) does not lookup the image header, I work-arounded the issue by manually converting those PNGs to JPGs.

All 16 comments

Could you take a look at the files themselves?

file path/to/image.jpg

and

vi -b path/to/image.jpg

should show you the magic

All of them show this JPEG image data, JFIF standard 1.01, when i do file path/to/image.jpg

If I try this find -name '*.jpg' -exec identify -format "%f" {} \; 1>ok.txt 2>errors.txt as suggested here I dont see any bad jpeg files.

Also, if I run this as mentioned in https://github.com/tensorflow/tensorflow/issues/4009, all go through fine -

  for i, image_name in enumerate(glob.glob(os.path.join(base_path, 'train',  '*.jpg'))):
    print i, image_name
    with tf.Graph().as_default():
      image_contents = tf.read_file(image_name)
      image = tf.image.decode_jpeg(image_contents, channels=3)
      init_op = tf.initialize_all_tables()
      with tf.Session() as sess:
        sess.run(init_op)
        tmp = sess.run(image)

so I am puzzled now, how to find the problematic image file

Interesting... perhaps try this: Use binary search on your set to find the error image. I.e. make two directories that have half of your data. Run im2txt on each one of those. Whichever fails repeat the bisecting and testing procedure.

Closing due to lack of activity. Please let us know if you find the problematic file!

I faced this error on a file labeled as .jpg but upon checking file path/to/image.jpg I discovered it was actually RIFF (little-endian) data. Thanks for the support @drpngx and others.

When I run the im2txt model, the following image file causes the DecodeJpeg error (so do more unknown files) /cc @girving

./COCO_val2014_000000320612.jpg

It is a PNG. Verification:

In [10]: im_encoded = tf.gfile.GFile("./COCO_val2014_000000320612.jpg", "r").read()

In [11]: len(im_encoded)
Out[11]: 302086

In [12]: im_encoded[:4]
Out[12]: '\x89PNG'

In [13]: sess.run( tf.image.decode_jpeg(im_encoded, channels=3) )
Not a JPEG file: starts with 0x89 0x50
Not a JPEG file: starts with 0x89 0x50
---------------------------------------------------------------------------
InvalidArgumentError
...

It is the problem of COCO, but as TF (tensorflow#4009) does not lookup the image header, I work-arounded the issue by manually converting those PNGs to JPGs.

@wookayin Use decode_image instead of decode_jpeg, and it'll handle both formats.

@girving Great, I just learned that it was finally added! I found the related PR (mentioning here for a reference): https://github.com/tensorflow/tensorflow/issues/4222
Although it is available in master (not in TF 0.12, but in TF 1.0? soon maybe), it would be very useful.

@girving
I am still having similar issue. (I am running under Python 3.5)

I processed a different set of data (Flickr8K) using this code. (Note the change of 'rb' from 'r' as suggested in other posts)

with tf.gfile.FastGFile(image.filename, "rb") as f: encoded_image = f.read()

However, while decoding I am facing this issue
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Expected image (JPEG, PNG, or GIF), got unknown format starting with 'b\'\\xff\\xd8\\xff\\x'

May I ask, if the graph already finalized, how to call : tf.InteractiveSession.run(tf.image.decode_image(image, channels=3))?
I occurred this error:

ile "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2290, in _check_not_finalized
    raise RuntimeError("Graph is finalized and cannot be modified.")

@Lalit7Jain
I encountered the same issue with you.
I pre-processed the data with python 3.6 and run the train script, I got this error. Then I re-built the data with python 2.7 and trained, this time there was no error. I think the im2txt model is not compatible with python 3.

@ajkl , I meet a silimar question with you when i train models on google TensorFlow Object Detection API, May i ask if the question has been solved? what's the problem and how to solved? thank you!

@wookayin , you can try this:

def _parse_function(filename):
    image_string = tf.read_file(filename)
    image_decoded = tf.cond(
        tf.image.is_jpeg(image_string),
        lambda: tf.image.decode_jpeg(image_string, channels=3),
        lambda: tf.image.decode_png(image_string, channels=3))
    image_resized = tf.image.resize_images(image_decoded, [90, 90])
return image_resized

filenames = ["/var/data/image1.jpg", "/var/data/image2.jpg", ...]
labels = [0, 37, 29, 1, ...]
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.map(_parse_function)
dataset = dataset.batch(batch_size)
iterator = dataset.make_one_shot_iterator()

is_jpeg() is a function added by tensorflow r1.7

@girving
I am still having similar issue. (I am running under Python 3.5)

I processed a different set of data (Flickr8K) using this code. (Note the change of 'rb' from 'r' as suggested in other posts)

with tf.gfile.FastGFile(image.filename, "rb") as f: encoded_image = f.read()

However, while decoding I am facing this issue
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Expected image (JPEG, PNG, or GIF), got unknown format starting with 'b\'\\xff\\xd8\\xff\\x'

@ljain-chwy Have you been able to solve this issue? I also tried to generate TFRecords from TIF images. I loaded them into NumPy arrays (raw and segmented), converted the segmented into the desired format (as per the class labels), and generated the records. I am facing an issue while training as the TF Dataset is not able to decode this from the TF Records and throws a similar error.

Was this page helpful?
0 / 5 - 0 ratings