Pillow: UnidentifiedImageError when I try to train my model

Created on 7 Jun 2020  路  28Comments  路  Source: python-pillow/Pillow

UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x63a807fb0>

classifier.fit_generator(training_set,
steps_per_epoch = 914,
epochs = 25,
validation_data = test_set,
validation_steps = 237)

Screenshot 2020-06-07 at 9 33 20 PM

Most helpful comment

A self-contained script means giving us everything that we would need to trigger the error - ideally, a version of your code that is as simple as it can be, while still causing the problem. This means giving us any input arguments to your code, or any input files as well, so that we can run it just like you do.

All 28 comments

Would you be able to provide the image that cannot be identified? Or a self-contained script? At the moment, there isn't much information to go on.

If I were you, I would open up ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/keras_preprocessing/image/utils.py and print(path) just before the error so that I could figure out which image it is - but that's me.

I don't understand which image it is because sometimes it happens after first few steps and sometimes its happens after 10-15 steps, also how can I share a self-contained script?
Also, about the print(path), where can I put it in my code?before the classifier.fit_generator?

I'm not suggesting that you modify your code. I'm suggesting that you make a temporary modification to the keras_preprocessing library - open ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/keras_preprocessing/image/utils.py, the file mentioned in your traceback, go to line 113 and make this modification -

with open(path, 'rb' as f:
    print(path)
    img = pil_image.open(io.BytesIO(f.read()))
    if color_mode == 'grayscale':

Then, run your code. The last path printed before the error should be the one. Upload the image listed at that path here. After you are done, remove print(path) again to prevent it printing out paths in the future.

A self-contained script means giving us everything that we would need to trigger the error - ideally, a version of your code that is as simple as it can be, while still causing the problem. This means giving us any input arguments to your code, or any input files as well, so that we can run it just like you do.

Thank you for reporting an issue.

Follow these guidelines to ensure your issue is handled properly.

If you have a ...

  1. General question: consider asking the question on Stack Overflow
    with the python-imaging-library tag:

    Do not ask a question in both places.

    If you think you have found a bug or have an unexplained exception
    then file a bug report here.

  2. Bug report: include a self-contained, copy-pastable example that
    generates the issue if possible. Be concise with code posted.
    Guidelines on how to provide a good bug report:

    Bug reports which follow these guidelines are easier to diagnose,
    and are often handled much more quickly.

  3. Feature request: do a quick search of existing issues
    to make sure this has not been asked before.

We know asking good questions takes effort, and we appreciate your time.
Thank you.

What did you do?

What did you expect to happen?

What actually happened?

What are your OS, Python and Pillow versions?

  • OS:
  • Python:
  • Pillow:

Please include code that reproduces the issue and whenever possible, an image that demonstrates the issue. Please upload images to GitHub, not to third-party file hosting sites. If necessary, add the image to a zip or tar archive.

The best reproductions are self-contained scripts with minimal dependencies. If you are using a framework such as Plone, Django, or Buildout, try to replicate the issue just using Pillow.

code goes here

@radarhere I have attached the model.py file and the image that caused the error
Archive.zip
Please have a look and let me know what is the issue

There are also some more images that are causing the same error

Your script is not self-contained, because you haven't provided 'dataset/training_set', 'dataset/test_set' and 'best_model.hdf5'.

I tried the image you provided, and I'm able to open it without a problem.

import io
from PIL import Image as pil_image
path = 'chat (429).jpeg'
with open(path, 'rb') as f:
    img = pil_image.open(io.BytesIO(f.read()))

Well, I don't understand why I am getting an error at random for different images.

What version of Pillow are you using?

If you open the file on your machine, does it work? It's entirely possible there's a problem in your environment, rather than a problem in Pillow or in the image as such.

from PIL import Image
Image.open('chat (429).jpeg')

If the image does not open, then it is a problem with your environment. Please provide as much information as you can about your operating system, Python version, etc.

If the image does open okay for you, then we still need to search for the problem image. It is unlikely, but theoretically possible, that the image file is being modified by the surrounding script. Could you open ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/keras_preprocessing/image/utils.py and modify it from

with open(path, 'rb' as f:
    img = pil_image.open(io.BytesIO(f.read()))
    if color_mode == 'grayscale':

to

import os, PIL
with open(path, 'rb') as f:
    f_data = f.read()
    try:
        img = pil_image.open(io.BytesIO(f_data))
        print("Opened", path)
        print()
    except PIL.UnidentifiedImageError:
        if not os.path.exists('failed'):
            print("Failed to open", path)
            with open('failed', 'wb') as fp:
                fp.write(f_data)
            with open('failed', 'rb') as fp:
                try:
                    img = pil_image.open(io.BytesIO(fp.read()))
                    print("The file could be opened when re-saved. Something is very strange.")
                except PIL.UnidentifiedImageError:
                    print("Confirmed: This image cannot be opened. Please upload 'failed' to the Pillow issue. Don't worry about the lack of a file extension")
        else:
            print("Failed to open", path, "but a problem image has already been found")
    if color_mode == 'grayscale':
            pass

and run your code. A file called 'failed' should be generated, which you can upload here.

@radarhere , I could successfully open the image using Image.open('chat (429.jpeg)

then I modified the ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/keras_preprocessing/image/utils.py like you said, and then I started receiving these errors

OS: macOS Catalina
Python version: 3.7.6
Pillow version: PIL 7.1.2
Keras version: 2.3.1

Found 914 images belonging to 2 classes.
Found 239 images belonging to 2 classes.
Epoch 1/10
OpenedOpened dataset/test_set/chats/chat (483).jpeg

 dataset/training_set/chats/chat (152).jpeg

OpenedFailed to open dataset/training_set/others/other (102).jpeg

 dataset/test_set/chats/chat (466).jpeg
Confirmed: This image cannot be opened. Please upload 'failed' to the Pillow issue. Don't worry about the lack of a file extension
Opened dataset/test_set/others/other (521).jpeg

Opened dataset/test_set/chats/chat (459).jpeg

Opened dataset/test_set/others/other (523).jpeg

Opened dataset/test_set/others/other (566).jpeg

Opened dataset/test_set/chats/chat (561).jpeg

Opened dataset/test_set/others/other (554).jpeg

Traceback (most recent call last):

  File "/Users/anis/Desktop/Projects/Machine learning/Chat Screenshots discriminator/model.py", line 54, in <module>
    callbacks=[checkpoint])

  File "/Users/anis/anaconda3/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)

  File "/Users/anis/anaconda3/lib/python3.7/site-packages/keras/engine/training.py", line 1732, in fit_generator
    initial_epoch=initial_epoch)

  File "/Users/anis/anaconda3/lib/python3.7/site-packages/keras/engine/training_generator.py", line 185, in fit_generator
    generator_output = next(output_generator)

  File "/Users/anis/anaconda3/lib/python3.7/site-packages/keras/utils/data_utils.py", line 625, in get
    six.reraise(*sys.exc_info())

  File "/Users/anis/anaconda3/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value

  File "/Users/anis/anaconda3/lib/python3.7/site-packages/keras/utils/data_utils.py", line 610, in get
    inputs = future.get(timeout=30)

  File "/Users/anis/anaconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value

  File "/Users/anis/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))

  File "/Users/anis/anaconda3/lib/python3.7/site-packages/keras/utils/data_utils.py", line 406, in get_index
    return _SHARED_SEQUENCES[uid][i]

  File "/Users/anis/anaconda3/lib/python3.7/site-packages/keras_preprocessing/image/iterator.py", line 65, in __getitem__
    return self._get_batches_of_transformed_samples(index_array)

  File "/Users/anis/anaconda3/lib/python3.7/site-packages/keras_preprocessing/image/iterator.py", line 231, in _get_batches_of_transformed_samples
    x = img_to_array(img, data_format=self.data_format)

  File "/Users/anis/anaconda3/lib/python3.7/site-packages/keras_preprocessing/image/utils.py", line 315, in img_to_array
    raise ValueError('Unsupported image shape: %s' % (x.shape,))

ValueError: Unsupported image shape: ()

@anis-agwan could you upload the 'failed' file that should have been saved by my code changes?

Do you mean the file 'chat (466).jpeg'?

If you can't open 'chat (466.jpeg' with Pillow, please upload it.

The intention of my code was for it to saved a file called 'failed', that would be an image that you couldn't open with Pillow on your system, which you could then upload here.

At the moment, you haven't provided any images that you can't open by themselves with Pillow. Without that, there isn't a Pillow problem for us to help solve.

I can't run your code to help you try and figure out what those images are either, because I don't know what 'dataset/training_set', 'dataset/test_set' or 'best_model.hdf5' are in the code example you have provided.

@anis-agwan It would help us a lot if you could follow this guide:

Thanks!

@radarhere I am able to open the image with Pillow that failed during the training ml model. I think this isn't an issue with the Pillow. Can we connect somewhere so that we can close this issue?

I don't have any knowledge specific to keras_preprocessing. I could only contribute knowledge about general debugging, so it's possible that another community could help you faster. Certainly, keras has more questions on StackOverflow than Pillow does.

If this isn't a Pillow issue, then closing.

Hi @anis-agwan. I have the exact same issue when attempting to train an image classifier. Did you ever fint a solution? Thanks!

Hi @anis-agwan. I have the exact same issue when attempting to train an image classifier. Did you ever fint a solution? Thanks!

I didn't find a solution, I happened to find the issue in my folders. I had created a python script to rename all the files as jpeg in the folder. So I had a different file extension which was also converted to jpeg and that file caused error. I would suggest go through the images for a corrupt file.

Okay, thanks, i'll try that.

Check your images.

Hi, @anis-agwan @ahkm1234, I am facing the same issue.
The error is showing when the first epoch is getting completed. I tried to put different batch sizes and found that once the first epoch is over, it is throwing this error.
If the problem is solved and able to complete the training then please share your remarks that can be helpful for us.
Thanks!

Hello, Please do check your image files, error is caused due to corrupt file

sure, I'll check.

Same here, corrupt image files. Although I had run several different scripts to detect corrupt files (and found none), it turned out there were indeed a couple of corrupt images in the dataset. This, I only realized after running a pythorch classifier, which gave me the specific image-names, that the model wasn't able to read (although these images looked completely fine to me, very strange). After deleting these files, it ran without a problem.

That's nice.
I used keras Imagedatagenerator for data augmentation before feeding into the model. Even all images are showing correctly.
I'll also check with pytorch. Thanks for your responses!

Same here, corrupt image files. Although I had run several different scripts to detect corrupt files (and found none), it turned out there were indeed a couple of corrupt images in the dataset. This, I only realized after running a pythorch classifier, which gave me the specific image-names, that the model wasn't able to read (although these images looked completely fine to me, very strange). After deleting these files, it ran without a problem.

Hi @ahkm1234, @anis-agwan
just to share one more point here, I was able to train the model without augmentation. But with augmentation facing the above issue shared.
@ahkm1234 could you please share how did you resolve to find out the corrupt images with pytorch classifier?

I believe it was during data augmentation, that the corrupted files were reveled, but I might remember wrong. The code is super messy, but looks something like this:

Location of data

datadir = '...'
traindir = datadir + 'train/'
validdir = datadir + 'val/'
testdir = datadir + 'test/'
#preddir = datadir + 'Predict/'

save_file_name = '.pt'
#path = F"...t"
checkpoint_path = '.pth'

# Empty lists
categories = []
img_categories = []
n_train = []
n_valid = []
n_test = []
hs = []
ws = []

# Iterate through each category
for d in os.listdir(traindir):
    categories.append(d)

    # Number of each image
    train_imgs = os.listdir(traindir + d)
    valid_imgs = os.listdir(validdir + d)
    test_imgs = os.listdir(testdir + d)
    n_train.append(len(train_imgs))
    n_valid.append(len(valid_imgs))
    n_test.append(len(test_imgs))

    # Find stats for train images
    for i in train_imgs:
        img_categories.append(d)
        img = Image.open(traindir + d + '/' + i)
        img_array = np.array(img)
        # Shape
        hs.append(img_array.shape[0])
        ws.append(img_array.shape[1])

# Dataframe of categories
cat_df = pd.DataFrame({'category': categories,
                       'n_train': n_train,
                       'n_valid': n_valid, 'n_test': n_test}).\
    sort_values('category')

# Dataframe of training images
image_df = pd.DataFrame({
    'category': img_categories,
    'height': hs,
    'width': ws
})

cat_df.sort_values('n_train', ascending=False, inplace=True)
cat_df.head()
cat_df.tail()

#A function that will plot a tensor as an image: 
def imshow_tensor(image, ax=None, title=None):
    """Imshow for Tensor."""

    if ax is None:
        fig, ax = plt.subplots()

    # Set the color channel as the third dimension
    image = image.numpy().transpose((1, 2, 0))

    # Reverse the preprocessing steps
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    image = std * image + mean

    # Clip the image pixel values
    image = np.clip(image, 0, 1)

    ax.imshow(image)
    plt.axis('off')

    return ax, image

# Image transformations
image_transforms = {
    # Train uses data augmentation
    'train':
    transforms.Compose([
        transforms.RandomResizedCrop(size=256, scale=(0.8, 1.0)),
        transforms.RandomRotation(degrees=15),
        transforms.ColorJitter(),
        transforms.RandomHorizontalFlip(),
        transforms.CenterCrop(size=224),  # Image net standards
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])  # Imagenet standards
    ]),
    # Validation does not use augmentation
    'val':
    transforms.Compose([
        transforms.Resize(size=256),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    # Test does not use augmentation
    'test':
    transforms.Compose([
        transforms.Resize(size=256),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

--

# Datasets from each folder
data = {
    'train':
    datasets.ImageFolder(root=traindir, transform=image_transforms['train']),
    'val':
    datasets.ImageFolder(root=validdir, transform=image_transforms['val']),
    'test':
    datasets.ImageFolder(root=testdir, transform=image_transforms['test']) 
}

# Dataloader iterators
dataloaders = {
    'train': DataLoader(data['train'], batch_size=batch_size, shuffle=True),
    'val': DataLoader(data['val'], batch_size=batch_size, shuffle=True),
    'test': DataLoader(data['test'], batch_size=batch_size, shuffle=True)
}

Hope it works!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

etc0de picture etc0de  路  4Comments

boskicthebrain picture boskicthebrain  路  4Comments

readyready15728 picture readyready15728  路  4Comments

edowson picture edowson  路  3Comments

mmalenta picture mmalenta  路  3Comments