Pillow: No n_frames, or bad value for n_frames

Created on 2 Jan 2016 · 21Comments · Source: python-pillow/Pillow

When I feed the test file flower2.jpg into this code (from #1630)

im = Image.open( fn )
imgcnt = im.n_frames
colors = im.getcolors( im.width * im.height )
if args.hist:
    for cnt, col in colors:
        allcolors[ col ] += cnt
    for iz in range( 1, imgcnt ):
        im = Image.open( fn ) # does getcolors implicitly close????
                              # without the open, get "seek of closed
                              # file" error on line below.
        im.seek( iz )
        colors = im.getcolors( im.width * im.height )
        for cnt, col in colors:
            allcolors[ col ] += cnt

I get "AttributeError: n_frames"

But other .jpg files do not get that error... this one: http://nevcal.com/temporary/20151110-105826gl.jpg has no problem with the attribute error on that line, but it gets a value of 2, apparently handles the seek OK, but dies in the second call to getcolors, with "OSError: image file is truncated (0 bytes not processed)".

Bug

Source

v-python

Most helpful comment

I have created PR #3588 to address this.

@jmetz are you able to verify that it solves your issue? When I try to replicate your situation, I only get an OSError, not a segfault, but I'm still inclined to believe that this should work as a fix.

radarhere on 17 Jan 2019

👍2

All 21 comments

I'd expect .jpg files to always return 1 as the n_frames attribute; some from some cameras have an embedded thumbnail in the metadata, but I don't think that is particularly standard?

Same environment as for #1630: Windows 10, Python 3.5.0, Pillow 3.0.0.

So I've added a hasattr workaround to avoid the Attribute error, but when the attribute exists, but is wrong? Harder to deal with...

v-python on 2 Jan 2016

Hi. I think the simple answer to your question is that 20151110-105826gl.jpg is not actually a JPEG file. It's an MPO, and Pillow reads it as such, giving it an n_frames attribute. This is a case of the file extension not reflecting the type of file.

JPEGs do not have multiple frames, and so currently don't have the n_frames attribute.

If you think that a change should be made to Pillow to clarify this somehow, feel free to say.

radarhere on 2 Jan 2016

With regards to OSError: image file is truncated (0 bytes not processed), the error can replicated with -

from PIL import Image
im = Image.open('test.jpg')
im.seek(1)
im.load()

So it has nothing to do with n_frames. Is there any reason you think that the file isn't truncated as the error suggests?

radarhere on 2 Jan 2016

Re: no n_frames for JPEG... I thought I saw some comments here and a pull request for adding n_frames to file types that didn't have them, for uniformity. Maybe that isn't released yet.

Re: MPO support. I had never heard of MPO until you mentioned them. I found a little info on Wikipedia, and my camera is listed as one that uses them. The picture is one I took with my camera. All consistent so far. I copied the file from the camera. I didn't truncate it. I don't know why the camera would. I haven't found good documentation for MPO yet, that explains what EXIF fields distinguish them from JPG. None of my other JPG software has complained about them not being JPG. This error may have nothing to do with n_frames, except that one shouldn't seek(1) if there are not n_frames, and n_frames produces 2.

Thanks again for your analysis. I have some things to learn now... but there may be a bug, or at least a necessary workaround, for MPO files named JPG that come off standard consumer cameras.

v-python on 2 Jan 2016

OK, I pulled a picture directly from the camera card. I found a version of the MPO spec at https://web.archive.org/web/20190227081740/http://www.cipa.jp/std/documents/e/DC-007_E.pdf and manually decoded my first MPO headers using a HEX editor, and as far as I can tell, it says 2 pictures. But also, as far as I (or PILLOW) can tell, there is only one picture.

So it would seem that 1. My camera lies OR 2. there is a newer version of the spec that says that cameras are allowed to lie.

It would seem that because my camera lies, that maybe other cameras lie also. So it would seem that maybe, in addition to checking the MPO image count field, that the 5.2.3.3.3 Individual Image Data Offset fields should be checked against EOF to determine the validity of the count.

Sad, when cameras lie.

v-python on 2 Jan 2016

Although, it seems (from Page 56 of the above standard) that files named .JPG are supposed to be treated as .JPG, and multi-image files are supposed to be named as .MPO, so perhaps even if a .JPG contains APP2 extension headers (MPO headers), that it should still be treated as a single image file format!

v-python on 2 Jan 2016

Here's my work-around code, if others bump in to this...

    # some formats don't get n_frames
    # some .jpg (having APP2/MPO headers) get n_frames != 1 !! (typically 2)
    if hasattr( im, 'n_frames'):
        imgcnt = im.n_frames
         # if named .jpg, only one image, really. Multi-image named .mpo
        if fnext.lower() == '.jpg':
            imgcnt = 1

v-python on 2 Jan 2016

Oh, fnext -- File Name EXTension -- a variable containing that, for checking the name. And im is the Image that corresponds.

v-python on 2 Jan 2016

From my reading of specification, extended multipicture files (multiple primary images) should have the extension MPO, and baseline multipicture files (single primary image) should have the extension JPG.

However, I believe that while multipicture files with the extension JPEG should only contain one primary image, they may still have 'MPO' features, and so should be recognised as MPOs. I don't believe that Pillow supports any of these additional features at the moment, but I think it's still better to recognise that it's not an ordinary JPG.

So the only way to stick further to the specification that I can see would be to drop any subsequent images, which seems unhelpful.

radarhere on 22 Jan 2016

Yes, I agree with your analysis of the specification, that is what I found too.

I really can't say why the JPG from my camera claims to have a second image. There is no second image... but the MPO data say there is.

There is no subsequent image to drop, so it would be quite impossible for Pillow to drop any subsequent images! However, there might be a compromise solution that would work better inside Pillow than my workaround hack above: checking the MPO data pointer for file size (already being done, maybe) and if the pointer is out of bounds _and_ the file name is JPG, then adjusting n_frames. This is more work, indeed, to calculate a lesser value for n_frames based on other data, rather than using the n_frames value that the MPO headers supply, but we have here a situation where the n_frames data is clearly wrong, from a camera that is "in the wild" busily creating files that have this inconsistency. Mine probably isn't the only model camera from that vendor that is doing that sort of thing.

This solution would allow future image files with JPG extension and multiple actual images to still work with Pillow, and allow access to all the images, yet would not require all Pillow programmers to be aware of the need to trap the "File is truncated" error as possibly spurious in the presence of JPG with n_frames > 1. Nor would they have to use my workaround above.

v-python on 22 Jan 2016

There is a property in ImageFile, LOAD_TRUNCATED_IMAGES, that will allow you to load the second image regardless. If you'd like to see the second image -

from PIL import Image, ImageFile
im = Image.open('test.jpg')
ImageFile.LOAD_TRUNCATED_IMAGES = True
im.seek(1)
im.show()

This means that to implement your suggestion, the value of n_frames would have to change based on the value of LOAD_TRUNCATED_IMAGES, and that behaviour sounds confusing to me.

radarhere on 22 Jan 2016

I was unaware of LOAD_TRUNCATED_IMAGES. But in this case, there is _no_ (second) image, not a truncated one.

Meaning that the "start" offset is already past EOF, rather than there being some bytes available to read.

v-python on 22 Jan 2016

Is this a bug then or not an issue?

aclark4life on 6 Jan 2017

Once radar educated me to the existence of MPO files, I would say the bug is really in the JPG file... but such JPG files are being created every click by lots of cameras. So that leaves Pillow a bunch of non-conformant JPG files to have thrown at it. The situation is detectable in two ways: the file extension, and the calculation of the second frame starting past EOF. So if it were to detect the condition, it could handle them like JPG, which would be helpful.

v-python on 6 Jan 2017

I've noticed in a similar image (available below) that the MPO issue causes a segfault when using PIL via the skimage.io module.

The image I used was the one referenced there (https://github.com/scikit-image/scikit-image/issues/2445):
https://cloud.githubusercontent.com/assets/13360214/21850830/6aadccd6-d847-11e6-84d6-e6e955592957.jpg

However, upon further digging, I found that while PIL segfaults if you try and seek to frame 1 and load, I was able to read the second image by reading in the binary data at the specified offset and loading that separately using PIL!

As far as I can tell this means that PIL just hasn't yet integrated the full MPO spec correctly, right?

Code to reproduce - the last line causes a segfault.

from PIL import Image, ImageFile
im = Image.open("./bad_image.jpg")
im.load()
print(im)
print("Size 1:", im.size)
print(im.n_frames)
print("In header thing:", im.mpinfo[0xB001])
offsets = [mp["DataOffset"] for mp in im.mpinfo[0xB002]]
print("Offsets:", offsets)
print(im._MpoImageFile__mpoffsets)

print("Reading starting from second offset...")

with open("./bad_image.jpg", "rb") as fin:
    #print(len(fin.read()))
    fin.seek(im._MpoImageFile__mpoffsets[1])
    #dat = fin.read()
    im2 = Image.open(fin)
    print("Size 2:", im2.size)
    im2.load()
    print("Load successful!")

print("Trying to load after seek(1)...")
im = Image.open("./bad_image.jpg")
im.load()
print(im.offset)
im.seek(1)
print("Seeked to 1, offset is:")
print(im.offset)
im.load()

jmetz on 8 May 2018

Looking at this further, MPO files contain multiple images. The images within the current Pillow test suite are 'Multi-Frame Image'. The two images causing a problem here have a 'Large Thumbnail' after the first image.

Are either of the images in this issue able to added as test images, under Pillow's license?

radarhere on 7 Jan 2019

Are either of the images in this issue able to added as test images, under Pillow's license?

@abuccts is the OP who posted the image in the issue (https://github.com/scikit-image/scikit-image/issues/2445) I referenced, they presumably will know about the usage rights for that image.

jmetz on 10 Jan 2019

I remember I crawled that image from Google Image Search two years ago but I cannot find the original link now. I think the usage rights is Fair Use.