Weasyprint: Pixbuf error: Unrecognized image file format

Created on 10 Apr 2014  ·  17Comments  ·  Source: Kozea/WeasyPrint

Hi
I am running WeasyPrint 0.21 on Ubuntu (12.04) and installed all the dependencies according to http://weasyprint.org/docs/install/#debian-ubuntu including GdkPixbuf:
libgdk-pixbuf2.0-0 (2.26.1-1)
gir1.2-gdkpixbuf-2.0 (2.26.1-1)
libgdk-pixbuf2.0-dev (2.26.1-1)

Rendering of HTML+PNG to pdf works fine. I am only running into problems if there are any jpgs. The error I am getting looks like:

Pixbuf error: Unrecognized image file format

When I run gdk-pixbuf-query-loaders I can see:
"/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-jpeg.so"
"jpeg" 5 "gdk-pixbuf" "The JPEG image format" "LGPL"
"image/jpeg" ""
"jpeg" "jpe" "jpg" ""
"\377\330" "" 100

I also checked the jpg file with jpeginfo:
jpeginfo skateboard-1.jpg
skateboard-1.jpg 1200 x 1200 24bit JFIF P 64299

Do I have anything missing to get jpg support with weasyprint?

Most helpful comment

Just an update on this: The pixbuf error was misleading. The problem is that the images inside the pdf are protected by a view that requires authentication.

I am using nginx as a proxy in front of Django. Nginx also serves the media folder (images for the pdf) but only internally. The images inside the pdf are accessed via a view which requires the user to be logged in. The view adds a X-Accel-Redirect header and requests the image from nginx.

This all works fine when the template is requested as a normal web page from an authenticated user.

The problem with the pdf is that the request from weasyprint to fetch the images is done as an AnonymousUser.

Is there a way for weasyprint to pass on the session when it needs to fetch protected resources like images?

All 17 comments

Do you get this problem with any JPEG image, or only some specific files? What about images in formats other than PNG or JPEG? Can you reproduce the issue using cairocffi.pixubuf directly, without WeasyPrint? Can you reproduce it on another machine?

Do you also get an exception with this snippet?

filename = b'./skateboard-1.jpg'

import cffi
ffi = cffi.FFI()
ffi.cdef('void* gdk_pixbuf_new_from_file (const char* filename, void** err);')
pixbuf = ffi.dlopen('gdk_pixbuf-2.0')
result = pixbuf.gdk_pixbuf_new_from_file(filename, ffi.NULL)
assert result != ffi.NULL

I have tried another JPEG and ended up with the same error. On my development machine (OSX, installed with Homebrew) everything works fine.

This is what I get when I run your snippet:

$ python ~/snippet.py 
Traceback (most recent call last):
  File "/home/mischa/snippet.py", line 8, in <module>
    assert result != ffi.NULL
AssertionError

Anything else I can try?

That this snippet fails shows that this is not a WeasyPrint bug, but something in your GDK-PixBuf install. Unfortunately I don’t know what else to suggest.

Ok, no problem. Thanks for taking your time. I will post an update in case I get this fixed.

Just an update on this: The pixbuf error was misleading. The problem is that the images inside the pdf are protected by a view that requires authentication.

I am using nginx as a proxy in front of Django. Nginx also serves the media folder (images for the pdf) but only internally. The images inside the pdf are accessed via a view which requires the user to be logged in. The view adds a X-Accel-Redirect header and requests the image from nginx.

This all works fine when the template is requested as a normal web page from an authenticated user.

The problem with the pdf is that the request from weasyprint to fetch the images is done as an AnonymousUser.

Is there a way for weasyprint to pass on the session when it needs to fetch protected resources like images?

The problem is that the images inside the pdf are protected by a view that requires authentication.

Yeah, if fetching an image returns an HTML login form, "Unrecognized image file format" is expected.

Is there a way for weasyprint to pass on the session when it needs to fetch protected resources like images?

Not automatically, since WeasyPrint doesn’t know about your Django app and has no idea of what "the session" is.

However you can write a custom "URL fetcher" function to tweak how WeasyPrints obtains the image, either by adding the appropriate cookies or other parameters to the HTTP request, or (if the image is stored on the same server as WeasyPrint is running on) by bypassing the network entirely and open the file directly.

http://weasyprint.org/docs/tutorial/#url-fetchers
http://weasyprint.org/docs/api/#weasyprint.default_url_fetcher

Thanks. Yes, the "URL fetcher" did the job. I ended up opening the file directly. Here is the code for reference:


def photo_url_fetcher(url):
"""
Custom way to fetch protected photos for weasyprint.
"""
if '/media/photos/' in url:
url = url.split('/media/')[1]
with open(os.path.join(settings.MEDIA_ROOT, url)) as asset:
contents = asset.read()
return dict(string=contents)
else:
return weasyprint.default_url_fetcher(url)

You may save some memory by not loading the whole file in memory and using return dict(file_obj=open(…)). WeasyPrint will take care of closing the file. Though maybe the difference is not that big, since this data is short-lived anyway.

Good point. Thanks again and great project by the way :)

Please close the issue if you found a solution that works for you :)

Just to add to @mzu thread. I've had the same Pixbuf error: Unrecognized image file format error on Gentoo machine and it wasn't reletated to Ngnix .

To solve it I had to recompile gdk-pixbuf with following flags

USE="jpeg jpeg2k tiff" emerge -av x11-libs/gdk-pixbuf

By the way @SimonSapin , I've used several pdf generating libs so far ( rst2pdf being the last one ) and I only wish I had come across weasyprint earlier!! Thanks

@zzart, that sounds like a different issue. gdk-pixbuf-query-loaders would probably have given different results from @mzu’s, before you rebuilt PixBuf. I blame Gentoo’s over-zealousness in not enabling flags by default. “Who needs JPEG support in an image decoding library?”

I had the same Pixbuf error: Unrecognized image file format error, updated shared-mime-info and error gone

Im having the same issue on macOS. Im new to WeasyPrint and links mentioned by @SimonSapin are broken. Any clues on how to fix?

Thanks. Yes, the "URL fetcher" did the job. I ended up opening the file directly. Here is the code for reference:

def photo_url_fetcher(url):
    """
    Custom way to fetch protected photos for weasyprint. 
    """
    if '/media/photos/' in url:
        url = url.split('/media/')[1]
        with open(os.path.join(settings.MEDIA_ROOT, url)) as asset:
            contents = asset.read()
        return dict(string=contents)
    else:
        return weasyprint.default_url_fetcher(url)

I have ran into the same issue with using X-Accel-Redirect to protected uploaded files, I have slightly modified your code to work with Python3 and Django 1.11.x:

def photo_url_fetcher(url):
    """
    Custom way to fetch protected photos for weasyprint.
    """
    if '/uploads/' in url:
        url = url.split('/uploads/')[1]
        with open(os.path.join(settings.MEDIA_ROOT, url), 'rb') as asset:
            contents = asset.read()
        return dict(string=contents)
    else:
        return weasyprint.default_url_fetcher(url)

Hello, I am getting the same issue when i enable logging for weasyprint.

Failed to load image at "" (Pixbuf error: Unrecognized image file format)

I know my issue but i don't know how to solve it in respect to weayprint. My issue is the image is served with non trusted certificate. so it's an ssl issue and i am not sure how can i disable ssl check in weasyprint

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mjbeyeler picture mjbeyeler  ·  4Comments

grewn0uille picture grewn0uille  ·  4Comments

elcolie picture elcolie  ·  4Comments

zopyx picture zopyx  ·  5Comments

bjornasm picture bjornasm  ·  3Comments