Pillow: Saving GIF uses huge amount of memory

Created on 28 Aug 2016  Â·  11Comments  Â·  Source: python-pillow/Pillow

I'm encountering limits processing GIF's. I've noticed that saving GIF's uses a massive amount of memory.

The example below opens a 6.5M GIF, rotates it and saves it to a new file, 4.5M GIF. Somehow the function Image.save uses ~1G to get it done!

Doing the same thing with a jpeg uses less than 1M.

Demonstration

The entire script is shown in the profiler output.

Profiler results

GIF

$ python -m memory_profiler pillow_issue_gif.py
Filename: pillow_issue_gif.py

Line #    Mem usage    Increment   Line Contents
================================================
     3   18.934 MiB    0.000 MiB   @profile
     4                             def f():
     5   20.012 MiB    1.078 MiB       im = Image.open('MountWilson.gif')
     6                                 #im.rotate(45).show()
     7  129.625 MiB  109.613 MiB       im2 = im.rotate(45)
     8 1153.125 MiB 1023.500 MiB       im2.save('Test-new.gif', "GIF")

JPEG

$ python -m memory_profiler pillow_issue_jpeg.py
Filename: pillow_issue_jpeg.py

Line #    Mem usage    Increment   Line Contents
================================================
     3   18.980 MiB    0.000 MiB   @profile
     4                             def f():
     5   20.055 MiB    1.074 MiB       im = Image.open('MountWilson.jpeg')
     6                                 #im.rotate(45).show()
     7  308.359 MiB  288.305 MiB       im2 = im.rotate(45)
     8  308.555 MiB    0.195 MiB       im2.save('Test-new.jpeg', "JPEG")

I'm kinda surprised the rotate operation uses so much memory too.

Input files

$ llg MountWilson
-rw-r--r--@  1 minmac  staff   6.5M 28 Aug 09:40 MountWilson.gif
-rw-r--r--@  1 minmac  staff   2.2M 28 Aug 09:43 MountWilson.jpeg

Output files

$ llg Test
-rw-r--r--   1 minmac  staff   1.7M 28 Aug 16:21 Test-new.jpeg
-rw-r--r--   1 minmac  staff   4.9M 28 Aug 16:24 Test-new.gif

Environment Information

OSX 10.11.6
python 2.7.12
pillow 3.1.1

I haven't profiled it on linux but i know its happening because I'm reaching preset 1GB memory limit on AWS lambda which is running:

  • Linux kernel version – 4.1.27-25.49.amzn1.x86_64
  • python 2.7
  • pillow 3.3.0
Bug GIF Palette Issue Performance

Most helpful comment

Roughly, if you've got continuous tone, like a photograph, JPEG is your best bet. There's lossy compression that is tripped up with high contrast edges, like text.

If have images which aren't continuous, PNG is generally the best choice. It can do full color or palette images, (though, lossless compression of photos is a lot larger than JPEG's lossy) and it handles most sorts of transparency. Its compression is basically gzip.

GIF predates both of them, and is a 8 bit (or less) palette image. It's got 1 bit of transparency. I'd say the biggest use of GIF these days is for multiframe images.

All 11 comments

How big are the images in pixels?

FWIW, there are a couple of possibilities for the GIF taking more memory, but it's hard to tell at this granularity, since are several things that are happening at the save command. I'd have to guess that it's quantization, but that's with no data.

The images are 9566 × 3909.

~1e4 times is quite a lot _more_ though isn't it.

With that pixel size, the raw size of the image is h x w x 4 bytes, which is ... about 150 megs. So your peak memory usage on save is O(5x) uncompressed image size.

Stepping through this, when you open it, you're loading the header. Which is small.
Once you rotate it, you're loading the data into im, rotating it, and saving that copy into im2. That explains the 300 megs for the JPEG. The GIF is apparently taking a lot less memory in that step, which may mean that the interpolation in rotate is happening in the P mode, which would be... potentially bad for quality.

Here is where the processes diverge. The JPEG save is essentially a streamable function that compresses blocks as they come out of the image. It's got overhead, but (for the most part) it's not related to the entire image size. The GIF makes at least one copy, and potentially more in the quantization step.

There's probably some inefficiency in there, but I'd have to dig a little deeper to see exactly what modes you're getting and where the copies are coming from.

So, with a 36Mpx gif, I'm not seeing the same results. I've got a 140 Meg uncompressed size, both in P mode. It looks like it's taking the fast path, and I'm not even certain why it's apparently not copying the image in the _save unless the memory profiler just isn't picking that up.

Filename: test_large_gif.py

Line #    Mem usage    Increment   Line Contents
================================================
     3   15.598 MiB    0.000 MiB   @profile
     4                             def main():
     5   15.766 MiB    0.168 MiB       im = Image.open('lg.gif')
     6   15.773 MiB    0.008 MiB       print im.mode, im.tile
     7   15.773 MiB    0.000 MiB       print im.size, (im.width * im.height * 4) / (1024 *1024)
     8   85.840 MiB   70.066 MiB       im.load()
     9  120.875 MiB   35.035 MiB       im2 = im.rotate(45)
    10  120.875 MiB    0.000 MiB       print im2.mode
    11  156.105 MiB   35.230 MiB       im2.save('lg_out.gif')

Valgrind is not showing anything remarkably different either:

--------------------------------------------------------------------------------
Command:            python test_large_gif.py
Massif arguments:   (none)
ms_print arguments: massif.out.20111
--------------------------------------------------------------------------------


    MB
149.3^                                                               ::       
     |                                           ####:::::::::@::::::: @::::::
     |                                           #   : : ::: :@:: :::: @: : ::
     |                                           #   : : ::: :@:: :::: @: : ::
     |                                           #   : : ::: :@:: :::: @: : ::
     |                                 @@@@@@@@@@#   : : ::: :@:: :::: @: : ::
     |                                 @         #   : : ::: :@:: :::: @: : ::
     |                                 @         #   : : ::: :@:: :::: @: : ::
     |                                 @         #   : : ::: :@:: :::: @: : ::
     |                                 @         #   : : ::: :@:: :::: @: : ::
     |                                 @         #   : : ::: :@:: :::: @: : ::
     |                                 @         #   : : ::: :@:: :::: @: : ::
     |                                 @         #   : : ::: :@:: :::: @: : ::
     |                                 @         #   : : ::: :@:: :::: @: : ::
     |                                 @         #   : : ::: :@:: :::: @: : ::
     |    @:::::::::@:::::::::::@::::::@         #   : : ::: :@:: :::: @: : ::
     |    @:: :::: :@::::::: :::@: ::: @         #   : : ::: :@:: :::: @: : ::
     |    @:: :::: :@::::::: :::@: ::: @         #   : : ::: :@:: :::: @: : ::
     |    @:: :::: :@::::::: :::@: ::: @         #   : : ::: :@:: :::: @: : ::
     |   :@:: :::: :@::::::: :::@: ::: @         #   : : ::: :@:: :::: @: : ::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   5.627

Number of snapshots: 56
 Detailed snapshots: [4, 12, 23, 28, 30 (peak), 37, 44]

Thanks @wiredfool. For now I'm happy with using JPEG but I suppose I would like to know how my GIF differs to yours.

I've attached the input GIF here if you're interested.

mount_wilson

Ok, so there's something different about that gif (and consistent with what you're seeing) -- I'm seeing this in valgrind:

--------------------------------------------------------------------------------
Command:            python test_large_gif.py
Massif arguments:   (none)
ms_print arguments: massif.out.23666
--------------------------------------------------------------------------------


    GB
1.311^                                                                     #  
     |        @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#  
     |        @                                                            #  
     |        @                                                            #  
     |        @                                                            #  
     |       :@                                                            #::
     |       :@                                                            #::
     |       :@                                                            #::
     |       :@                                                            #::
     |       :@                                                            #::
     |      ::@                                                            #::
     |      ::@                                                            #::
     |      ::@                                                            #::
     |      ::@                                                            #::
     |      ::@                                                            #::
     |      ::@                                                            #::
     |      ::@                                                            #::
     |      ::@                                                            #::
     |     @::@                                                            #::
     |   @@@::@                                                            #::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   29.88

Looking at the memory allocations, it's certainly different. At the top level, there's:

->64.28% (904,546,272B) 0x572C27: ??? (in /home/erics/vpy27/bin/python)
->21.26% (299,168,952B) 0x4FA112: PyList_New (in /home/erics/vpy27/bin/python)
->10.63% (149,573,976B) 0x8020C0F: ImagingNewArray (Storage.c:315)

(and that one is two copies and two new allocations):

For reference, the allocations for my previous one looked more like this, which is 93% image storage including two new allocations and two copies.:

->93.92% (147,000,000B) 0x8020C0F: ImagingNewArray (Storage.c:315)

As to why it's happening... I still suspect quantization. I'm wondering if there's a different code path when the palette is full vs when the palette is empty.

Right. This is suboptimal. Remapping the image palette in python by looping over every byte of the image, using a range, when the number of pixels is large.

https://github.com/python-pillow/Pillow/blob/master/PIL/GifImagePlugin.py#L591

    if _get_optimize(im, info):
        used_palette_colors = _get_used_palette_colors(im)

        # create the new palette if not every color is used
        if len(used_palette_colors) < 256:
            palette_bytes = b""
            new_positions = {}

            i = 0
            # pick only the used colors from the palette
            for oldPosition in used_palette_colors:
                palette_bytes += source_palette[oldPosition*3:oldPosition*3+3]
                new_positions[oldPosition] = i
                i += 1

            # replace the palette color id of all pixel with the new id
            image_bytes = bytearray(im.tobytes())
            for i in range(len(image_bytes)):
                image_bytes[i] = new_positions[image_bytes[i]]
            im.frombytes(bytes(image_bytes))
            new_palette_bytes = (palette_bytes +
                                 (768 - len(palette_bytes)) * b'\x00')
            im.putpalette(new_palette_bytes)
            im.palette = ImagePalette.ImagePalette("RGB",
                                                   palette=palette_bytes,
                                                   size=len(palette_bytes))

It dates to this commit, to fix #211 https://github.com/python-pillow/Pillow/commit/a466b3e09982ebcf5aee2cbe957ce90e6ddfd0ae

So, for a workaround, when saving a largish gif:

im.save('foo.gif', optimize=False)

Future possible fixes:
1) Don't optimize palettes for anything bigger than N mpix.
2) Don't optimize palettes if the palette doesn't change
3) Don't do it in python by looping over every pixel.
4) There should be a palette mapping function in the c layer of the palette code.

Also, FWIW, that image is probably best off as a PNG.

Right. Wow! Thanks @wiredfool .

Excuse my ignorance (and perhaps a link is better than an answer) but what makes PNG the best format in this case?

Roughly, if you've got continuous tone, like a photograph, JPEG is your best bet. There's lossy compression that is tripped up with high contrast edges, like text.

If have images which aren't continuous, PNG is generally the best choice. It can do full color or palette images, (though, lossless compression of photos is a lot larger than JPEG's lossy) and it handles most sorts of transparency. Its compression is basically gzip.

GIF predates both of them, and is a 8 bit (or less) palette image. It's got 1 bit of transparency. I'd say the biggest use of GIF these days is for multiframe images.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

damianmoore picture damianmoore  Â·  4Comments

SysoevDV picture SysoevDV  Â·  3Comments

vytisb picture vytisb  Â·  4Comments

thinrhino picture thinrhino  Â·  3Comments

Larivact picture Larivact  Â·  4Comments