I'm encountering limits processing GIF's. I've noticed that saving GIF's uses a massive amount of memory.
The example below opens a 6.5M GIF, rotates it and saves it to a new file, 4.5M GIF. Somehow the function Image.save uses ~1G to get it done!
Doing the same thing with a jpeg uses less than 1M.
The entire script is shown in the profiler output.
$ python -m memory_profiler pillow_issue_gif.py
Filename: pillow_issue_gif.py
Line # Mem usage Increment Line Contents
================================================
3 18.934 MiB 0.000 MiB @profile
4 def f():
5 20.012 MiB 1.078 MiB im = Image.open('MountWilson.gif')
6 #im.rotate(45).show()
7 129.625 MiB 109.613 MiB im2 = im.rotate(45)
8 1153.125 MiB 1023.500 MiB im2.save('Test-new.gif', "GIF")
$ python -m memory_profiler pillow_issue_jpeg.py
Filename: pillow_issue_jpeg.py
Line # Mem usage Increment Line Contents
================================================
3 18.980 MiB 0.000 MiB @profile
4 def f():
5 20.055 MiB 1.074 MiB im = Image.open('MountWilson.jpeg')
6 #im.rotate(45).show()
7 308.359 MiB 288.305 MiB im2 = im.rotate(45)
8 308.555 MiB 0.195 MiB im2.save('Test-new.jpeg', "JPEG")
I'm kinda surprised the rotate operation uses so much memory too.
$ llg MountWilson
-rw-r--r--@ 1 minmac staff 6.5M 28 Aug 09:40 MountWilson.gif
-rw-r--r--@ 1 minmac staff 2.2M 28 Aug 09:43 MountWilson.jpeg
$ llg Test
-rw-r--r-- 1 minmac staff 1.7M 28 Aug 16:21 Test-new.jpeg
-rw-r--r-- 1 minmac staff 4.9M 28 Aug 16:24 Test-new.gif
OSX 10.11.6
python 2.7.12
pillow 3.1.1
I haven't profiled it on linux but i know its happening because I'm reaching preset 1GB memory limit on AWS lambda which is running:
How big are the images in pixels?
FWIW, there are a couple of possibilities for the GIF taking more memory, but it's hard to tell at this granularity, since are several things that are happening at the save command. I'd have to guess that it's quantization, but that's with no data.
The images are 9566 × 3909.
~1e4 times is quite a lot _more_ though isn't it.
With that pixel size, the raw size of the image is h x w x 4 bytes, which is ... about 150 megs. So your peak memory usage on save is O(5x) uncompressed image size.
Stepping through this, when you open it, you're loading the header. Which is small.
Once you rotate it, you're loading the data into im, rotating it, and saving that copy into im2. That explains the 300 megs for the JPEG. The GIF is apparently taking a lot less memory in that step, which may mean that the interpolation in rotate is happening in the P mode, which would be... potentially bad for quality.
Here is where the processes diverge. The JPEG save is essentially a streamable function that compresses blocks as they come out of the image. It's got overhead, but (for the most part) it's not related to the entire image size. The GIF makes at least one copy, and potentially more in the quantization step.
There's probably some inefficiency in there, but I'd have to dig a little deeper to see exactly what modes you're getting and where the copies are coming from.
So, with a 36Mpx gif, I'm not seeing the same results. I've got a 140 Meg uncompressed size, both in P mode. It looks like it's taking the fast path, and I'm not even certain why it's apparently not copying the image in the _save unless the memory profiler just isn't picking that up.
Filename: test_large_gif.py
Line # Mem usage Increment Line Contents
================================================
3 15.598 MiB 0.000 MiB @profile
4 def main():
5 15.766 MiB 0.168 MiB im = Image.open('lg.gif')
6 15.773 MiB 0.008 MiB print im.mode, im.tile
7 15.773 MiB 0.000 MiB print im.size, (im.width * im.height * 4) / (1024 *1024)
8 85.840 MiB 70.066 MiB im.load()
9 120.875 MiB 35.035 MiB im2 = im.rotate(45)
10 120.875 MiB 0.000 MiB print im2.mode
11 156.105 MiB 35.230 MiB im2.save('lg_out.gif')
Valgrind is not showing anything remarkably different either:
--------------------------------------------------------------------------------
Command: python test_large_gif.py
Massif arguments: (none)
ms_print arguments: massif.out.20111
--------------------------------------------------------------------------------
MB
149.3^ ::
| ####:::::::::@::::::: @::::::
| # : : ::: :@:: :::: @: : ::
| # : : ::: :@:: :::: @: : ::
| # : : ::: :@:: :::: @: : ::
| @@@@@@@@@@# : : ::: :@:: :::: @: : ::
| @ # : : ::: :@:: :::: @: : ::
| @ # : : ::: :@:: :::: @: : ::
| @ # : : ::: :@:: :::: @: : ::
| @ # : : ::: :@:: :::: @: : ::
| @ # : : ::: :@:: :::: @: : ::
| @ # : : ::: :@:: :::: @: : ::
| @ # : : ::: :@:: :::: @: : ::
| @ # : : ::: :@:: :::: @: : ::
| @ # : : ::: :@:: :::: @: : ::
| @:::::::::@:::::::::::@::::::@ # : : ::: :@:: :::: @: : ::
| @:: :::: :@::::::: :::@: ::: @ # : : ::: :@:: :::: @: : ::
| @:: :::: :@::::::: :::@: ::: @ # : : ::: :@:: :::: @: : ::
| @:: :::: :@::::::: :::@: ::: @ # : : ::: :@:: :::: @: : ::
| :@:: :::: :@::::::: :::@: ::: @ # : : ::: :@:: :::: @: : ::
0 +----------------------------------------------------------------------->Gi
0 5.627
Number of snapshots: 56
Detailed snapshots: [4, 12, 23, 28, 30 (peak), 37, 44]
Thanks @wiredfool. For now I'm happy with using JPEG but I suppose I would like to know how my GIF differs to yours.
I've attached the input GIF here if you're interested.

Ok, so there's something different about that gif (and consistent with what you're seeing) -- I'm seeing this in valgrind:
--------------------------------------------------------------------------------
Command: python test_large_gif.py
Massif arguments: (none)
ms_print arguments: massif.out.23666
--------------------------------------------------------------------------------
GB
1.311^ #
| @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#
| @ #
| @ #
| @ #
| :@ #::
| :@ #::
| :@ #::
| :@ #::
| :@ #::
| ::@ #::
| ::@ #::
| ::@ #::
| ::@ #::
| ::@ #::
| ::@ #::
| ::@ #::
| ::@ #::
| @::@ #::
| @@@::@ #::
0 +----------------------------------------------------------------------->Gi
0 29.88
Looking at the memory allocations, it's certainly different. At the top level, there's:
->64.28% (904,546,272B) 0x572C27: ??? (in /home/erics/vpy27/bin/python)
->21.26% (299,168,952B) 0x4FA112: PyList_New (in /home/erics/vpy27/bin/python)
->10.63% (149,573,976B) 0x8020C0F: ImagingNewArray (Storage.c:315)
(and that one is two copies and two new allocations):
For reference, the allocations for my previous one looked more like this, which is 93% image storage including two new allocations and two copies.:
->93.92% (147,000,000B) 0x8020C0F: ImagingNewArray (Storage.c:315)
As to why it's happening... I still suspect quantization. I'm wondering if there's a different code path when the palette is full vs when the palette is empty.
Right. This is suboptimal. Remapping the image palette in python by looping over every byte of the image, using a range, when the number of pixels is large.
https://github.com/python-pillow/Pillow/blob/master/PIL/GifImagePlugin.py#L591
if _get_optimize(im, info):
used_palette_colors = _get_used_palette_colors(im)
# create the new palette if not every color is used
if len(used_palette_colors) < 256:
palette_bytes = b""
new_positions = {}
i = 0
# pick only the used colors from the palette
for oldPosition in used_palette_colors:
palette_bytes += source_palette[oldPosition*3:oldPosition*3+3]
new_positions[oldPosition] = i
i += 1
# replace the palette color id of all pixel with the new id
image_bytes = bytearray(im.tobytes())
for i in range(len(image_bytes)):
image_bytes[i] = new_positions[image_bytes[i]]
im.frombytes(bytes(image_bytes))
new_palette_bytes = (palette_bytes +
(768 - len(palette_bytes)) * b'\x00')
im.putpalette(new_palette_bytes)
im.palette = ImagePalette.ImagePalette("RGB",
palette=palette_bytes,
size=len(palette_bytes))
It dates to this commit, to fix #211 https://github.com/python-pillow/Pillow/commit/a466b3e09982ebcf5aee2cbe957ce90e6ddfd0ae
So, for a workaround, when saving a largish gif:
im.save('foo.gif', optimize=False)
Future possible fixes:
1) Don't optimize palettes for anything bigger than N mpix.
2) Don't optimize palettes if the palette doesn't change
3) Don't do it in python by looping over every pixel.
4) There should be a palette mapping function in the c layer of the palette code.
Also, FWIW, that image is probably best off as a PNG.
Right. Wow! Thanks @wiredfool .
Excuse my ignorance (and perhaps a link is better than an answer) but what makes PNG the best format in this case?
Roughly, if you've got continuous tone, like a photograph, JPEG is your best bet. There's lossy compression that is tripped up with high contrast edges, like text.
If have images which aren't continuous, PNG is generally the best choice. It can do full color or palette images, (though, lossless compression of photos is a lot larger than JPEG's lossy) and it handles most sorts of transparency. Its compression is basically gzip.
GIF predates both of them, and is a 8 bit (or less) palette image. It's got 1 bit of transparency. I'd say the biggest use of GIF these days is for multiframe images.
Most helpful comment
Roughly, if you've got continuous tone, like a photograph, JPEG is your best bet. There's lossy compression that is tripped up with high contrast edges, like text.
If have images which aren't continuous, PNG is generally the best choice. It can do full color or palette images, (though, lossless compression of photos is a lot larger than JPEG's lossy) and it handles most sorts of transparency. Its compression is basically gzip.
GIF predates both of them, and is a 8 bit (or less) palette image. It's got 1 bit of transparency. I'd say the biggest use of GIF these days is for multiframe images.