Memory is slowly leaking when process many image files one by one.
Leak size:
100K images --> ~100MB
300K images --> ~200MB
1M images --> ~1GB
Here you can find reproducing code examples: https://github.com/PyWavelets/pywt/issues/180
@dmpetrov Please can you give a minimal code example reproducing the problem that uses Pillow but no third-party libraries like numpy or pywt? Thanks!
import os
import PIL
from PIL import Image
dir = "/Volumes/Seagate/storage/kaggle/avito/images"
onlyfiles = []
dirs = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
for d in dirs:
filedir = os.path.join(dir, str(d))
files = [f for f in os.listdir(filedir) if os.path.isfile(os.path.join(filedir, f))]
filepaths = map( lambda x: os.path.join(filedir, x), files )
onlyfiles += list(filepaths)
s = 0
for i in range(len(onlyfiles)):
fname = onlyfiles[i]
image = PIL.Image.open(fname)
# Do something
s += image.size[0]*image.size[1]
if i % 10000 == 0:
print('iteration ', i)
input("Press Enter to continue...")
Input data: image dataset (10M small images) from Kaggle competition https://www.kaggle.com/c/avito-duplicate-ads-detection
Result: It starts with 221MB for the file names.
500 images - 374MB
1M images - 510MB
gc() inside the loop did not help.
Do we know what release this started in? I'd like to pin to a lower release to mitigate any production impact without having to set memory scaling alarm/actions.
@dmpetrov I can't reproduce this on Ubuntu 14.04 with Python 2.7 on Pillow master. Could you tell your Pillow, Python, OS, libjpeg versions?
@damaestro This is not confirmed yet.
@dmpetrov I've tried Python 2.7 and 3.4 under OS X. I can confirm the leak in 3.4.
So, this is a minimal test case, which I came to:
https://gist.github.com/homm/c6a79ce7b445f47a74a4d296742a5af8
As you can see, there is no Pillow at all! I'm pretty sure this is pure Python 3 memory leak.
The script accepts path to the root folder with tons of files as argument and iterates all files in the folder twice: First time it opens the files and calls .close() method. At this stage, memory usage doesn't grow. Second time the script doesn't call .close() method and memory usage constanlty increasing. I've tested Python 3.4 and 3.5 on OS X and Python 3.4 on Ubuntu 14.04. It doesn't appear on Python 2.7.
This is Python's memory leak because there are no pointers to the file objects from the application space. This is most likely only memory leak. At least lsof doesn't show any opened files after script's end. This leak appears only on different files. It doesn't work if we are reopening one file again and again. Average memory consumption is 370 bytes per file.
As I understand we can't fix it on Pillow's level because we are assuming that file pointer may be shared with the other code. There are two possibilities: It can be fixed in Python itself. Or it can be fixed on the application level. Something like this:
f = open(filename, 'rb')
image = Image.open(f)
image.load()
f.close()
instead of Image.open(filename)
I want to Invite @asvetlov to the thread. Maybe he could clarify something.
What happens when using with open(file.path, 'rb', 0) as f:?
@homm thank you for the investigation. It is becoming more and more interesting
Yes, I use python 3.5. As far as I know, Python does not guarantee that file will be closed. So, it might be Python 3+ "feature", not a bug.
It would be great to have opinions of Python experts.
Python does not guarantee that file will be closed.
Of course it guarantees. Moreover, it guarantees that exactly .close() method will be called. It doesn't guarantee when this will be done, that is all.
But as I said we are not speaking about the file descriptors leak. All files are closed. The problem is there is a memory leak.
I'm seeing the issue with Py 2.7 running thumbor, so I don't think this is isolated to Py 3.x.

@damaestro
so I don't think this is isolated to Py 3.x.
Any reasons why you are thinking this is the same leak? Memory leaks are not isolated to Python 3 or Python or any other platform or library. In this thread, we are discussing leak in Python 3 which affects Pillow users. You can report thumbor leaks in appropriate place.
@hugovk with file statement correctly frees all memory like .close() do.
Guys, sorry.
I really don't know what the difference between f.close()/with f and del f except deleting of non-closed file should raise a warning along with actual file closing.
But explicit resource cleanup (either via with statement or by .close() call is considered as good practice).
deleting of non-closed file should raise a warning
I haven't met this in the docs. Could you elaborate?
I'll try to simplify the test and fill report to the Python bug tracker at holidays.
Also, I'm sure that we can at least explicit close files which were opened in Image.open by the filename. This should fix bug on our side for most of the users.
Looks like this is exactly warnings issue. If I'm adding f._dealloc_warn(f) right before close, the memory is also leakings.
The memory is allocated to store all already occurred warnings to prevent occurrence second time. Horrible decision in my opinion. Not closed files might lead to the memory leaks, while storing warning message for all files is absolutely guaranteed memory leak.
@homm no reason. I'll continue to follow this discussion and if I find something indicating it's an issue with Pillow I'll speak up. Thanks.
The issue in the Python tracker: http://bugs.python.org/issue27535
Sorry, moving to 3.5.0 cause of lack of time. The workaround for this is using file objects instead of file names:
with open(filename, 'rb') as f:
image = Image.open(f)
# We need to consume the whole file inside the `with` statement
image.load()
. . .
# Unref here to be sure that there is nothing happens
# with the image after file is closed
image = None
@homm Sorry, it's a little unclear what the status of this bug is. Can you confirm that the bug is still present in 3.4.2 and 4.0.0? Thank you
@wiredfool Does it fixed?
It was partly fixed in 4.1.0, but for some formats (GIF, for example), the problem is still exist.
Is this now fixed on master?
If anyone in this issue has thoughts on #3577, that could be helpful.
The issue in the Python tracker: http://bugs.python.org/issue27535
This has been fixed in Python 3.7.
Note that implicitly closing an image's file has now been deprecated - https://pillow.readthedocs.io/en/stable/releasenotes/6.1.0.html#deprecations
Support has now been removed for implicitly closing an image's file - #3577
It's hard to follow this issue can someone explain what is the suggested solution?
@SaschaHeyer I think the problem discussed here is likely resolved. If you have a situation, I'd recommend that you open a new issue with a self-contained example
@radarhere
Thank you for your quick response.
I wanted to know how the issue is resolved.
What steps are required to solve?
If you are using the latest version of Pillow, then make sure that you close images properly. Either by explicitly calling close(),
im = Image.new("RGB", (100, 100))
im.close()
or using a context manager
with Image.open("hopper.jpg") as im:
pass
Let's close this, and we can re-open if needed, or open a new issue.
Most helpful comment
Support has now been removed for implicitly closing an image's file - #3577