Pymupdf: Memory Leak In getPixmap

Created on 29 Oct 2019 · 3Comments · Source: pymupdf/PyMuPDF

_Please provide all mandatory information!_

Describe the bug (mandatory)

getPixmap cause memory leak when pdf file has many images

To Reproduce (mandatory)

Explain the steps to reproduce the behavior, For example, include a minimal code snippet, example files, etc.

import fitz
import tracemalloc

tracemalloc.start(1000)
t1 = tracemalloc.take_snapshot()
for i in range(100) :
doc = fitz.open("c:\temp\bbb.pdf")
page = doc.loadPage(0)
mat = fitz.Matrix(3, 3)
pix = page.getPixmap(matrix = mat)
del pix
del page
doc.close()
t2 = tracemalloc.take_snapshot()

stats = t2.compare_to(t1, 'lineno')

for stat in stats :
print(stat)

bbb.pdf has many bitmap object

Expected behavior (optional)

Describe what you expected to happen (if not obvious).

Screenshots (optional)

If applicable, add screenshots to help explain your problem.

Your configuration (mandatory)

Operating system, potentially version and bitness
Python version, bitness
PyMuPDF version, installation method (wheel or generated from source).

OS :
windows 10 / 64 bit

python :
C:\Python37>python
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 19:29:22) [MSC v.1916 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

PyMuPDF
1.16.5

For example, the output of print(sys.version, "\n", sys.platform, "\n", fitz.__doc__) would be sufficient (for the first two bullets).

3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 19:29:22) [MSC v.1916 32 bit (Intel)]
win32

PyMuPDF 1.16.5: Python bindings for the MuPDF 1.16.0 library.
Version date: 2019-10-13 22:50:03.
Built for Python 3.7 on win32 (32-bit).

Additional context (optional)

tracemalloc result
C:\Python37\lib\site-packages\fitz\fitz.py:4766: size=210 MiB (+210 MiB), count=3203 (+3203), average=67.2 KiB
C:\Python37\lib\site-packages\fitz\fitz.py:3502: size=22.3 MiB (+22.3 MiB), count=3201 (+3201), average=7306 B

bug

Source

cherryjo18

All 3 comments

I believe this is not the case.
The base library MuPDF in contrast deliberately keeps things in memory - up to some provided limit, and - for images - even after the document has been closed.
A realistic scenario for measuring memory occupation therefore must not only close the document, but also forcedly free this MuPDF buffer before measuring memory deltas.
PyMuPDF has tools to do exactly this. Here is a script to convince you. It reads a cli-provided document niter times and creates pixmaps for each page of it.
A sample output for 100 page scientific magazine with 1137 images produced this output (niter = 10). The Adobe manual with is 180 images on 1310 pages looks similar.

File 'sdw_2015_06.pdf'
Memory deltas (MB) for Page.getPixmap
0 stop delta 5.8
1 stop delta 0.9
2 stop delta 0.5
3 stop delta -0.1
4 stop delta 0.4
5 stop delta -0.7
6 stop delta 0.6
7 stop delta -0.3
8 stop delta 0.6
9 stop delta 0.1
================================================================================
Total pages processed: 1000
Duration: 34 sec

memory-pix.zip