_Please provide all mandatory information!_
getPixmap cause memory leak when pdf file has many images
Explain the steps to reproduce the behavior, For example, include a minimal code snippet, example files, etc.
import fitz
import tracemalloc
tracemalloc.start(1000)
t1 = tracemalloc.take_snapshot()
for i in range(100) :
doc = fitz.open("c:\temp\bbb.pdf")
page = doc.loadPage(0)
mat = fitz.Matrix(3, 3)
pix = page.getPixmap(matrix = mat)
del pix
del page
doc.close()
t2 = tracemalloc.take_snapshot()
stats = t2.compare_to(t1, 'lineno')
for stat in stats :
print(stat)
bbb.pdf has many bitmap object
Describe what you expected to happen (if not obvious).
If applicable, add screenshots to help explain your problem.
OS :
windows 10 / 64 bit
python :
C:\Python37>python
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 19:29:22) [MSC v.1916 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
PyMuPDF
1.16.5
For example, the output of print(sys.version, "\n", sys.platform, "\n", fitz.__doc__) would be sufficient (for the first two bullets).
3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 19:29:22) [MSC v.1916 32 bit (Intel)]
win32
PyMuPDF 1.16.5: Python bindings for the MuPDF 1.16.0 library.
Version date: 2019-10-13 22:50:03.
Built for Python 3.7 on win32 (32-bit).
tracemalloc result
C:\Python37\lib\site-packages\fitz\fitz.py:4766: size=210 MiB (+210 MiB), count=3203 (+3203), average=67.2 KiB
C:\Python37\lib\site-packages\fitz\fitz.py:3502: size=22.3 MiB (+22.3 MiB), count=3201 (+3201), average=7306 B
I believe this is not the case.
The base library MuPDF in contrast deliberately keeps things in memory - up to some provided limit, and - for images - even after the document has been closed.
A realistic scenario for measuring memory occupation therefore must not only close the document, but also forcedly free this MuPDF buffer before measuring memory deltas.
PyMuPDF has tools to do exactly this. Here is a script to convince you. It reads a cli-provided document niter times and creates pixmaps for each page of it.
A sample output for 100 page scientific magazine with 1137 images produced this output (niter = 10). The Adobe manual with is 180 images on 1310 pages looks similar.
File 'sdw_2015_06.pdf'
Memory deltas (MB) for Page.getPixmap
0 stop delta 5.8
1 stop delta 0.9
2 stop delta 0.5
3 stop delta -0.1
4 stop delta 0.4
5 stop delta -0.7
6 stop delta 0.6
7 stop delta -0.3
8 stop delta 0.6
9 stop delta 0.1
================================================================================
Total pages processed: 1000
Duration: 34 sec
thank for your help.
TOOLS.store_shrink(100) works well
PyMuPDF is BEST!!
Thanks for the feedback!