Page 0 of pct_481288.pdf contain one embedded image, however page.getText("dict",flags = fitz.TEXT_INHIBITS_SPACES) returns only the text blocks.
path = './pct_481288.pdf'
doc = fitz.open(path)
pg_0 = doc[0]
pg_dc = page.getText("dict",flags = fitz.TEXT_INHIBIT_SPACES) # could be any flags
for blk in pg_dc["blocks"]:
if blk["type"] == 1:
print(blk)
1 image block should be printed.
[GCC 8.3.0] linux
PyMuPDF 1.16.10: Python bindings for the MuPDF 1.16.0 library.
Version date: 2019-12-21 07:31:32.
Built for Python 3.7 on linux (64-bit).
Possible indentation issue at line 465 and 466, causing the TEXT_PRESERVE_IMAGES line skipped when flags is not None.
https://github.com/pymupdf/PyMuPDF/blob/4546862accd82f3b746578c2d8bab227229f6327/fitz/utils.py#L463-L466
I understand. It's very early in the morning here and I am still busy with my first cup of coffee, so maybe I am overlooking something.
But the intention is to _allow suppressing images_ for outputs supporting them (because images are so damn large). So there is a default flag combination for each output type, which is taken if flags=None. If flags is not None, then this means the developer knows what he doing and no further logic is applied.
Thanks @JorjMcKie. It is clear now.
I will add some comment in the doc to clarify a bit more.