Pymupdf: page.drawRect drawing incorrectly

Created on 24 Sep 2019  Â·  5Comments  Â·  Source: pymupdf/PyMuPDF

Describe the bug (mandatory)

It's my understanding that for pyMuPDF the page origin is top left. However in some pdfs that am trying to read, only when it comes to drawing images it seems bottom left.
Even though the extracted text from the document gives me the position as 133,61,354,81, when I try to page.drawRect, the rectangle gets drawn in the bottom instead of the top. The y coordinates all seem messed up. Drawing consecutive rectangles are never coming correctly.

Any insight into why text extract maintains top left as opposed to image drawing taking the bottom right would be appreciated.

To Reproduce (mandatory)

Unfortunately, I cannot share the document am working on. however, am sure you all might have encountered these kinds of documents somewhere

Expected behavior (optional)

Both text extract and draw utils to have same origin reference

Screenshots (optional)

Unable to share at this point, if you absolutely need it, I will see if I can do something.

Your configuration (mandatory)

  • Operating system - Mac OS 10.14.6
  • Python version - 3.7
  • PyMuPDF version - 1.16.2

3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)]
darwin

PyMuPDF 1.16.2: Python bindings for the MuPDF 1.16.0 library.
Version date: 2019-09-12 17:43:20.
Built for Python 3.7 on darwin (64-bit).

Additional context (optional)

Add any other context about the problem here.

bug

Most helpful comment

My reference isn't about text wrapping either. It is about wrapping the page's /Contents stream to re-establish standard geometry where needed.
Like it is written in the documentation:

  1. Check whether a wrapping might be needed: page._isWrapped. I assume, the answer will be False for your problematic pages.
  2. If False then execute page._wrapContents()
  3. Execute your draw methods

All 5 comments

This should be one of the commonly observed issues which can be solved in a standard way described here.

Jor, the issue is not with the wrapping of text. What I need to do is highlight some text by drawing red rectangles over them. So when I get a coordinate for text like 10,10,30,40 and when I do a page.drawRect(fitz.Rect(10,10,30,40),color=(1,0,0)), it goes and draw the rectangle at the bottom of the page rather than the top.
This happens only for some documents not all. At the very least I need to know for which pdfs/pages have the origin changed. I can change my drawing logic accordingly

My reference isn't about text wrapping either. It is about wrapping the page's /Contents stream to re-establish standard geometry where needed.
Like it is written in the documentation:

  1. Check whether a wrapping might be needed: page._isWrapped. I assume, the answer will be False for your problematic pages.
  2. If False then execute page._wrapContents()
  3. Execute your draw methods

Thanks, Jor. It worked. Amazing :)

You are welcome - have fun with the package!

Was this page helpful?
0 / 5 - 0 ratings