It's my understanding that for pyMuPDF the page origin is top left. However in some pdfs that am trying to read, only when it comes to drawing images it seems bottom left.
Even though the extracted text from the document gives me the position as 133,61,354,81, when I try to page.drawRect, the rectangle gets drawn in the bottom instead of the top. The y coordinates all seem messed up. Drawing consecutive rectangles are never coming correctly.
Any insight into why text extract maintains top left as opposed to image drawing taking the bottom right would be appreciated.
Unfortunately, I cannot share the document am working on. however, am sure you all might have encountered these kinds of documents somewhere
Both text extract and draw utils to have same origin reference
Unable to share at this point, if you absolutely need it, I will see if I can do something.
3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)]
darwin
PyMuPDF 1.16.2: Python bindings for the MuPDF 1.16.0 library.
Version date: 2019-09-12 17:43:20.
Built for Python 3.7 on darwin (64-bit).
Add any other context about the problem here.
This should be one of the commonly observed issues which can be solved in a standard way described here.
Jor, the issue is not with the wrapping of text. What I need to do is highlight some text by drawing red rectangles over them. So when I get a coordinate for text like 10,10,30,40 and when I do a page.drawRect(fitz.Rect(10,10,30,40),color=(1,0,0)), it goes and draw the rectangle at the bottom of the page rather than the top.
This happens only for some documents not all. At the very least I need to know for which pdfs/pages have the origin changed. I can change my drawing logic accordingly
My reference isn't about text wrapping either. It is about wrapping the page's /Contents stream to re-establish standard geometry where needed.
Like it is written in the documentation:
page._isWrapped. I assume, the answer will be False for your problematic pages.False then execute page._wrapContents()Thanks, Jor. It worked. Amazing :)
You are welcome - have fun with the package!
Most helpful comment
My reference isn't about text wrapping either. It is about wrapping the page's /Contents stream to re-establish standard geometry where needed.
Like it is written in the documentation:
page._isWrapped. I assume, the answer will beFalsefor your problematic pages.Falsethen executepage._wrapContents()