Pymupdf: Question: font replacement

Created on 23 Aug 2020 · 3Comments · Source: pymupdf/PyMuPDF

hi @JorjMcKie hope you are doing good today. i am a learner out of college

I have tried to build a piece of code with your fitz package probably you can take a look,will be grateful & appreciate your suggestions.

Input - PDF(Helvetica,Helvetica-Bold)
Expected Output - PDF(Courier-Bold Helvetica-Bold)
Actual Output - PDF(['Font Type: Type0, Font Name: Courier-Bold, Encoding: Identity-H', 'Font Type: Type0, Font Name: (null), Encoding: Identity-H']) missing some characters as well compared to input.

Is there any way that you can help me in successfully fetching all data including images,drawings?

Scenario: Given a PDF , read the current font embedding and convert its encoding from one format to other.
by using your awesome package am able to read current font embedding but unsure on how to change encodings any leads on its implementation?

Scenario : Given a PDF, raise exceptions for irregular font encoding(custom encoding)/irregular font embedding(unembedded fonts) for further processing.

embeddscript.zip

thanks in advance.

duplicate question

Source

harveyspecter09

All 3 comments

Tried your script: with a few minor fixes, it does work.
So I am not sure what you see as a problem ...
elsefontfile.zip

One or two comments:

When dealing with subset font names, names that start like "ABCDEF+...", try to get rid of that prefix. It complicates things and the next version of PyMuPDF will not include prefixes in span["font"] anymore.
I am currently developping an almost complete font replacer tool.It consists of 2 scripts: (1) creates a CSV file with information on existing fonts. The user can edit this file and enter new fonts for some all or all existing fonts. (2) reads the CSV file and rewrites the PDF based on that information.
- Also included in that tool is an automatic fontsize reduction by using fontTools: based on the characters actually used by each new font, font subsets are created and included in the output PDF. This significantly reduces the resulting output.
- In addition, other than for the text, the rest of the page layout is retained: all images, graphics, etc. remain where they are.

JorjMcKie on 23 Aug 2020

👍1

@JorjMcKie thanks for the timely update generous of you

Scenario: Given a PDF , read the current font embedding and convert its encoding from one format to other.
by using your awesome package am able to read current font embedding but unsure on how to change encodings any leads on its implementation?

Scenario : Given a PDF, raise exceptions for irregular font encoding(custom encoding)/irregular font embedding(unembedded fonts) for further processing.

any leads as of now would be helpful.
i am eagerly waiting for your new release which includes complete font replacer.

harveyspecter09 on 24 Aug 2020

i am eagerly waiting for your new release which includes complete font replacer.

To put things in the right perspective:
A font replacer will never become _part_ of PyMuPDF. The next version will just contain one or two new features, which the font replacer script needs that I am developping.
This script will always remain an _example_ and reside in a different repository. Also, like the rest of scripts in that repo, it will not be subject to the official issue management: anyone who believes there is an error is instead asked to submit a pull request containing corrective changes.

I am planning to release the next PyMuPDF version 1.17.6 in the course of this week. Around this time I will also upload the font replacer script.
Looking at it will answer your above questions, so please be patient ...

JorjMcKie on 24 Aug 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings