Attach (recommended) or Link to PDF file here:
EMG8 -Cambridge Essentail Gold Maths 8-B 117.pdf
Configuration:
Steps to reproduce the problem:
What is the expected behavior? (add screenshot)
On any other PDF reader it renders fine:
Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
Current github pages viewer: https://mozilla.github.io/pdf.js/web/viewer.html
Notes:
We've managed make it work by editing the fonts within the PDF. It is currently "Times New Roman PS", re-rendering it with just "Times New Roman" seems to fix it.
There are no console errors, or any other visible signs of solutions such as missing CMaps.
Unfortunately we are not allowed to alter PDFs, so re-rendering each one is not a viable solution.
If anyone can give any insight into this and a possible solution, that would be massively appreciated 馃
The PDF defines the same fonts many times. For example, font LULQLP+TimesLTStd-Roman
is defined nine times. Each one refers to the same FontDescriptor and the same embedded CFF data stream.
There is a font hash computation in PartialEvaluator.prototype.preEvaluateFont()
in core/evaluator.js. It adds entries Encoding, ToUnicode, and Widths in the hash. Some fonts in the PDF get identical hash codes because all the mentioned entries are identical, even Widths. Only entries FirstChar and LastChar differ. If fonts get identical hash codes, could it cause a font to be skipped so that it won't be converted to OpenType?
Here is a reduced PDF that contains two fonts from the original PDF
issue10665_reduced.pdf
There is a font hash computation in
PartialEvaluator.prototype.preEvaluateFont()
in core/evaluator.js. It adds entries Encoding, ToUnicode, and Widths in the hash. Some fonts in the PDF get identical hash codes because all the mentioned entries are identical, even Widths. Only entries FirstChar and LastChar differ.
Really excellent analysis, thank you; this made the bug easy to fix!
If fonts get identical hash codes, could it cause a font to be skipped so that it won't be converted to OpenType?
In some badly generated PDF files there can be huge amounts of identical fonts, and the purpose of preEvaluateFont
was simply to avoid having to load/parse duplicate ones. Hence loadFont
will compare hash
es, and if possible use an already loaded/parsed font.
Obviously this all hinges on the fact that the hash
es are actually correct/unique, but fortunately there's been relatively few bugs in that code over the years.
This is awesome thanks so much for the help!
Most helpful comment
The PDF defines the same fonts many times. For example, font
LULQLP+TimesLTStd-Roman
is defined nine times. Each one refers to the same FontDescriptor and the same embedded CFF data stream.There is a font hash computation in
PartialEvaluator.prototype.preEvaluateFont()
in core/evaluator.js. It adds entries Encoding, ToUnicode, and Widths in the hash. Some fonts in the PDF get identical hash codes because all the mentioned entries are identical, even Widths. Only entries FirstChar and LastChar differ. If fonts get identical hash codes, could it cause a font to be skipped so that it won't be converted to OpenType?Here is a reduced PDF that contains two fonts from the original PDF
issue10665_reduced.pdf