Attach (recommended) or Link to PDF file here:
Sample file_from_docbase.pdf
Configuration:
Steps to reproduce the problem:
What is the expected behavior?
Characters rendered correctly as in Acrobat
What went wrong?
Incorrect character in viewer - Acrobat shows the correct one
Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
https://mozilla.github.io/pdf.js/web/viewer.html
The following is printed in the console:
PDF 91d7d110e0edd2aeed3b25db4e68f818 [1.7 www.adlibsoftware.com: CTP (5.4.0.30881) OS (Windows 2012,2,0,64); modified using iText 2.1.7 by 1T3XT / Microsoft Word(15.0)] (PDF.js: 2.0.203) viewer.js:1607:7
Warning: FormatError: Required "loca" table is not found pdf.worker.js:340:5
The error suggests either that the PDF file includes broken TrueType fonts that are missing the necessary tables required for glyph mapping to work, or that the PDF file contains inconsistent font data which "lies" about the type of the included font files.
Another possible explanation, based on a cursory look at the actual font files, would be that a number of the TrueType fonts contain bogus file header information.
@Snuffleupagus can you explain what that comment mean? :) Thank you.
PDFBox shows a different error, "head is mandatory". That table is needed too. I can see "head" and "loca" in the byte sequence. Saving MSMincho and opening it with DTL OTMaster fails. However it succeeded when I renamed the file to *.ttc. It's a font collection. (I noticed that the byte sequence started with "ttcf").
@generiscorp The software you have used ("www.adlibsoftware.com: CTP (5.4.0.30881) OS (Windows 2012,2,0,64); modified using iText 2.1.7 by 1T3XT") has embedded the whole truetype collection, instead of embedding just a truetype font, or better, a font subset. Check the options or contact their support. Font subsetting makes the files much smaller.
@THausherr thank you, but the PDF should still be displayed correctly in PDF.js as in Acrobat, right?
It's a font collection. (I noticed that the byte sequence started with "ttcf").
The specification can be found at https://www.microsoft.com/typography/otspec/otff.htm, under the "Font Collections" heading.
@generiscorp Adobe displays a lot of broken files. Nevertheless, your file is incorrect. You put a font collection at a place where one font is expected. And having a one page PDF with no image and just a few lines of text grow to a size of 23MB should give you a hint that something is wrong, and fix your PDF production. Such a file should have a size of less than 100KB.
@THausherr I understand. This is though a file generated by Adlib rendering tool and we have no impact on it's format. Anyway, since the font is embedded all the characters should be displayed correctly, right? Is it going to be fixes or is the only way to fix it is fixing the fonts being embedded?
@generiscorp You could complain to Adlib, or check whether you're using the correct options with their software, maybe also ask them why they are using iText 2.1.7 which is from 2009.
I don't speak for PDF.js. I work with a different project (pdfbox) and that one has the same problem.
The chrome pdf viewer and ghostscript have the same problem.
I haven't had time to run any tests yet, but it looks like this ought to work: https://github.com/mozilla/pdf.js/compare/master...Snuffleupagus:TrueType-Collection
One thing to note though is that the performance isn't so great, but that's probably to be expected since we're forced to parse more than 20 MB of font data for just one page (see also https://github.com/mozilla/pdf.js/issues/9262#issuecomment-351336064).
Edit: Also, @THausherr I just wanted to say thank you for helping narrowing down the root-cause of this issue :-)
Most helpful comment
@generiscorp You could complain to Adlib, or check whether you're using the correct options with their software, maybe also ask them why they are using iText 2.1.7 which is from 2009.
I don't speak for PDF.js. I work with a different project (pdfbox) and that one has the same problem.