Pdf.js: disableFontFace mutually-exclusive font rendering

Created on 2 Jun 2020  路  23Comments  路  Source: mozilla/pdf.js

Attach (recommended) or Link to PDF file here:
renderExample.pdf

Configuration:

  • Web browser and its version: Node 12.x
  • Operating system and its version: AmazonLinux2
  • PDF.js version: 2.3.200
  • Is a browser extension: No

Steps to reproduce the problem:

  1. Explicity set "disableFontFace" to "false"
  2. Try to render the PDF

What is the expected behavior? (add screenshot)
image

What went wrong? (add screenshot)
image

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):

I've managed - after a LOT of tinkering - to install some fonts on my amazonlinux2 instance. It was not fun. This means that the bulk of the text visible in that PDF, which is Helvetica, renders properly (instead of not at all, like it did before I installed the fonts). Unfortunately, for whatever stupid reason, it also means that the fonts that RENDER CORRECTLY when "disableFontFace" is set to true (like it is by default in Node) cease rendering correctly. I know for a fact that the included PDF has the necessary included subset of Arial to render correctly, but instead we're getting weird... well, to be honest, I don't even know what they are.

In short: disableFontFace false gives me the helvetica I want, but renders the pdf-embedded fonts wrongly. disableFontFace true gives me the embedded fonts, but fails when it tries to fall back on system fonts.

4-font-conversion

Most helpful comment

I think those are valid points. Let's reopen this.

All 23 comments

Duplicate of issues such as e.g. #11347, #11311, and #4244.

Hey, guys! I've dug into the errors quite thoroughly prior to posting this one, and I actually disagree with the designation of duplicate - though I'd agree that it seems to revolve around similar issues. #11347 is actually closed (and unresolved) so, closing this as a duplicate of that doesn't accomplish much. #11311 is more "how do I determine which to use," which I am not asking, and #4244 is quite old, and was created initially to resolve the issue of system fonts not rendering properly at all - a problem that seems (partially) resolved by "disableFontFaces".

This issue is to address the problems with the "disableFontFaces" feature, specifically recognizing that (A) embedded fonts work properly with "disableFontFaces" set to "true" for node, and (B) System Fonts work properly with "disableFontFaces" set to "false," but there is no happy medium that makes use of BOTH embedded AND system fonts when trying to render in node.

I've provided a PDF that demonstrates this exact behavior, which should help make debugging at least more straightforward (if not "easy" in a library as impressive as this). I'm willing to help out - I'll happily provide a docker container with the exact environment, as well as my programming experience - but closing this as a duplicate of a 6-year-old as-of-yet-unresolved issue seems a bit like a door slam. Is there anything I can do to help move this forward instead?

I think those are valid points. Let's reopen this.

What would help resolve this most? I can set up a docker that would be an exact environment in which I see the problem.

I am dealing with exactly the same problem and would love to see this issue resolved! Happy you reopened it.

Unfortunately https://github.com/mozilla/pdf.js/issues/4244#issuecomment-40354728 is still (mostly) accurate here, and the only real solution would be to start embedding (standard) fonts in the PDF.js library (there's potentially copyright/filesize reasons that would complicate doing that).
Since the code runs in a brower we thus cannot really load font data directly from the system, which is why the PDF.js library would need to bundle font data such that src/core/fonts.js would be able to fetch fallback font data for fonts which do not include any font program.

Hence why the duplicate is still correct as far as I'm concerned, while that may indeed be unfortunate.

I definitely think embedding (standard) fonts would solve the issue, but not head on - it more... avoids the issue. What I'm trying to focus on in this specific issue is the fact that "disableFontFace: false" will allow system fonts to render, while "disableFontFace: true" will allow embedded fonts to render - thus demonstrating that both types of fonts are capable of rendering/loading - but there is no setting that would allow both system AND embedded fonts to render - at least, not in node.

Pre-loading the fonts would be one solution, since then I could just rely on embedded fonts and know that is has 'em all.

Adding an entry point to force-embed fonts would be another solution, and that may well be a more general solution: it would mean that we could "fix" rendering for PDFs that never embedded necessary fonts in the first place, too. (This would have the added benefit of avoiding any sort of copyright/file size issue, too.)

But in this case, I KNOW that pdf.js is capable of rendering every part of the PDF I provided, it just can't seem to do them at the same time - so that's what this issue is about. Perhaps this issue could be addressed those other ways, but...

That explains why, when "disableFontFace: true" (as is default for node) the system fonts don't render (unless embedded, of course). That makes total sense.

What doesn't make as much sense is the fact that when "disableFontFace: false", suddenly the embedded fonts from the PDF fail to render properly, instead appearing as some strange font-point glyph. That kind of indicates the "conver[sion] to OpenType fonts and load[ing] via font face rules" (referenced in the documentation you provided) is failing. I'll definitely dig a little bit more around the code that manages that, IIRC I saw it doing some verification/loading via dataurl - unless I'm thinking of the wrong thing.

Note how disableFontFace is described in the JSDocs:

https://github.com/mozilla/pdf.js/blob/96ad60f116f420945daa29dea185eac6e558d67e/src/display/api.js#L135-L138

As a consequence of drawing glyphs manually, there needs to be font data present to create said path commands from; see e.g. https://github.com/mozilla/pdf.js/blob/master/src/core/font_renderer.js

Unless the font program is embedded in the PDF file, we thus have no way of accessing the necessary data to build the paths; hence why we'd need to bundle (standard) font data in the library such that things would work.


Finally, note that, when disableFontFace = false (i.e. the default value) then the environment itself (normally the browser) is then falling back on whatever fonts are available in the system. (As mentioned above, we cannot directly access fonts from the system.)

I just want to make sure I'm understanding the implications correctly here - when disableFontFace = false we fall back on system fonts, and don't even try to use the pdf-embedded fonts? Could we not try to use a pdf-embedded font first, and, if/when that fails, THEN 'fall back' to system?

when disableFontFace = false we fall back on system fonts,

Only for those fonts without embedded font data, since browsers are able to handle that situation (as opposed to Node.js).

It seems like something, then is going awry:

In my environment - the one I described in this issue - I've INSTALLED system fonts. When disableFontFace = false (which I have to force, since node defaults it to true), the system fonts render just fine - so I know they're working. The embedded fonts, however - namely a subset of Arial for my PDF - only load weird glyphs/fontpoint icons. Perhaps this is because for the embedded fonts we NEED to use some sort of path-generation approach (at least, in node), but for system fonts we don't? Maybe we just need to create a hybrid mode that uses path generation for embedded fonts, and falls back for system fonts? I'm just trying to figure out where it's going wrong, honestly.

Please keep in mind that the PDF.js library was developed for use in browsers, and whatever Node.js support there is was "bolted on" afterwards so to speak; this obviously shows unfortunately :-(

I remember seeing other issues, which I (obviously) cannot find right now, where it was suggested that https://github.com/mozilla/pdf.js/blob/master/src/display/font_loader.js doesn't really support Node.js which is probably a fairly likely explanation for the troubles.

That reasoning makes sense - for both the difficulties with node and the explanation of the fontloader.

IIRC when the environment is node the "isFontLoadingAPISupported" is false, so it wouldn't surprise me if that one of the problems. If we KNOW that the path-based approach works for the embedded fonts, and the system fonts/fontface rules work for the non-embedded fonts, how difficult would it be to create a hybrid approach that checked the registry ("commonObjs" or something IIRC from digging around) for an embedded font and used path-based rendering, and if that failed tried the normal (font face) rendering?

The correct approach would be to extend https://github.com/mozilla/pdf.js/blob/master/src/display/font_loader.js to be able to register custom fonts in Node.js environments.

Looks like some of the code is in place already - check out https://github.com/mozilla/pdf.js/blob/master/src/display/font_loader.js#L158 . It says you can treat node as if sync loading is supported.

I'm curious what advantage this would give - correct me if I'm wrong, but the font loader isn't used when we have disableFontFace = true, and that's when the embedded fonts work properly. Are you suggesting that by enabling the font loader, you could then load in system default fonts in addition to the embedded fonts?

but the font loader isn't used when we have disableFontFace = true

That's when the glyphs are rendered as path operators, as explained above, i.e. no font data is actually being loaded/registered in the browser/environment.
If the FontLoader supported Node.js properly, I'm assuming that things should "just work" (with the caveat that I don't know how feasible registering custom fonts is in Node.js environments).

Edit: Note how the code is essentially assuming a browser-compatible environment in e.g. https://github.com/mozilla/pdf.js/blob/96ad60f116f420945daa29dea185eac6e558d67e/src/display/font_loader.js#L44-L56

OK, so what you're saying - and correct me if I'm wrong, it's very early in the morning here - is this:

Right now, we don't do font loading. If we see embedded fonts, we can use the data from them to generate paths. This happens when disableFontFace = true. We cannot, however, generate paths for the system fonts, because we can't actually pull the font data down from the system - we can only humbly request that the system render things with its knowledge about its own fonts. Thus embedded fonts work because we can draw paths, and non-embedded fonts fail because we can't draw paths. Correct so far?

When disableFontFace = false, we ALWAYS just ask the system to render stuff. This means the embedded fonts, if they aren't also included in the system, will fail. The fontloader, however, allows us to take an embedded font and tell the system about it, so when we ask the system to render stuff it knows how.

That's my understanding, is that reasonable?

disableFontFace = true
[...]
we can only humbly request that the system render things with its knowledge about its own fonts.

There's no attempt to fallback in this mode, we'll only render glyphs as path commands (which requires an existing font program to generate) and nothing else; note https://github.com/mozilla/pdf.js/blob/96ad60f116f420945daa29dea185eac6e558d67e/src/display/font_loader.js#L365-L368 and https://github.com/mozilla/pdf.js/blob/96ad60f116f420945daa29dea185eac6e558d67e/src/display/font_loader.js#L377-L380 and finally https://github.com/mozilla/pdf.js/blob/96ad60f116f420945daa29dea185eac6e558d67e/src/display/canvas.js#L1503-L1554


The fontloader, however, allows us to take an embedded font and tell the system about it, so when we ask the system to render stuff it knows how.

That sounds about right, and it works perfectly well in browsers. Most likely, this part simply isn't working in Node.js environments.

It may be time for me to sleep, but I'm still hung up on a couple things.

The fontloader serves to take the embedded font, and teach the system about it. Doesn't that mean, at the times we would be trying to use it, that "disableFontFace" would be false? So we should make it past those checks you mentioned.

The fontloader serves to take the embedded font, and teach the system about it.

Yes, but as already mentioned there's no Node.js-specific code in the FontLoader so it's not really surprising if things don't work :-)

I think I'm understanding. Was the font loader written specifically for this project, or was it a derivative of something else? I'm trying to figure out where I need to go to dig into it to see if I can find anybody who's got it (or something similar) working on Node. I really appreciate your feedback so far!

Was the font loader written specifically for this project,

Yes, at least as far I know.


A quick search seem to suggest that loading fonts in Node.js, and preferably in an at least similar way to what's possible in the browser, is perhaps not that straightforward in general (which is probably why there's no support for Node.js in the FontLoader).

The closest I can find is the node-canvas package, which apparently has some support for registering fonts. However, I've got absolutely no idea if that would be useful/sufficient for the PDF.js use-case.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

timvandermeij picture timvandermeij  路  4Comments

PeterNerlich picture PeterNerlich  路  3Comments

SehyunPark picture SehyunPark  路  3Comments

smit-modi picture smit-modi  路  3Comments

azetutu picture azetutu  路  4Comments