Pdf.js: Embedded fonts are not rendered in node when rendering to canvas

Created on 4 Jan 2017  路  9Comments  路  Source: mozilla/pdf.js

Link to PDF file (or attach file here):
input.pdf

This document has an embedded subset of the Ballpark font:
image

Configuration:
Node.js v6.9.2
[email protected]
pdfjs-dist@^1.6.414

Steps to reproduce the problem:

  1. Download and extract the following node project.
    fontTest.zip
  1. Run "npm install" to install pdfjs-dist and canvas node modules

  2. Run "node fontTest.js" to load the input pdf and render it to a canvas and subsequently an image named out.png.

What is the expected behavior? (add screenshot)
The document will render the pdf characters using the embedded font. This works using the same pdf and converting to canvas in the browser.

Chrome 55.0.2883.95 (64-bit) screenshot:
image

What went wrong? (add screenshot)
The document is rendered using a default font. It appears to be Arial or ArialBlack:

Node.js out.png:
out

This is true of any fonts that are embedded in the pdfs.

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):

1-other 2-feature 4-node-specific

Most helpful comment

I actually need pdf.js to fallback to a default font, how can I do that?

All 9 comments

Similar to #7798. PDF.js is not designed for use with 'canvas' module.

As workaround try using PDFJS.disableFontFace=true;

That works. That's kind of an odd parameter name to fix this issue, but it will work for my case.

Thanks for the quick workaround.

odd parameter name to fix this issue

It's not fixing issue, it just using lines to draw letters instead of fonts and fillText.

I understand now. Thanks again.

I'm trying to render the embedded fonts in PDFs using the node canvas library. It has support to register fonts as explained here: https://github.com/Automattic/node-canvas#registerfont-for-bundled-fonts

I can rip the fonts from pdfjs via two paths:

  1. By declaring global elements, namely Document, DocumentElement, StyleElement and StyleSheetElement and providing them in chain, and finally doing like this: (It seems that all fonts are added as inline CSS font definitons to the html document.)
export class StyleSheetElement {
  cssRules: any[] = []
  insertRule(rule: string, index: number) {

    console.log('Deal with rule', rule.substring(0, 100))
    const fontFamilyRegexp = /font-family\s*:\s*["'](.*)["']/gi
    const fontBase64DataRegExp = /base64\s*,\s*(.*)/gi
    const fontFamily = fontFamilyRegexp.exec(rule)[1]
    const base64Font = fontBase64DataRegExp.exec(rule)[1]

    const fileName = '/tmp/' + fontFamily + '.otf'

    fs.writeFileSync(fileName, base64Font, 'base64')
    registerFont(fileName, {family: fontFamily})

    console.log('Family: ' + fontFamily)
  }
}
  1. By declaring my own FontInspector like this:
const globalScope: any = typeof window !== 'undefined' && (window as any).Math === Math ? window : typeof global !== 'undefined' && global.Math === Math ? global : typeof self !== 'undefined' && (self as any).Math === Math ? self : {};

globalScope['pdfBug'] = true
globalScope['FontInspector'] = new FontInspector()

And again it seems that any embedded font is transformed to a base64 encoded url that is fed to the inspector also.

Note that I'm running this as PDFJS.disableFontFace=false, to get the embedded fonts registered.

The names of the fonts are really cryptic, and I'm not sure what to pass as the weight: 'normal' |聽'bold' or style: 'italic' |聽'normal' attributes.

Can anyone help with this? It would also be nice to be able to just call pdfjs.parseFonts and get back an array of objects containing the base64 encoded fonts and font family names. The above is quite a hack..

But the biggest problem is that the fonts are still not used. Is there anything obvious I have missed? Any pointers or help is really appreciated.

For an example for test.pdf, the following is logged:

info: Deal with rule @font-face { font-family:"g_d0_f1";src:url(data:font/opentype;base64,T1RUTwAJAIAAAwAQQ0ZGIAXwxrUAAAC
info: Family: g_d0_f1 saved to /tmp/g_d0_f1.otf
info: Deal with rule @font-face { font-family:"g_d0_f2";src:url(data:font/opentype;base64,T1RUTwAJAIAAAwAQQ0ZGINdKoSwAAAC
info: Family: g_d0_f2 saved to /tmp/g_d0_f2.otf
info: Deal with rule @font-face { font-family:"g_d0_f3";src:url(data:font/opentype;base64,T1RUTwAJAIAAAwAQQ0ZGIMz3BLwAAAC
info: Family: g_d0_f3 saved to /tmp/g_d0_f3.otf
info: Deal with rule @font-face { font-family:"g_d0_f4";src:url(data:font/opentype;base64,AAEAAAANAIAAAwBQT1MvMnhUdUoAAAD
info: Family: g_d0_f4 saved to /tmp/g_d0_f4.otf
info: Deal with rule @font-face { font-family:"g_d0_f5";src:url(data:font/opentype;base64,T1RUTwAJAIAAAwAQQ0ZGINGlj9MAAAC
info: Family: g_d0_f5 saved to /tmp/g_d0_f5.otf
info: Deal with rule @font-face { font-family:"g_d0_f6";src:url(data:font/opentype;base64,T1RUTwAJAIAAAwAQQ0ZGIIUnQ4AAAAC
info: Family: g_d0_f6 saved to /tmp/g_d0_f6.otf
info: Deal with rule @font-face { font-family:"g_d0_f7";src:url(data:font/opentype;base64,AAEAAAANAIAAAwBQT1MvMhpbWp4AAAD
info: Family: g_d0_f7 saved to /tmp/g_d0_f7.otf

And I get warning from pango:

(process:36223): Pango-WARNING **: 15:46:36.286: couldn't load font 
"ABBVIT+Impact,monospace Condensed Not-Rotated 16px", modified 
variant/weight/stretch as fallback, expect ugly output.

Looking at the PDF I can't see such embedded font at all, but these (Maybe it's the Impact one without ABBVIT?)
nayttokuva 2018-3-27 kello 15 54 51

The output looks like this: test_1

The above output was with the Stylesheet approach. When I use font inspector, I have access to the font abject also, and from that I can get the real font name also. Using that I register the fonts with the following names:

info: Deal with font { name: 'KAFISZ+Lobster1.4', loadedName: 'g_d0_f1' }
info: Load 1/tmp/g_d0_f1.otf as g_d0_f1
info: Load 2/tmp/g_d0_f1.otf as Lobster1.4
info: Load 5/tmp/g_d0_f1.otf as KAFISZ+Lobster1.4

info: Deal with font { name: 'KIDDSZ+LibreBaskerville-Regular', loadedName: 'g_d0_f2' }
info: Load 1/tmp/g_d0_f2.otf as g_d0_f2
info: Load 2/tmp/g_d0_f2.otf as LibreBaskerville-Regular
info: Load 3/tmp/g_d0_f2.otf as KIDDSZ+LibreBaskerville
info: Load 4/tmp/g_d0_f2.otf as LibreBaskerville
info: Load 5/tmp/g_d0_f2.otf as KIDDSZ+LibreBaskerville-Regular

info: Deal with font { name: 'INCRQD+Intro-Inline', loadedName: 'g_d0_f3' }
info: Load 1/tmp/g_d0_f3.otf as g_d0_f3
info: Load 2/tmp/g_d0_f3.otf as Intro-Inline
info: Load 3/tmp/g_d0_f3.otf as INCRQD+Intro
info: Load 4/tmp/g_d0_f3.otf as Intro
info: Load 5/tmp/g_d0_f3.otf as INCRQD+Intro-Inline

info: Deal with font { name: 'ABBVIT+Impact', loadedName: 'g_d0_f4' }
info: Load 1/tmp/g_d0_f4.otf as g_d0_f4
info: Load 2/tmp/g_d0_f4.otf as Impact
info: Load 5/tmp/g_d0_f4.otf as ABBVIT+Impact

info: Deal with font { name: 'OECWWR+LibreBaskerville-Bold', loadedName: 'g_d0_f5' }
info: Load 1/tmp/g_d0_f5.otf as g_d0_f5
info: Load 2/tmp/g_d0_f5.otf as LibreBaskerville-Bold
info: Load 3/tmp/g_d0_f5.otf as OECWWR+LibreBaskerville
info: Load 4/tmp/g_d0_f5.otf as LibreBaskerville
info: Load 5/tmp/g_d0_f5.otf as OECWWR+LibreBaskerville-Bold

info: Deal with font { name: 'XUSDSZ+LibreBaskerville-Italic', loadedName: 'g_d0_f6' }
info: Load 1/tmp/g_d0_f6.otf as g_d0_f6
info: Load 2/tmp/g_d0_f6.otf as LibreBaskerville-Italic
info: Load 3/tmp/g_d0_f6.otf as XUSDSZ+LibreBaskerville
info: Load 4/tmp/g_d0_f6.otf as LibreBaskerville
info: Load 5/tmp/g_d0_f6.otf as XUSDSZ+LibreBaskerville-Italic

info: Deal with font { name: 'GSBAOH+TimesNewRomanPS-BoldMT', loadedName: 'g_d0_f7' }
info: Load 1/tmp/g_d0_f7.otf as g_d0_f7
info: Load 2/tmp/g_d0_f7.otf as TimesNewRomanPS-BoldMT
info: Load 3/tmp/g_d0_f7.otf as GSBAOH+TimesNewRomanPS
info: Load 4/tmp/g_d0_f7.otf as TimesNewRomanPS
info: Load 5/tmp/g_d0_f7.otf as GSBAOH+TimesNewRomanPS-BoldMT

Still no luck. There's no errors but the output remains the same.

The good thing is I can now infer the bold / italic data from the name.

I actually need pdf.js to fallback to a default font, how can I do that?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Rob--W picture Rob--W  路  43Comments

soa-x picture soa-x  路  174Comments

jonasyuandotcom picture jonasyuandotcom  路  29Comments

Richard-Mlynarik picture Richard-Mlynarik  路  32Comments

snorp picture snorp  路  95Comments