Link to PDF file (or attach file here):
input.pdf
This document has an embedded subset of the Ballpark font:
Configuration:
Node.js v6.9.2
[email protected]
pdfjs-dist@^1.6.414
Steps to reproduce the problem:
Run "npm install" to install pdfjs-dist and canvas node modules
Run "node fontTest.js" to load the input pdf and render it to a canvas and subsequently an image named out.png.
What is the expected behavior? (add screenshot)
The document will render the pdf characters using the embedded font. This works using the same pdf and converting to canvas in the browser.
Chrome 55.0.2883.95 (64-bit) screenshot:
What went wrong? (add screenshot)
The document is rendered using a default font. It appears to be Arial or ArialBlack:
Node.js out.png:
This is true of any fonts that are embedded in the pdfs.
Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
Similar to #7798. PDF.js is not designed for use with 'canvas' module.
As workaround try using PDFJS.disableFontFace=true;
That works. That's kind of an odd parameter name to fix this issue, but it will work for my case.
Thanks for the quick workaround.
odd parameter name to fix this issue
It's not fixing issue, it just using lines to draw letters instead of fonts and fillText.
I understand now. Thanks again.
I'm trying to render the embedded fonts in PDFs using the node canvas library. It has support to register fonts as explained here: https://github.com/Automattic/node-canvas#registerfont-for-bundled-fonts
I can rip the fonts from pdfjs via two paths:
Document
, DocumentElement
, StyleElement
and StyleSheetElement
and providing them in chain, and finally doing like this: (It seems that all fonts are added as inline CSS font definitons to the html document.)export class StyleSheetElement {
cssRules: any[] = []
insertRule(rule: string, index: number) {
console.log('Deal with rule', rule.substring(0, 100))
const fontFamilyRegexp = /font-family\s*:\s*["'](.*)["']/gi
const fontBase64DataRegExp = /base64\s*,\s*(.*)/gi
const fontFamily = fontFamilyRegexp.exec(rule)[1]
const base64Font = fontBase64DataRegExp.exec(rule)[1]
const fileName = '/tmp/' + fontFamily + '.otf'
fs.writeFileSync(fileName, base64Font, 'base64')
registerFont(fileName, {family: fontFamily})
console.log('Family: ' + fontFamily)
}
}
const globalScope: any = typeof window !== 'undefined' && (window as any).Math === Math ? window : typeof global !== 'undefined' && global.Math === Math ? global : typeof self !== 'undefined' && (self as any).Math === Math ? self : {};
globalScope['pdfBug'] = true
globalScope['FontInspector'] = new FontInspector()
And again it seems that any embedded font is transformed to a base64 encoded url that is fed to the inspector also.
Note that I'm running this as PDFJS.disableFontFace=false
, to get the embedded fonts registered.
The names of the fonts are really cryptic, and I'm not sure what to pass as the weight: 'normal' |聽'bold'
or style: 'italic' |聽'normal'
attributes.
Can anyone help with this? It would also be nice to be able to just call pdfjs.parseFonts and get back an array of objects containing the base64 encoded fonts and font family names. The above is quite a hack..
But the biggest problem is that the fonts are still not used. Is there anything obvious I have missed? Any pointers or help is really appreciated.
For an example for test.pdf, the following is logged:
info: Deal with rule @font-face { font-family:"g_d0_f1";src:url(data:font/opentype;base64,T1RUTwAJAIAAAwAQQ0ZGIAXwxrUAAAC
info: Family: g_d0_f1 saved to /tmp/g_d0_f1.otf
info: Deal with rule @font-face { font-family:"g_d0_f2";src:url(data:font/opentype;base64,T1RUTwAJAIAAAwAQQ0ZGINdKoSwAAAC
info: Family: g_d0_f2 saved to /tmp/g_d0_f2.otf
info: Deal with rule @font-face { font-family:"g_d0_f3";src:url(data:font/opentype;base64,T1RUTwAJAIAAAwAQQ0ZGIMz3BLwAAAC
info: Family: g_d0_f3 saved to /tmp/g_d0_f3.otf
info: Deal with rule @font-face { font-family:"g_d0_f4";src:url(data:font/opentype;base64,AAEAAAANAIAAAwBQT1MvMnhUdUoAAAD
info: Family: g_d0_f4 saved to /tmp/g_d0_f4.otf
info: Deal with rule @font-face { font-family:"g_d0_f5";src:url(data:font/opentype;base64,T1RUTwAJAIAAAwAQQ0ZGINGlj9MAAAC
info: Family: g_d0_f5 saved to /tmp/g_d0_f5.otf
info: Deal with rule @font-face { font-family:"g_d0_f6";src:url(data:font/opentype;base64,T1RUTwAJAIAAAwAQQ0ZGIIUnQ4AAAAC
info: Family: g_d0_f6 saved to /tmp/g_d0_f6.otf
info: Deal with rule @font-face { font-family:"g_d0_f7";src:url(data:font/opentype;base64,AAEAAAANAIAAAwBQT1MvMhpbWp4AAAD
info: Family: g_d0_f7 saved to /tmp/g_d0_f7.otf
And I get warning from pango:
(process:36223): Pango-WARNING **: 15:46:36.286: couldn't load font
"ABBVIT+Impact,monospace Condensed Not-Rotated 16px", modified
variant/weight/stretch as fallback, expect ugly output.
Looking at the PDF I can't see such embedded font at all, but these (Maybe it's the Impact one without ABBVIT?)
The output looks like this:
The above output was with the Stylesheet approach. When I use font inspector, I have access to the font abject also, and from that I can get the real font name also. Using that I register the fonts with the following names:
info: Deal with font { name: 'KAFISZ+Lobster1.4', loadedName: 'g_d0_f1' }
info: Load 1/tmp/g_d0_f1.otf as g_d0_f1
info: Load 2/tmp/g_d0_f1.otf as Lobster1.4
info: Load 5/tmp/g_d0_f1.otf as KAFISZ+Lobster1.4
info: Deal with font { name: 'KIDDSZ+LibreBaskerville-Regular', loadedName: 'g_d0_f2' }
info: Load 1/tmp/g_d0_f2.otf as g_d0_f2
info: Load 2/tmp/g_d0_f2.otf as LibreBaskerville-Regular
info: Load 3/tmp/g_d0_f2.otf as KIDDSZ+LibreBaskerville
info: Load 4/tmp/g_d0_f2.otf as LibreBaskerville
info: Load 5/tmp/g_d0_f2.otf as KIDDSZ+LibreBaskerville-Regular
info: Deal with font { name: 'INCRQD+Intro-Inline', loadedName: 'g_d0_f3' }
info: Load 1/tmp/g_d0_f3.otf as g_d0_f3
info: Load 2/tmp/g_d0_f3.otf as Intro-Inline
info: Load 3/tmp/g_d0_f3.otf as INCRQD+Intro
info: Load 4/tmp/g_d0_f3.otf as Intro
info: Load 5/tmp/g_d0_f3.otf as INCRQD+Intro-Inline
info: Deal with font { name: 'ABBVIT+Impact', loadedName: 'g_d0_f4' }
info: Load 1/tmp/g_d0_f4.otf as g_d0_f4
info: Load 2/tmp/g_d0_f4.otf as Impact
info: Load 5/tmp/g_d0_f4.otf as ABBVIT+Impact
info: Deal with font { name: 'OECWWR+LibreBaskerville-Bold', loadedName: 'g_d0_f5' }
info: Load 1/tmp/g_d0_f5.otf as g_d0_f5
info: Load 2/tmp/g_d0_f5.otf as LibreBaskerville-Bold
info: Load 3/tmp/g_d0_f5.otf as OECWWR+LibreBaskerville
info: Load 4/tmp/g_d0_f5.otf as LibreBaskerville
info: Load 5/tmp/g_d0_f5.otf as OECWWR+LibreBaskerville-Bold
info: Deal with font { name: 'XUSDSZ+LibreBaskerville-Italic', loadedName: 'g_d0_f6' }
info: Load 1/tmp/g_d0_f6.otf as g_d0_f6
info: Load 2/tmp/g_d0_f6.otf as LibreBaskerville-Italic
info: Load 3/tmp/g_d0_f6.otf as XUSDSZ+LibreBaskerville
info: Load 4/tmp/g_d0_f6.otf as LibreBaskerville
info: Load 5/tmp/g_d0_f6.otf as XUSDSZ+LibreBaskerville-Italic
info: Deal with font { name: 'GSBAOH+TimesNewRomanPS-BoldMT', loadedName: 'g_d0_f7' }
info: Load 1/tmp/g_d0_f7.otf as g_d0_f7
info: Load 2/tmp/g_d0_f7.otf as TimesNewRomanPS-BoldMT
info: Load 3/tmp/g_d0_f7.otf as GSBAOH+TimesNewRomanPS
info: Load 4/tmp/g_d0_f7.otf as TimesNewRomanPS
info: Load 5/tmp/g_d0_f7.otf as GSBAOH+TimesNewRomanPS-BoldMT
Still no luck. There's no errors but the output remains the same.
The good thing is I can now infer the bold / italic data from the name.
I actually need pdf.js to fallback to a default font, how can I do that?
Most helpful comment
I actually need pdf.js to fallback to a default font, how can I do that?