React-pdf: The text layer is not rendered with same font weight and color

Created on 18 Sep 2020  路  3Comments  路  Source: wojtekmaj/react-pdf

Before you start - checklist

  • [x] I followed instructions in documentation written for my React-PDF version
  • [x] I have checked if this bug is not already reported
  • [x] I have checked if an issue is not listed in Known issues
  • [x] If I have a problem with PDF rendering, I checked if my PDF renders properly in Mozilla Firefox

Description
The text layer over the canvas, I changed the opacity of being transparent to black color. I disabled the canvas layer being drawn. I can see the text layer now, but the font-weight and font color are not the same as pdf. I know that the text layer is rendered separately and doesn't have to do anything with the canvas. So, I was wondering if we could somehow get these details for each textDiv created on the text layer? The font-weight and color should be the same at least.

Steps to reproduce
Disable the canvas by adding the following lines in canvas.js.

CanvasRenderingContext2D.prototype.strokeText = () => {};
CanvasRenderingContext2D.prototype.fillText = () => {};

Additional _information

_BEFORE_
Screen Shot 2020-09-18 at 4 26 40 PM

_AFTER_
Screen Shot 2020-09-18 at 4 27 10 PM

Environment
Chrome

question

Most helpful comment

What confused me most was that I was comparing to other libraries (react-pdf-viewer and the vanilla PDF.js web viewer) and they seemed to be rendering the text perfectly. But that wasn't really the case. They had simply provided css rules to make the text layer span color transparent. I was able to get a lot closer with some css that forces the ::selection pseudo style to inherit color: transparent. Without this extra rule, my chrome was showing white text over the selection highlight.

.react-pdf__Page__textContent  {
  color: transparent;
  opacity: 0.5;
}
.react-pdf__Page__textContent ::selection {
  background: #0000ff;
}

It seems that this should be default behavior or documented in the FAQ on the wiki.

But what noticed was that PDF.js viewer and react-pdf-viewer and the PDF.js viewer seem to using the built in renderTextLayer function from pdf.js's src/display/text_layer.js. And the highlight background for those seems to better fit the text vertically. It's just a tiny bit higher and looks better, in my opinion. Screen captures are below.

I'm curious should you really be halving the translateY result your using in TextLayerItem.alignTextItem? When I make this change locally, all the highlights appear nice and centered around the text.

| Library | Result |
| --- | ---|
| PDF.js web viewer | image |
| react-pdf-viewer | image |
| My CRA app using react-pdf | image |

All 3 comments

I'm seeing the same thing.

image

Text is off vertically. The generated spans in the text layer specify the font family from the pdf (g_d0_f3), but clearly it isn't being loaded. I'm using [email protected] ([email protected]) with create react app (uses react 16.13.1). It clearly is falling back to the sans-serif font-family, and likely positioning them incorrectly because of this. I had the same results with 4.2.0.

Is that font-family even correct? The pdf.js text-only example does a little more work to turn this internal font key into a real font name.

import React, { useCallback, useState } from 'react'
import { Document, Page, pdfjs } from 'react-pdf'
import 'react-pdf/dist/esm/Page/AnnotationLayer.css'
pdfjs.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjs.version}/pdf.worker.js`

const pdf = require('./SE3E-Ch02.pdf')

const App = () => {
  const [numPages, setNumPages] = useState(-1)
  const [pageNumber, setPageNumber] = useState(1)
  const [width, setWidth] = useState(400)
  function onDocumentLoadSuccess(pdf) {
    console.log('onDocumentLoadSuccess', pdf)
    setNumPages(pdf.numPages)
  }

  const previousPage = useCallback(() => {
    setPageNumber(pageNumber - 1)
  }, [pageNumber, setPageNumber])
  const nextPage = useCallback(() => {
    setPageNumber(pageNumber + 1)
  }, [pageNumber, setPageNumber])

  return (
    <>
      <div>
        {numPages >= 1 && pageNumber !== null && pageNumber + 1 < numPages && <span onClick={previousPage}>&lt;&lt;</span>}
        <span>
          Page {pageNumber} of {numPages}
        </span>
        {numPages >= 1 && pageNumber !== null && pageNumber + 1 < numPages && <span onClick={nextPage}>&gt;&gt;</span>}
      </div>
      <div style={{ border: '1px solid #dddddd', width: 500 }}>
        <Document
          file={pdf.default}
          onLoadSuccess={onDocumentLoadSuccess}
          options={{
            cMapUrl: `//cdn.jsdelivr.net/npm/pdfjs-dist@${pdfjs.version}/cmaps/`,
            cMapPacked: true
          }}
        >
          <Page pageNumber={pageNumber} renderTextLayer={true} width={500} />
        </Document>
      </div>
    </>
  )
}

export default App

It's not always possible to reuse embedded fonts from PDF to render the text layer. In such cases, PDF.js and React-PDF are using generic fonts (serif, sans-serif, monospace; depending on which is the most applicable) and scale it to match the original container for an okay experience of selection.

Text layer is not meant to be rendered for anything else than accessibility and text selection purposes.

What confused me most was that I was comparing to other libraries (react-pdf-viewer and the vanilla PDF.js web viewer) and they seemed to be rendering the text perfectly. But that wasn't really the case. They had simply provided css rules to make the text layer span color transparent. I was able to get a lot closer with some css that forces the ::selection pseudo style to inherit color: transparent. Without this extra rule, my chrome was showing white text over the selection highlight.

.react-pdf__Page__textContent  {
  color: transparent;
  opacity: 0.5;
}
.react-pdf__Page__textContent ::selection {
  background: #0000ff;
}

It seems that this should be default behavior or documented in the FAQ on the wiki.

But what noticed was that PDF.js viewer and react-pdf-viewer and the PDF.js viewer seem to using the built in renderTextLayer function from pdf.js's src/display/text_layer.js. And the highlight background for those seems to better fit the text vertically. It's just a tiny bit higher and looks better, in my opinion. Screen captures are below.

I'm curious should you really be halving the translateY result your using in TextLayerItem.alignTextItem? When I make this change locally, all the highlights appear nice and centered around the text.

| Library | Result |
| --- | ---|
| PDF.js web viewer | image |
| react-pdf-viewer | image |
| My CRA app using react-pdf | image |

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Vanals picture Vanals  路  4Comments

theHasanas picture theHasanas  路  5Comments

herneli picture herneli  路  3Comments

Kerumen picture Kerumen  路  3Comments

wangzhidavid picture wangzhidavid  路  4Comments