Pdf.js: No spaces between words when copying text

Created on 12 Mar 2019  路  13Comments  路  Source: mozilla/pdf.js

PDF files:
1.pdf
2.pdf
3.pdf

Configuration:

Steps to reproduce the problem:

  1. Select text
  2. Copy text

What went wrong?

Getting this:
OpenSansisahumanistsansseriftypefacedesignedbySteveMatteson.OpenSanswasdesignedwithanuprightstress,openformsandaneu-tral,yetfriendlyappearance.Itwasoptimizedforprint,web,andmobileinterfaces,andhasexcellentlegibilitycharacteristicsinitsletterforms(seefigure聛onthefollowingpage).ThisfontisavailablefromtheGoogleFontDirectory[聛]asTrueTypefileslicensedundertheApacheLicenseversion聜.聙.

Instead of this:
Open Sans is a humanist sans serif typeface designed by Steve Matteson. Open Sans was designed with an upright stress, open forms and a neutral, yet friendly appearance. It was optimized for print, web, and mobile interfaces, and has excellent legibility characteristics in its letterforms (see figure 1 on the following page). This font is available from the Google Font Directory [1] as TrueType files licensed under the Apache License version 2.0.

This is quite significant problem, because I would say 1/20 of scientific papers encounter this problem.

4-text-selection

Most helpful comment

how do chrome and acrobat handle this?

All 13 comments

It is still happening to you? I tested here and i coundt see this bug

Yeah, the problem still exists. Just try to open the attached 1.pdf in https://mozilla.github.io/pdf.js/web/viewer.html and copy -> paste text. I tried out with the current master on Chrome on both Ubuntu and Android. I think it should be the same whatever system you would use.

Hi,
I am facing the same issue
Any update on this.

Space between words is removed for paragraphs with style="text-align: justify;"
each word is rendered in separate span. When i copied text from pdf to notepad spaces are removed.
It works fine without this style.
We are converting html from ckeditor to pdf, user expect pdf renders exactly same ways as it shown in ckeditor(WYSIWYG). I tried pdfjs-2.1.266-dist, problem is consistent. Please help.

Still facing this issue.
Any updates/ workarounds on this ?

Need to mention that this issue exits right in the default viewer.
There's no space between lines and
Screenshot (176)
is pasted as the systemidentifieshot(

how do chrome and acrobat handle this?

Any solution for that?

I need to open some PDFs in Okular (Linux) that copies the text just fine that is missing spaces in Firefox :(

Running 76.0.1 (64-bit) on Mac OS and it happens still. Download the PDF and open it in Preview and copy-paste includes spaces, as expected.

Possible hotfix that _may_ help here https://github.com/mozilla/pdf.js/issues/7310#issuecomment-530713483 ? YMMV

I think I've found an issue in core/evaluator.js:2043-2056.

if (spaceWidth) {
        textContentItem.spaceWidth = spaceWidth;
        textContentItem.fakeSpaceMin = spaceWidth * SPACE_FACTOR;
        textContentItem.fakeMultiSpaceMin = spaceWidth * MULTI_SPACE_FACTOR;
        textContentItem.fakeMultiSpaceMax = spaceWidth * MULTI_SPACE_FACTOR_MAX;
        // It's okay for monospace fonts to fake as much space as needed.
        textContentItem.textRunBreakAllowed = !font.isMonospace;
} else {
        textContentItem.spaceWidth = 0;
        textContentItem.fakeSpaceMin = Infinity;
        textContentItem.fakeMultiSpaceMin = Infinity;
        textContentItem.fakeMultiSpaceMax = 0;
        textContentItem.textRunBreakAllowed = false;
}

This if-else is not correct I think, the spaceWidth should be always grater than zero by definition, setting a fallback value equals to the min width of the current font solves the issue for the test pdf.

The fix that worked for me:

var fontMinWidth = Math.min.apply(null, font.widths.filter(w => !!w));
var spaceWidth = (fontMinWidth / 1000) * textState.fontSize;

Facing similar issue:

image

{
      str: 'wykonanaprzezBibliotek臋Narodow膮zegzemplarzapochodz膮cegozezbior贸wBN.',
      dir: 'ltr',
      width: 250.77697646469932,
      height: 8.012811977655979,
      transform: [Array],
      fontName: 'g_d0_f1'
    },

any ideas?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

timvandermeij picture timvandermeij  路  4Comments

jigskpatel picture jigskpatel  路  3Comments

azetutu picture azetutu  路  4Comments

patelsumit5192 picture patelsumit5192  路  3Comments

sujit-baniya picture sujit-baniya  路  3Comments