Pdf.js: SVG path is incorrectly broken into 2 nodes

Created on 6 Feb 2019  路  4Comments  路  Source: mozilla/pdf.js

Attach (recommended) or Link to PDF file here:
test4.pdf

Configuration:

  • Web browser and its version: Chrome 56.0.2924.76
  • Operating system and its version: Linux Mint 18.3
  • PDF.js version: commit c0d6e46e392b327996eb0964b7932cb5bdde1727
  • Is a browser extension: no

Steps to reproduce the problem:

  1. run "node pdf2svg.js test4.pdf"
  2. load the output test4.1.svg into Chrome
  3. Check the console error log. You will see this:
    chrome_console_error

What is the expected behavior? (add screenshot)
No errors.

What went wrong? (add screenshot)
One path in the output got split into 2 parts.
The initial "M" is in one node and the following "L" is in the next node.
These should be in the same node.
bad_svg_node

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):

4-svg

Most helpful comment

I'm not very familiar with the workings of OperatorList but it looks like operator lists are split into chunks of about 1000 operators.

Correct; this is to allow rendering to begin before the entire OperatorList (i.e. the page) has been parsed, thus reducing overall time needed from the page loading to it being fully rendered.

Does this sound plausible?

Yes, and it should be easy to verify (just increase the value a lot, effectively disabling this chunking).

Instead of modifying OperatorList and its constant CHUNK_SIZE, svg.js should be fixed.

Agreed, modifying the constant is definitely not an acceptable solution. First of all, it would do nothing more than move the error elsewhere. Second of all, and much more importantly, changing it could have far-reaching implications for the general rendering performance in the canvas back-end.

Maybe like this: Consecutive OPS.constructPath operators should be combined into one <svg:path> node if there is no intervening path painting operator...

Again, that sounds totally reasonable.

All 4 comments

This may be related to bug #9167.

I think the culprit is here:
https://github.com/mozilla/pdf.js/blob/a045a00af34b764edda5991d2bcd18541ed60536/src/core/operator_list.js#L533-L534
I'm not very familiar with the workings of OperatorList but it looks like operator lists are split into chunks of about 1000 operators. Sometimes the chunk boundary is placed in the middle of a PDF path definition. This produces two OPS.constructPath operators and the latter one doesn't start with a moveTo.

Does this sound plausible?

Instead of modifying OperatorList and its constant CHUNK_SIZE, svg.js should be fixed. Maybe like this: Consecutive OPS.constructPath operators should be combined into one <svg:path> node if there is no intervening path painting operator...

I'm not very familiar with the workings of OperatorList but it looks like operator lists are split into chunks of about 1000 operators.

Correct; this is to allow rendering to begin before the entire OperatorList (i.e. the page) has been parsed, thus reducing overall time needed from the page loading to it being fully rendered.

Does this sound plausible?

Yes, and it should be easy to verify (just increase the value a lot, effectively disabling this chunking).

Instead of modifying OperatorList and its constant CHUNK_SIZE, svg.js should be fixed.

Agreed, modifying the constant is definitely not an acceptable solution. First of all, it would do nothing more than move the error elsewhere. Second of all, and much more importantly, changing it could have far-reaching implications for the general rendering performance in the canvas back-end.

Maybe like this: Consecutive OPS.constructPath operators should be combined into one <svg:path> node if there is no intervening path painting operator...

Again, that sounds totally reasonable.

I can confirm that if I increase CHUNK_SIZE to 10000000 then the problem goes away.
(and I agree that this isn't a proper solution)
Thanks for your help.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dotnetCarpenter picture dotnetCarpenter  路  39Comments

Snuffleupagus picture Snuffleupagus  路  28Comments

soa-x picture soa-x  路  174Comments

AliND picture AliND  路  29Comments

Richard-Mlynarik picture Richard-Mlynarik  路  32Comments