Pdf.js: PDFJS.getDocument - Unable to catch thrown Error when file is corrupted

Created on 4 Jul 2017 · 10Comments · Source: mozilla/pdf.js

Link to PDF file (or attach file here):
BrokenPdf.pdf

Configuration:

Web browser and its version: Chrome 59.0.3071.115
Operating system and its version: Mac OSX Sierra 10.12.5
PDF.js version: 1.7.225
Is an extension: NO

Steps to reproduce the problem:

Try _PDFJS.getDocument(fileReader.result).then( ... ).catch( ... )_ with the attached "corrupted file"

What is the expected behavior? (add screenshot)
As the file is corrupted, Pdf.js thrown an Error (Error: page dictionary kid reference points to wrong type of object).

screen shot 2017-07-04 at 17 18 04

What went wrong? (add screenshot)
I am trying to catch this error in the promise, but its never being caught.

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
Simplifying, I have similar scenario (found in the web):

http://jsbin.com/qulinaweho/1/edit?html,console,output

But when the file is corrupted, the library prints the error in the console log, and I am not able to catch it in order to handle the error.

Thank you.

1-other

Source

ironal

All 10 comments

Try PDFJS.getDocument(fileReader.result).then( ... ).catch( ... ) with the attached "corrupted file"

I am trying to catch this error in the promise, but its never being caught.

Considering that the error in question, FormatError: page dictionary kid reference points to wrong type of object, isn't thrown when calling getDocument it's expected that you cannot catch it like that.

Please note that the error originates in src/core/obj.js#L492-L493, i.e. as a result of a PDFDocumentProxy.getPage() call. In order to catch this error, you'll need to add catch handlers to those calls; e.g. something like this (using your sample code):

PDFJS.getDocument(fileReader.result).then((pdfDocument) => {
  // Fetching a page, e.g. the first.
  pdfDocument.getPage(1).then((pdfPage) => {
    // The page is now available for use...
  }).catch((ex) => {
    // This will catch errors such as:
    // "FormatError: page dictionary kid reference points to wrong type of object"
  });
}).catch( ... )

Edit: In order for this to work as described, you'll need to use PDF.js version 1.8.564 (or greater). The reason for this is that prior to PR #8684, we didn't correctly propagate these kind of errors from the worker side to PDFDocumentProxy.getPage on the API side.

Snuffleupagus on 3 Aug 2017

Hi @Snuffleupagus, thank you for your response.
Yes I have tried to put the catch in the getPage. Actually when i was testing I put a catch in all the promises, but none of them was catching the issue, it was like the code was continuing the normal flow (apart that pdf.js was printing the error in the Console and stopping to work). Unfortunately doesn't go inside the catch and the library throws an exception in the pdf.worker.js:

// Fatal errors that should trigger the fallback UI and halt execution by
// throwing an exception.
function error(msg) {
  if (verbosity >= VERBOSITY_LEVELS.errors) {
    console.log('Error: ' + msg);
    console.log(backtrace());
  }
  throw new Error(msg);
}

Thanks.

ironal on 3 Aug 2017

Do you have to same issue with newer version of PDF.js?

yurydelendik on 3 Aug 2017

@yurydelendik I am using version 1.7.225 which I have download a couple of months ago. Didn't try with newer versions (if any).

ironal on 3 Aug 2017

Please also notice that new version PDF.js has stopAtErrors setting

yurydelendik on 3 Aug 2017

👍1

I am using version 1.7.225 which I have download a couple of months ago. Didn't try with newer versions (if any).

Please note that I specifically mentioned at the end of https://github.com/mozilla/pdf.js/issues/8608#issuecomment-320069449 that you need at least version 1.8.564 for this to work :-)

Snuffleupagus on 3 Aug 2017

👍1

@ironal WFM at http://jsbin.com/quqinecojo/edit?html,console

yurydelendik on 3 Aug 2017

🎉1

Thanks both for the info.
Looking at the documentation I see that the beta version released today is 1.8.188 and stable is the one that I am using.
Would it be a risk to use 1.8.564 in Prod? Probably better that is released?

ironal on 3 Aug 2017