Pdf.js: "Invalid or corrupted PDF files" is displayed

Created on 12 Mar 2019  路  11Comments  路  Source: mozilla/pdf.js

Attach (recommended) or Link to PDF file here:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf

Configuration:

  • Web browser and its version: Chrome 72.0.3621.121 (Official Build) (64-bit)
  • Operating system and its version: MacOS
  • PDF.js version: PDF.js v2.0.673 (build: 31012570)
  • Is a browser extension: Yes

Steps to reproduce the problem:

  1. Click the above link
    2.

What is the expected behavior? (add screenshot)
This is what I can see when I pasted chrome-extension prefixed url or reloading the error pdf page.
(chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf)
Screen Shot 2019-03-12 at 17 20 26

What went wrong? (add screenshot)
This is what I can see when click the above link
Screen Shot 2019-03-12 at 17 19 38

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
https://chrome.google.com/webstore/detail/pdf-viewer/oemmndcbldboiebfnladdacbdfmadadm

1-other 4-chrome-specific

Most helpful comment

I found the cause of this issue.
The reason is requesting twice to server in a very short interval.
To the second request, server redirects to downloadsexceeded.html instead of pdf content.
So, pdf.js complains it's invalid/corrupted pdf file.
It's maybe server's DDoS protection I think.

Why two requests are issued when user clicks that link?
First one is issued by browser for user click.
Then, pdfjs extension intercepts header response and redirects to extension url.
Then, one more requesting is issued by pdf.js.

I think we can improve this more.
How about using the contents received from first request instead of requesting again?
This is just an idea.
(Sorry, if this idea doesn't make sense. I don't fully understand about pdf.js/extension implementation now.).

WDYT? @timvandermeij @Rob--W

All 11 comments

Possibly a duplicate of #10562.

@Snuffleupagus I think this is a different issue with #10562.
This is for some pdf isn't opened properly, whereas #10562 is the issue that pdf is opened by chrome's internal pdf viewer(pdfium?) instead of pdfjs extension.

I found the cause of this issue.
The reason is requesting twice to server in a very short interval.
To the second request, server redirects to downloadsexceeded.html instead of pdf content.
So, pdf.js complains it's invalid/corrupted pdf file.
It's maybe server's DDoS protection I think.

Why two requests are issued when user clicks that link?
First one is issued by browser for user click.
Then, pdfjs extension intercepts header response and redirects to extension url.
Then, one more requesting is issued by pdf.js.

I think we can improve this more.
How about using the contents received from first request instead of requesting again?
This is just an idea.
(Sorry, if this idea doesn't make sense. I don't fully understand about pdf.js/extension implementation now.).

WDYT? @timvandermeij @Rob--W

@simonhong
Will this be a temporary fix for this problem?

document.querySelectorAll('a[href]').forEach(function(a){
    if (a.href.match(/.+.pdf$/)){
        a.setAttribute('href', 'chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/' + a.href);
    }
});

@shge good try. I think it would work for the link that ends .pdf suffix.
However, we can easily find pdf links that don't have that suffix.

Do you try to reload the web page after this exception has displayed and the issue will be solved? This exception will be thrown when the pdf data loading in the first time.

@864534182 Do you try to reload the web page after this exception has displayed and the issue will be solved? This exception will be thrown when the pdf data loading in the first time.

Yes, but it does not work on some pages that require referer information.

I came across a website which prevents the second request by plugin because it is "a direct request".
It lets me download the file only when I access it by clicking a link in a specific webpage (the referer has to be a specific page).
Anyway, it should request once with referer information.
https://github.com/brave/brave-browser/issues/3474#issuecomment-473666538

The referrer thing is a regression caused by a change in Chrome - see https://github.com/mozilla/pdf.js/issues/10645

I will post this on the brave-browser repository too:
Browser: Brave-browser.
In this link:
https://projecteuclid.org/euclid.rmjm/1181072068
there is a button linking to PDF file. When I click on the button, it shows the already mentioned "Invalid or corrupted PDf file" message.

  • The "reloading" workaround does not work.
  • Even after reloading the page with Ctrl+R , the download button on the upper right corner only lets me download an HTML file, but not a PDF file.
  • When I tried to load this supposedly "direct" link to the PDF file:
    https://projecteuclid.org/download/pdf_1/euclid.rmjm/1181072068
    , that link redirects me to the original link written in the second line of this post. In other words, there is no real direct link to the PDF file.
  • Exactly the same problem occurs with the following link:
    https://projecteuclid.org/euclid.rmjm/1181069828

This appears to have been resolved when I updated today. My issues with this have been resolved by the most recent update.

Closing since this seems to work again.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

brandonros picture brandonros  路  3Comments

azetutu picture azetutu  路  4Comments

anggikolo11 picture anggikolo11  路  3Comments

AlexP3 picture AlexP3  路  3Comments

zerr0s picture zerr0s  路  3Comments