Brave-browser: some PDF's can't be opened and generate "Invalid or corrupted PDF file." errors

Created on 24 Feb 2019  路  21Comments  路  Source: brave/brave-browser

Test plan

See https://github.com/brave/brave-core/pull/2342

Description

Cannot open PDF in Brave

Steps to Reproduce

  1. Navigate to http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf

Actual result:

PDF is corrupted

Expected result:

PDF should be displayed

Reproduces how often:

Easily

Brave version (brave://version info)

0.62.5 Chromium: 73.0.3683.39 (Official Build) dev (64-bit)

Reproducible on current release:

  • Does it reproduce on brave-browser dev/beta builds? Dev

Website problems only:

  • Does the issue resolve itself when disabling Brave Shields? No
  • Is the issue reproducible on the latest version of Chrome? No
closeduplicate closeinvalid extensioPDFJS webcompat

Most helpful comment

I want to confirm that I just updated my browser to:

Version 0.61.51 Chromium: 73.0.3683.75 (Official Build) (64-bit)

and the PDFs generated through my EHR site now renders correctly in a new tab.

Thanks for your hard work.

All 21 comments

Reproduced the above issue when I attempted to open several banking statements under https://easyweb.td.com. The PDF opened in a new window and displayed the following (same error that @jumde mentioned above):

PDF.js v2.0.673 (build: 31012570)
Message: Invalid PDF structure

Seeing the following in the terminal:

[6473:775:0226/221130.910014:ERROR:CONSOLE(1)] "Active tab not found", source: chrome-extension://mnojpmjdmbbfmejpflffifhffcmidifd/js/background.bundle.js (1)
[6473:775:0226/221133.424143:ERROR:CONSOLE(0)] "Unchecked runtime.lastError: This extension has no action specified.", source: chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/pdfHandler.html (0)
[6473:775:0226/221133.424197:ERROR:CONSOLE(0)] "Unchecked runtime.lastError: This extension has no action specified.", source: chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/pdfHandler.html (0)
[6473:775:0226/221133.726921:ERROR:CONSOLE(862)] "Uncaught (in promise) Error: Invalid or corrupted PDF file.", source: chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/content/web/viewer.js (862)

Example of the browser console:

screen shot 2019-02-26 at 10 27 44 pm

Checked the following versions:

  • 0.60.45 Chromium: 72.0.3626.109 (release) --> reproduced
  • 0.61.38 Chromium: 73.0.3683.39 (beta) --> reproduced
  • 0.62.8 Chromium: 73.0.3683.39 (dev) --> reproduced

Changing this as a P2 so it's the same as #884. We can decide if we want to close this one or #884 off. We should probably fix this sooner than later as most banking/government websites usually generate their PDFs when needed.

With the link, I could see pdf pages after several times of reloading.
And below images are captured header contents for both(success and fail).
The difference is 200 OK vs. 302 Found. I think pdf.js sometimes couldn't handle redirect properly.
I also reproduced this issue with pdf.js on chrome stable.
Success:
Screen Shot 2019-03-11 at 09 41 34
Failure:
Screen Shot 2019-03-11 at 09 43 59

I got more solid repro steps.
When opening by link click, it always shows invalid. then showing properly after reloading.
When opening by cmd + link click, it is always invalid. With reloading pdf loaded well.
When pasting link url to another tab(not already invalid pdf tab), it is invalid also. fine after reloading.
I assume that pdf.js seems not work properly with pdf link that does redirect.

I can confirm that refreshing the page with the error does load the PDF correctly in my application. This is a workaround, of course, but a good step forward in identifying the real issue so it can be resolved.

I want to add that I also receive this error (PDF.js v2.0.673 (build: 31012570)
Message: Invalid PDF structure) from my EHR site (https://www.therapynotes.com/app/) when opening a PDF file.

  • It is a site that I am logged into securely.
  • I am not able to refresh and eventually get it to load correctly.
  • I am also not able to copy the link to another tab and have it open.
  • This happens with Brave Shields Down or Up.
  • The Download button (top right of PDF Viewer Window) will still download the pdf file correctly.
  • When I hit the print button (top right) it gives this warning: Warning: The PDF is not fully loaded for printing.
  • Browser is up to date: Version 0.60.48 Chromium: 72.0.3626.121 (Official Build) (64-bit)
  • The Download button (top right of PDF Viewer Window) will still download the pdf file correctly.

VERY interesting. I'd never thought to try that.

What makes difference between loading
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf and
chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf?
Latter one is pdfjs extension id prepended one and pdf displayed well when it loads.

After some debugging, I found that http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf redirects to https://citeseerx.ist.psu.edu/messages/downloadsexceeded.html when loading it twice quickly.
Because of that, pdf.js failed. pdf.js tries to parsing https://citeseerx.ist.psu.edu/messages/downloadsexceeded.html file.
This will always happen because after first loading, pdf.js extension hooks that loading and request again with extension url prefixed url. So, it will always get exceed.html content instead of pdf.

When I load chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf, it success because it just request pdf once.

I can reproduce this in chrome with its builtin viewer(pdfium). When I reload quickly, I can see exceed.html page in chrome.

I'm trying to find the way to resolve this, but I'm not sure it can be fixed with pdf.js because pdf.js should request pdf url twice.

I'm trying to find the way to resolve this, but I'm not sure it can be fixed with pdf.js because pdf.js should request pdf url twice.

Do you mean it should request it once? It seems that it ought to request it once and parse it, but the second request for the attempted workaround is triggering the exceed.html, which is apparently normal behavior for that server, probably as a DDOS attack mitigation. The server self-protection behavior is stopping the workaround from working, but the root problem is still the same as what I and others have been experiencing.

Something else, if it can help.
A comparison of sites where I am logged in. Two that work and one that doesn't.

All of these begin with the standard: chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/

Then two sites that work:

The site that doesn't work:

Is there something related to the site that doesn't work not specifically defining token=
Or that the one that doesn't work ends in a hash / ?

@SilverPuppy brave requests twice to that server when user clicks pdf link. One is issued when user clicks the link.
And the other is issued by pdfjs extension. This is inevitable with current pdfjs extension implementation. Because of this two requests in a very short interval, I think that server responded with exceed.html.

I assume that any server that provides pdf documents handles like that, current pdfjs extension would not work with that pdf link.

It would be nice if pdfjs extension displays with contents from first request.

@tripp-lc Thanks for checking! If it works with chrome-extension:// prefixed url, that server also might deny extension's request (second request. first one is user click)

Is there something related to the site that doesn't work not specifically defining token=
Or that the one that doesn't work ends in a hash / ?

I don't think so.

I'll continue discussion about this issue in here - https://github.com/mozilla/pdf.js/issues/10639.
I think it's hard to fix this issue from brave-side.
cc: @bbondy

probably as a DDOS attack mitigation

Because of this two requests in a very short interval, I think that server responded with exceed.html.

I came across a website which prevents the second request by plugin because it is "a direct request".
It lets me download the file only when I access it by clicking a link in a specific webpage (the referer has to be a specific page).

Anyway, it should request once with referer information.

I am experiencing a variant of this issue, with the same symptoms. In my case though, it's a result of the requests for the pdf file not including the cookies, resulting in a redirect to a login page.

I commented on this issue describing the same problem, which was closed a few months ago: https://github.com/brave/brave-browser/issues/2048

Version 0.61.51 Chromium: 73.0.3683.75 on macos

Edit: It works fine when downloading the PDF and loading it from a file:// url.

I want to confirm that I just updated my browser to:

Version 0.61.51 Chromium: 73.0.3683.75 (Official Build) (64-bit)

and the PDFs generated through my EHR site now renders correctly in a new tab.

Thanks for your hard work.

Edit: It works fine when downloading the PDF and loading it from a file:// url.

@gustavnikolaj Loading local file would be fine although it also needs two requests with same reason above. Local file system will give files always whenever user requests :)

Unfortunately, whatever changed is not a full fix, because Google Calendar's PDFs for printing are still problematic requiring the refresh workaround. I am also on .61.51

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fmarier picture fmarier  路  3Comments

hollons picture hollons  路  3Comments

jonathansampson picture jonathansampson  路  3Comments

AlexeyBarabash picture AlexeyBarabash  路  3Comments

qingxiang-jia picture qingxiang-jia  路  3Comments