Notebook: Hyperlink in markdown cell to pdf document stopped working

Created on 31 May 2018  路  11Comments  路  Source: jupyter/notebook

I have a large number of Jupyter Notebooks and in many of them I have hyperlinks to locally stored pdf documents. Today on my iMac the links stopped working. When clicking on a link, a new tab is opened with the proper address, but the page is just black. When I do this on my MacBook with exactly the same Jupyter Notebook, it works ok. Up to yesterday I had no problems. I have tried a number of things to resolve this, amongst others I normally work with Google Chrome, but I switched to Safari and had the same problem. When opening the pdf in either Chrome or Safari from Finder, it works fine. So it looks like Jupyter Notebook issue. When executing the hyperlink in the notebook, I get the following entry in the log file:
[I 21:56:01.222 NotebookApp] 302 GET /notebooks/Cookbooks/Git%20%26%20GitHub/books/Pro_Git.pdf (::1) 1.01ms

I get the same entry on MacBook where it works ok.

A screenshot of the page after trying to load the pdf is attached
screen shot 2018-05-31 at 10 03 17 pm

Most helpful comment

I got it to work in Google Chrome by installing the PDF Viewer extension. I am not very technical and I have no idea why it initially stopped working in Google Chrome and Safari. But at least I have it working again. Google Chrome is my default browser.

All 11 comments

Any messages in the browser's Javascript console?

I found this in the Javascript console:
Failed to load 'http://localhost:8888/files/Cookbooks/Git%20%26%20GitHub/books/Pro_Git.pdf' as a plugin, because the frame into which the plugin is loading is sandboxed.

This must be the cause of the problem. I have no idea how to address this. Can you help?

Same issue! No idea what's happening... Tried launching another simple HTTP server, PDF links worked just fine there, so it shouldn't be a browser issue. PDF.js extension (firefox) works fine though.

jupyter-troubleshoot attached:
jupyter-troubleshoot.log

@takluyver I have zero experience in web development, but after some googling, I believe it's some kind of cross origin request issue... This PR: https://github.com/jupyter/notebook/pull/3341 seems to be related?
@kdeleeuw11 Have you found any solution to this? PDF documents really matters to me too.

I got it to work in Google Chrome by installing the PDF Viewer extension. I am not very technical and I have no idea why it initially stopped working in Google Chrome and Safari. But at least I have it working again. Google Chrome is my default browser.

@takluyver Now I'm confident that this issue is indeed caused by https://github.com/jupyter/notebook/pull/3341. After manually remove the lines included in https://github.com/jupyter/notebook/pull/3341 from my conda installation ([...]/anaconda3/lib/python3.7/site-packages/notebook), my pdf links work perfectly again.

FYI, These are the lines I removed:

Subject: [PATCH] UN-patch #3341

---
 base/handlers.py  | 7 -------
 files/handlers.py | 7 -------
 2 files changed, 14 deletions(-)

diff --git a/base/handlers.py b/base/handlers.py
index e3fbddc..72677c9 100644
--- a/base/handlers.py
+++ b/base/handlers.py
@@ -640,13 +640,6 @@ class Template404(IPythonHandler):
 class AuthenticatedFileHandler(IPythonHandler, web.StaticFileHandler):
     """static files should only be accessible when logged in"""

-    @property
-    def content_security_policy(self):
-        # In case we're serving HTML/SVG, confine any Javascript to a unique
-        # origin so it can't interact with the notebook server.
-        return super(AuthenticatedFileHandler, self).content_security_policy + \
-                "; sandbox allow-scripts"
-
     @web.authenticated
     def get(self, path):
         if os.path.splitext(path)[1] == '.ipynb' or self.get_argument("download", False):
diff --git a/files/handlers.py b/files/handlers.py
index 7973fd6..b942149 100644
--- a/files/handlers.py
+++ b/files/handlers.py
@@ -26,13 +26,6 @@ class FilesHandler(IPythonHandler):
     a subclass of StaticFileHandler.
     """

-    @property
-    def content_security_policy(self):
-        # In case we're serving HTML/SVG, confine any Javascript to a unique
-        # origin so it can't interact with the notebook server.
-        return super(FilesHandler, self).content_security_policy + \
-               "; sandbox allow-scripts"
-
     @web.authenticated
     def head(self, path):
         self.get(path, include_body=False)
-- 
2.18.0

This works correctly for me in Firefox, but fails in Chromium with the error Failed to load 'http://localhost:8889/(...).pdf' as a plugin, because the frame into which the plugin is loading is sandboxed.

It is sandboxed, and quite deliberately so. And you're right that #3341 is where the sandboxing was introduced. This is a security measure, so we can't just disable it again. If you're interested, I'd suggest someone research what relaxations of the sandbox would be needed to let Chrome display a PDF.

CSP sandboxing docs: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/sandbox

I think this is also the case when trying to display a PDF inline in a notebook a la

from IPython.display import IFrame
IFrame("foo.pdf", width=900, height=800)

Could be nice if this worked again even in Chrome.

This works correctly for me in Firefox, but fails in Chromium with the error Failed to load 'http://localhost:8889/(...).pdf' as a plugin, because the frame into which the plugin is loading is sandboxed.

It is sandboxed, and quite deliberately so. And you're right that #3341 is where the sandboxing was introduced. This is a security measure, so we can't just disable it again. If you're interested, I'd suggest someone research what relaxations of the sandbox would be needed to let Chrome display a PDF.

CSP sandboxing docs: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/sandbox

@takluyver I suppose that as a security measure this is somehow meaningful, but since we already allow the kernel (e.g. python) to do anything to the filesystem, isn't it sort of pointless to have this kind of sandboxing? :stuck_out_tongue_winking_eye:

I do hope this bug can be resolved sooner. Sometimes PDF.js extension feels too clumsy for me... Unfortunately I don't have the necessary expertise to contribute, but I was able to (_kind of?_) circumvent this by reading the PDF as binary from the python kernel, then embedding it with a server side PDF.js engine - which is even clumsier, but at least I don't have to ask every one of my collaborators to install a PDF.js extension. :wink:

@matanster If you really want PDF in your ipynb, you can try something like this. :joy:

(previous post was wrong and was deleted)

Relevant Chromium bug: https://bugs.chromium.org/p/chromium/issues/detail?id=413851
Note that it's currently WontFix.

It boils down to "there's nothing in the standard to allow plugins to operate in sandbox; there's no allow-plugins rule".

It seems like Chrome and Firefox take different approaches to handling this. Chrome just straight up disallows it.

since we already allow the kernel (e.g. python) to do anything to the filesystem, isn't it sort of pointless to have this kind of sandboxing? :stuck_out_tongue_winking_eye:

The model we've got is that code you deliberately run can do anything (within the context of where the kernel runs), but opening a file should never be able to execute arbitrary code on your system. People don't expect that opening a document (whether that's a notebook, an HTML page, or a PDF) can start running code outside a sandbox. See also: word macro viruses.

The technical implication of this is that any pages served by the notebook server where we don't entirely control the content must either be sandboxed (so they can't talk to kernels) or sanitised (so they can't run Javascript).

We sanitise untrusted notebooks, because the notebook page has to be able to talk to the kernel. But sanitisation is tricky, edge cases can be missed (we had a CVE because of an interaction between our sanitisation engine and jQuery), and it breaks a lot of rich content. So we sandbox when serving (non-notebook) files - they can run Javascript, but the browser's cross-origin security mechanisms stop them talking to kernels.

Was this page helpful?
0 / 5 - 0 ratings