Jupyter-book: Export notebooks as PDF with no page breaks (using HTML)

Created on 23 Aug 2020  路  7Comments  路  Source: executablebooks/jupyter-book

Is your feature request related to a problem? Please describe.

Currently jupyterbook's Print to PDF functionality relies on ~nbconvert to export to a pdf intended for _actual printing_~ UPDATE: See below, nbconvert isn't used here. Of course, most people do not actually want to print notebooks, but rather read/refer to them offline or on mobile devices.

@betatim has just created an extension that exports a notebook to PDF without LaTeX, and with the minimum amount of page-breaks.

Describe the solution you'd like

The nice thing is that the extension is also possible to use with nbconvert directly:

jupyter-nbconvert --to PDFviaHTML example.ipynb

So it's very possible that this can already be configured to work today (but I just don't know how).

Describe alternatives you've considered

  • Using CSS to export to PDF (issue #761)
  • Also briefly looked at @phaustin and co's work in this issue

Additional context

I have tried it on a couple of notebooks and it works quite well. The only thing that is currently missing is the Myst-related directives image exports.

Below are two PDF exports from an example jupyter book that is not mine (ThreatHunter Playbook):

  1. Standard Jupyterbook PDF export

  2. Using notebook as PDF extension to export

Currently the extension only puts H1 level headers in the auto-generated TOC, but lower level headings may be coming soon.

enhancement

Most helpful comment

How about we re-name this issue as something like "make PDF through HTML output no page breaks". That seems more specific and would be useful!

I think the fix would be somewhere around here:

https://github.com/executablebooks/jupyter-book/blob/master/jupyter_book/pdf.py#L42

Note that we're (I believe) using the same library as @betatim so perhaps we can just copy his solution straightaway?

All 7 comments

relies on nbconvert to export to a pdf intended for actual printing

Thanks @firasm!
Just to note, we don't actually use nbconvert for any conversions, since it is too limiting for our use case (i.e. it would not be possible to implement the MyST directive features)

Not to say that it may be of use; directly or learning from its implementation.

Looking at the code though: https://github.com/betatim/notebook-as-pdf/blob/master/notebook_as_pdf/__init__.py, it seems mainly to just launch the HTML in a browser and print the page (with a few touch ups), which doesn't seem too much difference from what we are doing now, unless I am missing something?

Also CC @mmcky and @AakashGfude, who are working on PDF generation as we speak and, particularly for this use case, single page PDFs 馃槃

Ahh - my mistake then. I didn't realize nbconvert wasn't being used here! Okay that likely makes using the extension not viable then...

The single most important thing I'm after is "no page-breaks" in the exported PDF. I'll see if I can implement this part of Tim's code using CSS:

    await page.pdf(
        {
            "path": pdf_file,
            "width": width,
            # Adobe can not display pages longer than 200inches. So we limit
            # ourselves to that and start a new page if needed.
            "height": min(height, 200 * 72),
            "printBackground": True,
            "margin": page_margins,
        }

A nice to have (but not a requirement) is bookmarks/table of contents as part of the PDF rather than taking up a 1/5 of the page width on the first page. I have figured out how to remove the TOC sidebar completely using CSS so this isn't essential.

The single most important thing I'm after is "no page-breaks" in the exported PDF

馃憤
@mmcky and @AakashGfude, make it happen lol!

We are currently working on building pdf via LaTeX at the document level in this PR which is a different pathway than the one described here. In past experience on other projects -- if you can get the LaTeX route supported it produces nice pdf files. We are currently working on making the link to PDF via LaTeX available first -- and then will work on robustness.

@AakashGfude can you add to you todo list to look at this suggestion as an alternative pathway for building pdf files via browser. There is also this LaTex.css styling that makes sites look like LaTeX documents which may be promising as well.

How about we re-name this issue as something like "make PDF through HTML output no page breaks". That seems more specific and would be useful!

I think the fix would be somewhere around here:

https://github.com/executablebooks/jupyter-book/blob/master/jupyter_book/pdf.py#L42

Note that we're (I believe) using the same library as @betatim so perhaps we can just copy his solution straightaway?

It is open-source so as long as you follow the license you can copy the code :D but you can also use notebook-as-pdf as a library if coding by copy&paste isn't your thing.

I'd be happy to help on a PR that adds a nice public API to notebook-as-pdf so that there is a single function to call that does all the work.

also -- as far as pdf with pagebreaks goes: we are definitely interested in implementing something like the pagedjs functionality of our sphinx extension for jupyter-book.

Was this page helpful?
0 / 5 - 0 ratings