Nbconvert: NBConvert 6.0 Planning

Created on 9 Jun 2019  Â·  62Comments  Â·  Source: jupyter/nbconvert

In light of several conversations around larger changes, I wanted to start the thread on 6.0 targets and outline / discuss what changes should or shouldn't be in the next major release. We should make issues for each of the following that we want addressed. Any of the topics below can be questioned and challenged or pushed until later releases. Champions for each one would be appreciated to help move key subjects forward. I'll keep this top ticket updated with links and additional requested items from the thread below.

  • [x] Move execute preprocessor code into separate thin project
    @mpacer first brought up the idea, I think it would greatly help the separation of concerns and let nbconvert focus on conversion rather than execution. Other projects (i.g. papermill) that just want execution ability could import this new module rather than the much larger dependency chain of nbconvert.
  • [x] Python 3 only
    Remove, or at a minimum not support, python 2 moving forward. Seems straight forward enough for the major version upgrade.
  • [x] Directory based template loading, following viola and other jupyter project patterns.
    First part for adding additional paths will be in #1028 . It's intended to make templates a directory with resources and multiple asset files rather than a single template file with the jump to 6.0.
  • [ ] Eliminating the majority of usability concerns with latex, pdf, and html conversion.
    This is a tall order, and we've been trying to get improvements made for the major latex / pdf issues, but there's a lot of these and I'm not sure we'll have them closed for the last 5.x release. Will probably require some significant improvements to how templates are loaded, executed, and tested to get improvements to more production quality levels.
    ~- [ ] Deprecate / remove unnecessary or complicated configuration options
    We have a lot of configuration options, some of them are old or could be unnecessary now. I'd like to prune options and figure out how templating improvements could help where flags were used as crutches in the past. Not sure what the full scope could be here for this.~
    ~- [ ] Consider lightening or removing traitlets
    Not going to be a popular topic, but traitlets causes a lot of support burden and debugging issues. It's also been cited as the part that's kept many strong developers I know from contributing because it's complicated and difficult to understand how it operates without a lot of digging. I don't expect this topic to make it in for 6.0 but I thought I might at least spark a conversation on the topic to see how we could make the repo more contributor friendly.~

There's probably some more I'm not thinking of atm. I'll add items from the thread that seem relevant.

Most helpful comment

Nbconvert 6.0 is out! :tada:

The conda-forge PR should follow shortly.

Closing this issue!

All 62 comments

Move execute preprocessor code into separate thin project

I agree. It's weird that a conversion utility deals with execution, but we should keep the --execute flag in some fashion even if deprecated. We could reference the new package. It would be nice to have them developed independently.

Python 3 only

Yes please. Note: Python 2 notebooks will still be able to be converted. Nbconvert itself will just need python 3 to run.

Eliminating the majority of usability concerns with latex, pdf, and html conversion.

This will have a more clear path forward after I do issue triaging.

Deprecate / remove unnecessary or complicated configuration options

I'd be willing to help audit some of the configuration options. We could definitely do with some pruning. I'd also like to re-write some of the docs to make the python API and configuration options more clear. This would also make the repo more contributor friendly.

Consider lightening or removing traitlets

As an example of bugs: #774. That said. Traitlets and python based config files really do allow for a lot awesome config. Refactoring and lightening would be appreciated. I feel like a big problem with traitlets is the documentation. I had to look through the code to find traitlets.config.loaders.PyFileConfigLoader in order to load the nbconvert config file whilst using the python API recently since I wasn't using NbConvertApp.

Should we do 5.6 release for some of the fixes that aren't API breaking? It would also be nice to have milestones for "must be in >6.0" and "could be released before 6.0". It would help with triaging issues.

Note on traitlets. It actually helped me get into contributing to nbconvert when I was making #889 once I realised that all I had to do was add 1 line to make something easily configurable. I really think the limitation is in documentation more than anything.

Don't get me wrong that the ability to configure is really nice -- it's the debugging it when it goes wrong (which I just spent a good 5 hours doing so last week) is where I find myself questioning if we need something so complicated.

For the tagging I say we add anything that should be > 6.0 as Milestone 6.0 for now. Anything not milestoned could be before or after.

Can we consider making notebook the default export format for nbconvert in 6.0? This allows

jupyter nbconvert notebook.ipynb --SomePreprocessor.enabled=True

to be the command for notebook -> notebook operations. This way if people just want to apply preprocessor operations without changing to a different format it doesn't need an explicit --to notebook. This seems like a less arbitrary default than html.

Can we consider making notebook the default export format for nbconvert in 6.0?

I'm +1 for this

Python 3 only

:+1:

Consider lightening or removing traitlets

I certainly think traitlets are an attractive nuisance. They're a whole pile of complexity which is, I believe, largely unnecessary for this kind of application (I believe traits were originally designed to support making GUIs quickly). And they tie your config to class names, making refactoring a pain.

The one thing in their favour is that it's really easy to add a new config option. But, I'd argue, it's kind of too easy. You wind up making everything configurable because why wouldn't you add config=True, but then two config options don't play well together, or some future change becomes 5x more complex because you have to accommodate a config option just in case someone is relying on it.

OK, enough ranting.

Can we consider making notebook the default export format for nbconvert in 6.0?

Maybe with a warning, in particular if there are no preprocessors configured? I suspect there are a lot of people who are used to not needing --to html to get HTML output.

If I can suggest one further idea: drop postprocessors as a general category? There's only one (serve), and that only really makes sense for one specific format (slides). Maybe there should be a separate little tool that converts a notebook to reveal slides and serves them?

Move execute preprocessor code into separate thin project

I think this could make sense for several of the generic preprocessors which manipulate the notebook data, e.g. removing cells, clearing metadata. Like with execution, there's no real reason that they need to depend on the conversion machinery.

Maybe they should live together in a package called something like nbdo, to avoid creating N separate tiny packages to do a single function each. Then again, @kynan's nbstripout is already a standalone thing - though that also includes functionality for integrating it with git.

If I can suggest one further idea: drop postprocessors as a general category? There's only one (serve), and that only really makes sense for one specific format (slides). Maybe there should be a separate little tool that converts a notebook to reveal slides and serves them?

I'd be down for dropping the postprocessors and just changing preprocessors to processors. I found the naming and convention therein super confusing when I first joined and I believe the confusion is partially keeping people from contributing more by not understanding the model. Let's make a new issue for this idea and copy the notes from here.

I think this could make sense for several of the generic preprocessors which manipulate the notebook data, e.g. removing cells, clearing metadata. Like with execution, there's no real reason that they need to depend on the conversion machinery.

We were thinking the execute preprocessor should have it's own isolated package because papermill, nbconvert, and possibly a future async client library would all make use of just that dependency. Currently papermill pulls a lot of extra dependencies it doesn't need to get nbconvert installed. Would other preprocessors you've read have other direct importers that wouldn't want the rest of nbconvert? I'd be inclined to leave processors that are only used by nbconvert in the library, as we already have a lot of problems with users having mismatched dependency versions that don't work together. Would like to iterate on the ideas more here.

Would other preprocessors you've read have other direct importers that wouldn't want the rest of nbconvert?

nbstripout is doing something a lot like the clearoutput and clearmetadata preprocessors, although they're trivial enough that it reimplements the functionality. I think it's easy to imagine preprocessors like regexremove and tagremove being used to modify notebooks without converting them, e.g. #1132.

This is really going one step further with @t-makaro's idea to make notebook the default output format. Any time you want to nbconvert --to notebook, you're only using the preprocessors, not the conversion machinery. And it's the conversion machinery that has all the dependencies.

Brainstorming: would it make sense to specify a kind of in-memory pipeline, something like this:

nbdo clearoutput -- removecells --tag debug -- execute --timeout 60 -- convert --to html

I think chaining processors would be a good move. It would make the ordering of conversions clear and allow for one call to do all conversions. It also makes pre vs post processor not matter anymore.

I would be in favor of tagging alpha releases before we check all the boxes here.

As soon as we have it, we will start targetting the alpha in a development branch of Voilà and other places where we consume the new template system.

I can look this weekend at any remaining python 3-only items and kick off an alpha release to that effort. @takluyver anything you're aware of that we should close before doing so?

I haven't been following closely, but I think you can make alpha releases whenever you think it's useful. Just be aware that anyone running pip install --pre might pick it up - they don't need to specify the precise version.

@SylvainCorlay pre-release 6.0.0a0 is now on PyPI. I tested that it's only getting picked up by pip install --pre nbconvert. Normal nbconvert installs will still pick up 5.6.1. I also made sure python 2 isn't picking up the new pre-release.

Unfortunately notebook tests have picked up the pre-release, and are failing with an error about not finding templates:

Traceback (most recent call last):
  File "/home/travis/build/jupyter/notebook/notebook/nbconvert/handlers.py", line 132, in get
    resources=resource_dict
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/nbconvert/exporters/html.py", line 99, in from_notebook_node
    self.register_filter('highlight_code', highlight_code)
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 418, in register_filter
    return self._register_filter(self.environment, name, jinja_filter)
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 154, in environment
    self._environment_cached = self._create_environment()
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 436, in _create_environment
    paths = self.get_template_paths()
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 465, in get_template_paths
    template_names = self.get_template_names()
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 512, in get_template_names
    raise ValueError('No template sub-directory with name %r found in the following paths:\n\t%s' % (template_name, paths))
ValueError: No template sub-directory with name 'classic' found in the following paths:
    /tmp/tmpjcjzbt1f/data
    /tmp/tmpjcjzbt1f/env/share/jupyter
    /tmp/tmpjcjzbt1f/share/jupyter

https://travis-ci.org/jupyter/notebook/jobs/616385738

I'm not sure that's unfortunate or not. Is the build purposefully pulling
any (not released) dependency versions?

It does tells us where we need to prep fixes for 6.0 branch. I am guessing
that code needs to be 5.x/6.0 compatible.

On Sun, 24 Nov 2019, 1:58 pm Thomas Kluyver, notifications@github.com
wrote:

Unfortunately notebook tests have picked up the pre-release, and are
failing with an error about not finding templates:

Traceback (most recent call last):
File "/home/travis/build/jupyter/notebook/notebook/nbconvert/handlers.py", line 132, in get
resources=resource_dict
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/nbconvert/exporters/html.py", line 99, in from_notebook_node
self.register_filter('highlight_code', highlight_code)
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 418, in register_filter
return self._register_filter(self.environment, name, jinja_filter)
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 154, in environment
self._environment_cached = self._create_environment()
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 436, in _create_environment
paths = self.get_template_paths()
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 465, in get_template_paths
template_names = self.get_template_names()
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 512, in get_template_names
raise ValueError('No template sub-directory with name %r found in the following paths:\n\t%s' % (template_name, paths))
ValueError: No template sub-directory with name 'classic' found in the following paths:
/tmp/tmpjcjzbt1f/data
/tmp/tmpjcjzbt1f/env/share/jupyter
/tmp/tmpjcjzbt1f/share/jupyter

https://travis-ci.org/jupyter/notebook/jobs/616385738

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/jupyter/nbconvert/issues/1045?email_source=notifications&email_token=AAOHBVA5EQQ77VDU7LPECATQVL2H5A5CNFSM4HWOEWPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAV5KQ#issuecomment-557932202,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAOHBVFKAQKVLLAJDE5LLTTQVL2H5ANCNFSM4HWOEWPA
.

It deliberately uses pip install --pre, presumably to get advance warning of issues with dependencies. So as you say, it's fortunate that it could get picked up quickly.

I'm guessing the issue is that the template files are now found using the 'jupyter path' search path, but the notebook tests monkeypatch this to isolate themselves from the user's local setup.

Certainly the notebook code and test suite needs to remain compatible with nbconvert 5.x at least until nbconvert 6 is released (and really for a while afterwards too).

The notebook issue only seemed to be test run issue: https://github.com/jupyter/notebook/pull/5092 https://github.com/jupyter/notebook/issues/5086.
So far it seems solid.

I'm now working on voila and many of the templates, so far so good.

FYI, I released 6.0.0a1 today which has everything in master as the nbclient library usage is now merged. We should make an effort to burn through the remaining 6.0 milestone items (or remove things we aren't going to get to) and get a real release sometime soon.

One note, in case you missed it, we're now python 3.6+ as we had to drop 3.5 due to issues with async in upstream libraries.

This is great! We are now using nbconvert 6.0a1 in Voilà master, with great results?

Are there any blocker for anyone to move forwards towards a 6.0 release?

Also, after the release is tagged, it may be interesting to work together to draft a blog post for the Jupyter blog listing the main features of the release (nbclient, async features, new templates, dropping Python 2 and much more)!

Are there any blocker for anyone to move forwards towards a 6.0 release?

I think the only remaining items are to cleanup some usability issues (e.g. document new template directory patterns), squash the bugs labeled with 6.0 milestone, and do an overhaul on the latex/pdf conversion so we have fewer issues with attachments and character mappings. I was planning to tackle this last one soon as I want 6.0 truely released asap. But my time has been limited recently to get that last push. If folks could look through open issues in the milestone and help knock them out it'd help a lot here to getting a full release finalized.

Also, after the release is tagged, it may be interesting to work together to draft a blog post for the Jupyter blog listing the main features of the release (nbclient, async features, new templates, dropping Python 2 and much more)!

This sounds like an excellent idea. I'd be down to help.

If folks could look through open issues in the milestone and help knock them out it'd help a lot here to getting a full release finalized.

Ok I will see if we can put some cycles at QuantStack sort some of these out!

Maybe they should live together in a package called something like nbdo, to avoid creating N separate tiny packages to do a single function each. Then again, @kynan's nbstripout is already a standalone thing - though that also includes functionality for integrating it with git.

I'd be open to

  1. "donate" nbstripout to Jupyter i.e. move it into the jupyer org
  2. merge it into a larger package if that makes sense

Do you think there's a need to a more generic (base) package to facilitate integration with git?

I'd be open to

  1. "donate" nbstripout to Jupyter i.e. move it into the jupyer org
  2. merge it into a larger package if that makes sense

Do you think there's a need to a more generic (base) package to facilitate integration with git?

@kynan That's very gracious of you. I think it'd be fine as a stand-alone import at first. From reading the code in the repo there's sort of two functions present, utils.py strip_output and the nbstripout git install function. If those functions were to merge into jupyter I imagine that strip_output would belong in nbformat and the git integration would eventually combine with nbdime in a more unifed jupyter git package. For the short term though we could make it a dependency and promote into the import path of nbformat for strip functionality on notebook classes and address the shared git concerns separately. This would make the functionality native to current patterns in nbconvert for the concerns it has around cleaning notebook documents.

@SylvainCorlay I'll start working on the last major bullet this weekend (though I might not have a PR for that for a week or two).

@kynan what are your thoughts on us incorporating nbstripout into nbconvert/nbformat given the earlier response? I'd be happy to have it used in any form rather than re-implementing.

@t-makaro do you think you'd have some cycles in the next couple weeks to go through open 6.0 issues that you might be able to aid with or triage for us as 6.0 to later?

@t-makaro @SylvainCorlay I'd like to move 6.0 to the finish goal soon. I've started solving or closing the open 6.0 milestone issues. I think what's left is relatively mangeable. What's left in your minds of things that need addressing that we might have missed (e.g. reveal template state)?

What's left in your minds of things that need addressing that we might have missed (e.g. reveal template state)?

I think that we are nearly done.

  • The reveal template now works but is not ISO with the look of the previous version. I need to make a pass to review the CSS.
  • There is a PR coming up by @maartenbreddels to improve how we register preprocessors in conf.json (with a dict instead of a list) so that we can merge a hierarchy of conf.json through the regular configuration merge system.

We should probably tag a real beta then.

I am going to work on a documentation update tomorrow. I really want to do a complete overhaul of the python API docs. I'd also like to do some flattening of the latex templates but I might not get to it. I also want to take a stab at #911; we might want a behaviour change here (maybe not) so now is the time to look at it.

That sounds great. I was going to take a stab at remaining open 6.0 milestone issues to see what I can close out. Additional I was going to check in on how backwards compatability / warnings for old template patterns vs new are working to see if we can make the transition smoother there. I know @maartenbreddels was working on that but I don't remember where it ended up.

Let me know if I can help with specific issues, I'll try to make sure to set some time aside to get nbconvert over the 6.0 line.

We should probably tag a real beta then.

Should we just prep a jump to 6.0.0? I think a lot of people have been opted into the alphas and I'd like to not have another long cycle before it's default accessible, at the cost of maybe needing a 6.0.1 patch for a minor issue that might arise.

We have been using the alphas in the CI of Voilà for some time.

If we make a public beta and warn enough people that we are very close to a release, we may get some adoption. I think we need some improvement of slides stuff first though.

Alright, that works

FYI, I pushed another quiet alpha (6.0.0a4), so that we can start testing the latest changes in voila-reveal.

Sounds good, I think some rapid alpha releases to support testing the last changes is a good idea. Btw in helping with https://github.com/jupyter/nbconvert/pull/1292 I think I got most of the "backwards compatability / warnings for old template patterns" done if you wanted to take a look later.

I am thinking of kicking off a beta release and setting an explicit date for 6.0 official release. I'd prefer to do the full release in a week or two (at most).

The only things I have that I am queuing to complete are:

There's a number of bugs still left in https://github.com/jupyter/nbconvert/issues?q=is%3Aopen+is%3Aissue+milestone%3A6.0 but nothing I'd consider blocking a release. I am going to post to discord about seeing if anyone that's not a common contributor is open to helping wrap those up before a release goes out.

What are folks thoughts on kicking off a beta release now and a full release next ~weekend? Is there anything else that needs to be actively addressed @t-makaro @maartenbreddels @SylvainCorlay ?

Oh and https://github.com/jupyter/nbconvert/issues/582 perhaps if we want to take an action around PostProcessors or not.

I kicked off a 6.0.0b7 release for the beta and am going to announce it on the discourse so people are aware / ask for interested parties to contribute any of the open bug fixes they want in before the final release.

6.0.0b7

why not b0?

I think it is really weird to release the first beta with b7. I think that we should delete that release and make a b0 available before communicating about it. @MSeal are you ok with me going ahead with this?

It was the natural progression of pre-release numbering from a build number perspective after a6. It also showed that we had several prior builds, and the number doesn't really matter as it's not semver anyway. I'd prefer we just leave it to avoid confusing people who have already installed the latest pre-release because pip tooling doesn't do great with installing when you go backwards and not forwards with release progression alphanumerically.

OK, sounds good.

Nbconvert 6.0.0rc0 is out with the latest changelog. I sent another round of emails to downstream maintainers about it.

The only things I have that I am queuing to complete are:

1314

1350 (if possible before a release, not going to block on it)

If you don't have time to tackle this, I think it would be great to table it to the next release. We are blocking on nbconvert 6.0 for quite some time, and the more we wait, the more difficult it becomes...

Update README to reflect #1348 (comment)

I opened a PR to the readme to add the comment

@SylvainCorlay That's fair, I was just going to take some time today to see if I could get it resolved. I agree and would like to release 6.0 this week. Give me a couple hours and I should have that PR updated, but I agree that we don't need to block on it.

Great! let's aim for tomorrow (Tuesday)!

Excited for 6.0, thanks for working on this release!

~I don't want to delay, I just want to raise awareness about https://github.com/jupyter/nbconvert/issues/1358. If you can reproduce that issue, it might be something to consider fixing before 6.0 because it blocks HTML export, if you cannot reproduce I guess it is something local although I have tried it on two different laptops and had it reported by several of my students running a6 (on Windows).~

~I just tried with the rc0 zip and the same problem is there. In essence, I create a new conda env with only jupyterlab (and deps) installed, HTML export works fine until I do pip install https://github.com/jupyter/nbconvert/archive/6.0.0rc0.zip, then I encounter the issue reported in #1358.~

It seems like this is only an issue with installing nbconvert zip or tar.gz files from GitHub, not directly from a commit, so likely won't be a problem for the release. Sorry for the noise.

Np, thanks for reporting. I'll follow up on that oddity in #1358 but I agree I'm not hitting a blocking issue for release.

@SylvainCorlay Since we're in fairly different timezones how do you want to coordinate the release itself? I did a PR using the release automation tooling to capture the remaining authors for this release: https://github.com/jupyter/nbconvert/pull/1363

I'm happy with either of us running the release after that merges.

What about #1347?
This still seems to be broken, right?

Ahh we should check that and see if it's an issue or not (I won't have any time tomorrow to do that)

There is still an error in the example notebook mentioned in #1339 (see also https://github.com/jupyter/nbconvert/pull/1292#issuecomment-656599554).

This doesn't yet show up as a CI failure (see #1350), but I think the notebook itself (or whatever is causing the error) should be fixed before 6.0 because the failure will be visible in the docs, where the example output is much too long: https://nbconvert.readthedocs.io/en/latest/nbconvert_library.html#Example

@MSeal

we should check that and see if it's an issue or not

I've tried running the notebook docs/source/customizing.ipynb (which has been removed in #1354) and it still produces the errors mentioned in #1339.

The usage in this notebook might be wrong, though, I don't know.

There is still an error in the example notebook mentioned in #1339 (see also #1292 (comment)).

This is now fixed!

The usage in this notebook might be wrong, though, I don't know.

Yes, I think this was outdated.

@mseal @maartenbreddels I fixed the last issue reported by @mgeier.

There is the question of #1347, which is about supporting the cwd as a template path. I am not sure we should do this though.

OK, I added a comment about #1347.

If nobody objects by then, I will tag 6.0.0 tomorrow morning (CET)!

I think we're good for a release, thanks for offering to handle shipping @SylvainCorlay. If minor things come up after let's just be more proactive with patch releases with fixes than we were with 5.x.

Nbconvert 6.0 is out! :tada:

The conda-forge PR should follow shortly.

Closing this issue!

I can post on discourse later today or tomorrow about the event. Awesome work everyone and thanks for getting 6.0 across the finish line

Was this page helpful?
0 / 5 - 0 ratings