r.js optimizer would produce: a concatenation of defines. It will include all runtime dependencies needed by the bundle. The plugin author must have node installed in order to run this script. The tool which we use to generate this bundle is an implementation detail for all but the most advanced use cases.RequireJS loader which provides dynamic lookup of required modules using the semver library. Even if the same module is bundled multiple times, only one version of the module will ever be executed. Since all modules in the final bundle are registered before the bootstrapping code executes, we guarantee that we only load the most up-to-date version of the module which matches the semver requirement. This is effectively client-side deduping at runtime.The use of RequireJS is an implementation detail and we make it clear that it is not intended to be used for dynamic imports.
The user does not need node or npm or internet access to install plugins.
npm dedupe.Since the use of RequireJS and Webpack are implementation details, most extension authors need only re-run our build script to generate a new version if we switch to a different implementation. Authors using custom WebPack configuration would require some manual re-writes in this case.
We are potentially shipping code to the client (duplicate modules) which will never be executed. Duplicate modules also increase the on-disk size of the extension.
This is a re-direction and continuation of #224, where the final page including extensions is creating using a single build step.
This allows us to hide the implementation details from extension authors, who simply have to make the extension available as an npm package written in commonjs format.
config and data paths - most likely using a jupyter labextension endpoint.Atom packages use the atom public API but are otherwise completely stand-alone npm packages in ~/.atom/packages.
Thought exercise for having packages that are self-contained and combined in a single build:
node_modules includednpm install --production in jupyterlab/jupyterlab (after removing the enormous material-design-icons package ~374MB), the node_modules is 34MB on disk, which would mean any self-contained extension would be at least that big. package.json and run npm dedupe && webpackA glaring issue with the above is that each package would ship with a locked version number of each of its dependencies, which would mean that JupyterLab would have to follow suit.
@SylvainCorlay brings up an interesting point - In this choice of assembling the javascript on the user's computer, we are shifting complexity from the plugin authors to the plugin users (plugin users now need npm installed). This is opposed to the plugin author providing dependencies (either as a bundle, or as a checked-in node directory). Some thoughts on this:
I'm sure @SylvainCorlay has more thoughts that he'll post.
We cannot have duplicate dependencies for phosphor at the very least, because it breaks drag-drop semantics. If we push more onto the plugin author, we are exposing our module loader implementation.
I have a very very strong preference for 2 in the list by Jason.
If extension installation requires building an npm package, this is going to be a huge cliff for the typical data-scientist who installs ipywidgets / bqplot.
I think that it is ok to have duplication between extension-provided bundles (that may bundle e.g. d3 multiple times) for now.
So let's not solve the general in-browser dedupe problem and let duplication happen. All we need to decide is what are the main dynamically loadable modules provided by jupyterlab for extension authors.
We cannot have duplicate dependencies for phosphor at the very least, because it breaks drag-drop semantics.
Yes, we'd have to provide a set of platform externals (e.g., phosphor), like we were doing. I don't like having special treatment for JupyterLab platform packages either, so I see this is as a reason for deduping on the user's computer.
In what we were doing before, we were letting each package declare which dependencies it provided to others, but that leads down a terrible rabbit hole of basically writing a package manager, drawing lines between what should be provided and what should not, and still doesn't solve the problem of deduping between peer plugins.
CCing @ellisonbg and @sccolbert too...
It would be absolutely catastrophic to need to build npm modules on installations of extensions, and would raise the bar so high for users that we would lost most of our user base.
More food for thought: currently we're running into issues with jupyter-js-widgets - what I _think_ is happening is that both the notebook and jupyter-js-widgets is bundling jquery-ui/jquery, and one is stomping on the other. I think we have this problem right now in the current notebook and current ipywidgets, where bundling things up before distribution creates problems.
cc @fperez
What if the stance were:
JupyterLab would provide jupyterlab/lib/foo and jupyterlab/externals/phosphor/foo/ etc.ipywidgets/externals/backboneexternals configNotice that the above approach mimics the behavior of npm dedupe but at build time for the extension author, not for the user.
Peers that provide the same dependency would always be duplicate in this scenario.
I can dust off #720, which was well on its way toward implementing the above.
cc @bollwyvl @damianavila
An alternative if we wanted to account for peer dependencies would be to provide our own require() function that takes a semver string and gives back the appropriate module. This would mean that JupyterLab would provide [email protected]/lib/foo and use jupyter.require('phosphor^0.6/lib/foo'). The jupyter.require() function would find the appropriate version and load it.
The problem with the above is that you start having to morph the WebPack bundle to get it to use this custom import (unless we explicitly override window.define), and it is harder for the extension author to reason about.
An extension provides all of its stuff that it wants to expose using AMD
Do you mean also all of the dependencies they don't expose too?
I don't know of any npm packages that bundle all of their dependencies in their distribution. It seems fraught with very subtle bugs. Especially if some things are duplicated and some are not. It seems that we're setting ourselves up for lots of pain in debugging and having weird interactions between plugins.
No, you'd have an explicit list of libraries you provide in a jupyter.lab key in package.json.
Most extensions would be the single self-contained WebPack bundle, if they are using our script.
An extension provides all of its stuff that it wants to expose using AMD
Do you mean also all of the dependencies they don't expose too?
An extension would have a webpack build step as part of their prepublish script. The resulting bundle would only depend on the external dependencies provided by the platform.
Like the bundle for bqplot available https://npmcdn.com/[email protected]/dist/index.js which start with define(["jupyter-js-widgets"].
What Sylvain said.
If extension installation requires building an npm package, this is going to be a huge cliff for the typical data-scientist who installs ipywidgets / bqplot.
@SylvainCorlay - let's discuss this assumption. To use the packaging system "correctly", it just requires that they have npm installed.
Anaconda already has npm packaged, so I assume that that is painless in that situation - any third-party plugin's conda package would depend on npm, but jupyterlab itself wouldn't (since it distributes the prebuilt bundle for included plugins). Since ipywidgets is a third party plugin, this probably means that almost everyone will end up having npm installed.
For the case outside of Anaconda, that's where it gets trickier. A third party extension plugin won't _just_ be a pypi package anymore, since it depends on having npm installed as well, which is outside of the python ecosystem. I think your claim is basically that outside of Anaconda, it's tricky or impossible to get npm installed and configured (indeed, it does look tricky from a cursory search about installing node without installing it at the system level...). Is that the main issue?
An extension would have a webpack build step as part of their prepublish script. The resulting bundle would only depend on the external dependencies provided by the platform.
Like https://npmcdn.com/[email protected]/dist/index.js which start with define(["jupyter-js-widgets"].
Sounds like we are using the words "provide" differently. Perhaps this is clearer: "Do you mean an extension includes all of its dependencies' code, even the things it is not providing to downstream plugins"? I.e., the ipywidgets labextension package has two bundles it will install - jupyter-js-widgets and all of its dependencies, and jupyter-js-widgets-labextension and all of its dependencies (except jupyter-js-widgets). (Similar to how we did things in https://github.com/ipython/ipywidgets/pull/727/files#diff-eb0670367ac857fbcfb6904f6cbb8fdeR1)
The ipywidgets extension can choose to provide jupyter-js-widgets for downstream use. It would require jupyterlab things from the environment and shim jupyter-js-widgets for others to use.
bqplot would require jupyter-js-widgets from the environment and potentially provide d3.
For the case outside of Anaconda, [...] Is that the main issue?
That is a big one. But another issue is that I think it is illusory to think that building on the final user machine will work nicely. My opinion is that it will break in most cases.
Examples: various versions of npm installed on the user machine overriding the one of anaconda, path issues, discrepancies of behavior of npm depending on the platform (windows, debian), an extension that breaks your jupyterlab bundle, npm cache files in user npm directory owned by root because the user has run npm install -g with sudo for some other context, etc...
Another option is to go ahead and ship the entire node_modules folder with every plugin (perhaps except for some system-provided packages), and have a _python_ program that does the deduping, and optional bundling on the client side. (That also sounds painful.)
Crazy idea: what if we could dedup webpack bundles automatically, on the client side? Load a bunch of bundles at the start, dedup their provided modules on the client side. Bundles loaded after the initial start wouldn't be deduped against each other, but against the base system. This doesn't prevent shipping lots of extra code, but it doesn't require npm on the user's system either.
Everyone, let's discuss this more at the jupyterlab dev meeting tomorrow.
while sitting here, waiting for an
npm installto finish installing over a spotty connection...
I have just been trying to play along and represent what _could_ happen in a few cases w/r/t to anaconda/conda, but would really like it if we did _not_ end up with a runtime dependency on npm... _unless_ it is actually doing something magical like enabling multi-user collaboration.
One of the ideas we talked about a fair while ago, actually in the context of the original notebook extensions, was a shared static asset distribution namespace that ended up looking kinda like bower/npmcdn:
- /share/whatever/
- jupyter-js-widgets/
- 1.0.0/
- main.min.js
- package.json
Of course, this would be some unioned path between /share and ~/.jupyter/ and $PREFIX/share, but whatever, you'd end up with a big stack of AMD modules. Bleh. But whatever. Tornado would be able to handle some kind of redirection to latest (presumably by semverspecting the package.jsons).
Then, the runtime task becomes configuring the requirejs namespace to tie everything together, which is probably simpler than actually doing a build.
Everybody would have to do a lot of builds, as a number of folk would build jquery/d3/etc. But them's the breaks!
We could extend the "local npm" approach by using a python script to generate a requirejs bundle, which is just a concatenation of define() blocks.
And then when http 2.0 comes along we just stop bundling.
Yes, keep in mind that there is already a tornado http2 project: https://github.com/bdarnell/tornado_http2 (discussion at https://github.com/tornadoweb/tornado/issues/1438)
Or maybe we don't bundle by default (single-user case), but suggest that people running shared servers install npm and webpack everything up.
But a bigger issue than bundling is deduping. We run into the deduping problem in the single-user case even if we don't bundle.
Compared to the other problems, deduping should actually be "just" a data structures and algorithms exercise. We've got a python semver package.
deduping should actually be "just" a data structures and algorithms exercise
Agreed. That's why I wrote yesterday:
and have a python program that does the deduping
Another idea: to generate an extension, we create a build directory and install all of the dependencies except the ones that are also in upstream dependencies that have compatible semvers, and ship that folder. This would cut way down on duplication in shipped code. We then assemble all of this code on the user's machine (only when the plugin manifest changes), and handle dedupe/bundling/etc. from Python.
I was thinking that we'd ship the entire node_modules. Great idea to do partial deduping with what information we have at package build time.
I think we probably don't need to solve the bundling problem in python. The way I see it, we have two classes of users/use-cases:
I am feeling more and more comfortable with this approach. It is not "standard", but neither is allowing third party extensions to a Web app, and it is a lot cleaner than what Atom is doing.
I'm reading up on what VSCode does now.
VSCode has a publish and package commands that generate custom build artifacts.
On disk, it ends up being the same thing as Atom packages, self-contained node installs.
Yep, both atom and vscode make the core library available as a global.
As a global, or as something you can require?
In the case of atom, it is a literal require(), looking at vscode now.
var vscode_1 = require('vscode');
Note: neither Atom nor VSCode are running in the browser.
Ugh, if we don't have a bundle stage, extensions have to write AMD-style code exclusively.
No, I suppose they just have to _ship_ an amd entry point.
For which we could provide a script.
Maybe not even that. I think require can load common js modules.
On building npm packages
I think that from a scientist using the python scientific stack, having to build a node module to install an extension is a recipe for a disaster. I have invoked multiple reasons earlier in the conversation.
For this sort of audience, pip install pythreejs or even pip install ipywidgets should work out the box, which imposes some support for dynamic loading of javascript resources in jupyterlab.
Extensions
Here, by jupyterlab, I mean the conda/pip installable package that comes with a javascript bundle. Anyone could use the jupyter npm package to create _something else_ but by jupyterlab now I mean the pip / conda package that has both
The js bundle in jupyterlab should defines a number of amd modules such as phosphor, lab, services, which can be loaded by the extensions.
An extension is a pip / conda package that comes with one or more javascript bundles (but let's assume one here). This javascript bundle declares the dependencies that are part of the base platform or provided by another extension as externals.
No version check of the dependencies of an extension needs to be done on the javascript since any external js dependency corresponds to a pip / conda package dependency.
For example, if both interact.jl and ipywidgets depend on the widgetsnbextension conda package, and the conda solver finds a version of widgetsnbextension that matches both requirement, then the js bundle for widgetsnbextension must have a compatible version.
This has a consequence on version numbering of conda package that ship js resources: anytime the conda package ships a new version of the js bundle that has a change in its (major, minor, patch) version number, it must also increase its major / minor / patch version number.
Deduping
I propose that we don't try to do any deduping.
jupyter-js-widgets symbol and any package that does too is a bad citizen.I think that from a scientist using the python scientific stack, having to build a node module to install an extension is a recipe for a disaster. I have invoked multiple reasons earlier in the conversation.
I would expand the audience beyond the scientist to social scientists, data analysts, teachers esp. high school, college students, etc.
For this sort of audience, pip install pythreejs or even pip install ipywidgets should work out the box, which imposes some support for dynamic loading of javascript resources in jupyterlab.
FWIW. The more complexity of install burden put on a user (i.e. installing node/npm), the more difficult the use becomes for teaching tutorials or workshops (i.e. installation takes longer and is more frustrating), and the more burden for support and maintenance put on our developers.
Once comment that is separate from the design and implementation of third party package handling:
I am a little concerned about allocating too much developer time to solving this problem right now. We have a ton of interested users who are anxiously waiting on JupyterLab being ready for real usage. The amount of work we have to do on core features, UX/UI work, etc. is very significant.
I know that getting third party extensions working is _super_ important, but having them working won't make a difference to most users if JupyterLab itself isn't ready for real usage.
I know that different organizations involved in the development of JupyterLab have different priorities, and that those priorities may require solving the third party extensions issue sooner rather than later. I want to be sensitive to that, while at the same time encouraging us to focus on the core features and usability of JupyterLab.
I know that getting third party extensions working is super important, but having them working won't make a difference to most users if JupyterLab itself isn't ready for real usage.
Widgets libraries are one point of extension that is core to what the notebook is in my opinion.
Another comment about us running a build service. Here is a description of what I am thinking:
Now the details of the bundler service:
I want to acknowledge that there are many different issues under consideration in this discussion and that such a bundler service may not be needed. But I at least wanted to spell out in more detail what I was thinking.
widgetsnbextension should be the only package allowed to define globally the jupyter-js-widgets symbol and any package that does too is a bad citizen.
-1 on treating widgets differently than other packages. I think widgets is our chance to step into third party extension authors' shoes.
widgetsnbextension should be the only package allowed to define globally the jupyter-js-widgets symbol and any package that does too is a bad citizen.
-1 on treating widgets differently than other packages. I think widgets is our chance to step into third party extension authors' shoes.
That is not what I said. I just took widgetsnbextension as an example.
My plan is to not think about this until Monday morning and come at it again with fresh eyes, have a good weekend folks!
That is not what I said. I just took widgetsnbextension as an example.
Ah, I misunderstood you. What I understand now is that you are saying that no other package besides widgetslabextension should provide the jupyter-js-widgets package (as an example), and somehow everyone should know that. Is that right?
I am a little concerned about allocating too much developer time to solving this problem right now.
I think it is important to achieve consensus and a workable solution for this now, in the early stages, otherwise we are exposing us to really problematic times in the future when we have a narrow space to implement alternatives without breaking everything backwards.
Since JupyterLab is a compo-sable experience from the very beginning and I think most (if not all) of us think Jupyterlab in that way for the future as well, we should spent cycles on this discussion because of the large implications these discussions and the solution/s achieved will have in JupyterLab future and success.
Widgets libraries are one point of extension that is core to what the notebook is in my opinion.
This is just one example about why this is important regardless of different organizations interests... this is essential for ipywidgets which, probably, most of us consider something pretty close to the core of our desires about what Jupyter ecosystem should provide.
Here are a few examples of extensible applications in a browser:
Cloud9 Packages use require.js, but as a DI container where the plugin defines the list of plugin names it consumes and the main entry point is called with those plugins as attributes of an imports object. The plugin calls a function in its entry point to register the concrete object it provides.
https://cloud9-sdk.readme.io/v0.1/docs/create-a-package
Codiad plugins use a global.codiad attribute and plugins add themselves to that variable.
https://github.com/Codiad/Codiad-Plugin-Template/blob/master/init.js
Eclipse-Che plugins are written in Java and transpiled to independent bundles.
https://eclipse-che.readme.io/
All of the above, along with Atom and VSCode, are self-contained from the perspective of the plugin author, and there is no explicit concept of de-duplication or handling of multiple versions of external dependencies. Generally, plugins are given their own namespace (or entire directory), and are not encouraged or able to extend beyond it.
So they don't try to solve the problem as I was suggesting.
I understand what you mean @SylvainCorlay, but can you elaborate a bit on that statement?
(they don't try to dedupe and allow dynamic loading)
Speaking of, if we could just sandbox extensions from each other while exposing framework libraries, we would not need to worry about deduping between extensions.
A first solution would be to run each extension in a separate javascript context (for example via an iframe in the same domain) and pass the main modules provided by jupyter lab (phosphor etc). The extension author should only be careful about not duplicating things that are _in_ the main jupyterlab bundle.
Also there is this project: https://developers.google.com/caja/docs/about/
This could also be a good way to start addressing security concerns about extensions.
Last Friday, it seemed that there was some disagreement about fundamental assumptions. Can we vote on the next few statements to try to understand where we are really disagreeing? (You can use the thumbs-up or thumbs-down buttons, and elaborate in a comment if you want).
We _must_ have a way of installing and using extensions that does not require the user to have node (and thus npm too) installed.
Another question for vote:
There must be a way to distribute self-contained extensions that include their runtime dependencies that we don't know are available in the system. (i.e., it's okay to omit dependencies that we know are required by our dependencies, but we must include any other runtime dependencies).
There must be a way to distribute self-contained extensions that include their runtime dependencies that we don't know are available in the system. (i.e., it's okay to omit dependencies that we know are required by our dependencies, but we must include any other runtime dependencies).
Do you mean in a bundle or in the package as bare npm package to be built later on?
Sylvain, either way.
We must have a way of installing and using extensions that does not require the user to have node (and thus npm too) installed.
What would it be like to vendor node and npm with jupyterlab?
vendor node and npm with jupyterlab
Do you mean distributing node and npm with jupyterlab, so that a jupyterlab-specific local copy of node is installed with jupyterlab?
That is correct.
I haven't tried, but I imagine that the fact that node is compiled and platform-specific makes it way out of scope to distribute with jupyterlab.
The reason I bring it up is that as I read this discussion, there's a standstill over whether you can require users to have node available.
To agree on some points: I think it's pretty likely that each plugin has to have a separate context for the built asset. In the current lens, that means a webpack bundle for each plugin. It's also the only way to be free of conflict while letting people be dynamic.
I'll have to ask about this deduping, and why it's necessary because it forces your hand into having to reload the page with a brand new bundle (hot module reloading _can't_ work in this setup).
The way Atom and VS Code work is indeed by each plugin having its own context. It's fun looking back on their plugin history. Lots of packages are small and some used to be lots of separate packages. Nuclide is the notable example - they tried to make smaller packages which relied on their own hooks then realized performance wise it was better for them to have a big package. As time went on Atom ended up improving the packaging system so that plugins themselves could be providers to others (yes plugin plugins). The best example of this is likely the atom linter setup. I'm amazed at the natural feeling I get in go-plus is similar to the one I get with JavaScript and TypeScript.
With each package separate the little packages and the big packages don't collide, dependencies wise. They can still break each other on the DOM of course, and we should expect this in jupyter environments as well.
Allowing duplicate instances of modules means that instanceof will break (among other things) as soon as you start sharing common objects across plugin boundaries. These "bugs" take _hours_ to debug if you haven't dealt with it before. I'm not sure that's feasible in the long-term.
As time went on Atom ended up improving the packaging system so that plugins themselves could be providers to others
FYI, that's in our design too - that's how widgets would work for example, as a plugin extending to the notebook plugin.
I don't think rebuilding the application bundle with extensions into a single bundle on every extension install/enable/uninstall is going to be tenable.
We have a plan incoming. Stay tuned while we write it up...
I don't think rebuilding the application bundle with extensions into a single bundle on every extension install/enable/uninstall is going to be tenable.
I think that is exactly what some usecases will want, for example, server admins serving up a single set of plugins as a hosted solution.
While the server admin use case is a reasonable one, end users will be unhappy with having to rebuild (and in many cases may not have the technical experience to do so).
I also built the legacy notebook on a Raspberry PI and it took a few hours.
Pre-built browser-ready extension are also important for such platforms.
S.
On Aug 29, 2016 8:34 PM, "Carol Willing" [email protected] wrote:
While the server admin use case is a reasonable one, end users will be
unhappy with having to rebuild (and in many cases may not have the
technical experience to do so).—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jupyter/jupyterlab/issues/728#issuecomment-243213051,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACSXFiA5Bj-E5ab5XgoJTi6TDlxQv86Kks5qkyYpgaJpZM4JrW1n
.
We think we have a solution which will handle all of these cases. We are in the process of writing it up. Will post here when done.
Chris, have you seen the discussion above about using multiple js contexts?
Extension would only have to be careful to not duplicate core packages.
S.
On Aug 29, 2016 8:40 PM, "Jason Grout" [email protected] wrote:
I don't think rebuilding the application bundle with extensions into a
single bundle on every extension install/enable/uninstall is going to be
tenable.I think that is exactly what some usecases will want, for example, server
admins serving up a single set of plugins as a hosted solution.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jupyter/jupyterlab/issues/728#issuecomment-243211761,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACSXFpQUqoiYMCKnxTdg-VzgL7IwuMGUks5qkyU1gaJpZM4JrW1n
.
I'm :-1: on multiple JS contexts. It's more complexity than is needed, and doesn't really solve the problem.
Probably worth clarifying here: in this case, it's multiple build contexts and not multiple JS VMs.
Sure. But I don't see what that buys us, or how it's different from independent modules, except that now there are also multiple global contexts. i.e. N-number of Array prototypes, for example.
How are there multiple Array prototypes? Something must be lost in translation here. It's all one global context here. These scripts are loaded separately.
I'm definitely not here to debate how JupyterLab is doing this. The reason I chimed in here is my interest in what's on the other side of the message spec because this is dictating the Jupyter frontend in general. I primarily want to collaborate on the shared spec between a backend library via a kernel to push code through on application/javascript or text/html on the message spec and have some expectation of what's on the other side.
Each <iframe> you put on the page has it's own set of globals., so Array in frame 1 !== Array in frame 2. This is why Array.isArray exists, because instanceof breaks across frame contexts.
@rgbkrk Ah okay. I view that as a somewhat separate problem. What we're trying to address at the moment, is effectively how to ship all of the static JS (plugins) for a given instance of JLab app. Dynamically loading JS via cell execution will necessarily need a different mechanism, as far as I can see (or at least warrants a separate discussion).
r.js optimizer would produce: a concatenation of defines. It will include all runtime dependencies needed by the bundle. The plugin author must have node installed in order to run this script. The tool which we use to generate this bundle is an implementation detail for all but the most advanced use cases.RequireJS loader which provides dynamic lookup of required modules using the semver library. Even if the same module is bundled multiple times, only one version of the module will ever be executed. Since all modules in the final bundle are registered before the bootstrapping code executes, we guarantee that we only load the most up-to-date version of the module which matches the semver requirement. This is effectively client-side deduping at runtime.RequireJS is an implementation detail and we make it clear that it is not intended to be used for dynamic imports.npm dedupe.Since all modules in the final bundle are registered before the bootstrapping code executes, we guarantee that we only load the most up-to-date version of the module which matches the semver requirement.
If I have plugin A which requires library C, version 1, and plugin B which requires library C, version 2, then are we going to modify the define dependencies in A and B to include the version numbers (and load both copies of library C)?
If I have plugin A which requires library C, version 1, and plugin B which requires library C, version 2, then are we going to modify the define dependencies in A and B to include the version numbers (and load both copies of library C)?
Yes. This is precisely why the import statements will be augmented with semver info by the bundling tool.
Edit: Assuming versions 1 and 2 are major versions. i.e. version 2 is not semver compatible with version 1.
Even if the same module is bundled multiple times, only one version of the module will ever be executed.
So this statement is a bit more nuanced than just "only one version...will ever be executed"
(perhaps "only one copy of any specific version of a module will ever be executed")
Yeah: "subject to semver compatiblity, only one copy of a module will ever be executed".
or even "the minimal working set of modules..." In any case, I think these comments now make it clear what is meant :)
Only one version for the entire subset of semantically versioned and compatible requires will be loaded. So if one extension wants foo-lib @ ^1.1.1, a second wants foo-lib @ ^1.2.0, and a third wants foo-lib @ ^1.3.5, we will execute v1.3.5.
But if a fourth extension wants foo-lib @ ^2.0.0, that will _also_ be executed.
^ this. Thanks @afshin!
Thank you all, this discussion really helped my understanding and direction. I've been hoping for a generic approach to asset loading from the kernels, primarily to handle the widgets. Since the widgets set up a custom mimetype and are expected to be loaded on the page, they skirt the fundamental issue there. It just means that libraries need to adopt similar standards, as well as the follow on with nbconvert and nbviewer.
we will load v1.3.5.
Or really, we'll load the maximum satisfied version that someone includes that matches ^1.3.5 (since our third library might be distributed with 1.3.7...)
Remember that semver ranges can have upper bounds, be discontinuous, etc., so there can also be more complicated situations. If a fifth library wants 1.3.0-1.3.4, but only bundles 1.3.2, then we could also load 1.3.2.
Luckily, there is a maxSatisfying function in the semver library that takes what versions you have, and a semver range, and does all the work for you :).
I like how the proposal consolidates a number of the ideas we've been batting around. In essence, it's implementing a package cache like npmcdn, populated by the bundled packages, but using requirejs as the cache and AMD defines to request them, rather than url requests. The advantage with that is that in this pre-http2 world, server requests are minimized.
It would be far simpler to just ship the node_modules directory for each plugin (possibly deduped for upstream dependencies we know are already provided). But that involves many more server requests reading package.json files to get semver ranges, then deciding which copy of a module to load, and loading it.
All of these deduping strategies are swapping out dependencies at runtime, assuming that the semver ranges specified in all upstream dependencies are correct. This means that loading a new plugin could break all your existing plugins (for example, if the new plugin provided codemirror 5.18, which broke semver). But unfortunately, it seems impossible to fix this and get the deduping to fix other subtle bugs.
It would be far simpler to just ship the node_modules directory for each plugin (possibly deduped for upstream dependencies we know are already provided). But that involves many more server requests reading package.json files to get semver ranges, then deciding which copy of a module to load, and loading it.
This is pretty much a non-starter because we would have to implement all of that logic in Python. Effectively reimplementing webpack. The approach we describe removes that hurdle.
The advantage with that is that in this pre-http2 world, server requests are minimized.
And we don't have to crawl a package structure on the server-side, parse package.json files, or do any sort of semver matching in Python.
This is pretty much a non-starter because we would have to implement all of that logic in Python. Effectively reimplementing webpack. The approach we describe removes that hurdle.
Definitely a non-starter if you reimplement in python. Instead, you'd either do it from the client side or use webpack if you happen to have node.
All of these deduping strategies are swapping out dependencies at runtime, assuming that the semver ranges specified in all upstream dependencies are correct. This means that loading a new plugin could break all your existing plugins (for example, if the new plugin provided codemirror 5.18, which broke semver). But unfortunately, it seems impossible to fix this and get the deduping to fix other subtle bugs.
You would have this same problem with server-side webpack and NPM 3+. If someone breaks semver, all bets are off.
You would have this same problem with webpack and NPM 3+. If someone breaks semver, all bets are off.
Yep. No solution to this in sight, but still good to know where to look when encountering a subtle bug.
Definitely a non-starter if you reimplement in python. Instead, you'd either do it from the client side or use webpack if you happen to have node.
Doing it client-side is also a non-starter IMO. That's an http request for each file, of which there could be thousands. We tried that with StealJS, and load times were horrendous.
Doing it client-side is also a non-starter IMO. That's an http request for each file, of which there could be thousands. We tried that with StealJS, and load times were horrendous.
Hence my comments about us compromising in a pre-http2 world...
Hence my comments about us compromising in a pre-http2 world...
If and when HTTP2 because a broadly implemented, reliable alternative, we can switch over. I think this proposal gives us the best chance of moving forward with something that works well, at the minor cost of shipping some dead code to the client.
If and when HTTP2 because a broadly implemented, reliable alternative, we can switch over. I think this proposal gives us the best chance of moving forward with something that works well, at the minor cost of shipping some dead code to the client.
It sounds like we're debating, but I think in fact we are agreeing. It's a compromise that's important now, which is unfortunate as it involves writing a lot of code and solving part of the packaging problem.
but I think in fact we are agreeing.
I think so too.
unfortunate as it involves writing a lot of code and solving part of the packaging problem.
I don't think its actually much code to write. On the plugin author's machine, we can use webpack internally. On the server, we literally just have to concat files together. The custom JS loader should be simple to write given that the semver library exists.
The advantage with that is that in this pre-http2 world, server requests are minimized.
And even in a post-http2 world, there still is value in doing what bundling we can at plugin build time to avoid a server hitting the disk to fetch potentially thousands of files, even if the network issues are not as much of a concern.
I like the idea so far. An interesting version of this would be to allow certain dependencies to not be provided at all and fetched from a service like npmcdn with the semver suffixed paths.
Enabling this could be configurable via some trait attribute of the LabApp. It could make sense for widget libraries on public deployments like binder where you don't want to enable a ton of extensions by default.
On the plugin author's machine, we can use webpack internally.
It may be better to use requirejs to convert commonjs dependency directories to amd modules (as an implementation detail). @blink1073 was running into issues before using webpack to expose all of the files in a library.
We provide an npm-installable script for plugin authors which generates a bundle with semver-suffixed AMD module paths.
npmcdn uses the format packagename@semver-range/path/to/file. Is that the path format you are envisioning?
It may be better to use requirejs to convert commonjs dependency directories to amd modules
AFAIU webpack supports generating amd-style bundles, and also understands how to traverse a node_modules hierarchy (I don't believe r.js can do this).
npmcdn uses the format packagename@semver-range/path/to/file. Is that the path format you are envisioning?
Yep. Something of that form which is yet-to-be-determined.
AFAIU webpack supports generating amd-style bundles
Yup, I've done that in production to help migrate a web app _off_ of require.js.
AFAIU webpack supports generating amd-style bundles, and also understands how to traverse a node_modules hierarcy (I don't believe r.js can do this).
AFAIU, the amd-style bundles webpack generates are just top-level package bundles, but it doesn't populate the require namespace with each module in the package (so you could require('phosphor'), but not require('phosphor/lib/ui/widget')). However, since we are already taking over the requirejs loading semantics on the client side, this may be a moot point - if we know how to get out a specific module from a package amd bundle, the problem may be solved. @blink1073 was already experimenting a lot with scripts to capture information about module paths in packages in a usable way.
Another thing with the webpack - we'll have to tell it to bundle nothing, but treat all dependencies as externals.
Point being that I think it's going to be a lot more complicated than just 'use webpack'. But at this point, we're expecting that :).
The nice thing about using r.js to convert a package from commonjs to amd is that it just very nicely transcribes each .js to wrap it in a define (as far as I can tell with a quick experiment), but it doesn't try to do a lot of other magic for bundling that we have to unconfigure.
(To make all of my comments clear, I think the proposal is a sound one, and nicely combines strengths of individual approaches we've experimented with or discussed, without the individual weaknesses that made each approach something we dreaded doing, so +1 on the overall approach.)
Another thing with the webpack - we'll have to tell it to bundle nothing, but treat all dependencies as externals.
This is incorrect. We can bundle _everything_ for each plugin, but only load _one_ of the N-copies included in the final application bundle. As a trivial optimization, we can make the _core_ plugin packages externals. This is the critical advantage with deduping on the client-side: it guarantees that the plugin will have all of it's dependencies, and won't have to rely on them being available as externals (which would require us to detect that - meaning implementing a dependency solver).
We might be talking past each other, @sccolbert, or perhaps you're thinking of something like the r.js optimizer instead of webpack. IIRC, if a plugin A has dependencies B and C, and we have webpack bundle everything for A, then a require call in A for B is converted to a sync function call to the bundled version of B in A's bundle. You can't intercept that call to interject your one global copy. If you declare B as an external, _then_ it loads B using AMD and _then_ we can interject our own copy. Am I missing something?
I was thinking that if we use webpack, we'll need to bundle each package separately from a generated configuration. Each package will have an externals configuration of _all_ of its runtime dependencies. Each package will also generate a manifest mapping paths to modules in the package, as well as the package version number. Somehow we'll also mangle external paths to include the appropriate semver range.
On the client side, we'll load each module and overload the require loader to be able to fetch the appropriate module, use the manifest to get the appropriate module, and return it.
Please tell me I'm wrong and it doesn't need to be so complicated if we use webpack :).
You can't intercept that call to interject your one global copy.
This is exactly what we are proposing to do. And AFAIK is possible using a custom requirejs loader.
This is exactly what we are proposing to do. And AFAIK is possible using a custom requirejs loader.
My point is you can't do that if webpack bundles the packages together into one bundle. It doesn't use require for things inside the bundle.
It doesn't use require for things inside the bundle
We'll figure out a way to make it work :) But at least now our intent is clear.
Edit: this would be the point where I would inject a definitive "yes it can", but I'm not positive. Steve would know better. If we _can't_ make it work with webpack, we'll come up with another bundling solution that has these semantics.
We'll figure out a way to make it work :)
Those are great words we've heard a lot in the last few weeks :).
@blink1073 and I were starting to pursue something like this (sans the versioning, which we were punting on while we were exploring feasibility). We kept fighting webpack to tell it _not_ to bundle so we could get at individual modules - good thing @blink1073 works out :).
On the other hand, the r.js bundler uses require through and through, so it seems like it would be much less of a fight to handle this approach of bundling, but at the same time exposing to the environment. Hence the comment https://github.com/jupyter/jupyterlab/pull/720#issuecomment-241725541
AFAUI the problem with r.js bundler is that it doesn't understand node_modules semantics. I'd rather not reimplement that functionality if at all possible.
AFAUI the problem with r.js bundler is that it doesn't understand node_modules semantics. I'd rather not reimplement that functionality if at all possible.
I think the r.js bundler can load commonjs modules in a node_modules directory structure, IIRC: http://requirejs.org/docs/api.html#packages (haven't tried, and haven't really studied those docs too deeply either, though, so take my statement with a pound of salt). I agree I'd rather not reimplement that too.
It does look like the plugin infrastructure of webpack goes pretty deep, so we might get away with writing a plugin or two for webpack.
The r.js bundler can load commonjs modules in a node_modules directory structure, IIRC:
That doesn't seem to indicate understanding of node_modules semantics, which necessitates automatic upward traversal of the directory structure, along with implicit node_modules directories.
It does look like the plugin infrastructure of webpack goes pretty deep, so we might get away with writing a plugin or two for webpack.
That's what my gut is telling me. If the group is in general agreement with the _semantics_ of our outlined approach, we (my team) can start sweating the details.
If the group is in general agreement with the semantics of our outlined approach, we (my team) can start sweating the details.
+1. I think it's a good evolution of where we were headed (by process of elimination of a lot of dead ends that were gradually getting better, and helping us understand the problem :), so it's not controversial to me. It seems like a good next thing to try (and hopefully the last!).
I think it would be great if we could avoid using the r.js bundler, if we can. The more we learn about webpack, the better we can do at figuring out the best way to use it. I do think the semantics of @sccolbert's proposal sound sensible, and it would be really great if we can get away with a jupyterlab-extension webpack plugin that just does the right thing. If we can get as far as telling extension authors:
that would go a long way.
I did some more research last night. The client end looks to be fairly straightforward, in that we use our own RequireJS loader (e.g. 'jupyterlab!phosphor^0.6.0/lib/ui/widget'), that uses the semver package to find the right loaded module to hand back.
Getting WebPack to cooperate is still going to be tricky. @jasongrout is right in that there is still some friction in getting WebPack to expose all of its internals and behave more like an r.js bundle. I am going to play more with the DLLPlugin and the auto-shim approach used in #720, but it is looking like we will have to write our own WebPack plugin that is as expansive as the DllPlugin itself. The docs on doing so are _okay_.
The DllPlugin exposes all of its internal modules but also calls internal modules explicitly, which means we cannot use to to generate a shim for more than one library.
I think it would be great if we could avoid using the r.js bundler, if we can.
I spent some time playing with it last night, and it easily gets us part of the way where we want to go (expose all of the modules in a package as amd modules in the global amd namespace), but between https://webpack.github.io/docs/comparison.html, and my own very initial reading into how to extend it to provide the versioning capabilities we want, I agree.
I think before, we (or more likely, just I) didn't realize how customizable webpack was, and we didn't dig deep enough and try to write our own extension, instead of cobbling together a solution using existing plugins.
make an npm package
use this webpack config
🍰
Even better I think is @sccolbert's proposal (for the vast majority of usecases, hopefully):
as it abstracts out the implementation detail of webpack, so we can move on to an es6 import system at some point in the future, if it makes sense.
I'm currently working on a WebPack Plugin :fingerscrossed:
Preliminary results:
I modified one of the example Webpack plugins and introspected the compilation object at the emit stage.
Here is the plan based on those results:
__webpack_require__(<import number>); importsrequire(‘jupyterlab!phosphor^0.6.0/lib/ui/widget’);)define(‘[email protected]/lib/application/index’, function (require, exports, module) {@minrk, I'm pretty sure a variant of the above could be used to generate the code for jupyter/notebook, it would use unmodified module paths and not require the RequireJS plugin.
@blink1073 awesome, thanks!
Work is now continuing in a new repo on https://github.com/jupyter/jupyterlab-extension-builder/pull/1
This works great now! Thanks @blink1073!
From an extension builder's point of view something that is exceptionally valuable is the webpack dev server. I don't understand a fraction of what is required to even know if it will be possible, but having the extensions built from within webpack's tooling might provide that amazing live update build on RAM feature that makes finding ones feet in developing within a new framework significantly easier. I can try something, see it breaks, try something else, see it breaks, etc.
Most helpful comment
@minrk, I'm pretty sure a variant of the above could be used to generate the code for
jupyter/notebook, it would use unmodified module paths and not require the RequireJS plugin.