Hi @rongmu! Our syntax highlighting is provided by the CodeMirror editor. This issue would be better suited there: https://github.com/codemirror/CodeMirror/issues
@blink1073 Thank you! I will file an issue there!
Unfortunately, the maintainer of CodeMirror says that they won't add any more language modes to the main distribution. The maintainer suggests that new language modes should be added as separate modules.
The issue in CodeMirror repo: https://github.com/codemirror/CodeMirror/issues/5623
Stan is really important for Bayesian data analysis, it has interfaces for Python, R, Matlab etc.; RStudio supports Stan syntax out-of-the-box.
It would be really great if JupyterLab supports Stan syntax.
Ah, understood. I marked it as help wanted since the core team is saturated with other tasks.
Yeah. So this raises again the need for a jupyter/syntax repo. Previous
considered entrants include graphviz and robotframework (the existing one
is long in the tooth), so maybe three is enough. Would this require a JEP,
or just someone with perms?
Ideally, this would initially consist of codemirror modes, packaged and
usable without lab, and labextensions that rely on them. We could also
incubate monaco and pygments modes there if it comes to it, though those
two communities are accepting new modes.
A related tool could be a "mimerenderer" for a JSON version of:
https://codemirror.net/demo/simplemode.html
Unfortunately it's not easy to encode regex flags in json, but perhaps even
that could be overcome with a bit of preprocessing. Anyhow, this could be a
way to make syntax more user serviceable.
On Wed, Oct 17, 2018, 21:21 Steven Silvester notifications@github.com
wrote:
Ah, understood. I marked it as help wanted since the core team is
saturated with other tasks.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/jupyterlab/jupyterlab/issues/5504#issuecomment-430842831,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AACxRNUq4Th_kR0pqwmm29I22rN0dQFUks5ul9eOgaJpZM4Xj2QX
.
Ah, this looks very promising as tool, by the creator of codemirror:
>
https://github.com/codemirror/grammar-mode
It's set up for building js files from it's bnf-like syntax, but could be
made into something dynamic.... We'd need more metadata, at any rate.
Practically, robust languages like Stan includes a bnf description, so this
could be a good match.
There's also this which supports many more features, but looks more
complicated and less maintained:
https://github.com/foo123/codemirror-grammar/blob/master/README.md
At any rate, it seems this is a possible low-barrier (e.g no ts/webpack)
entry point for language users to contribute to the lab ecosystem!
Re: the above: there's a big stack of examples in:
https://github.com/codemirror/google-modes
Not quite as spare as I was hoping to get everything, but probably still a
good way to go.
Oh, and i'd also like to point out the current work on codemirror 6:
There's a lot to love from a lab perspective: it's typescript-native,
tree-shakeable, dual-licenced gpl/mit, mobile-enabled and has accessibility
as a high-level goal, and still fast like v5.
While other efforts (e.g monaco) present many interesting opportunities, CM
will likely continue to be important to our work.
While the first CM6 demos are up, they're asking for donations! As a
community, we've gotten so much value from marijn and the CM community at
large, and I encourage those who have the means to pitch in: I intended to!
But anybody can star/fork the repo:
Is trying to submit a PR to Codemirror still the best way to add a language's syntax highlighting for support in Jupyterlab? As mentioned above, it appears that no new language modes will be considered for inclusion in CodeMirror. And it was suggested to distribute a new language's mode as a separate package/module. Then how would I distribute that to Jupyterlab users? Would I need to create a jupyterlab extension in order for users to get the syntax highlighting?
For now the best way would be as a third party extension, perhaps with the mode itself in its own package to make it most useful to other codemirror consumers.
Yes, i think the best way to proceed is to write a codemirror mode in a JLab extension, as what I have done for sos.
I figured out how to put the codemirror mode in its own NPM package, but I'm struggling to figure out the lab extension side.
@BoPeng I've been looking through your code, but it's quite complex since you're working with so many languages.
It feels like I should need a total of like 5 lines of code in the labextension...:
import * as CodeMirror from 'codemirror';
import 'codemirror-mode-stata';
if kernel name == 'stata' {
change mode to 'stata'
}
Any advice?
You can create an extension using the jupyterlab extension cookiecutter following one of the examples and put your code in there. The code should have something like this to define the mode. Once the mode is loaded and listed as one of the JLab codemirror modes, you can associate .stan file type and the stan kernel with the mode.
Ok, thanks, I'll look into it. For the record, I'm working to add Stata language support, not Stan. I piggybacked this issue because it seemed like a good place for a general discussion of how to add syntax highlighting to Jupyter lab for new modes.
I see, you are the author of the Stata kernel, right? I have never used Stata but as the author of SoS, I am very interested in making Stata work with R and SAS in the same notebook. Do you know any known mechanism for Stata to import/export datasets in Python (DataFrame) or SAS?
Thanks. I have created a repo for Stata support for SoS and invited you as a collaborator in case I need your help on the use of the Stata kernel. :smile:
@BoPeng Feel free to ping me with questions with regards to SoS, but I really don't have time to invest in another project right now.
You can create an extension using the jupyterlab extension cookiecutter following one of the examples and put your code in there.
I have a repo started from the JS cookiecutter. I don't want to deal with TS right now.
The code should have something like this to define the mode.
That's included in the mode's NPM package. I assume when I run import 'codemirror-mode-stata' it runs those lines.
Once the mode is loaded and listed as one of the JLab codemirror modes
The mode should be loaded from import 'codemirror-mode-stata', but how do you list it as one of the Jupyterlab codemirror modes? Is that this? That didn't work.
you can associate
.stanfile type
I do that: https://github.com/kylebarron/jupyterlab-stata-highlight/blob/49453d54e6c0efb4b8626bb513504e0be2b8b245/lib/plugin.js#L14-L21
and the stan kernel with the mode.
I did that as well: https://github.com/kylebarron/stata_kernel/blob/0e4dffaad88c5d802fff5fca0ffe6b38dc916bf8/stata_kernel/kernel.py#L30
Is the codemirror mode not being loaded correctly?
Thanks. The extension should be easy to write after I get a trial version of Stata installed. I invited you to the project just to make pinging you a bit easier.
Your problem is most likely caused by the extension. You should check if the extension is correctly installed (jupyter labextension list), loaded (use a console.log line when package is loaded), and executed (set a break point in the browser). If all goes well, you should see the stata mode listed under View -> Text Editor Syntax Highlighting.
It appears that I need to do something else in order for Jupyter to see the third-party mode:

You also need to make sure that your extension depends on @jupyterlab/codemirror, not on codemirror because in my experience that will add your mode to your own instance of codemirror, not JupyterLab's codemirror. You might have to clean up jlab or your node_modules if that is already the case. I had a lot of confusion and trouble with the npm caches for this.
Your problem is most likely caused by the extension. You should check if the extension is correctly installed (
jupyter labextension list), loaded (use aconsole.logline when package is loaded),
It is loaded:

and executed (set a break point in the browser).
Sorry, I don't know how to do this. I was able to set a breakpoint in the browser console, but then I don't know how that helps me check that it's executed correctly.
If all goes well, you should see the stata mode listed under
View -> Text Editor Syntax Highlighting.
I still don't see that.
You also need to make sure that your extension depends on
@jupyterlab/codemirror, not oncodemirrorbecause in my experience that will add your mode to your own instance of codemirror, not JupyterLab's codemirror. You might have to clean up jlab or yournode_modulesif that is already the case. I had a lot of confusion and trouble with the npm caches for this.
It does use @jupyterlab/codemirror: https://github.com/kylebarron/jupyterlab-stata-highlight/blob/49453d54e6c0efb4b8626bb513504e0be2b8b245/package.json#L20
It appears it's loaded to the document registry in the console:

Well, I do not know if this line does its work as good as this line and this. I am not proficient in JS to identify the problem by checking your code so I would suggest that you copy/paste code from your npm package, make it work and then split.
To debug, open inspector, source, open file and find your source file, this should work at least for chrome.
DocumentRegistry is populated correctly does not mean the mode has been correctly inserted into codemirror.
You're right. The part that's not working is the import 'codemirror-mode-stata';. When I paste the contents of codemirror-mode-stata into plugin.js, it works correctly.
Great. Now you only need to figure out how to organize your npm package to work with your JLab extension. Frankly I do not know how to do this so I have the same code in both sos-notebook (for classic jupyter) and jupyterlab-sos (for JupyterLab). ðŸ˜
Well I gave up and copy-pasted the codemirror mode definition into the jupyterlab extension. In any case, it's functional, so thanks a lot and hopefully someone can use my example to help with Stan syntax highlighting.
That is unfortunate because I was hoping that you could find some way to do it. :smile:
@blink1073 Do you have any idea on how to use a npm defined codemirror mode in a JupyterLab extension? More specifically, @kylebarron used
import 'codemirror-mode-stata';
in his extension for Stata but it did not work, so he had to copy/paste the content of the mode definition into the extension.
I tested setting @jupyterlab/codemirror as the peerDependency in the NPM package's package.json, but that didn't work either.
I found that having these lines were essential: https://github.com/kylebarron/jupyterlab-stata-highlight/blob/86e6990a75f0305daf1a5d93992a4361f1b551b5/lib/stata.js#L496-L504, but that including those lines at the end of the file in the NPM package raised an error, like "can't push on an undefined object". Maybe Jupyterlab is using a different version of Codemirror?
@kylebarron, those lines require codemirror/mode/meta. Does it work if you import the mode from codemirror-mode-stata, require codemirror/mode/meta in your plugin, and then define the metadata?
I was never able to get it to work, even playing around with import codemirror/mode/meta
So far, this sounds awful. Is there another issue here dedicated to adding syntax highlighting for languages not currently in codemirror, or are the valiant efforts described here (which it seems like haven't worked yet) where we're at? FYI, I got here looking for Stan language support like @rongmu.
I kinda co-opted this issue to be a general discussion of adding syntax highlighting for languages not in codemirror.
It's working for me, but only with code duplication, where I include the codemirror language definition inside the Jupyterlab extension package itself.
I know we are pretty far off the original stan issue, but here is a rough cut at something suggested:
https://github.com/deathbeds/jupyterlab-simple-syntax
The basic idea is a mime "renderer" that dynamically makes a CM mode from simplemode json. It's not ready for primetime, but _could_ be adapted to be useful for a kernel to make something interesting happen without touching typescript. I haven't written a simple mode for Stan, though, so no closer to a solution for this issue. :crying_cat_face:
However, in stumbling around, I did find this chain:
This is probably _far_ more useful than inventing some new format, and probably worth investigating further, as we would inherit a decade or so worth of syntax highlighting knowledge. That being said, I really like simplemode, as it supports nice things like embedded other modes just by knowing their names. So there might be a place for both, potentially in the same (eventually in core) extension.
I hadn't seen codemirror-atom-modes, and it's a couple years old, so it would need to be tested, but if it worked, that would be a really great solution!
Well, it does appear the textmate route works out rather well:

so it would need to be tested
I wouldn't call it Old and Busted: however, the dependency on
https://github.com/bcoe/onigurumajs
...for the much better regexen it requires is a bit creaky, as it bundles in pretty heavy (no stats handy, sorry). There do appear to be wasm-based attempts at porting it.
Well I'm excited to hear your progress. My Atom syntax highlighting (for Stata) is a lot better than the simple codemirror mode I put together, so I'd love to use that too.
I added an example of getting stata thing, dumping it to JSON: there's definitely some wrongness:
which is handled just fine by the same code loaded from CSON into atom. So not really sure where this puts us. We'd probably need to get a look at getting https://github.com/NeekSandhu/onigasm into the mix (used by theia).
@bollwyvl - thanks for pursuing this. Those results look really nice!
I also just stumbled across https://github.com/neeksandhu/codemirror-textmate
I implemented pygments syntax-highlighting for SoS because jupyter uses it for static HTML output, then I went a long way to implement a codemirror mode for editing SoS code in Jupyter, still I found that I need to implement a text-mate version for the notebook to be displayed in github (https://github.com/vatlab/sos-notebook/issues/141), and I really did not have the energy for it. A quick glance at the codemirror-textmate project shows that it goes from textmate to codemirror, not the other way around, so I am still stuck. ðŸ˜
not the other way around
A textmate grammar is a regexp-based grammar, right? In general, a codemirror mode involves arbitrary code. I think it's probably impossible to convert an arbitrary codemirror mode to a textmate grammar.
it's probably impossible to convert an arbitrary codemirror mode to a textmate grammar.
Yes, that is exactly the problem. The SoS codemirror mode is quite involved as it uses other codemirror modes to highlight code in other languages, for example, use the R mode to highlight the R_code in a R block.
R:
R_code
I do not think textmate can do this kind of thing and it is one of the reasons why I have not done it.
Textmate can. In Atom, you'd just do include: 'source.r', to include all the R rules. That's what the markdown grammar does to include all the syntax highlighting of embedded code blocks.
See here for the Textmate specification: https://macromates.com/manual/en/language_grammars
@kylebarron That is indeed good to know. Can it also overlay modes as codemirror does? By overlay I mean highlighting the same code by two modes, for example highlighting by R and also by another mode that identifies and highlights expressions in python f-string in the following example. The tricky part is that the second mode in this example is triggered by the expand=True option so I thought that a regular expression based solution will not cut it.

But even without overlay the use of multiple modes might be good enough. I will have a look at the spec you sent when I get a chance.
There's an order of regex rules. You can set it to first try all the R rules and apply more rules if something isn't caught, or vice versa.
Well, I could certainly see multiple schema-based modes working together.
Perhaps a cm simple mode that is just a wrapper for other modes, some stock
cm, some from textmate grammars.
Another thing I've been wondering about... Writing regexen in JSON is
really a special place in hell. It would be interesting to have something
like...
https://github.com/VerbalExpressions/JSVerbalExpressions
...represented as validateable JSON. I've recently been working with some
really dynamic syntaxes, and it would be great to be able to support live
hacking what you see when you start making changes in a less arcane format.
indeed, codemirror-textmate and onigasm FTW. That fixed up the Stata syntax highlighting. Same binder as above:

Without reaching _way down_ into the guts of Lab's webpack (One Does Not Simply Add A Loader :ok_hand: ), in order to get the onigasm.wasm into the browser I had to add a serverextension that, _gasp_, monkeypatches mimetype on the python side so that tornado will serve the correct mime type. And They Call It a MIME!
The wasm itself is a "reasonable" 0.5mb, and is loaded/cached/parsed lazily, so doesn't _currently_ entail a startup hit... but could if it was wired up for real. Not sure how deeply I can hack the loading stuff.
This is actually A Good Thing, as no doubt this mime-type based syntax highlighting is not the end-all-be-all approach, and syntaxes will want to be installed and served statically. In fact, codemirror-textmate does not really permit re-loading syntax anyway. Also, we're gonna need more wasm, and Never Want To WebPack It.
Another thing: we really, really can't support CSON. I don't want to talk about it.
Most recent push on the demo adds a full serverextension (api, schemas for: tmlanguage.json, simplemode, and a rudimentary manifest) and a full labextension (manager, also nothing configurable yet). It also includes stan (!), stata (and therefore sql) as examples.
Note that the binder example notebooks are now kinda busted, because you can only register a tmlanguage once... and they'll already be registered! Still good for getting the example files, i guess! Probably still worth having the renderer.
This is pretty much working at this point: it allows purely static, validateable files, served/modifiable at runtime (maybe with some cache busting) from an nbextension-style search path, to provide new syntax highlighting for the two schema (well, almost) and could be extended for other JSON-native (or presumably other) formats.
here's the workflow
labextension install and serverextension install (or the package manager would do the equivalent)Mode.ensurejupyter_path('syntax') ({$PREFIX/share/jupyter,$HOME/.local/.jupyter/$THE_LIB}/syntax)modeInfo and makes IFileTypesensured, the labextension first checks its modesrequire error)The workflow for adding new modes:
$PREFIX/share/jupyter/syntax/my-lang and praylanguage-* repo and the mimetype(s) you want to register.The only upstream change required: in @jupyterlab/codemirror/lib/mode, adding a Mode.registerEnsurer would remove the need for the monkeypatch, and still let modes be lazily loaded.
Most helpful comment
Well, it does appear the textmate route works out rather well:
