I've been thinking a lot about ways to improve displaCy and displaCy ENT, and make the Jupyter extension more convenient. For many of our users, those demos and libraries have become developer tools and the current setup is pretty suboptimal for this (using our hacky REST services to generate JSON, serving it, putting together the front-end with Javascript etc.).
I originally started re-writing the visualisers in Python to combine the parsing, rendering and serving into one package. But the code turned out to be so lightweight that it can easily be shipped with spaCy. I'm currently working on porting it over to the develop branch, to be released with v2.0! ๐
All of this is still a work in progress, so feedback is appreciated. I'm especially interested in how you are / would like to be using the visualisers as part of your development workflow.
from spacy import displacy
There are only two methods:
displacy.render: render visualisation and return HTML markupdisplacy.serve: render visualisation and serve it locally| Argument | Type | Description |
| --- | --- | --- |
| docs | list of docs / Doc | Doc(s) to be visualised. |
| style | unicode | 'dep' (dependency visualiser) or 'ent' (NER visualiser) |
| options | dict | Visualiser-specific options, e.g. colors.
| page | bool | Render markup as full HTML page (default: False). |
| minify | bool | Minify HTML markup (default: True). |
| jupyter | bool | Experimental idea: Use Jupyter's display() to output markup, for easy use in notebooks. |
| port | unicode | Only in serve: Port to serve visualisation. |
dep: text/arrow color, background color, font family, plus optional settings for spacings, arrow widths etc. (see here)ent: entity colors (dict of labels mapped to color values), selection of entity types to highlight (if not set, all entities are rendered)I'm not sure if there's a need for this form a user's perspective, but if there is, I could also add an option to supply a custom CSS stylesheet. In general, all CSS styles will be inlined, so any markup you export can be used and rendered independently.
displacy only performs the rendering on already constructed Doc objects โ this means you can construct those however you like, using different models and configurations.
import spacy
from spacy import displacy
def visualise_a_parse():
nlp = spacy.load('en')
doc = nlp(u'This is a sentence.')
displacy.serve(doc, style='dep')
def compare_two_models():
text = 'This is a sentence.'
nlp_sm = spacy.load('en_core_web_sm')
nlp_md = spacy.load('en_core_web_md')
displacy.serve([nlp_sm(text), nlp_md(text)], style='dep')
def write_to_file(output_file):
nlp = spacy.load('de')
doc = nlp(u'Adobe-Hack: Facebook warnt betroffene Nutzer')
html = displacy.render(doc, style='ent', page=True)
output_file.open('w', encoding='utf-8').write(html)
Doc (e.g. expected output vs. model).Doc for a sample text on every 1000th iteration during NER training and on completion, export an .html file with all NER visualisations (write output of render(page=True) to a file).cairosvg on the output of render() to export a PNG or PDF instead.)Would love to see this!
Would be a very powerful for on-line interactive use in jupyter, similar to how Seaborn and Bokeh function. I've had cases where visualization during development is useful, but had to open up the website and manually copy the text there manually - a slow process.
When imported from within jupyter, it's likely possible to auto fetch and switch to "jupyter mode" without manual param i.e. no need to specify the jupyter or port params, unless if user wants to render full HTML markup outside of jupyter, called from within jupyter; useful for multi-cell render, multinotebook (more uncommon). So, call like displacy.render(doc) and get the image.
Rendering in jupyter as SVG / static asset also allows for keeping the image with notebook (so someone could see it as committed notebook on GitHub / notebook PDF), and of course the ability to export that same image elsewhere for usage.
This will be a great feature to have! Will it be possible to pass a span to the displacy visualiser or does the argument have to be a the full doc object? I.e. can we visualise excerpts from a large doc after we have the full parse or do we need to create separate docs of the subsections we want to visualise?
@kengz Thanks! "Auto-detecting" Jupyter is a good idea! The solutions for this all seem kinda hacky (or is there something obvious that I'm missing?)... but then again, as long as we avoid "false positives", there's not much that can go wrong and we can always leave the explicit jupyter=True option if auto-detection fails (and let user set jupyter=False explicitly to disable Jupyter mode).
Right now, the "Jupyter mode" looks like this and it's been working pretty well for me so far (and the HTML output is preserved when you export it โ just need to test the PDF export, didn't get that to work properly yet).

@nikeqiang In theory, yes! We still need to make some fixes to Span to make its interface consistent with Doc (which is how it should be).
The functions converting the input to renderable dicts are very simple and look like this. Essentially, the dependency visualiser can render any iterable of tokens containing a text, head, i, tag_ and dep_. The entity visualiser can render anything with an ents attribute containing entities with a start_char, end_char and label_.
@ines for detecting jupyter, the key is to detect the IPython kernel:
so, suppose the above is wrapped into a method is_in_jupyter(), the param setting will look like:
# constant set when displacy is imported
IS_IN_JUPYTER = is_in_jupyter()
# so, when in jupyter, the default is jupyter=True; otherwise it's False.
def render(doc, style, ..., jupyter=IS_IN_JUPYTER, ...):
...
If these are not sufficient - if more logic needs to be executed for jupyter, which I doubt given what you already have - use the jupyter Magic command. But this would need extra setup step. Just in case, here're some examples:
See the v2.0.0 alpha release notes and #1105 ๐
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
@ines for detecting jupyter, the key is to detect the IPython kernel:
so, suppose the above is wrapped into a method
is_in_jupyter(), the param setting will look like:If these are not sufficient - if more logic needs to be executed for jupyter, which I doubt given what you already have - use the jupyter Magic command. But this would need extra setup step. Just in case, here're some examples:
%matplotlib inlinecommonly used to get matplotlib to render inside a notebook cell.