Spacy: 💫 Integrate displaCy visualisers with spaCy

Created on 14 May 2017 · 6Comments · Source: explosion/spaCy

I've been thinking a lot about ways to improve displaCy and displaCy ^ENT, and make the Jupyter extension more convenient. For many of our users, those demos and libraries have become developer tools and the current setup is pretty suboptimal for this (using our hacky REST services to generate JSON, serving it, putting together the front-end with Javascript etc.).

I originally started re-writing the visualisers in Python to combine the parsing, rendering and serving into one package. But the code turned out to be so lightweight that it can easily be shipped with spaCy. I'm currently working on porting it over to the develop branch, to be released with v2.0! 🎉

All of this is still a work in progress, so feedback is appreciated. I'm especially interested in how you are / would like to be using the visualisers as part of your development workflow.

How it's going to work

from spacy import displacy

There are only two methods:

displacy.render: render visualisation and return HTML markup
displacy.serve: render visualisation and serve it locally

| Argument | Type | Description |
| --- | --- | --- |
| docs | list of docs / Doc | Doc(s) to be visualised. |
| style | unicode | 'dep' (dependency visualiser) or 'ent' (NER visualiser) |
| options | dict | Visualiser-specific options, e.g. colors.
| page | bool | Render markup as full HTML page (default: False). |
| minify | bool | Minify HTML markup (default: True). |
| jupyter | bool | Experimental idea: Use Jupyter's display() to output markup, for easy use in notebooks. |
| port | unicode | Only in serve: Port to serve visualisation. |

Visualiser-specific options

dep: text/arrow color, background color, font family, plus optional settings for spacings, arrow widths etc. (see here)
ent: entity colors (dict of labels mapped to color values), selection of entity types to highlight (if not set, all entities are rendered)

I'm not sure if there's a need for this form a user's perspective, but if there is, I could also add an option to supply a custom CSS stylesheet. In general, all CSS styles will be inlined, so any markup you export can be used and rendered independently.

Usage

displacy only performs the rendering on already constructed Doc objects – this means you can construct those however you like, using different models and configurations.

import spacy
from spacy import displacy

def visualise_a_parse():
    nlp = spacy.load('en')
    doc = nlp(u'This is a sentence.')
    displacy.serve(doc, style='dep')

def compare_two_models():
    text = 'This is a sentence.'
    nlp_sm = spacy.load('en_core_web_sm')
    nlp_md = spacy.load('en_core_web_md')
    displacy.serve([nlp_sm(text), nlp_md(text)], style='dep')

def write_to_file(output_file):
     nlp = spacy.load('de')
     doc = nlp(u'Adobe-Hack: Facebook warnt betroffene Nutzer')
     html = displacy.render(doc, style='ent', page=True)
     output_file.open('w', encoding='utf-8').write(html)

Usage example ideas

Compare a spaCy parse to a manually constructed Doc (e.g. expected output vs. model).
Generate visualisations in Jupyter notebooks without custom extensions.
Use it in a web application to recreate the experience of our demos.
Create a Doc for a sample text on every 1000th iteration during NER training and on completion, export an .html file with all NER visualisations (write output of render(page=True) to a file).
Quickly and dynamically generate parse tree visualisations as SVG graphics to embed them online and in documents. (Pro tip: Use cairosvg on the output of render() to export a PNG or PDF instead.)

enhancement ⚠️ wip 🌙 nightly

Source

ines

❤4 👍2

Most helpful comment

@ines for detecting jupyter, the key is to detect the IPython kernel:

so, suppose the above is wrapped into a method is_in_jupyter(), the param setting will look like:

# constant set when displacy is imported
IS_IN_JUPYTER = is_in_jupyter()

# so, when in jupyter, the default is jupyter=True; otherwise it's False.
def render(doc, style, ..., jupyter=IS_IN_JUPYTER, ...):
    ...

If these are not sufficient - if more logic needs to be executed for jupyter, which I doubt given what you already have - use the jupyter Magic command. But this would need extra setup step. Just in case, here're some examples:

kengz on 15 May 2017

👍3

All 6 comments

Would love to see this!

Would be a very powerful for on-line interactive use in jupyter, similar to how Seaborn and Bokeh function. I've had cases where visualization during development is useful, but had to open up the website and manually copy the text there manually - a slow process.

When imported from within jupyter, it's likely possible to auto fetch and switch to "jupyter mode" without manual param i.e. no need to specify the jupyter or port params, unless if user wants to render full HTML markup outside of jupyter, called from within jupyter; useful for multi-cell render, multinotebook (more uncommon). So, call like displacy.render(doc) and get the image.

Rendering in jupyter as SVG / static asset also allows for keeping the image with notebook (so someone could see it as committed notebook on GitHub / notebook PDF), and of course the ability to export that same image elsewhere for usage.

kengz on 15 May 2017

This will be a great feature to have! Will it be possible to pass a span to the displacy visualiser or does the argument have to be a the full doc object? I.e. can we visualise excerpts from a large doc after we have the full parse or do we need to create separate docs of the subsections we want to visualise?

nikeqiang on 15 May 2017

@kengz Thanks! "Auto-detecting" Jupyter is a good idea! The solutions for this all seem kinda hacky (or is there something obvious that I'm missing?)... but then again, as long as we avoid "false positives", there's not much that can go wrong and we can always leave the explicit jupyter=True option if auto-detection fails (and let user set jupyter=False explicitly to disable Jupyter mode).

Right now, the "Jupyter mode" looks like this and it's been working pretty well for me so far (and the HTML output is preserved when you export it – just need to test the PDF export, didn't get that to work properly yet).

displacy_jupyter

@nikeqiang In theory, yes! We still need to make some fixes to Span to make its interface consistent with Doc (which is how it should be).

The functions converting the input to renderable dicts are very simple and look like this. Essentially, the dependency visualiser can render any iterable of tokens containing a text, head, i, tag_ and dep_. The entity visualiser can render anything with an ents attribute containing entities with a start_char, end_char and label_.

ines on 15 May 2017

@ines for detecting jupyter, the key is to detect the IPython kernel:

so, suppose the above is wrapped into a method is_in_jupyter(), the param setting will look like:

# constant set when displacy is imported
IS_IN_JUPYTER = is_in_jupyter()

# so, when in jupyter, the default is jupyter=True; otherwise it's False.
def render(doc, style, ..., jupyter=IS_IN_JUPYTER, ...):
    ...