Spacy: Details/paper used for recent NER implementation

Created on 16 Mar 2018 · 3Comments · Source: explosion/spaCy

Hi,

I am a PhD student and considering including comparison of my method for customized entity recognition with SpaCy one. Can you provide the description or the paper of the approach that lies in NER implementation?

Thank you in advance,
Luiza

docs

Source

muzaluisa

Most helpful comment

Hi Luiza,

There's no paper published, as the algorithm wasn't designed for an academic contribution, and details are subject to change. The overall approach is quite similar to the paper by Strubell et al (2017): https://arxiv.org/abs/1702.02098 . The main differences are that we use a different embedding method, a transition-based framework to facilitate imitation learning, and the convolutional layers use residual connections instead of dilation.

I think it's probably better to compare against Strubell's work. This will allow a cleaner comparison, as the system in that paper was designed to have fewer moving parts, to make it more obvious what matters and what doesn't. In spaCy I designed for a mix of performance and usability concerns.

Edit: Actually I typed all that and realised I'm probably missing the point. If you're comparing on a non-English corpus, and need a tool you can actually run as a baseline, then yes it makes sense to compare against spaCy. This video is currently the best description of the NER: https://www.youtube.com/watch?v=sqDHBH9IjRU

honnibal on 16 Mar 2018

👍3

All 3 comments

Hi Luiza,

honnibal on 16 Mar 2018

👍3

Hi,
I understand that one might need to have a compromise between the speed and the accuracy. I indeed then choose Strubell's work for the baseline, but thank you a lot for the quick reply, I hope that can be helpful as well for other researchers as well some info is given about NER implementation :)

muzaluisa on 17 Mar 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.