For Coreferencing, an additional piece of software is needed right now (would be good to advertise this more, ive been using SpaCy for a year now and just found out about that, I was already thinking of using CoreNLP for this feature). Are there plans to include this in SpaCy's pipeline?
Thanks for preserving spaces by the way, SPACE IS A WORD TOO. This makes detokenization easily possible! I had to implement lots of questionable logic and make a lot of assumptions for detokenization after switching to Google Cloud NLP for Coreferencing.
PS:
This (a fake news generator) is what I plan on using SpaCy for again, if anyone has any nice ideas.
We might later add built-in coreferencing, but we're definitely encouraging an ecosystem of extension packages to be developed around spaCy. This allows other developers to keep credit for their work, and lets new developments avoid worrying so much about backwards compatibility and maintainability.
The core library is still focussed on stability and performance at the moment. We're especially working on our infrastructure and automation set ups, to make sure we're testing everything reliably, and that we're able to regenerate the whole model automatically, including long-running batch jobs that require processing lots of raw text.
Looks like @honnibal forgot to actually link the coref library – here it is:
https://github.com/huggingface/neuralcoref ✨
It integrates seamlessly with spaCy and comes with custom models, a pipeline component with extension attributes and training code. (It's also a great example of a spaCy plugin and how we imagine integrations of other libraries to work in the future. The team at @huggingface really did an amazing job here.)
Most helpful comment
Looks like @honnibal forgot to actually link the coref library – here it is:
https://github.com/huggingface/neuralcoref ✨
It integrates seamlessly with spaCy and comes with custom models, a pipeline component with extension attributes and training code. (It's also a great example of a spaCy plugin and how we imagine integrations of other libraries to work in the future. The team at @huggingface really did an amazing job here.)