Spacy: Co-referencing System

Created on 15 Aug 2015 · 13Comments · Source: explosion/spaCy

This library is incredible. Do you have plans to introduce a co-referencing system?

I'm aware that co-referencing is still an unsolved research problem. But the CoreNLP implementation is still useful for some folks :)

enhancement

Source

scottyli

👍3

Most helpful comment

Now that deepmind win a game of go, any news for a Co-referencing System ?

jesuisnicolasdavid on 15 Mar 2016

👍8

All 13 comments

Co-ref is pretty difficult. A big problem is that the current evaluations are pretty hard to interpret, so it's really hard to figure out what aspects of the system design are important. I find the current literature quite confusing.

My plan is to drive the named entity technologies through named entity linking — resolving entity mentions to a knowledge-base, e.g. DBPedia IDs. I think this captures the most useful types of coreference resolution, where an entity is in the chain of reference. In these cases the semantics are usually relatively simple.

If an entity isn't involved, it's all very difficult. An arbitrary example: I clicked a random link near the top of Hacker News, and landed on this: http://www.nytimes.com/2015/08/16/opinion/sunday/oliver-sacks-sabbath.html . Consider the first paragraph:

MY mother and her 17 brothers and sisters had an Orthodox upbringing — all photographs of their father show him wearing a yarmulke, and I was told that he woke up if it fell off during the night.

Are {the yarmulke, it} coreferent? Probably it's not the self-same yarmulke all through his father. There are good, internally consistent theories of how to answer all these difficult questions...but the theories are complicated. And the complexity is necessary. So the best simplification I can see is to just deal with references to named entities.

This feature is a bit down the queue, though. At the moment I'm working on re-launching the docs site, and launching features that promote user customization of spaCy. After that, I need to get spaCy working on more languages.

honnibal on 15 Aug 2015

Yeah that makes sense, I guess what you're working on right now is more important

scottyli on 15 Aug 2015

Named entity linking with DBPedia / freebase IDs will be great to see, but I agree cross language support sounds like higher priority.

aeneaswiener on 17 Aug 2015

Entity disambiguation is less interesting to me than correct "type":
person, place, or thing. Disambiguation is a harder problem!
On Aug 17, 2015 3:09 PM, "Aeneas Wiener" [email protected] wrote:

Named entity linking with DBPedia IDs will be great to see, but I agree
cross language support sounds like higher priority.

—
Reply to this email directly or view it on GitHub
https://github.com/honnibal/spaCy/issues/80#issuecomment-131813077.

rw on 17 Aug 2015

I think you really need an ontology to do the typing — and at that point you're pretty much doing disambiguation already.

An NER system uses three types of information. They're mostly independent:

The syntactic context;
The orthographic "shape" of the word/phrase (e.g. capitalization, numbers, some common markers like Inc., Mr. etc)
Entries in a knowledge base/ontology

There's only so much 1 and 2 can do for you, in terms of figuring out whether the entity is a product, organization, location, person etc. Sometimes you'll get lucky on 1, and some entities are easy via 2. But you'll always have contexts where you can't tell if you don't know — so you need 3.

Assigning a type to the entity is very easy once you've associated it to the knowledge base. And you can get very rich typing that way, with ~100 or more classes.

honnibal on 17 Aug 2015

Now that deepmind win a game of go, any news for a Co-referencing System ?

jesuisnicolasdavid on 15 Mar 2016

👍8

@honnibal some time has passed since your last comment. Has your attitude about coreference resolution changed, and if not, how would you recommend coreference resolution be done in Python?

ehknight on 16 Oct 2016

You should read the recent research on the ACL anthology and implement a system that looks like it offers a good accuracy/difficulty trade-off.

honnibal on 16 Oct 2016

Hey @honnibal, is this still in "build it if you want it" territory?

I'm in the market for some simple coref, and was thinking about a bit of pragmatic hacking around the Stanford sieve ideas:
https://nlp.stanford.edu/pubs/conllst2011-coref.pdf

There's probably some really fancy neural stuff one can do these days, but starting with names and pronouns seems pretty sensible (staying well away from nominal mentions for now).

wejradford on 7 Apr 2017

👍3

https://github.com/huggingface/neuralcoref
One of the implementation of coreferencing on Spacy. Hope it is useful.

DeepthiKarnam on 14 Jul 2017

👍7

This looks great @DeepthiKarnam, I think I also saw the medium post. We ended up doing adapting the ideas in the sieve paper, but using lots of hashing, otherwise we had some nasty performance issues when we directly implemented the _search for antecedent_ looping from the paper.

wejradford on 18 Jul 2017

@DeepthiKarnam
I checked the output.
For the command below, I got the following output but I didn't understand what actually means.