This library is incredible. Do you have plans to introduce a co-referencing system?
I'm aware that co-referencing is still an unsolved research problem. But the CoreNLP implementation is still useful for some folks :)
Co-ref is pretty difficult. A big problem is that the current evaluations are pretty hard to interpret, so it's really hard to figure out what aspects of the system design are important. I find the current literature quite confusing.
My plan is to drive the named entity technologies through named entity linking — resolving entity mentions to a knowledge-base, e.g. DBPedia IDs. I think this captures the most useful types of coreference resolution, where an entity is in the chain of reference. In these cases the semantics are usually relatively simple.
If an entity isn't involved, it's all very difficult. An arbitrary example: I clicked a random link near the top of Hacker News, and landed on this: http://www.nytimes.com/2015/08/16/opinion/sunday/oliver-sacks-sabbath.html . Consider the first paragraph:
MY mother and her 17 brothers and sisters had an Orthodox upbringing — all photographs of their father show him wearing a yarmulke, and I was told that he woke up if it fell off during the night.
Are {the yarmulke, it} coreferent? Probably it's not the self-same yarmulke all through his father. There are good, internally consistent theories of how to answer all these difficult questions...but the theories are complicated. And the complexity is necessary. So the best simplification I can see is to just deal with references to named entities.
This feature is a bit down the queue, though. At the moment I'm working on re-launching the docs site, and launching features that promote user customization of spaCy. After that, I need to get spaCy working on more languages.
Yeah that makes sense, I guess what you're working on right now is more important
Named entity linking with DBPedia / freebase IDs will be great to see, but I agree cross language support sounds like higher priority.
Entity disambiguation is less interesting to me than correct "type":
person, place, or thing. Disambiguation is a harder problem!
On Aug 17, 2015 3:09 PM, "Aeneas Wiener" [email protected] wrote:
Named entity linking with DBPedia IDs will be great to see, but I agree
cross language support sounds like higher priority.—
Reply to this email directly or view it on GitHub
https://github.com/honnibal/spaCy/issues/80#issuecomment-131813077.
I think you really need an ontology to do the typing — and at that point you're pretty much doing disambiguation already.
An NER system uses three types of information. They're mostly independent:
There's only so much 1 and 2 can do for you, in terms of figuring out whether the entity is a product, organization, location, person etc. Sometimes you'll get lucky on 1, and some entities are easy via 2. But you'll always have contexts where you can't tell if you don't know — so you need 3.
Assigning a type to the entity is very easy once you've associated it to the knowledge base. And you can get very rich typing that way, with ~100 or more classes.
Now that deepmind win a game of go, any news for a Co-referencing System ?
@honnibal some time has passed since your last comment. Has your attitude about coreference resolution changed, and if not, how would you recommend coreference resolution be done in Python?
You should read the recent research on the ACL anthology and implement a system that looks like it offers a good accuracy/difficulty trade-off.
Hey @honnibal, is this still in "build it if you want it" territory?
I'm in the market for some simple coref, and was thinking about a bit of pragmatic hacking around the Stanford sieve ideas:
https://nlp.stanford.edu/pubs/conllst2011-coref.pdf
There's probably some really fancy neural stuff one can do these days, but starting with names and pronouns seems pretty sensible (staying well away from nominal mentions for now).
https://github.com/huggingface/neuralcoref
One of the implementation of coreferencing on Spacy. Hope it is useful.
This looks great @DeepthiKarnam, I think I also saw the medium post. We ended up doing adapting the ideas in the sieve paper, but using lots of hashing, otherwise we had some nasty performance issues when we directly implemented the _search for antecedent_ looping from the paper.
@DeepthiKarnam
I checked the output.
For the command below, I got the following output but I didn't understand what actually means.
clusters=coref.one_shot_coref(utterances=u"She loves him.",context=u"My sister has a dog.")
print clusters
{2: [2, 0], 3: [3, 1]}
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Now that deepmind win a game of go, any news for a Co-referencing System ?