Spacy: get_lca_matrix is incorrect

Created on 30 May 2018  路  4Comments  路  Source: explosion/spaCy

@ramananbalakrishnan

In spacy version 2.0.11

s = u"He created a test for spacy"
doc = nlp(s)
doc.get_lca_matrix()[3][5]

Expected: 3
Actual: 1

This branch of the recursion is incorrect:

https://github.com/explosion/spaCy/blob/master/spacy/tokens/doc.pyx#L732

It assumes that if n1 and n2 are not adjacent, then LCA(n1, n2) = LCA(n1.head, n2.head).

This is generally not true: there are many counterexamples, the above one with 3 nodes in a chain is the simplest.

bug feat / doc help wanted help wanted (easy)

Most helpful comment

Hi @slavi , thanks for catching that.

I had ported this section of the code during the transition from v1 to v2 (relevant PR https://github.com/explosion/spaCy/pull/1443)

Would you be able to make the required change and send a pull request?

All 4 comments

Hi @slavi , thanks for catching that.

I had ported this section of the code during the transition from v1 to v2 (relevant PR https://github.com/explosion/spaCy/pull/1443)

Would you be able to make the required change and send a pull request?

Thanks for the report, and yes -- a pull request would be much appreciated!

I reimplemented the function to solve this issue, PR here.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ajayrfhp picture ajayrfhp  路  3Comments

ahalterman picture ahalterman  路  3Comments

peterroelants picture peterroelants  路  3Comments

muzaluisa picture muzaluisa  路  3Comments

besirkurtulmus picture besirkurtulmus  路  3Comments