Spacy: Wrong ID for StringStore returned by Matcher using OP quantifiers

Created on 15 Aug 2018  Â·  4Comments  Â·  Source: explosion/spaCy

Matchers appear to return incorrect match_id hashes for at least some patterns which use quantifiers. This results in not retrieving the correct pattern ID from the nlp.vocab StringStore, and instead getting back one of the terms being matched in the pattern. It can be triggered by *, ? or + quantifiers.

Example:

nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab)

pattern = [{'LOWER': 'high'}, {'IS_PUNCT': True, 'OP': '?'}, {'LOWER': 'adrenaline'}]
matcher.add("test_pattern", None, pattern)

doc1 = nlp("This is a high-adrenaline situation.")
doc2 = nlp("This is a high adrenaline situation.")

def get_matches(doc):
    matches = matcher(doc)
    for match_id, start, end in matches:
        rule_id = nlp.vocab.strings[match_id]
        span = doc[start:end]
        print(f"{match_id}, Rule '{rule_id}', {start}:{end}, '{span.text}'")

# Works correctly
get_matches(doc1)
# > 5651646042889419180, Rule 'test_pattern', 4:7, 'high-adrenaline'

# Returns wrong pattern ID
get_matches(doc2)
# > 15052847843637698704, Rule 'adrenaline', 4:6, 'high adrenaline'

Environment

  • Operating System: Ubuntu 18.04
  • Python Version Used: 3.6.6
  • spaCy Version Used: 2.1.0a0 (spaCy-nightly)
bug feat / matcher 🌙 nightly

Most helpful comment

Still not 100% on the root causes of this, but the fix makes the code a bit more readable, and resolves the issue.

All 4 comments

Thanks for the report – that's very interesting 🤔 I just tested it with the latest v2.0.x and it worked as expected there, so this might be related to some bug in the new matcher engine in v2.1.x.

Thanks for the test case! Confirmed.

Still not 100% on the root causes of this, but the fix makes the code a bit more readable, and resolves the issue.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TropComplique picture TropComplique  Â·  3Comments

peterroelants picture peterroelants  Â·  3Comments

smartinsightsfromdata picture smartinsightsfromdata  Â·  3Comments

ank-26 picture ank-26  Â·  3Comments

ahalterman picture ahalterman  Â·  3Comments