on_match functions that are supplied to Matcher are not firing for matched of patterns that use * and + OP constraints. ? seems to be ok. This behaviour is present in both the spacy-nightly and also 2.0.12.
import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_sm')
def on_match(matcher, doc, id, matches):
print('Matched!', matches)
matcher = Matcher(nlp.vocab)
matcher.add('JOHN', on_match, [{'LEMMA': 'invest'}, {'OP': '*'}, {'LOWER': 'china'}])
doc = nlp("John Doe invests in one more stock in China.")
for match in matcher(doc):
print(match)
The above snipped does successfully find a match, printing this:
(10603582739829208913, 2, 9)
However the on_match function does not fire, as nothing is printed.
The following snippet on the other hand, which has the greedy wildcard removed (and an updated string which matches) does cause the on_match function to fire:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_sm')
def on_match(matcher, doc, id, matches):
print('Matched!', matches)
matcher = Matcher(nlp.vocab)
matcher.add('JOHN', on_match, [{'LEMMA': 'invest'}, {'LOWER': 'china'}])
doc = nlp("John Doe invests China.")
for match in matcher(doc):
print(match)
Outputting this:
Matched! [(15211191707941042503, 2, 4)]
(15211191707941042503, 2, 4)
This has the same root cause as https://github.com/explosion/spaCy/issues/2671 --- which I've just fixed. Nice timing!
The problem was that the entity ID was being returned incorrectly in some situations, and it's the entity ID that is used to key the on_match callbacks. With the wrong ID matching, the callback was not called. This should now be fixed.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
This has the same root cause as https://github.com/explosion/spaCy/issues/2671 --- which I've just fixed. Nice timing!
The problem was that the entity ID was being returned incorrectly in some situations, and it's the entity ID that is used to key the
on_matchcallbacks. With the wrong ID matching, the callback was not called. This should now be fixed.