As i understand from the documentation, we can match sentence using rules with adding patterns,
example :
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
doc = nlp(
"i want to buy an iPhone X"
)
pattern = [{"TEXT": "iPhone"}, {"TEXT": "X"}]
matcher.add("Phone", None, pattern)
matches = matcher(doc)
print("Total matches found:", len(matches))
for match_id, start, end in matches:
print("Match found:", doc[start:end].text)
the program will return :
Total matches found: 1
Match found: iPhone X
now we want to tag each token from the matched sentence:
we want a result as:
Phone Product: iPhone
Version: X
Phone Product and Version are two variables tags provided by the user
is there a way to achieve this result ?
You could either iterate over the tokens and store the results in some structure like this (very naive approach):
result = {}
for match_id, start, end in matches:
match = doc[start:end]
print("Match found:", match.text)
if len(match) == 2:
result[match.text] = {}
result[match.text]["Phone Product"] = match[0].text
result[match.text]["Version"] = match[1].text
print(result)
# {'iPhone X': {'Phone Product': 'iPhone', 'Version': 'X'}}
or use spaCys extension attributes. These will allow you to add attributes to the tokens and make them available via the underscore token._.phone_product and token._.version.
Thanks for your advice. It was very helpful.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
You could either iterate over the tokens and store the results in some structure like this (very naive approach):
or use spaCys extension attributes. These will allow you to add attributes to the tokens and make them available via the underscore
token._.phone_productandtoken._.version.