I would like to use the DependencyMatcher, which I learned to exist reading this issue. There, I also learned there is no documentation for it.
I figured I would learn it from the code and testing, which I might end up doing, but I also figured I could try and ask for some help. I'm having some difficulty understanding what all of the operators are supposed to do (these ones for example: ">>", ".", "$+").
Once I get it, I fully intend to help with this documentation the best I can. So, any help or advice about either the DependencyMatcher or how to contribute to documentation are welcome
I assume the correct place for this documentation would be rule based matching page
Hi @fabio-reale : it would be awesome to get some help to get this properly documented. I haven't used the DependencyMatcher myself yet either, but I'd be happy to dig through this code together with you.
As I understand it, the different operators refer to the kind of grammatical relations that can exist between tokens. E.g. >, referring to gov, refers to all the children of a node, while >>, referring to gov_chain, refers to the full subtree of a node. You can find the definitions of e.g. subtree in token.pyx, which is what doc[node] refers to (a doc is a list of tokens).
I also saw that there's a dependency_matcher pytest fixture defined here, which could be useful to look into as a first example. I agree with you that the patterns are a little hard to read with those various operators. Maybe there is a way to simplify those or make it more intuitive.
We should probably also look into the reason why test_dependency_matcher (in that same file) has been commented out.
A little bit more background is here : https://github.com/explosion/spaCy/pull/2836 and https://github.com/explosion/spaCy/pull/3465
@fabio-reale You could have a look at these pull requests and go over the referenced links within them.
@svlandeg Yes, we would need to get this documented. We would be requiring a few real world examples to use and help concretize everything.
test_dependency_matcher was commented out due to inconsistencies that I had faced with regard to Spacy Matcher used internally. I haven't tried using it recently, so we could probably un-comment it out and see if everything works perfectly now.
Hi @skrcode! Do you have some real-life examples yourself? I think it would be great to get this documented because I'm sure a lot of people could use this functionality. We should think of some example cases to include in the docs, and include the same cases in the test suite.
Over the past few months, we've been fixing quite a few issues with the Matcher, so hopefully the inconsistencies you mentioned, should be resolved now.
@svlandeg Unfortunately, I do not have any real-life examples for this; although probably samples from Semgrex could work just as fine for the purposes of documentation. Some analyses of run-time, memory usage and correctness would also be required and the real-life examples would help out to a great extent here. I think that @cyclecycle was using this functionality and could probably be able to give a much better idea on how it has been faring so far.
Hi @svlandeg and @skrcode,
Thanks for both your responses, they are being useful in understanding the DependencyMatcher. I'll write some tests to make sure I understand it well enough before trying to write any documentation for it.
About real-life examples, I might come up with a few, but they would all be uses for the Portuguese language.
I believe I can actually use the DependencyMatcher in a project I am currently working on.
In my case the texts are German medical notes, but there should be a possibility to find some English examples as well.
@DeNeutoy has recently created this whole blog post on the dependency matcher: http://markneumann.xyz/blog/dependency_matcher/
It would be great if we can get this distilled into a version for the spaCy docs. PRs welcome!
@svlandeg @DeNeutoy That looks awesome !
The upcoming spaCy v3, currently available as spacy-nightly, finally officially supports the DependencyMatcher! @adrianeboyd has redesigned the pattern specifications and fixed & extended operator implementations (PR https://github.com/explosion/spaCy/pull/6018).
The documentation is here: https://nightly.spacy.io/usage/rule-based-matching#dependencymatcher, and will be moved to the main docs once v3.0 is out officially!
Most helpful comment
The upcoming spaCy v3, currently available as
spacy-nightly, finally officially supports theDependencyMatcher! @adrianeboyd has redesigned the pattern specifications and fixed & extended operator implementations (PR https://github.com/explosion/spaCy/pull/6018).The documentation is here: https://nightly.spacy.io/usage/rule-based-matching#dependencymatcher, and will be moved to the main docs once v3.0 is out officially!