Spacy: Annotation Specs for Syntactic Dependency Parsing are incomplete

Created on 20 Jan 2016  Â·  14Comments  Â·  Source: explosion/spaCy

Most helpful comment

It may just be that the ClearNLP doc itself needs updating as it is rather old. Appendix B2 lists the Stanford dependencies, which also does not include all of the labels I've observed and differs from the doc I pointed to.

The following dependencies are described by the ClearNLP Doc and listed in Table 2:

ACOMP Adjectival complement
ADVCL Adverbial clause modifier
ADVMOD Adverbial modifier
AGENT Agent NN Noun compound modifier
AMOD Adjectival modifier
APPOS Appositional modifier
ATTR Attribute
AUX Auxiliary NUM Numeric modifier
AUXPASS Auxiliary (passive)
CC Coordinating conjunction
CCOMP Clausal complement
COMPLM Complementizer
CONJ Conjunct
CSUBJ Clausal subject
CSUBJPASS Clausal subject (passive)
DEP Unclassified dependent
DET Determiner
DOBJ Direct object
EXPL Expletive
HMOD Modifier in hyphenation
HYPH Hyphen
INFMOD Infinitival modifier
INTJ Interjection
IOBJ Indirect object
MARK Marker
META Meta modifier
NEG Negation modifier
NMOD Modifier of nominal
NPADVMOD Noun phrase as ADVMOD
NSUBJ Nominal subject
NSUBJPASS Nominal subject (passive)
NUMBER Number compound modifier
OPRD Object predicate
PARATAXIS Parataxis
PARTMOD Participial modifier
PCOMP Complement of a preposition
POBJ Object of a preposition
POSS Possession modifier
POSSESSIVE Possessive modifier
PRECONJ Pre-correlative conjunction
PREDET Predeterminer
PREP Prepositional modifier
PRT Particle
PUNCT Punctuation
QUANTMOD Quantifier phrase modifier
RCMOD Relative clause modifier
ROOT Root
XCOMP Open clausal complement

Here are the dependency labels generated by SpaCy I've observed while parsing my corpus, * denotes labels not in the ClearNLP doc (these are only what I've observed, there may be more):

  • acl
    acomp
    advcl
    advmod
    agent
    amod
    appos
    attr
    aux
    auxpass
  • case
    cc
    ccomp
  • compound
    csubj
    csubjpass
  • dative
    dep
    det
    dobj
    expl
    intj
    iobj
    mark
    meta
    neg
    nmod
    npadvmod
    nsubj
    nsubjpass
  • nummod
    oprd
    parataxis
    pcomp
    pobj
    poss
    preconj
    predet
    prep
    prt
    punct
    quantmod
  • relcl
    xcomp

All 14 comments

We use the ClearNLP converter, which differs slightly from the Stanford one in some cases. The ClearNLP converter is generally more accurate and practical for our situation (i.e.: we just want to convert treebanks into dependency parses). It increases accuracy by making use of the additional annotations in the treebank. In contrast, the Stanford converter has to support the use-case of converting parser output into dependencies. These parsers don't have the additional annotations, so the Stanford converter uses less information than ClearNLP's.

If the ClearNLP docs really don't describe our dependencies, then okay, we have a problem, and I'll raise it with Jin-ho. But are you sure that's the case?

It may just be that the ClearNLP doc itself needs updating as it is rather old. Appendix B2 lists the Stanford dependencies, which also does not include all of the labels I've observed and differs from the doc I pointed to.

The following dependencies are described by the ClearNLP Doc and listed in Table 2:

ACOMP Adjectival complement
ADVCL Adverbial clause modifier
ADVMOD Adverbial modifier
AGENT Agent NN Noun compound modifier
AMOD Adjectival modifier
APPOS Appositional modifier
ATTR Attribute
AUX Auxiliary NUM Numeric modifier
AUXPASS Auxiliary (passive)
CC Coordinating conjunction
CCOMP Clausal complement
COMPLM Complementizer
CONJ Conjunct
CSUBJ Clausal subject
CSUBJPASS Clausal subject (passive)
DEP Unclassified dependent
DET Determiner
DOBJ Direct object
EXPL Expletive
HMOD Modifier in hyphenation
HYPH Hyphen
INFMOD Infinitival modifier
INTJ Interjection
IOBJ Indirect object
MARK Marker
META Meta modifier
NEG Negation modifier
NMOD Modifier of nominal
NPADVMOD Noun phrase as ADVMOD
NSUBJ Nominal subject
NSUBJPASS Nominal subject (passive)
NUMBER Number compound modifier
OPRD Object predicate
PARATAXIS Parataxis
PARTMOD Participial modifier
PCOMP Complement of a preposition
POBJ Object of a preposition
POSS Possession modifier
POSSESSIVE Possessive modifier
PRECONJ Pre-correlative conjunction
PREDET Predeterminer
PREP Prepositional modifier
PRT Particle
PUNCT Punctuation
QUANTMOD Quantifier phrase modifier
RCMOD Relative clause modifier
ROOT Root
XCOMP Open clausal complement

Here are the dependency labels generated by SpaCy I've observed while parsing my corpus, * denotes labels not in the ClearNLP doc (these are only what I've observed, there may be more):

  • acl
    acomp
    advcl
    advmod
    agent
    amod
    appos
    attr
    aux
    auxpass
  • case
    cc
    ccomp
  • compound
    csubj
    csubjpass
  • dative
    dep
    det
    dobj
    expl
    intj
    iobj
    mark
    meta
    neg
    nmod
    npadvmod
    nsubj
    nsubjpass
  • nummod
    oprd
    parataxis
    pcomp
    pobj
    poss
    preconj
    predet
    prep
    prt
    punct
    quantmod
  • relcl
    xcomp

Not sure what happened to the formatting on my last post after I submitted it, in the observed labels section each label was on its own line and asterisks are now replaced with bullets. So the following are observed but not documented:
acl
case
compound
dative
nummod
relcl

Hmm, okay. Thanks, I didn't realise those docs were out of date.

Hey @honnibal any chance we could get a full list of all possible dependency labels in SpaCy? Similar to spacy.parts_of_speech.NAMES?

From symbols.pyx:

    "acomp": acomp,
    "advcl": advcl,
    "advmod": advmod,
    "agent": agent,
    "amod": amod,
    "appos": appos,
    "attr": attr,
    "aux": aux,
    "auxpass": auxpass,
    "cc": cc,
    "ccomp": ccomp,
    "complm": complm,
    "conj": conj,
    "csubj": csubj,
    "csubjpass": csubjpass,
    "dep": dep,
    "det": det,
    "dobj": dobj,
    "expl": expl,
    "hmod": hmod,
    "hyph": hyph,
    "infmod": infmod,
    "intj": intj,
    "iobj": iobj,
    "mark": mark,
    "meta": meta,
    "neg": neg,
    "nmod": nmod,
    "nn": nn,
    "npadvmod": npadvmod,
    "nsubj": nsubj,
    "nsubjpass": nsubjpass,
    "num": num,
    "number": number,
    "oprd": oprd,
    "parataxis": parataxis,
    "partmod": partmod,
    "pcomp": pcomp,
    "pobj": pobj,
    "poss": poss,
    "possessive": possessive,
    "preconj": preconj,
    "prep": prep,
    "prt": prt,
    "punct": punct,
    "quantmod": quantmod,
    "rcmod": rcmod,
    "root": root,
    "xcomp": xcomp

I tried that list, but it seems to be incomplete, some missing items
include for example compound, nummod and ROOT

On Sep 1, 2016 5:46 PM, "Matthew Honnibal" [email protected] wrote:

From symbols.pyx:

"acomp": acomp,
"advcl": advcl,
"advmod": advmod,
"agent": agent,
"amod": amod,
"appos": appos,
"attr": attr,
"aux": aux,
"auxpass": auxpass,
"cc": cc,
"ccomp": ccomp,
"complm": complm,
"conj": conj,
"csubj": csubj,
"csubjpass": csubjpass,
"dep": dep,
"det": det,
"dobj": dobj,
"expl": expl,
"hmod": hmod,
"hyph": hyph,
"infmod": infmod,
"intj": intj,
"iobj": iobj,
"mark": mark,
"meta": meta,
"neg": neg,
"nmod": nmod,
"nn": nn,
"npadvmod": npadvmod,
"nsubj": nsubj,
"nsubjpass": nsubjpass,
"num": num,
"number": number,
"oprd": oprd,
"parataxis": parataxis,
"partmod": partmod,
"pcomp": pcomp,
"pobj": pobj,
"poss": poss,
"possessive": possessive,
"preconj": preconj,
"prep": prep,
"prt": prt,
"punct": punct,
"quantmod": quantmod,
"rcmod": rcmod,
"root": root,
"xcomp": xcomp

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/spacy-io/spaCy/issues/233#issuecomment-244122239, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA1hdz9Grr_CbfSfiE4AFccLSaE0wOBTks5qlvNlgaJpZM4HI2OX
.

Hello @honnibal,
I am parsing a German text using your new model and facing the same issue: the dependency tags are not clearly documented. Could you please fix that s.t. we could get the most of your API? :)

UPDATE:
I figured, the German model uses its own tags. Specifically, those of the TIGER Treebank as described here http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_introduction.pdf.

Nevertheless I am looking forward to the description of the English labels:)

Would it be too much work to adapt spaCy to output Universal Dependencies for the English and German parser?

Apologies for commenting on a closed issue, but I was scouring github (this issue and #676, #677) trying to figure out what the acl label is supposed to be, since it's not in the Stanford dependencies manual. After hopping around ClearNLP's (now _NLP4J's_) docs, I found the following page:

https://emorynlp.github.io/nlp4j/components/dependency-parsing.html

... which describes all of the mystery labels @sdenning helpfully posted above, except nummod. I post only in case this helps someone in the future.

Hi @honnibal
Could you please tell me, how can I get complete list of dependency relations in spacy?

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings