Spacy: Annotation Specs for Syntactic Dependency Parsing are incomplete

Created on 20 Jan 2016 · 14Comments · Source: explosion/spaCy

The ClearNLP doc pointed to doesn't include quite few of the dependency tags. Here is a Stanford doc that has all of them except DATIVE.

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&ved=0ahUKEwjg7pGCgLnKAhVG5mMKHeQwBcEQFggpMAM&url=http%3A%2F%2Fnlp.stanford.edu%2Fsoftware%2Fdependencies_manual.pdf&usg=AFQjCNFvNTtNhYCa9IkZMIaIUvKnzka1nA&sig2=OjqwfibBOlVnr-WpyzSKoQ

docs

Source

sdenning

👍1

Most helpful comment

It may just be that the ClearNLP doc itself needs updating as it is rather old. Appendix B2 lists the Stanford dependencies, which also does not include all of the labels I've observed and differs from the doc I pointed to.

The following dependencies are described by the ClearNLP Doc and listed in Table 2:

ACOMP Adjectival complement
ADVCL Adverbial clause modifier
ADVMOD Adverbial modifier
AGENT Agent NN Noun compound modifier
AMOD Adjectival modifier
APPOS Appositional modifier
ATTR Attribute
AUX Auxiliary NUM Numeric modifier
AUXPASS Auxiliary (passive)
CC Coordinating conjunction
CCOMP Clausal complement
COMPLM Complementizer
CONJ Conjunct
CSUBJ Clausal subject
CSUBJPASS Clausal subject (passive)
DEP Unclassified dependent
DET Determiner
DOBJ Direct object
EXPL Expletive
HMOD Modifier in hyphenation
HYPH Hyphen
INFMOD Infinitival modifier
INTJ Interjection
IOBJ Indirect object
MARK Marker
META Meta modifier
NEG Negation modifier
NMOD Modifier of nominal
NPADVMOD Noun phrase as ADVMOD
NSUBJ Nominal subject
NSUBJPASS Nominal subject (passive)
NUMBER Number compound modifier
OPRD Object predicate
PARATAXIS Parataxis
PARTMOD Participial modifier
PCOMP Complement of a preposition
POBJ Object of a preposition
POSS Possession modifier
POSSESSIVE Possessive modifier
PRECONJ Pre-correlative conjunction
PREDET Predeterminer
PREP Prepositional modifier
PRT Particle
PUNCT Punctuation
QUANTMOD Quantifier phrase modifier
RCMOD Relative clause modifier
ROOT Root
XCOMP Open clausal complement

Here are the dependency labels generated by SpaCy I've observed while parsing my corpus, * denotes labels not in the ClearNLP doc (these are only what I've observed, there may be more):

acl
acomp
advcl
advmod
agent
amod
appos
attr
aux
auxpass
case
cc
ccomp
compound
csubj
csubjpass
dative
dep
det
dobj
expl
intj
iobj
mark
meta
neg
nmod
npadvmod
nsubj
nsubjpass
nummod
oprd
parataxis
pcomp
pobj
poss
preconj
predet
prep
prt
punct
quantmod
relcl
xcomp

sdenning on 28 Jan 2016

👍4 ❤1

All 14 comments

We use the ClearNLP converter, which differs slightly from the Stanford one in some cases. The ClearNLP converter is generally more accurate and practical for our situation (i.e.: we just want to convert treebanks into dependency parses). It increases accuracy by making use of the additional annotations in the treebank. In contrast, the Stanford converter has to support the use-case of converting parser output into dependencies. These parsers don't have the additional annotations, so the Stanford converter uses less information than ClearNLP's.

If the ClearNLP docs really don't describe our dependencies, then okay, we have a problem, and I'll raise it with Jin-ho. But are you sure that's the case?

honnibal on 28 Jan 2016

The following dependencies are described by the ClearNLP Doc and listed in Table 2:

Here are the dependency labels generated by SpaCy I've observed while parsing my corpus, * denotes labels not in the ClearNLP doc (these are only what I've observed, there may be more):

acl
acomp
advcl
advmod
agent
amod
appos
attr
aux
auxpass
case
cc
ccomp
compound
csubj
csubjpass
dative
dep
det
dobj
expl
intj
iobj
mark
meta
neg
nmod
npadvmod
nsubj
nsubjpass
nummod
oprd
parataxis
pcomp
pobj
poss
preconj
predet
prep
prt
punct
quantmod
relcl
xcomp

sdenning on 28 Jan 2016

👍4 ❤1

Not sure what happened to the formatting on my last post after I submitted it, in the observed labels section each label was on its own line and asterisks are now replaced with bullets. So the following are observed but not documented:
acl
case
compound
dative
nummod
relcl

sdenning on 28 Jan 2016

Hmm, okay. Thanks, I didn't realise those docs were out of date.

honnibal on 28 Jan 2016

Hey @honnibal any chance we could get a full list of all possible dependency labels in SpaCy? Similar to spacy.parts_of_speech.NAMES?

phdowling on 1 Sep 2016

👍1

From symbols.pyx:

    "acomp": acomp,
    "advcl": advcl,
    "advmod": advmod,
    "agent": agent,
    "amod": amod,
    "appos": appos,
    "attr": attr,
    "aux": aux,
    "auxpass": auxpass,
    "cc": cc,
    "ccomp": ccomp,
    "complm": complm,
    "conj": conj,
    "csubj": csubj,
    "csubjpass": csubjpass,
    "dep": dep,
    "det": det,
    "dobj": dobj,
    "expl": expl,
    "hmod": hmod,
    "hyph": hyph,
    "infmod": infmod,
    "intj": intj,
    "iobj": iobj,
    "mark": mark,
    "meta": meta,
    "neg": neg,
    "nmod": nmod,
    "nn": nn,
    "npadvmod": npadvmod,
    "nsubj": nsubj,
    "nsubjpass": nsubjpass,
    "num": num,
    "number": number,
    "oprd": oprd,
    "parataxis": parataxis,
    "partmod": partmod,
    "pcomp": pcomp,
    "pobj": pobj,
    "poss": poss,
    "possessive": possessive,
    "preconj": preconj,
    "prep": prep,
    "prt": prt,
    "punct": punct,
    "quantmod": quantmod,
    "rcmod": rcmod,
    "root": root,
    "xcomp": xcomp

honnibal on 1 Sep 2016

I tried that list, but it seems to be incomplete, some missing items
include for example compound, nummod and ROOT

On Sep 1, 2016 5:46 PM, "Matthew Honnibal" [email protected] wrote:

From symbols.pyx:

"acomp": acomp,
"advcl": advcl,
"advmod": advmod,
"agent": agent,
"amod": amod,
"appos": appos,
"attr": attr,
"aux": aux,
"auxpass": auxpass,
"cc": cc,
"ccomp": ccomp,
"complm": complm,
"conj": conj,
"csubj": csubj,
"csubjpass": csubjpass,
"dep": dep,
"det": det,
"dobj": dobj,
"expl": expl,
"hmod": hmod,
"hyph": hyph,
"infmod": infmod,
"intj": intj,
"iobj": iobj,
"mark": mark,
"meta": meta,
"neg": neg,
"nmod": nmod,
"nn": nn,
"npadvmod": npadvmod,
"nsubj": nsubj,
"nsubjpass": nsubjpass,
"num": num,
"number": number,
"oprd": oprd,
"parataxis": parataxis,
"partmod": partmod,
"pcomp": pcomp,
"pobj": pobj,
"poss": poss,
"possessive": possessive,
"preconj": preconj,
"prep": prep,
"prt": prt,
"punct": punct,
"quantmod": quantmod,
"rcmod": rcmod,
"root": root,
"xcomp": xcomp

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/spacy-io/spaCy/issues/233#issuecomment-244122239, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA1hdz9Grr_CbfSfiE4AFccLSaE0wOBTks5qlvNlgaJpZM4HI2OX
.

phdowling on 1 Sep 2016

Hello @honnibal,
I am parsing a German text using your new model and facing the same issue: the dependency tags are not clearly documented. Could you please fix that s.t. we could get the most of your API? :)

tanya-h on 2 Sep 2016

UPDATE:
I figured, the German model uses its own tags. Specifically, those of the TIGER Treebank as described here http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_introduction.pdf.

Nevertheless I am looking forward to the description of the English labels:)

tanya-h on 2 Sep 2016

Would it be too much work to adapt spaCy to output Universal Dependencies for the English and German parser?

davidsbatista on 1 Oct 2016

👍1

@tanya-h: you can find more info here, but it's in German

http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_scheme-syntax.pdf

davidsbatista on 2 Oct 2016

Apologies for commenting on a closed issue, but I was scouring github (this issue and #676, #677) trying to figure out what the acl label is supposed to be, since it's not in the Stanford dependencies manual. After hopping around ClearNLP's (now _NLP4J's_) docs, I found the following page:

https://emorynlp.github.io/nlp4j/components/dependency-parsing.html

... which describes all of the mystery labels @sdenning helpfully posted above, except nummod. I post only in case this helps someone in the future.