I'm not sure if this issue is in scope of this project, since as far as I know it's only possible to figure if the 'd contraction is actually had or would from the context of the sentence, but most of the time spaCy seems to work with contractions as expected and it would be nice to be able to rely on it.
import spacy
nlp = spacy.load("en_core_web_lg")
doc = nlp("I'd a dream")
print(doc[1].lemma_)
> would
The result I'd expect to print is have instead of would.
Thanks for the report! This is coming from a rule (in the tokenizer exceptions) that assigns the lemma/tag would/MD to the contraction 'd. I think it would make sense to remove would/MD and let the tagger handle it instead. The tagger is still probably going to get this wrong a fair amount of the time (and the tagger will probably do better on 3rd person pronouns than 1st/2nd), but it doesn't make sense for a rule to say it's always would.
Most helpful comment
Thanks for the report! This is coming from a rule (in the tokenizer exceptions) that assigns the lemma/tag
would/MDto the contraction'd. I think it would make sense to removewould/MDand let the tagger handle it instead. The tagger is still probably going to get this wrong a fair amount of the time (and the tagger will probably do better on 3rd person pronouns than 1st/2nd), but it doesn't make sense for a rule to say it's alwayswould.